Commit Graph

10733 Commits

Author SHA1 Message Date
Simon Pilgrim d65ddca83f [ValueTracking] ComputeKnownBits - minimum leading/trailing zero bits in LSHR/SHL (PR44526)
Followup to D72573 - as detailed in https://blog.regehr.org/archives/1709 we don't make use of the known leading/trailing zeros for shifted values in cases where we don't know the shift amount value.

Stop ValueTracking returning zero for poison shift patterns and use the KnownBits shift helpers directly.

Extend KnownBits::shl to combine all possible shifted combinations if both min/max shift amount values are in range.

Differential Revision: https://reviews.llvm.org/D90479
2021-02-24 12:15:45 +00:00
Ta-Wei Tu 98c6110d9b [LoopNest] Use `getUniqueSuccessor()` instead when checking empty blocks
Blocks that contain only a single branch instruction to the next block can be skipped in analyzing the loop-nest structure.
This is currently done by `getSingleSuccessor()`.
However, the branch instruction might have multiple targets which happen to all be the same.
In this case, the block should still be considered as empty and skipped.

An example is `test/Transforms/LoopInterchange/update-condbranch-duplicate-successors.ll` (the LIT test for this patch is modified from it as well).

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D97286
2021-02-24 09:53:12 +08:00
Fangrui Song ef312951fd collectUsedGlobalVariables: migrate SmallPtrSetImpl overload to SmallVecImpl overload after D97128
And delete the SmallPtrSetImpl overload.

While here, decrease inline element counts from 8 to 4. See D97128 for the choice.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D97257
2021-02-23 16:09:06 -08:00
Simon Pilgrim 1020d16156 [InstSimplify] Handle nsw shl -> poison patterns
Pulled out from D90479 - this recognises invalid nsw shl patterns with signbit changes that result in poison.

Differential Revision: https://reviews.llvm.org/D97305
2021-02-23 18:26:56 +00:00
Simon Pilgrim 18b9fc48f1 [InstructionSimplify] SimplifyShift - rename shift amount KnownBits. NFCI.
As suggested on D97305.
2021-02-23 18:12:59 +00:00
David Green dd2dbf7ee2 [TTI] Change getOperandsScalarizationOverhead to take Type args
As a followup to D95291, getOperandsScalarizationOverhead was still
using a VF as a vector factor if the arguments were scalar, and would
assert on certain matrix intrinsics with differently sized vector
arguments. This patch removes the VF arg, instead passing the Types
through directly. This should allow it to more accurately compute the
cost without having to guess at which operands will be vectorized,
something difficult with more complex intrinsics.

This adjusts one SVE test as it is now calling the wrong intrinsic vs
veccall. Without invalid InstructCosts the cost of the scalarized
intrinsic is too low. This should get fixed when the cost of
scalarization is accounted for with scalable types.

Differential Revision: https://reviews.llvm.org/D96287
2021-02-23 13:04:59 +00:00
David Green bd4b61efbd [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf working. ARM removed the fix in
dfac521da1, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-23 13:03:26 +00:00
Arthur Eubanks 468fa037b2 Only verify LazyCallGraph under expensive checks
These verify calls are causing a lot of slowdown on some files, up to 8x.
The LazyCallGraph infra has been tested a lot over the years, so I'm fairly confident that we don't always need to run the verifys.

These verifies took >90% of total time in one of the compilations I looked at.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D97225
2021-02-22 20:18:59 -08:00
Kazu Hirata 896d0e1a2a [Analysis] Use range-based for loops (NFC) 2021-02-22 20:17:18 -08:00
Kazu Hirata 4ed47858ab [llvm] Use llvm::drop_begin (NFC) 2021-02-22 20:17:16 -08:00
Kazu Hirata 871affc5e7 [Analysis] Use ListSeparator (NFC) 2021-02-22 20:17:15 -08:00
Craig Topper 89440df64a [ValueTracking] Improve ComputeNumSignBits for SRem.
The result will have the same sign as the dividend unless the
result is 0. The magnitude of the result will always be less
than or equal to the dividend. So the result will have at least
as many sign bits as the dividend.

Previously we would do this if the divisor was a positive constant,
but that isn't required.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97170
2021-02-22 14:36:25 -08:00
Simon Pilgrim 476ff0327b [InstSimplify] Cleanup out-of-range shift amount handling.
Use APInt::uge() direct instead of getLimitedValue().

Use KnownBits::getMinValue() to make the bounds check more obvious.
2021-02-22 17:00:49 +00:00
Kazu Hirata 047fc3bf20 [Analysis] Use ListSeparator (NFC) 2021-02-21 19:58:04 -08:00
Nikita Popov e0615bcd39 [Loads] Add optimized FindAvailableLoadedValue() overload (NFCI)
FindAvailableLoadedValue() accepts an iterator by reference. If no
available value is found, then the iterator will either be left
at a clobbering instruction or the beginning of the basic block.
This allows using FindAvailableLoadedValue() across multiple blocks.

If this functionality is not needed, as is the case in InstCombine,
then we can use a much more efficient implementation: First try
to find an available value, and only perform clobber checks if
we actually found one. As this function only looks at a very small
number of instructions (6 by default) and usually doesn't find an
available value, this saves many expensive alias analysis queries.
2021-02-21 18:42:56 +01:00
Nikita Popov 7c706aa0d8 [Loads] Extract helper frunction for available load/store (NFC)
This contains the logic for extracting an available load/store
from a given instruction, to be reused in a following patch.
2021-02-21 18:24:58 +01:00
Juneyoung Lee aacf7878bc [ValueTracking] Improve impliesPoison
This patch improves ValueTracking's impliesPoison(V1, V2) to do this reasoning:

```
  %res = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
  %overflow = extractvalue { i64, i1 } %res, 1
  %mul      = extractvalue { i64, i1 } %res, 0

	; If %mul is poison, %overflow is also poison, and vice versa.
```

This improvement leads to supporting this optimization under `-instcombine-unsafe-select-transform=0`:

```
define i1 @test2_logical(i64 %a, i64 %b, i64* %ptr) {
; CHECK-LABEL: @test2_logical(
; CHECK-NEXT:    [[MUL:%.*]] = mul i64 [[A:%.*]], [[B:%.*]]
; CHECK-NEXT:    [[TMP1:%.*]] = icmp ne i64 [[A]], 0
; CHECK-NEXT:    [[TMP2:%.*]] = icmp ne i64 [[B]], 0
; CHECK-NEXT:    [[OVERFLOW_1:%.*]] = and i1 [[TMP1]], [[TMP2]]
; CHECK-NEXT:    [[NEG:%.*]] = sub i64 0, [[MUL]]
; CHECK-NEXT:    store i64 [[NEG]], i64* [[PTR:%.*]], align 8
; CHECK-NEXT:    ret i1 [[OVERFLOW_1]]
;

  %res = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
  %overflow = extractvalue { i64, i1 } %res, 1
  %mul = extractvalue { i64, i1 } %res, 0
  %cmp = icmp ne i64 %mul, 0
  %overflow.1 = select i1 %overflow, i1 true, i1 %cmp
  %neg = sub i64 0, %mul
  store i64 %neg, i64* %ptr, align 8
  ret i1 %overflow.1
}
```

Previously, this didn't happen because the flag prevented `select i1 %overflow, i1 true, i1 %cmp` from being `or i1 %overflow, %cmp`.
Note that the select -> or conversion happens only when `impliesPoison(%cmp, %overflow)` returns true.
This improvement allows `impliesPoison` to do the reasoning.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96929
2021-02-20 13:22:34 +09:00
Philip Reames b13e942224 [ValueTracking] Add a two argument form of safeCtxI [NFC]
The existing implementation was relying on order of evaluation to achieve a particular result.  This got really confusing when wanting to change the handling for arguments in a later patch.
2021-02-19 14:52:51 -08:00
Sanjay Patel 5b250a27ec [Analysis][LoopVectorize] do not form reductions of pointers
This is a fix for https://llvm.org/PR49215 either before/after
we make a verifier enhancement for vector reductions with D96904.

I'm not sure what the current thinking is for pointer math/logic
in IR. We allow icmp on pointer values. Therefore, we match min/max
patterns, so without this patch, the vectorizer could form a vector
reduction from that sequence.

But the LangRef definitions for min/max and vector reduction
intrinsics do not allow pointer types:
https://llvm.org/docs/LangRef.html#llvm-smax-intrinsic
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-umax-intrinsic

So we would crash/assert at some point - either in IR verification,
in the cost model, or in codegen. If we do want to allow this kind
of transform, we will need to update the LangRef and all of those
parts of the compiler.

Differential Revision: https://reviews.llvm.org/D97047
2021-02-19 14:01:57 -05:00
Philip Reames 4a5edea193 [SCEV] Use both known bits and sign bits when computing range of SCEV unknowns
When computing a range for a SCEVUnknown, today we use computeKnownBits for unsigned ranges, and computeNumSignBots for signed ranges. This means we miss opportunities to improve range results.

One common missed pattern is that we have a signed range of a value which CKB can determine is positive, but CNSB doesn't convey that information. The current range includes the negative part, and is thus double the size.

Per the removed comment, the original concern which delayed using both (after some code merging years back) was a compile time concern. CTMark results (provided by Nikita, thanks!) showed a geomean impact of about 0.1%. This doesn't seem large enough to avoid higher quality results.

Differential Revision: https://reviews.llvm.org/D96534
2021-02-19 08:29:12 -08:00
Nikita Popov 2f17ed294f [DCE] Don't remove non-willreturn calls
In both ADCE and BDCE (via DemandedBits) we should not remove
instructions that are not guaranteed to return. This issue was
pointed out by fhahn in the recent llvm-dev thread.

Differential Revision: https://reviews.llvm.org/D96993
2021-02-19 12:35:40 +01:00
Nikita Popov 370addb996 [IR] Move willReturn() to Instruction
This moves the willReturn() helper from CallBase to Instruction,
so that it can be used in a more generic manner. This will make
it easier to fix additional passes (ADCE and BDCE), and will give
us one place to change if additional instructions should become
non-willreturn (e.g. there has been talk about handling volatile
operations this way).

I have also included the IntrinsicInst workaround directly in
here, so that it gets applied consistently. (As such this change
is not entirely NFC -- FuncAttrs will now use this as well.)

Differential Revision: https://reviews.llvm.org/D96992
2021-02-19 11:56:01 +01:00
Nikita Popov 1d9f4903c6 [BasicAA] Add simple depth limit to avoid stack overflow (PR49151)
This is a simpler variant of D96647. It just adds a straightforward
depth limit with a high cutoff, without introducing complex logic
for BatchAA consistency. It accepts that we may cache a sub-optimal
result if the depth limit is hit.

Eventually this should be more fully addressed by D96647 or similar,
but in the meantime this avoids stack overflows in a cheap way.

Differential Revision: https://reviews.llvm.org/D96996
2021-02-19 11:05:42 +01:00
Wei Mi 5fb65c02ca [SampleFDO] Stop repeated indirect call promotion for the same target.
Found a problem in indirect call promotion in sample loader pass. Currently
if an indirect call is promoted for a target, and if the parent function is
inlined into some other function, the indirect call can be promoted for the
same target again. That is redundent which can harm performance and can cause
excessive compile time in some extreme case.

The patch fixes the issue. If a target is promoted for an indirect call, the
patch will write ICP metadata with the target call count being set to 0.
In the later ICP in sample profile loader, if it sees a target has 0 count
for an indirect call, it knows the target has been promoted and won't do
indirect call promotion for the indirect call.

The fix brings 0.1~0.2% performance on our search benchmark.

Differential Revision: https://reviews.llvm.org/D96806
2021-02-18 17:01:32 -08:00
Nikita Popov 70e3c9a8b6 [BasicAA] Always strip single-argument phi nodes
We can always look through single-argument (LCSSA) phi nodes when
performing alias analysis. getUnderlyingObject() already does this,
but stripPointerCastsAndInvariantGroups() does not. We still look
through these phi nodes with the usual aliasPhi() logic, but
sometimes get sub-optimal results due to the restrictions on value
equivalence when looking through arbitrary phi nodes. I think it's
generally beneficial to keep the underlying object logic and the
pointer cast stripping logic in sync, insofar as it is possible.

With this patch we get marginally better results:

  aa.NumMayAlias | 5010069 | 5009861
  aa.NumMustAlias | 347518 | 347674
  aa.NumNoAlias | 27201336 | 27201528
  ...
  licm.NumPromoted | 1293 | 1296

I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(),
as we're past the point where we can explicitly spell out everything
that's getting stripped.

Differential Revision: https://reviews.llvm.org/D96668
2021-02-18 23:07:50 +01:00
Kazu Hirata e54579307b [llvm] Ensure newlines at the end of files (NFC)
This patch eliminates pesky "No newline at end of file" messages from
git diff.
2021-02-17 23:58:44 -08:00
William S. Moses 40862b1a74 [SROA] Propagate correct TBAA/TBAA Struct offsets
SROA does not correctly account for offsets in TBAA/TBAA struct metadata.
This patch creates functionality for generating new MD with the corresponding
offset and updates SROA to use this functionality.

Differential Revision: https://reviews.llvm.org/D95826
2021-02-17 11:59:00 -05:00
Kazu Hirata df35a183d7 [SCEV] Use ListSeparator (NFC) 2021-02-16 23:23:05 -08:00
Mircea Trofin 33481c9997 [mlgo] Fetch models from path / URL
Allow custom location for pre-trained models used when AOT-compiling
policies.

Differential Revision: https://reviews.llvm.org/D96796
2021-02-16 22:47:14 -08:00
Kerry McLaughlin ba1e150d03 [SVE] Add support for scalable vectorization of loops with int/fast FP reductions
This patch enables scalable vectorization of loops with integer/fast reductions, e.g:

```
unsigned sum = 0;
for (int i = 0; i < n; ++i) {
  sum += a[i];
}
```

A new TTI interface, isLegalToVectorizeReduction, has been added to prevent
reductions which are not supported for scalable types from vectorizing.
If the reduction is not supported for a given scalable VF,
computeFeasibleMaxVF will fall back to using fixed-width vectorization.

Reviewed By: david-arm, fhahn, dmgreen

Differential Revision: https://reviews.llvm.org/D95245
2021-02-16 13:50:06 +00:00
Sameer Sahasrabuddhe 11bf7da64a [NewPM] Introduce (GPU)DivergenceAnalysis in the new pass manager
The GPUDivergenceAnalysis is now renamed to just "DivergenceAnalysis"
since there is no conflict with LegacyDivergenceAnalysis. In the
legacy PM, this analysis can only be used through the legacy DA
serving as a wrapper. It is now made available as a pass in the new
PM, and has no relation with the legacy DA.

The new DA currently cannot handle irreducible control flow; its
presence can cause the analysis to run indefinitely. The analysis is
now modified to detect this and report all instructions in the
function as divergent. This is super conservative, but allows the
analysis to be used without hanging the compiler.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D96615
2021-02-16 10:26:45 +05:30
Sanjay Patel 378941f611 [ValueTracking] add scan limit for assumes
In the motivating example from https://llvm.org/PR49171 and
reduced test here, we would unroll and clone assumes so much
that compile-time effectively became infinite while analyzing
all of those assumes.
2021-02-15 15:24:20 -05:00
Kerry McLaughlin 5fe1593438 [LoopVectorizer] Require no-signed-zeros-fp-math=true for fmin/fmax
Currently, setting the `no-nans-fp-math` attribute to true will allow
loops with fmin/fmax to vectorize, though we should be requiring that
`no-signed-zeros-fp-math` is also set.

This patch adds the check for no-signed-zeros at the function level and includes
tests to make sure we don't vectorize functions with only one of the attributes
associated.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D96604
2021-02-15 13:47:05 +00:00
Caroline Concatto 2d728bbff5 [CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse
This patch adds  a new intrinsic experimental.vector.reduce that takes a single
vector and returns a vector of matching type but with the original lane order
 reversed. For example:

```
vector.reverse(<A,B,C,D>) ==> <D,C,B,A>
```

The new intrinsic supports fixed and scalable vectors types.
The fixed-width vector relies on shufflevector to maintain existing behaviour.
Scalable vector uses the new ISD node - VECTOR_REVERSE.

This new intrinsic is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker (@paulwalker-arm).

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Differential Revision: https://reviews.llvm.org/D94883
2021-02-15 13:39:43 +00:00
Sjoerd Meijer 357237e93e Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode"
This reverts commit effc3b0799, with the build
problem fixed.
2021-02-15 11:33:00 +00:00
Sjoerd Meijer effc3b0799 Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode"
This reverts commit cd6de0e8de.
2021-02-15 11:01:23 +00:00
Sjoerd Meijer cd6de0e8de [TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode
This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into
getPreferredAddressingMode() so that we have one interface to steer LSR in
generating the preferred addressing mode.

Differential Revision: https://reviews.llvm.org/D96600
2021-02-15 10:44:15 +00:00
Nikita Popov f197cf2126 [BasicAA] Merge aliasGEP code paths
At this point, we can treat the case of GEP/GEP aliasing and
GEP/non-GEP aliasing in essentially the same way. The only
differences are that we need to do an additional negative GEP base
check, and that we perform a bailout on unknown sizes for the
GEP/non-GEP case (the latter exists only to limit compile-time).

This change is not quite NFC due to the peculiar effect that
the DecomposedGEP for V2 can actually be non-trivial even if V2
is not a GEP. The reason for this is that getUnderlyingObject()
can look through LCSSA phi nodes, while stripPointerCasts() doesn't.
This can lead to slightly better results if single-entry phi nodes
occur inside a loop, where looking through the phi node via aliasPhi()
would subject it to phi cycle equivalence restrictions. It would
probably make sense to adjust pointer cast stripping (for AA) to
handle this case, and ensure consistent results.
2021-02-14 19:35:36 +01:00
Kazu Hirata 910e2d1e57 [llvm] Use llvm::is_contained (NFC) 2021-02-14 08:36:20 -08:00
Kazu Hirata 6ac12e4b34 [Analysis] Use ListSeparator (NFC) 2021-02-14 08:36:14 -08:00
Nikita Popov 53ae96d4bb [BasicAA] Avoid duplicate query for GEPs with identical offsets (NFCI)
For two GEPs with identical offsets, we currently first perform
a base address query without size information, and then if it is
MayAlias, perform another with size information. This is pointless,
as the latter query should produce strictly better results.

This was not quite true historically due to the way that NoAlias
assumptions were handled, but that issue has since been resolved.
2021-02-14 17:18:28 +01:00
Nikita Popov 728803ed74 [BasicAA] Use index difference to detect GEPs with identical indexes
We currently detect GEPs that have exactly the same indexes by
comparing the Offsets and VarIndices. However, the latter implicitly
performs equality comparisons between two values, which is not
generally legal inside BasicAA, due to the possibility of comparisons
across phi cycles.

I believe that in this particular instance this actually ends up being
unproblematic, at least I wasn't able to come up with any cases that
could result in an incorrect root query result.

In the interest of being defensive, compute GetIndexDifference earlier
(which knows how to handle phi cycles properly) and use the result of
that to determine whether the offsets are identical.
2021-02-14 17:11:03 +01:00
aqjune 5f3c99085d [ValueTracking] Dereferenced pointers are noundef
This is a follow-up of D95238's LangRef update.
This patch updates `programUndefinedIfUndefOrPoison(V)` to return true if
`V` is used by any memory-accessing instruction.
Interestingly, this affected many tests in Attributors, mainly about adding noundefs.
The tests are updated using llvm/utils/update_test_checks.py. I checked that the diffs
are about updating noundefs.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96642
2021-02-14 22:50:48 +09:00
Tyker 642e9225c6 reland [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-13 13:03:11 +01:00
Nikita Popov 20cb6c7ceb [AA] Add option for tracing AA queries (NFC)
Add an -aa-trace debug option that can be used to print AA queries,
including any recursive queries and their results.
2021-02-12 21:42:49 +01:00
Nikita Popov 191e469ede [AA] Move Depth member from AAResults to AAQI (NFC)
Rather than storing the query depth in AAResults, store it in AAQI.
This makes more sense, as it is a property of the query. This
sidesteps the issue of D94363, fixing slightly inaccurate AA
statistics. Additionally, I plan to use the Depth from BasicAA in
the future, where fetching it from AAResults would be unreliable.

This change is not quite as straightforward as it seems, because
we need to preserve the depth when creating a new AAQI for recursive
queries across phis. I'm adding a new method for this, as we may
need to preserve additional information here in the future.
2021-02-12 21:42:36 +01:00
Akira Hatanaka ed4718eccb [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-12 09:51:57 -08:00
Kerry McLaughlin fea06efe7c [SVE][LoopVectorize] Support for vectorization of loops with function calls
Changes `getScalarizationOverhead` to return an invalid cost for scalable VFs
and adds some simple tests for loops containing a function for which
there is a vectorized variant available.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D96356
2021-02-12 13:47:43 +00:00
Sanjay Patel 79b1b4a581 [Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics
The vector reduction intrinsics started life as experimental ops, so backend support
was lacking. As part of promoting them to 1st-class intrinsics, however, codegen
support was added/improved:
D58015
D90247

So I think it is safe to now remove this complication from IR.

Note that we still have an IR-level codegen expansion pass for these as discussed
in D95690. Removing that is another step in simplifying the logic. Also note that
x86 was already unconditionally forming reductions in IR, so there should be no
difference for x86.

I spot checked a couple of the tests here by running them through opt+llc and did
not see any asm diffs.

If we do find functional differences for other targets, it should be possible
to (at least temporarily) restore the shuffle IR with the ExpandReductions IR
pass.

Differential Revision: https://reviews.llvm.org/D96552
2021-02-12 08:13:50 -05:00
David Sherwood 01b87444cb [NFC][Analysis] Change struct VecDesc to use ElementCount
This patch changes the VecDesc struct to use ElementCount
instead of an unsigned VF value, in preparation for
future work that adds support for vectorized versions of
math functions using scalable vectors. Since all I'm doing
in this patch is switching the type I believe it's a
non-functional change. I changed getWidestVF to now return
both the widest fixed-width and scalable VF values, but
currently the widest scalable value will be zero.

Differential Revision: https://reviews.llvm.org/D96011
2021-02-12 11:07:58 +00:00
David Sherwood 9700228abc [Analysis] Change VFABI::mangleTLIVectorName to use ElementCount
Adds support for mangling TLI vector names for scalable vectors.

Differential Revision: https://reviews.llvm.org/D96338
2021-02-12 09:38:12 +00:00
Philip Reames 8ef4b961a3 [knownbits] Preserve known bits for small shift recurrences
The motivation for this is that I'm looking at an example that uses shifts as induction variables. There's lots of other omissions, but one of the first I noticed is that we can't compute tight known bits. (This indirectly causes SCEV's range analysis to produce very poor results as well.)

Differential Revision: https://reviews.llvm.org/D96440
2021-02-11 17:56:36 -08:00
Stanislav Mekhanoshin 8151c1b442 Move implementation of isAssumeLikeIntrinsic into IntrinsicInst
This is remove dependency on ValueTracking in the future patch.

Differential Revision: https://reviews.llvm.org/D96079
2021-02-11 11:41:34 -08:00
Michael Kruse 606aa622b2 Revert "[AssumptionCache] Avoid dangling llvm.assume calls in the cache"
This reverts commit b7d870eae7 and the
subsequent fix "[Polly] Fix build after AssumptionCache change (D96168)"
(commit e6810cab09).

It caused indeterminism in the output, such that e.g. the
polly-x86_64-linux buildbot failed accasionally.
2021-02-11 12:17:38 -06:00
Sander de Smalen 41500836b0 NFC: Migrate CodeMetrics to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D96030
2021-02-11 11:08:41 +00:00
Sander de Smalen 703130fb01 [TTI] Change TargetTransformInfo::getMinimumVF to return ElementCount
This will be needed in the loop-vectorizer where the minimum VF
requested may be a scalable VF. getMinimumVF now takes an additional
operand 'IsScalableVF' that indicates whether a scalable VF is required.

Reviewed By: kparzysz, rampitec

Differential Revision: https://reviews.llvm.org/D96020
2021-02-11 09:08:48 +00:00
Philip Reames 9bf3cfa77b [SCEV] Add a missing AssumptionCache parameter
The AssumptionCache mechanism is used to feed assumes into known bits computations.  Most places in SCEV passed it in, but one place appears to have been missed.

Spotted via inspection, don't have a test case which actually exercises this, but it seemed like an obvious fixit.
2021-02-10 12:08:55 -08:00
Tyker 5652e192fc Revert "[InstCombine] convert assumes to operand bundles"
This reverts commit 5eb2e994f9.
2021-02-10 01:32:00 +01:00
Tyker 5eb2e994f9 [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-09 19:33:53 +01:00
Andrew Litteken 56615a2654 [IROutliner] Adding instruction strings to IRSimilarityPrinting diagnostics.
When doing some recent debugging of the IROutliner, and using the similarity pass for debugging, just having the basic block and function isn't really enough to get all the information. This adds the first and last instruction to the output of the IRSimilarityPrinting pass to give better information to a user.

Reviewer: paquette

Differential Revision: https://reviews.llvm.org/D94304
2021-02-09 12:11:47 -06:00
Sanjay Patel 0be0a1237c [ValueTracking] improve analysis for "C << X" and "C >> X"
This is based on the example/comments in:
https://llvm.org/PR48984

I tried just lifting the restriction in computeKnownBitsFromShiftOperator()
as suggested in the bug report, but that doesn't catch all of the cases
shown here. I didn't step through to see exactly why that happened. But it
seems like a reasonable compromise to cheaply check the special-case of
shifting a constant.

There's a slight regression on a cmp transform as noted, but this is likely
the more important/common pattern, so we can fix that icmp pattern later if
needed.

Differential Revision: https://reviews.llvm.org/D95959
2021-02-09 12:38:06 -05:00
Nico Weber de1966e542 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 4a64d8fe39.
Makes clang crash when buildling trivial iOS programs, see comment
after https://reviews.llvm.org/D92808#2551401
2021-02-09 11:06:32 -05:00
Jinsong Ji 9202806241 Revert "[CostModel] Remove VF from IntrinsicCostAttributes"
This reverts commit 502a67dd7f.

This expose a failure in test-suite build on PowerPC,
revert to unblock buildbot first,
Dave will re-commit in https://reviews.llvm.org/D96287.

Thanks Dave.
2021-02-09 02:14:14 +00:00
David Sherwood 3bbaece5a0 [Analysis] Remove unused functions from TargetLibraryInfo
A simple clean-up to remove dead code.

Differential Revision: https://reviews.llvm.org/D95934
2021-02-08 09:50:36 +00:00
Kazu Hirata 28d3132089 [Analysis] Use range-based for loops (NFC) 2021-02-06 11:17:10 -08:00
Johannes Doerfert b7d870eae7 [AssumptionCache] Avoid dangling llvm.assume calls in the cache
PR49043 exposed a problem when it comes to RAUW llvm.assumes. While
D96106 would fix it for GVNSink, it seems a more general concern. To
avoid future problems this patch moves away from the vector of weak
reference model used in the assumption cache. Instead, we track the
llvm.assume calls with a callback handle which will remove itself from
the cache if the call is deleted.

Fixes PR49043.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96168
2021-02-06 12:18:39 -06:00
Akira Hatanaka 4a64d8fe39 [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.

Original commit message:

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 06:09:42 -08:00
Akira Hatanaka 2fbbb18c1d Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 3fe3946d9a.

The commit violates layering by including a header from Analysis in
lib/IR/AutoUpgrade.cpp.
2021-02-05 06:00:05 -08:00
Akira Hatanaka 3fe3946d9a [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 05:55:18 -08:00
David Green 502a67dd7f [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf working. ARM removed the fix in
dfac521da1, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-05 09:34:24 +00:00
Craig Topper 11ef356d9e [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
Nikita Popov be9889b350 [MemorySSA] Don't treat lifetime.end as NoAlias
MemorySSA currently treats lifetime.end intrinsics as not aliasing
anything. This breaks MemorySSA-based MemCpyOpt, because we'll happily
move a read of a pointer below a lifetime.end intrinsic, as no clobber
is reported.

I think the MemorySSA modelling here isn't correct: lifetime.end(p)
has approximately the same effect as doing a memcpy(p, undef), and
should be treated as a clobber.

This patch removes the special handling of lifetime.end, leaving
alias analysis to handle it appropriately.

Differential Revision: https://reviews.llvm.org/D95763
2021-02-04 20:58:28 +01:00
Gil Rapaport d475030dc2 [SCEV] Apply loop guards to divisibility tests
Extend applyLoopGuards() to take into account conditions/assumes proving some
value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the
loop unroller and the loop vectorizer identify more loops as not requiring
remainder loops.

Differential Revision: https://reviews.llvm.org/D95521
2021-02-02 08:09:39 +02:00
Kazu Hirata 7a37d981d9 [llvm] Use pop_back_val (NFC) 2021-02-01 20:55:05 -08:00
Fangrui Song 0fa61304d2 [LoopVectorize] Relax a FCmpInst assert to dyn_cast after D95690
The instruction may be `icmp eq i32`. Noticed in an internal Halide+wasm JIT test.
2021-02-01 19:28:45 -08:00
Sanjay Patel bbed5f2f8a [LoopVectorize] improve IR fast-math-flags propagation in reductions
This is another step (see D95452) towards correcting fast-math-flags
bugs in vector reductions.

There are multiple bugs visible in the test diffs, and this is still
not working as it should. We still use function attributes (rather
than FMF) to drive part of the logic, but we are not checking for
the correct FP function attributes.

Note that FMF may not be propagated optimally on selects (example
in https://llvm.org/PR35607 ). That's why I'm proposing to union the
FMF of a fcmp+select pair and avoid regressions on existing vectorizer
tests.

Differential Revision: https://reviews.llvm.org/D95690
2021-02-01 16:21:36 -05:00
Philip Reames 2a53d9a6e7 [Loads] Plumb through TLI argument [NFC]
This is a (rather delayed) follow up to commit 0129cd5.  This commit is entirely NFC, the semantic change to leverage the new information will be submitted separate with a test case.
2021-02-01 11:45:30 -08:00
Florian Hahn f1e8136115
[SCEV] Bail out if URem operand cannot be zero-extended.
In some cases, LHS is larger than the target expression type. Bail out
in that case for now, to avoid crashing
2021-02-01 13:50:54 +00:00
Kazu Hirata 6cedffc0ad [MustExecute] Use ListSeparator (NFC) 2021-01-28 22:21:16 -08:00
Max Kazantsev 8a4ad8849f [SCEV] Do not cache comparison result upon reached max depth as "equivalence". PR48725
We use `EquivalenceClasses` to cache the notion that two SCEVs are equivalent,
so save time in situation when `A` is equivalent to `B` and `B` is equivalent to `C`,
making check "if `A` is equivalent to `C`?" cheaper.

We also return `0` in the comparator when we reach max analysis depth to save
compile time. After doing this, we also cache them as being equivalent.

Now, imagine the following situation:
- `A` is proved equivalent to `B`;
- `C` is proved equivalent to `D`;
- Comparison of `A` against `D` is proved non-zero;
- Comparison of `B` against `C` reaches max depth (and gets cached as equivalence).

Now, before the invocation of compare(`B`, `C`), `A` and `D` belonged
to different equivalence classes, and their comparison returned non-zero.
After the the invocation of compare(`B`, `C`), equivalence classes get merged
and `A`, `B`, `C` and `D` all fall into the same equivalence class. So the comparator
will change its behavior for couple `A` and `D`, with weird consequences following it.
This comparator is finally used in `std::stable_sort`, and this behavior change
makes it crash (looks like it's causing a memory corruption).

Solution: this patch changes `CompareSCEVComplexity` to return `None`
when the max depth is reached. So in this case, we do not cache these SCEVs
(and their parents in the tree) as being equivalent.

Differential Revision: https://reviews.llvm.org/D94654
Reviewed By: lebedev.ri
2021-01-29 12:08:34 +07:00
Fangrui Song 54fb3ca96e [ThinLTO] Add Visibility bits to GlobalValueSummary::GVFlags
Imported functions and variable get the visibility from the module supplying the
definition.  However, non-imported definitions do not get the visibility from
(ELF) the most constraining visibility among all modules (Mach-O) the visibility
of the prevailing definition.

This patch

* adds visibility bits to GlobalValueSummary::GVFlags
* computes the result visibility and propagates it to all definitions

Protected/hidden can imply dso_local which can enable some optimizations (this
is stronger than GVFlags::DSOLocal because the implied dso_local can be
leveraged for ELF -shared while default visibility dso_local has to be cleared
for ELF -shared).

Note: we don't have summaries for declarations, so for ELF if a declaration has
the most constraining visibility, the result visibility may not be that one.

Differential Revision: https://reviews.llvm.org/D92900
2021-01-27 10:43:51 -08:00
Mindong Chen 00fcc03687 [SCEV] Fix incorrect loop exit count analysis.
In computeLoadConstantCompareExitLimit, the addrec used to compute the
exit count should be from the loop which the exiting block belongs to.

Reviewed by: mkazantsev

Differential Revision: https://reviews.llvm.org/D92367
2021-01-27 19:36:05 +08:00
Kazu Hirata 657f5b9743 [MemorySSA] Use ListSeparator (NFC) 2021-01-26 20:00:18 -08:00
modimo ce7f9cdb50 [InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks
This change leverages the work done in D83743 to replay in the SampleProfile inliner to also be used in the CGSCC inliner. NOTE: currently restricted to non-ML advisors only.

The added switch `-cgscc-inline-replay=<remarks file>` will replay the inlining decisions in that file where the remarks file is generated via `-Rpass=inline`. The aim here is to make it easier to analyze changes that would modify inlining heuristics to be separated from this behavior. Doing so allows easier examination of assembly and runtime behavior compared to the baseline rather than trying to dig through the large churn caused by inlining.

In LTO compilation, since inlining is done twice you can separately specify replay by passing the flag to the FE (`-cgscc-inline-replay=`) and to the linker (`-Wl,cgscc-inline-replay=`) with the remarks generated from their respective places.

Testing on mysqld by comparing the inline decisions between base (generates remarks.txt) and diff (replay using identical input/tools with remarks.txt) and examining the inlining sites with `diff` shows 14,000 mismatches out of 247,341 for a ~94% replay accuracy. I believe this gap can be narrowed further though for the general case we may never achieve full accuracy. For my personal use, this is close enough to be representative: I set the baseline as the one generated by the replay on identical input/toolset and compare that to my modified input/toolset using the same replay.

Testing:
ninja check-llvm
newly added test correctly replays CGSCC inlining decisions

Reviewed By: mtrofin, wenlei

Differential Revision: https://reviews.llvm.org/D94334
2021-01-25 15:38:57 -08:00
Kazu Hirata 16baad8f4e [llvm] Use pop_back_val (NFC) 2021-01-24 12:18:57 -08:00
Nikita Popov 5d12b976b0 [ValueTracking] Don't assume readonly function will return
This is similar to D94106, but for the
isGuaranteedToTransferExecutionToSuccessor() helper. We should not
assume that readonly functions will return, as this is only true for
mustprogress functions (in which case we already infer willreturn).
As with the DCE change, for now continue assuming that readonly
intrinsics will return, as not all target intrinsics have been
annotated yet.

Differential Revision: https://reviews.llvm.org/D95288
2021-01-24 10:40:21 +01:00
Kazu Hirata a3254904b2 [Analysis] Use llvm::append_range (NFC) 2021-01-22 23:25:01 -08:00
Shimin Cui 99a0aa07e9 [Analysis] Support AIX vec_malloc routines
This is to support the memory routines vec_malloc, vec_calloc, vec_realloc, and vec_free. These routines manage memory that is 16-byte aligned. And they are only available on AIX.

Differential Revision: https://reviews.llvm.org/D94710
2021-01-22 16:03:01 -05:00
Arthur Eubanks 6699029b67 [NewPM][opt] Run the "default" AA pipeline by default
We tend to assume that the AA pipeline is by default the default AA
pipeline and it's confusing when it's empty instead.

PR48779

Initially reverted due to BasicAA running analyses in an unspecified
order (multiple function calls as parameters), fixed by fetching
analyses before the call to construct BasicAA.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D95117
2021-01-21 21:08:54 -08:00
Arthur Eubanks a11bf9a7fb [AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook
Having a custom inliner doesn't really fit in with the new PM's
pipeline. It's also extra technical debt.

amdgpu-inline only does a couple of custom things compared to the normal
inliner:
1) It disables inlining if the number of BBs in a function would exceed
   some limit
2) It increases the threshold if there are pointers to private arrays(?)

These can all be handled as TTI inliner hooks.
There already exists a hook for backends to multiply the inlining
threshold.

This way we can remove the custom amdgpu-inline pass.

This caused inline-hint.ll to fail, and after some investigation, it
looks like getInliningThresholdMultiplier() was previously getting
applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it
not applying at all, so some later inliner change must have fixed
something), so I had to change the threshold in the test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D94153
2021-01-21 20:29:17 -08:00
Kazu Hirata cfa241680f [llvm] Don't include StringSwitch.h where unnecessary (NFC) 2021-01-21 19:59:48 -08:00
David Green 39db5753f9 [LV][ARM] Inloop reduction cost modelling
This adds cost modelling for the inloop vectorization added in
745bf6cf44. Up until now they have been modelled as the original
underlying instruction, usually an add. This happens to works OK for MVE
with instructions that are reducing into the same type as they are
working on. But MVE's instructions can perform the equivalent of an
extended MLA as a single instruction:

  %sa = sext <16 x i8> A to <16 x i32>
  %sb = sext <16 x i8> B to <16 x i32>
  %m = mul <16 x i32> %sa, %sb
  %r = vecreduce.add(%m)
  ->
  R = VMLADAV A, B

There are other instructions for performing add reductions of
v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64
(VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV).
The i64 are particularly interesting as there are no native i64 add/mul
instructions, leading to the i64 add and mul naturally getting very
high costs.

Also worth mentioning, under NEON there is the concept of a sdot/udot
instruction which performs a partial reduction from a v16i8 to a v4i32.
They extend and mul/sum the first four elements from the inputs into the
first element of the output, repeating for each of the four output
lanes. They could possibly be represented in the same way as above in
llvm, so long as a vecreduce.add could perform a partial reduction. The
vectorizer would then produce a combination of in and outer loop
reductions to efficiently use the sdot and udot instructions. Although
this patch does not do that yet, it does suggest that separating the
input reduction type from the produced result type is a useful concept
to model. It also shows that a MLA reduction as a single instruction is
fairly common.

This patch attempt to improve the costmodelling of in-loop reductions
by:
 - Adding some pattern matching in the loop vectorizer cost model to
   match extended reduction patterns that are optionally extended and/or
   MLA patterns. This marks the cost of the reduction instruction correctly
   and the sext/zext/mul leading up to it as free, which is otherwise
   difficult to tell and may get a very high cost. (In the long run this
   can hopefully be replaced by vplan producing a single node and costing
   it correctly, but that is not yet something that vplan can do).
 - getExtendedAddReductionCost is added to query the cost of these
   extended reduction patterns.
 - Expanded the ARM costs to account for these expanded sizes, which is a
   fairly simple change in itself.
 - Some minor alterations to allow inloop reduction larger than the highest
   vector width and i64 MVE reductions.
 - An extra InLoopReductionImmediateChains map was added to the vectorizer
   for it to efficiently detect which instructions are reductions in the
   cost model.
 - The tests have some updates to show what I believe is optimal
   vectorization and where we are now.

Put together this can greatly improve performance for reduction loop
under MVE.

Differential Revision: https://reviews.llvm.org/D93476
2021-01-21 21:03:41 +00:00
Kazu Hirata 8f5da41c4d [llvm] Construct SmallVector with iterator ranges (NFC) 2021-01-20 21:35:52 -08:00
Mircea Trofin ccec2cf1d9 Reland "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor"
This reverts commit d97f776be5.

The original problem was due to build failures in shared lib builds. D95079
moved ImportedFunctionsInliningStatistics under Analysis, unblocking
this.
2021-01-20 13:33:43 -08:00
Mircea Trofin 95ce32c787 [NFC] Move ImportedFunctionsInliningStatistics to Analysis
This is related to D94982. We want to call these APIs from the Analysis
component, so we can't leave them under Transforms.

Differential Revision: https://reviews.llvm.org/D95079
2021-01-20 13:18:03 -08:00
Mircea Trofin d97f776be5 Revert "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor"
This reverts commit e8aec763a5.
2021-01-20 11:19:34 -08:00
Mircea Trofin e8aec763a5 [NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor
When using 2 InlinePass instances in the same CGSCC - one for other
mandatory inlinings, the other for the heuristic-driven ones - the order
in which the ImportedFunctionStats would be output-ed would depend on
the destruction order of the inline passes, which is not deterministic.

This patch moves the ImportedFunctionStats responsibility to the
InlineAdvisor to address this problem.

Differential Revision: https://reviews.llvm.org/D94982
2021-01-20 11:07:36 -08:00
Kazu Hirata b023cdeacc [llvm] Use llvm::all_of (NFC) 2021-01-19 20:19:17 -08:00
Juneyoung Lee 4479c0c2c0 Allow nonnull/align attribute to accept poison
Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison.
To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null.
This makes many transformations like below legal:

```
%p = gep inbounds %x, 1 ; % p is non-null pointer or poison
call void @f(%p)        ; instcombine converts this to call void @f(nonnull %p)
```

Instead, this semantics makes propagation of `nonnull` to caller illegal.
The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument.
Having `noundef` attribute there re-allows this.

```
define void @f(i8* %p) {       ; functionattr cannot mark %p nonnull here anymore
  call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p.
  ret void
}
```

Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D90529
2021-01-20 11:31:23 +09:00
Jeroen Dobbelaere 121cac01e8 [noalias.decl] Look through llvm.experimental.noalias.scope.decl
Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93042
2021-01-19 20:09:42 +01:00
Nikita Popov 051ec9f5f4 [ValueTracking] Strengthen impliesPoison reasoning
Split impliesPoison into two recursive walks, one over V, the
other over ValAssumedPoison. This allows us to reason about poison
implications in a number of additional cases that are important
in practice. This is a generalized form of D94859, which handles
the cmp to cmp implication in particular.

Differential Revision: https://reviews.llvm.org/D94866
2021-01-19 18:04:23 +01:00
Florian Hahn 3747b69b53
[LoopRotate] Calls not lowered to calls should not block rotation.
83daa49758 made loop-rotate more conservative in the presence of
function calls in the prepare-for-lto stage. The code did not properly
account for calls that are no actual function calls, like calls to
intrinsics. This patch updates the code to ensure only calls that are
lowered to actual calls are considered inline candidates.
2021-01-19 14:37:36 +00:00
Florian Hahn 83daa49758
[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands.
D84108 exposed a bad interaction between inlining and loop-rotation
during regular LTO, which is causing notable regressions in at least
CINT2006/473.astar.

The problem boils down to: we now rotate a loop just before the vectorizer
which requires duplicating a function call in the preheader when compiling
the individual files ('prepare for LTO'). But this then prevents further
inlining of the function during LTO.

This patch tries to resolve this issue by making LoopRotate more
conservative with respect to rotating loops that have inline-able calls
during the 'prepare for LTO' stage.

I think this change intuitively improves the current situation in
general. Loop-rotate tries hard to avoid creating headers that are 'too
big'. At the moment, it assumes all inlining already happened and the
cost of duplicating a call is equal to just doing the call. But with LTO,
inlining also happens during full LTO and it is possible that a previously
duplicated call is actually a huge function which gets inlined
during LTO.

From the perspective of LV, not much should change overall. Most loops
calling user-provided functions won't get vectorized to start with
(unless we can infer that the function does not touch memory, has no
other side effects). If we do not inline the 'inline-able' call during
the LTO stage, we merely delayed loop-rotation & vectorization. If we
inline during LTO, chances should be very high that the inlined code is
itself vectorizable or the user call was not vectorizable to start with.

There could of course be scenarios where we inline a sufficiently large
function with code not profitable to vectorize, which would have be
vectorized earlier (by scalarzing the call). But even in that case,
there probably is no big performance impact, because it should be mostly
down to the cost-model to reject vectorization in that case. And then
the version with scalarized calls should also not be beneficial. In a way,
LV should have strictly more information after inlining and make more
accurate decisions (barring cost-model issues).

There is of course plenty of room for things to go wrong unexpectedly,
so we need to keep a close look at actual performance and address any
follow-up issues.

I took a look at the impact on statistics for
MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer
loops rotated, but no change to the number of loops vectorized.

Reviewed By: sanwou01

Differential Revision: https://reviews.llvm.org/D94232
2021-01-19 10:15:29 +00:00
Juneyoung Lee 0441df94ad [InstCombine,InstSimplify] Optimize select followed by and/or/xor
This patch adds `A & (A && B)` -> `A && B`  (similarly for or + logical or)

Also, this patch adds `~(select C, (icmp pred X, Y), const)` -> `select C, (icmp pred' X, Y), ~const`.

Alive2 proof:
merge_and: https://alive2.llvm.org/ce/z/teMR97
merge_or: https://alive2.llvm.org/ce/z/b4yZUp
xor_and: https://alive2.llvm.org/ce/z/_-TXHi
xor_or: https://alive2.llvm.org/ce/z/2uYx_a

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94861
2021-01-19 09:14:17 +09:00
Kazu Hirata 23b0ab2acb [llvm] Use the default value of drop_begin (NFC) 2021-01-18 10:16:36 -08:00
Nikita Popov 4229b87ed3 [ValueTracking] Fix isSafeToSpeculativelyExecute for sdiv (PR48778)
The != -1 check does not work correctly for all bitwidths. Use
isAllOnesValue() instead.
2021-01-17 20:06:17 +01:00
Nikita Popov a13c0f62c3 [InstSimplify] Fold x*C1/C2 <= x (PR48744)
We can fold x*C1/C2 <= x to true if C1 <= C2. This is valid even
if the multiplication is not nuw: https://alive2.llvm.org/ce/z/vULors

The multiplication or division can be replaced by shifts. We don't
handle the case where both are shifts, as that should get folded
away by InstCombine.
2021-01-17 16:02:55 +01:00
Nikita Popov 0b84afa5fc Reapply [BasicAA] Handle recursive queries more efficiently
There are no changes relative to the original commit. However, an issue
this exposed in BasicAA assumption tracking has been fixed in the
previous commit.

-----

An alias query currently works out roughly like this:

 * Look up location pair in cache.
 * Perform BasicAA logic (including cache lookup and insertion...)
 * Perform a recursive query using BestAAResults.
   * Look up location pair in cache (and thus do not recurse into BasicAA)
   * Query all the other AA providers.
 * Query all the other AA providers.

This is a lot of unnecessary work, all ultimately caused by the
BestAAResults query at the end of aliasCheck(). The reason we perform
it, is that aliasCheck() is getting called recursively, and we of
course want those recursive queries to also make use of other AA
providers, not just BasicAA. We can solve this by making the recursive
queries directly use BestAAResults (which will check both BasicAA
and other providers), rather than recursing into aliasCheck().

There are some tradeoffs:

 * We can no longer pass through the precomputed underlying object
   to aliasCheck(). This is not a major concern, because nowadays
   getUnderlyingObject() is quite cheap.
 * Results from other AA providers are no longer cached inside
   BasicAA. The way this worked was already a bit iffy, in that a
   result could be cached, but if it was MayAlias, we'd still end
   up re-querying other providers anyway. If we want to cache
   non-BasicAA results, we should do that in a more principled manner.

In any case, despite those tradeoffs, this works out to be a decent
compile-time improvment. I think it also simplifies the mental model
of how BasicAA works. It took me quite a while to fully understand
how these things interact.

Differential Revision: https://reviews.llvm.org/D90094
2021-01-17 10:34:35 +01:00
Nikita Popov b1c2f1282a [BasicAA] Move assumption tracking into AAQI
D91936 placed the tracking for the assumptions into BasicAA.
However, when recursing over phis, we may use fresh AAQI instances.
In this case AssumptionBasedResults from an inner AAQI can reesult
in a removal of an element from the outer AAQI.

To avoid this, move the tracking into AAQI. This generally makes
more sense, as the NoAlias assumptions themselves are also stored
in AAQI.

The test case only produces an assertion failure with D90094
reapplied. I think the issue exists independently of that change
as well, but I wasn't able to come up with a reproducer.
2021-01-17 10:34:35 +01:00
Dávid Bolvanský bfd75bdf3f [NFC] Removed extra text in comments 2021-01-16 22:48:56 +01:00
Dávid Bolvanský 63bedc80da [InstSimplify] Handle commutativity for 'and' and 'outer or' for (~A & B) | ~(A | B) --> ~A
Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D94870
2021-01-16 19:42:50 +01:00
Kazu Hirata 2082b10d10 [llvm] Use *::empty (NFC) 2021-01-16 09:40:55 -08:00
Kazu Hirata 19aacdb715 [llvm] Construct SmallVector with iterator ranges (NFC) 2021-01-16 09:40:53 -08:00
Dávid Bolvanský bdd4dda58b [InstSimplify] Update comments, remove redundant tests 2021-01-16 16:31:23 +01:00
Dávid Bolvanský a4e2a5145a [InstSimplify] Add (~A & B) | ~(A | B) --> ~A 2021-01-16 15:43:34 +01:00
Mircea Trofin e8049dc3c8 [NewPM][Inliner] Move the 'always inliner' case in the same CGSCC pass as 'regular' inliner
Expanding from D94808 - we ensure the same InlineAdvisor is used by both
InlinerPass instances. The notion of mandatory inlining is moved into
the core InlineAdvisor: advisors anyway have to handle that case, so
this change also factors out that a bit better.

Differential Revision: https://reviews.llvm.org/D94825
2021-01-15 17:59:38 -08:00
Reid Kleckner 64db296e5a Revert "[BasicAA] Handle recursive queries more efficiently"
This reverts commit a3904cc77f.
It causes the compiler to crash while building Harfbuzz for ARM in
Chromium, reduced reproducer forthcoming:
https://crbug.com/1167305
2021-01-15 12:29:57 -08:00
Kazu Hirata 2efcbe24a7 [llvm] Use llvm::drop_begin (NFC) 2021-01-14 20:30:33 -08:00
Nikita Popov a3904cc77f [BasicAA] Handle recursive queries more efficiently
An alias query currently works out roughly like this:

 * Look up location pair in cache.
 * Perform BasicAA logic (including cache lookup and insertion...)
 * Perform a recursive query using BestAAResults.
   * Look up location pair in cache (and thus do not recurse into BasicAA)
   * Query all the other AA providers.
 * Query all the other AA providers.

This is a lot of unnecessary work, all ultimately caused by the
BestAAResults query at the end of aliasCheck(). The reason we perform
it, is that aliasCheck() is getting called recursively, and we of
course want those recursive queries to also make use of other AA
providers, not just BasicAA. We can solve this by making the recursive
queries directly use BestAAResults (which will check both BasicAA
and other providers), rather than recursing into aliasCheck().

There are some tradeoffs:

 * We can no longer pass through the precomputed underlying object
   to aliasCheck(). This is not a major concern, because nowadays
   getUnderlyingObject() is quite cheap.
 * Results from other AA providers are no longer cached inside
   BasicAA. The way this worked was already a bit iffy, in that a
   result could be cached, but if it was MayAlias, we'd still end
   up re-querying other providers anyway. If we want to cache
   non-BasicAA results, we should do that in a more principled manner.

In any case, despite those tradeoffs, this works out to be a decent
compile-time improvment. I think it also simplifies the mental model
of how BasicAA works. It took me quite a while to fully understand
how these things interact.

Differential Revision: https://reviews.llvm.org/D90094
2021-01-14 20:32:41 +01:00
Jay Foad 517196e569 [Analysis,CodeGen] Make use of KnownBits::makeConstant. NFC.
Differential Revision: https://reviews.llvm.org/D94588
2021-01-14 14:02:43 +00:00
Kazu Hirata 5c1c39e8d8 [llvm] Use *Set::contains (NFC) 2021-01-13 19:14:41 -08:00
Markus Lavin f8cece1863 [ValueTracking] Fix one s/dyn_cast/dyn_cast_or_null/
Handle if Constant::getAggregateElement() returns nullptr in
canCreateUndefOrPoison().

Differential Revision: https://reviews.llvm.org/D94494
2021-01-13 13:39:53 +01:00
Kazu Hirata 8a20e2b3d3 [llvm] Use Optional::getValueOr (NFC) 2021-01-12 21:43:50 -08:00
Kazu Hirata 12fc9ca3a4 [llvm] Remove redundant string initialization (NFC)
Identified with readability-redundant-string-init.
2021-01-12 21:43:46 -08:00
modimo 2a49b7c64a [Inliner] Change inline remark format and update ReplayInlineAdvisor to use it
This change modifies the source location formatting from:
LineNumber.Discriminator
to:
LineNumber:ColumnNumber.Discriminator

The motivation here is to enhance location information for inline replay that currently exists for the SampleProfile inliner. This will be leveraged further in inline replay for the CGSCC inliner in the related diff.

The ReplayInlineAdvisor is also modified to read the new format and now takes into account the callee for greater accuracy.

Testing:
ninja check-llvm

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D94333
2021-01-12 13:43:48 -08:00
Nikita Popov 7ecad2e4ce [InstSimplify] Don't fold gep p, -p to null
This is a partial fix for https://bugs.llvm.org/show_bug.cgi?id=44403.
Folding gep p, q-p to q is only legal if p and q have the same
provenance. This fold should probably be guarded by something like
getUnderlyingObject(p) == getUnderlyingObject(q).

This patch is a partial fix that removes the special handling for
gep p, 0-p, which will fold to a null pointer, which would certainly
not pass an underlying object check (unless p is also null, in which
case this would fold trivially anyway). Folding to a null pointer
is particularly problematic due to the special handling it receives
in many places, making end-to-end miscompiles more likely.

Differential Revision: https://reviews.llvm.org/D93820
2021-01-12 20:24:23 +01:00
Bjorn Pettersson 675be65106 Require chained analyses in BasicAA and AAResults to be transitive
This patch fixes a bug that could result in miscompiles (at least
in an OOT target). The problem could be seen by adding checks that
the DominatorTree used in BasicAliasAnalysis and ValueTracking was
valid (e.g. by adding DT->verify() call before every DT dereference
and then running all tests in test/CodeGen).

Problem was that the LegacyPassManager calculated "last user"
incorrectly for passes such as the DominatorTree when not telling
the pass manager that there was a transitive dependency between
the different analyses. And then it could happen that an incorrect
dominator tree was used when doing alias analysis (which was a pretty
serious bug as the alias analysis result could be invalid).

Fixes: https://bugs.llvm.org/show_bug.cgi?id=48709

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94138
2021-01-11 11:50:07 +01:00
Kazu Hirata e3d3dbd339 [llvm] Ensure newlines at the end of files (NFC)
This patch eliminates pesky "No newline at end of file" messages from
git diff.
2021-01-10 09:24:57 -08:00
Kazu Hirata 1d10a1d5b1 [MemorySSA] Remove unused dominatesUse (NFC)
The function was introduced without a use on Feb 2, 2016 in commit
e1100f533f.
2021-01-10 09:24:55 -08:00
Nikita Popov 1ecae1e62a [ConstantFold] Fold fptoi.sat intrinsics
The APFloat::convertToInteger() API already implements the desired
saturation semantics.
2021-01-10 17:37:27 +01:00
Florian Hahn c701f85c45
[STLExtras] Use return type from operator* of the wrapped iter.
Currently make_early_inc_range cannot be used with iterators with
operator* implementations that do not return a reference.

Most notably in the LLVM codebase, this means the User iterator ranges
cannot be used with make_early_inc_range, which slightly simplifies
iterating over ranges while elements are removed.

Instead of directly using BaseT::reference as return type of operator*,
this patch uses decltype to get the actual return type of the operator*
implementation in WrappedIteratorT.

This patch also updates a few places to use make use of
make_early_inc_range.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D93992
2021-01-10 14:41:13 +00:00
Kazu Hirata 6a6e382161 [llvm] Drop unnecessary make_range (NFC) 2021-01-09 09:25:00 -08:00
Kazu Hirata b7c5e0b02c [Target, Transforms] Use *Set::contains (NFC) 2021-01-08 18:39:54 -08:00
David Green e185b1dd7b [ConstProp] Constant propagation for get.active.lane.mask instrinsics
Similar to the Arm VCTP intrinsics, if the operands of an
active.lane.mask are both known, the constant lane mask can be
calculated. This can come up after unrolling the loops.

Differential Revision: https://reviews.llvm.org/D94103
2021-01-08 16:10:01 +00:00
Bardia Mahjour ebfe4de2c0 [DDG] Fix duplicate edge removal during pi-block formation
When creating pi-blocks we try to avoid creating duplicate edges
between outside nodes and the pi-block when an edge is of the
same kind and direction as another one that has already been
created. We do this by keeping track of the edges in an
enumerated array called EdgeAlreadyCreated. The problem is that
this array is declared local to the loop that iterates over the
nodes in the pi-block, so the information gets lost every time a
new inside-node is iterated over. The fix is to move the
declaration to the outer loop.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D94094
2021-01-07 10:31:11 -05:00
Simon Pilgrim fa6d897799 [Analysis] MemoryDepChecker::couldPreventStoreLoadForward - remove dead store. NFCI.
As we're breaking from the loop when clamping MaxVF, clang static analyzer was warning that the VF iterator was being updated and never used.
2021-01-07 14:21:54 +00:00
Kazu Hirata cfeecdf7b6 [llvm] Use llvm::all_of (NFC) 2021-01-06 18:27:36 -08:00
Juneyoung Lee 3a60a1f165 [InstSimplify] Fold insertelement vec, poison, idx into vec
This is a simple patch that adds folding from `insertelement vec, poison, idx` into `vec`.

Alive2 proof: https://alive2.llvm.org/ce/z/2y2vbC

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93994
2021-01-07 10:10:14 +09:00
Alina Sbirlea 63aeaf754a [DominatorTree] Add support for mixed pre/post CFG views.
Add support for mixed pre/post CFG views.

Update usages of the MemorySSAUpdater to use the new DT API by
requesting the DT updates to be done by the MSSAUpdater.

Differential Revision: https://reviews.llvm.org/D93371
2021-01-06 14:53:09 -08:00
Nikita Popov f6f6f6375d [BasicAA] Fix BatchAA results for phi-phi assumptions
Change the way NoAlias assumptions in BasicAA are handled. Instead of
handling this inside the phi-phi code, always initially insert a
NoAlias result into the map and keep track whether it is used.
If it is used, then we require that we also get back NoAlias from
the recursive queries. Otherwise, the entry is changed to MayAlias.

Additionally, keep track of all location pairs we inserted that may
still be based on assumptions higher up. If it turns out one of those
assumptions is incorrect, we flush them from the cache.

The compile-time impact for the new implementation is significantly
higher than the previous iteration of this patch:
https://llvm-compile-time-tracker.com/compare.php?from=c0bb9859de6991cc233e2dedb978dd118da8c382&to=c07112373279143e37568b5bcd293daf81a35973&stat=instructions
However, it should avoid the exponential runtime cases we run into
if we don't cache assumption-based results entirely.

This also produces better results in some cases, because NoAlias
assumptions can now start at any root, rather than just phi-phi pairs.
This is not just relevant for analysis quality, but also for BatchAA
consistency: Otherwise, results would once again depend on query order,
though at least they wouldn't be wrong.

This ended up both more complicated and more expensive than I hoped,
but I wasn't able to come up with another solution that satisfies all
the constraints.

Differential Revision: https://reviews.llvm.org/D91936
2021-01-06 22:15:30 +01:00
Nikita Popov 221c3b174b [InstSimplify] Canonicalize non-demanded shuffle op to poison (NFCI)
I don't believe this has an observable effect, because the only
thing we care about here is replacing the operand with a constant
so following folds can apply. This change is just to make the
representation follow canonical unary shuffle form.
2021-01-06 21:22:27 +01:00
Nikita Popov d042f2db5b [InstSimplify] Fold call null/undef to poison
Calling null or undef results in immediate undefined behavior.
Return poison instead of undef in this case, similar to what
we do for immediate UB due to division by zero.
2021-01-06 21:09:30 +01:00
Arthur Eubanks 54c01057b6 Fix non-assert builds after D93828 2021-01-06 11:42:03 -08:00
Nikita Popov a6df39236f [InstSimplify] Fold out-of-bounds shift to poison
Make InstSimplify return poison rather than undef for out-of-bounds
shifts, as specified by LandRef:

> If op2 is (statically or dynamically) equal to or larger than the
> number of bits in op1, this instruction returns a poison value.

Differential Revision: https://reviews.llvm.org/D93998
2021-01-06 20:41:37 +01:00
Arthur Eubanks 7fea561eb1 [CGSCC][Coroutine][NewPM] Properly support function splitting/outlining
Previously when trying to support CoroSplit's function splitting, we
added in a hack that simply added the new function's node into the
original function's SCC (https://reviews.llvm.org/D87798). This is
incorrect since it might be in its own SCC.

Now, more similar to the previous design, we have callers explicitly
notify the LazyCallGraph that a function has been split out from another
one.

In order to properly support CoroSplit, there are two ways functions can
be split out.

One is the normal expected "outlining" of one function into a new one.
The new function may only contain references to other functions that the
original did. The original function must reference the new function. The
new function may reference the original function, which can result in
the new function being in the same SCC as the original function. The
weird case is when the original function indirectly references the new
function, but the new function directly calls the original function,
resulting in the new SCC being a parent of the original function's SCC.
This form of function splitting works with CoroSplit's Switch ABI.

The second way of splitting is more specific to CoroSplit. CoroSplit's
Retcon and Async ABIs split the original function into multiple
functions that all reference each other and are referenced by the
original function. In order to keep the LazyCallGraph in a valid state,
all new functions must be processed together, else some nodes won't be
populated. To keep things simple, this only supports the case where all
new edges are ref edges, and every new function references every other
new function. There can be a reference back from any new function to the
original function, putting all functions in the same RefSCC.

This also adds asserts that all nodes in a (Ref)SCC can reach all other
nodes to prevent future incorrect hacks.

The original hacks in https://reviews.llvm.org/D87798 are no longer
necessary since all new functions should have been registered before
calling updateCGAndAnalysisManagerForPass.

This fixes all coroutine tests when opt's -enable-new-pm is true by
default. This also fixes PR48190, which was likely due to the previous
hack breaking SCC invariants.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D93828
2021-01-06 11:19:15 -08:00
Kazu Hirata cd088ba7e6 [llvm] Use llvm::lower_bound and llvm::upper_bound (NFC) 2021-01-05 21:15:59 -08:00
Juneyoung Lee 29f8628d1f [Constant] Add containsPoisonElement
This patch

- Adds containsPoisonElement that checks existence of poison in constant vector elements,
- Renames containsUndefElement to containsUndefOrPoisonElement to clarify its behavior & updates its uses properly

With this patch, isGuaranteedNotToBeUndefOrPoison's tests w.r.t constant vectors are added because its analysis is improved.

Thanks!

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94053
2021-01-06 12:10:33 +09:00
Kazu Hirata 65cd3cbb3f [Inliner] Compute the full cost for the cost benefit analsysis
This patch teaches the inliner to compute the full cost for a call
site where the newly introduced cost benefit analysis is enabled.

Note that the cost benefit analysis requires the full cost to be
computed.  However, without this patch or the -inline-cost-full
option, the early termination logic would kick in when the cost
exceeds the threshold, so we don't get to perform the cost benefit
analysis.  For this reason, we would need to specify four clang
options:

  -mllvm -inline-cost-full
  -mllvm -inline-enable-cost-benefit-analysis

This patch eliminates the need to specify -inline-cost-full.

Differential Revision: https://reviews.llvm.org/D93658
2021-01-05 12:48:49 -08:00
Whitney Tsang 314ccc0013 [LoopNest] Remove unused include.
Differential Revision: https://reviews.llvm.org/D93665
2021-01-05 20:05:31 +00:00
Whitney Tsang c005518936 [LoopNest] Allow empty basic blocks without loops
Allow loop nests with empty basic blocks without loops in different
levels as perfect.

Reviewers: Meinersbur

Differential Revision: https://reviews.llvm.org/D93665
2021-01-05 15:09:38 +00:00
Xun Li 3e2b42489f Remove RefSCC::handleTrivialEdgeInsertion
This function no longer does anything useful. It probably did something originally but latter changes removed them and didn't clean up this function.
The checks are already done in the callers as well.

Differential Revision: https://reviews.llvm.org/D94055
2021-01-04 20:21:01 -08:00
Juneyoung Lee f665a8c5b8 [InstSimplify] gep with poison operand is poison
This is a tiny update to fold gep poison into poison. :)

Alive2 proofs:
https://alive2.llvm.org/ce/z/7Nwdri
https://alive2.llvm.org/ce/z/sDP4sC
2021-01-05 11:07:49 +09:00
Sanjay Patel 36263a7ccc [LoopUtils] remove redundant opcode parameter; NFC
While here, rename the inaccurate getRecurrenceBinOp()
because that was also used to get CmpInst opcodes.

The recurrence/reduction kind should always refer to the
expected opcode for a reduction. SLP appears to be the
only direct caller of createSimpleTargetReduction(), and
that calling code ideally should not be carrying around
both an opcode and a reduction kind.

This should allow us to generalize reduction matching to
use intrinsics instead of only binops.
2021-01-04 17:05:28 -05:00
Juneyoung Lee abbef2fd46 [ValueTracking] isGuaranteedNotToBePoison should return true on undef
This is a one-line fix to isGuaranteedNotToBePoison to return true if
undef is given.
2021-01-05 06:50:02 +09:00
Whitney Tsang de6d43f16c Revert "[LoopNest] Allow empty basic blocks without loops"
This reverts commit 9a17bff4f7.
2021-01-04 20:42:21 +00:00
Whitney Tsang 9a17bff4f7 [LoopNest] Allow empty basic blocks without loops
Allow loop nests with empty basic blocks without loops in different
levels as perfect.

Reviewers: Meinersbur

Differential Revision: https://reviews.llvm.org/D93665
2021-01-04 19:59:50 +00:00
Kazu Hirata 848e8f938f [llvm] Construct SmallVector with iterator ranges (NFC) 2021-01-04 11:42:44 -08:00
Caroline Concatto 060cfd9795 [AArch64][SVE]Add cost model for masked gather and scatter for scalable vector.
A new TTI interface has been added 'Optional <unsigned>getMaxVScale' that
    returns the maximum vscale for a given target.
    When known getMaxVScale is used to compute the cost of masked gather scatter
    for scalable vector.

    Depends on D92094

    Differential Revision: https://reviews.llvm.org/D93030
2021-01-04 13:59:58 +00:00
Nikita Popov 3715c99be9 [InstSimplify] Fold nnan/ninf violation to poison
As the comment already indicates, performing an operation with
nnan/ninf flags on a nan/inf or undef results in poison. Now that
we have a proper poison value, we no longer need to relax it to
undef.
2021-01-03 22:05:40 +01:00
Nikita Popov 766cf7f32e [InstSimplify] Fold division by zero to poison
Div/rem by zero is immediate undefined behavior and anything goes.
Currently we fold it to undef, this patch changes it to fold to
poison instead, which is slightly stronger.

Differential Revision: https://reviews.llvm.org/D93995
2021-01-03 20:52:45 +01:00
Kazu Hirata ba82c0b315 [llvm] Call *(Set|Map)::erase directly (NFC)
We can erase an item in a set or map without checking its membership
first.
2021-01-03 09:57:47 -08:00
Nikita Popov f094d65bea [InstSimplify] Fix addo/subo with undef (PR43188)
We can't fold the first result to undef, because not all values
may be reachable under the constraint that no overflow occurred.
Use the same folds we do for saturated math instead.

Proofs:
uaddo: https://alive2.llvm.org/ce/z/zf55N_
saddo: https://alive2.llvm.org/ce/z/a_xPgS
usubo: https://alive2.llvm.org/ce/z/DmRqwt
ssubo: https://alive2.llvm.org/ce/z/8ag7U-
2021-01-03 18:51:49 +01:00
Nikita Popov c6ad00d709 [InstSimplify] Return poison for out of bounds extractelement
This is the same change as D93990, but for extractelement rather
than insertelement.

> If idx exceeds the length of val for a fixed-length vector, the
> result is a poison value. For a scalable vector, if the value of
> idx exceeds the runtime length of the vector, the result is a
> poison value.
2021-01-03 18:15:58 +01:00
Juneyoung Lee 2139958b53 [InstSimplify] Return poison if insertelement touches out of bounds
This is a simple patch that updates InstSimplify to return poison if the index is/can be out-of-bounds

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93990
2021-01-04 00:43:02 +09:00
Gil Rapaport d9c0b128e3 [SCEV] Simplify trunc to zero based on known bits
Let getTruncateExpr() short-circuit to zero when the value being truncated is
known to have at least as many trailing zeros as the target type.

Differential Revision: https://reviews.llvm.org/D93973
2021-01-03 13:57:12 +02:00
Sanjay Patel c74e8539ff [Analysis] flatten enums for recurrence types
This is almost all mechanical search-and-replace and
no-functional-change-intended (NFC). Having a single
enum makes it easier to match/reason about the
reduction cases.

The goal is to remove `Opcode` from reduction matching
code in the vectorizers because that makes it harder to
adapt the code to handle intrinsics.

The code in RecurrenceDescriptor::AddReductionVar() is
the only place that required closer inspection. It uses
a RecurrenceDescriptor and a second InstDesc to sometimes
overwrite part of the struct. It seem like we should be
able to simplify that logic, but it's not clear exactly
which cmp+sel patterns that we are trying to handle/avoid.
2021-01-01 12:20:16 -05:00
Nikita Popov 14e540febc [LVI] Handle unions of conditions
LVI previously handled "if (L && R)" conditions, but not
"if (L || R)" conditions. The latter case can still produce
useful information if L and R both constrain the same variable.

This adds support for handling the "if (L || R)" case as well.
The only difference is that we take the union instead of the
intersection of the lattice values.
2021-01-01 16:46:21 +01:00
Andrew Litteken 0d21e66014 [IRSim] Letting call instructions be legal for similarity identification.
Here we let non-intrinsic calls be considered legal and valid for
similarity only if the call is not indirect, and has a name.

For two calls to be considered similar, they must have the same name,
the same function types, and the same set of parameters, including tail
calls and calling conventions.

Tests are found in unittests/Analysis/IRSimilarityIdentifierTest.cpp.

Reviewers: jroelofs, paquette

Differential Revision: https://reviews.llvm.org/D87312
2020-12-31 20:52:45 -06:00
Sanjay Patel eaab71106b [Analysis] reduce code for matching min/max; NFC
This might also make it easier to adapt if we want
to match min/max intrinsics rather than cmp+sel idioms.

The 'const' part is to potentially avoid confusion
in calling code. There's some surprising and possibly
wrong behavior related to matching min/max reductions
differently than other reductions.
2020-12-31 17:19:37 -05:00
Andrew Litteken d974ac0224 [IRSim] Letting gep instructions be legal for similarity identification.
GetElementPtr instructions require the extra check that all operands
after the first must only be constants and be exactly the same to be
considered similar.

Tests are found in unittests/Analysis/IRSimilarityIdentifierTest.cpp.
2020-12-31 14:41:14 -06:00
Juneyoung Lee 509fa8e02e [SCEV] recognize logical and/or pattern
This patch makes SCEV recognize 'select A, B, false' and 'select A, true, B'.
This is a performance improvement that will be helpful after unsound select -> and/or transformation is removed, as discussed in D93065.

SCEV's answers for the select form should be a bit more conservative than the equivalent `and A, B` / `or A, B`.
Take this example: https://alive2.llvm.org/ce/z/NsP9ue .
To check whether it is valid for SCEV's computeExitLimit to return min(n, m) as ExactNotTaken value, I put llvm.assume at tgt.
It fails because the exit limit becomes poison if n is zero and m is poison. This is problematic if e.g. the exit value of i is replaced with min(n, m).
If either n or m is constant, we can revive the analysis again. I added relevant tests and put alive2 links there.

If and is used instead, this is okay: https://alive2.llvm.org/ce/z/K9rbJk . Hence the existing analysis is sound.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93882
2021-01-01 04:37:57 +09:00
Kazu Hirata b557c32ae9 [MemorySSA, BPF] Use isa instead of dyn_cast (NFC) 2020-12-31 09:39:13 -08:00
Kazu Hirata a87c7003ac [Analysis] Remove unused code recursivelySimplifyInstruction (NFC)
The last use of the function, located in RemovePredecessorAndSimplify,
was removed on Dec 25, 2020 in commit
46bea9b297.

The last use of RemovePredecessorAndSimplify was removed on Sep 29,
2010 in commit 99c985c37d.
2020-12-30 17:45:40 -08:00
Juneyoung Lee 9b29610228 Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923
2020-12-30 22:36:08 +09:00
Luo, Yuanke 981a0bd858 [X86] Add x86_amx type for intel AMX.
The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when
it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it
is used by load/store instruction. So amx intrinsics only operate on type x86_amx.
It can help to separate amx intrinsics from llvm IR instructions (+-*/).
Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981.

Differential Revision: https://reviews.llvm.org/D91927
2020-12-30 13:52:13 +08:00
Kazu Hirata f76e83bfbb [Analysis] Use llvm::append_range (NFC) 2020-12-29 19:23:21 -08:00
Florian Hahn b980bed34b
[MSSAUpdater] Skip renaming when inserting def in unreachable block.
This fixes a updater crash when moving memory defs between unreachable
blocks.

Fixes PR48616.
2020-12-29 18:22:12 +00:00
Kazu Hirata 2883cd98f3 [CFGPrinter] Use succ_empty (NFC) 2020-12-28 19:55:20 -08:00
Juneyoung Lee 0f2c180163 [ValueTracking] Implement impliesPoison
This PR adds impliesPoison(ValAssumedPoison, V) that returns true if V is
poison under the assumption that ValAssumedPoison is poison.

For example, impliesPoison('icmp X, 10', 'icmp X, Y') return true because
'icmp X, Y' is poison if 'icmp X, 10' is poison.

impliesPoison can be used for sound optimization of select, as discussed in
D77868.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D78152
2020-12-29 06:50:38 +09:00
Sanjay Patel 236c4524a7 [InstSimplify] remove ctpop of 1 (low) bit
https://llvm.org/PR48608

As noted in the test comment, we could handle a more general
case in instcombine and remove this, but I don't have evidence
that we need to do that.

https://alive2.llvm.org/ce/z/MRW9gD
2020-12-28 16:06:20 -05:00
Nikita Popov dcd21572f9 [ValueTracking] Fix isKnownNonEqual() with constexpr mul
Confusingly, BinaryOperator is not an Operator,
OverflowingBinaryOperator is... We were implicitly assuming that
the multiply is an Instruction here.

This fixes the assertion failure reported in
https://reviews.llvm.org/D92726#2472827.
2020-12-28 18:32:57 +01:00
Juneyoung Lee 860199dfbe [ValueTracking] Use m_LogicalAnd/Or to look into conditions
This patch updates isImpliedCondition/isKnownNonZero to look into select form of
and/or as well.

See llvm.org/pr48353 and D93065 for more context

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93845
2020-12-28 08:32:45 +09:00
Nikita Popov 0af42d3dc7 [PatternMatch][LVI] Handle select-form and/or in LVI
Following the discussion in D93065, this adds m_LogicalAnd() and
m_LogicalOr() matchers, that match A && B and A || B logical
operations, either as bitwise operations or select expressions.
As an example usage, LVI is adapted to use these matchers for its
condition reasoning.

The plan here is to switch other parts of LLVM that reason about
and/or of conditions to also support the select forms, and then
merge D93065 (or a variant thereof) to disable the poison-unsafe
select to and/or transform.

Differential Revision: https://reviews.llvm.org/D93827
2020-12-27 17:39:02 +01:00
Nikita Popov b218407512 [ValueTracking] Handle more non-trivial conditions in isKnownNonZero()
In 35676a4f9a I've added handling for
non-trivial dominating conditions that imply non-zero on the true
branch. This adds the same support for the false branch.

The changes in pr45360.ll change block ordering and naming, but
don't change the control flow. The urem is still guaraded by a
non-zero check correctly.
2020-12-26 15:48:04 +01:00
Nikita Popov c795dd1926 [BasicAA] Pass AC/DT to isKnownNonEqual()
This allows us to handle assumes etc in the recursive
isKnownNonZero() checks.
2020-12-25 18:29:20 +01:00
Nikita Popov 35676a4f9a [InstCombine] Generalize icmp handling in isKnownNonZero()
The dominating condition handling in isKnownNonZero() currently
only takes into account conditions of the form "x != 0" or "x == 0".
However, there are plenty of other conditions that imply non-zero,
a common one being "x s> 0".

Peculiarly, the handling for assumes was already dealing with more
general non-zero-ness conditions, so this just reuses the same
logic for the dominating condition case.
2020-12-25 16:49:23 +01:00
Nikita Popov a3614a31c4 [BasicAA] Pass context instruction to isKnownNonZero()
This allows us to handle additional cases like assumes.
2020-12-25 12:58:19 +01:00
Nikita Popov b96a6ea0a9 [BasicAA] Make sure context instruction is symmetric
D71264 started using a context instruction in a computeKnownBits()
call. However, if aliasing between two GEPs is checked, then the
choice of context instruction will be different for alias(GEP1, GEP2)
and alias(GEP2, GEP1), which is not supposed to happen.

Resolve this by remembering which GEP a certain VarIndex belongs to,
and use that as the context instruction. This makes the choice of
context instruction predictable and symmetric.

It should be noted that this choice of context instruction is
non-optimal (just like the previous choice): The AA query result is
only valid at points that are reachable from *both* instructions.
Using either one of them is conservatively correct, but a larger
context may also be valid to use.

Differential Revision: https://reviews.llvm.org/D93183
2020-12-25 11:35:46 +01:00
Kazu Hirata 200b15af45 [Analysis] Remove spliceFunction (NFC)
The function was introduced without a user on Jan 3, 2011 in commit
0f87ca7733.  We still don't have a user
yet.
2020-12-23 21:57:25 -08:00
Andrew Litteken 48ad8194a5 [IRSim] Adding support for isomorphic predicates
Some predicates, can be considered the same as long as the operands are
flipped. For example, a > b gives the same result as b > a. This maps
instructions in a greater than form, to their appropriate less than
form, swapping the operands in the IRInstructionData only, allowing for
more flexible matching.

Tests:

llvm/test/Transforms/IROutliner/outlining-isomorphic-predicates.ll
llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp

Reviewers: jroelofs, paquette

Recommit of commit 0503926602

Differential Revision: https://reviews.llvm.org/D87310
2020-12-23 19:42:35 -06:00
Andrew Litteken 45a4f34bd1 Revert "[IRSim] Adding support for isomorphic predicates"
Reverting due to unit test errors between commits.

This reverts commit 0503926602.
2020-12-23 15:14:19 -06:00
Andrew Litteken 0503926602 [IRSim] Adding support for isomorphic predicates
Some predicates, can be considered the same as long as the operands are
flipped. For example, a > b gives the same result as b > a. This maps
instructions in a greater than form, to their appropriate less than
form, swapping the operands in the IRInstructionData only, allowing for
more flexible matching.

Tests:

llvm/test/Transforms/IROutliner/outlining-isomorphic-predicates.ll
llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp

Reviewers: jroelofs, paquette

Differential Revision: https://reviews.llvm.org/D87310
2020-12-23 15:02:00 -06:00
Andrew Litteken cce473e0c5 [IRSim] Adding commutativity matching to structure checking
Certain instructions, such as adds and multiplies can have the operands
flipped and still be considered the same. When we are analyzing
structure, this gives slightly more flexibility to create a mapping from
one region to another. We can add both operands in a corresponding
instruction to an operand rather than just the exact match. We then try
to eliminate items from the set, until there is only one valid mapping
between the regions of code.

We do this for adds, multiplies, and equality checking. However, this is
not done for floating point instructions, since the order can still
matter in some cases.

Tests:

llvm/test/Transforms/IROutliner/outlining-commutative-fp.ll
llvm/test/Transforms/IROutliner/outlining-commutative.ll
llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp

Reviewers: jroelofs, paquette

Differential Revision: https://reviews.llvm.org/D87311
2020-12-23 15:02:00 -06:00
Evgeniy Brevnov 9fb074e7bb [BPI] Improve static heuristics for "cold" paths.
Current approach doesn't work well in cases when multiple paths are predicted to be "cold". By "cold" paths I mean those containing "unreachable" instruction, call marked with 'cold' attribute and 'unwind' handler of 'invoke' instruction. The issue is that heuristics are applied one by one until the first match and essentially ignores relative hotness/coldness
 of other paths.

New approach unifies processing of "cold" paths by assigning predefined absolute weight to each block estimated to be "cold". Then we propagate these weights up/down IR similarly to existing approach. And finally set up edge probabilities based on estimated block weights.

One important difference is how we propagate weight up. Existing approach propagates the same weight to all blocks that are post-dominated by a block with some "known" weight. This is useless at least because it always gives 50\50 distribution which is assumed by default anyway. Worse, it causes the algorithm to skip further heuristics and can miss setting more accurate probability. New algorithm propagates the weight up only to the blocks that dominates and post-dominated by a block with some "known" weight. In other words, those blocks that are either always executed or not executed together.

In addition new approach processes loops in an uniform way as well. Essentially loop exit edges are estimated as "cold" paths relative to back edges and should be considered uniformly with other coldness/hotness markers.

Reviewed By: yrouban

Differential Revision: https://reviews.llvm.org/D79485
2020-12-23 22:47:36 +07:00
Kazu Hirata e6fde1ae7d [MemorySSA] Use is_contained (NFC) 2020-12-22 19:58:54 -08:00
Siddhesh Poyarekar 6fcb039956 Fold comparison of __builtin_object_size expression with -1 for non-const size
When __builtin_dynamic_object_size returns a non-constant expression, it cannot
be -1 since that is an invalid return value for object size. However since
passes running after the substitution don't know this, they are unable to
optimize away the comparison and hence the comparison and branch stays in there.
This change generates an appropriate call to llvm.assume to help the optimizer
folding the test.

glibc is considering adopting __builtin_dynamic_object_size for additional
protection[1] and this change will help reduce branching overhead in fortified
implementations of all of the functions that don't have the __builtin___*_chk
type builtins, e.g. __ppoll_chk.

Also remove the test limit-max-iterations.ll because it was deemed unnecessary
during review.

[1] https://sourceware.org/pipermail/libc-alpha/2020-November/120191.html

Differential Revision: https://reviews.llvm.org/D93015
2020-12-22 10:56:31 +01:00
Nikita Popov 82bd64fff6 [AA] byval argument is identified function local
byval arguments should mostly get the same treatment as noalias
arguments in alias analysis. This was not the case for the
isIdentifiedFunctionLocal() function.

Marking byval arguments as identified function local means that
they cannot alias with other arguments, which I believe is correct.

Differential Revision: https://reviews.llvm.org/D93602
2020-12-21 20:18:23 +01:00
Sanjay Patel 38ca7face6 [InstSimplify] reduce logic with inverted add/sub ops
https://llvm.org/PR48559
This could be part of a larger ValueTracking API,
but I don't see that currently.

https://rise4fun.com/Alive/gR0

  Name: and
  Pre: C1 == ~C2
  %sub = add i8 %x, C1
  %sub1 = sub i8 C2, %x
  %r = and i8 %sub, %sub1
  =>
  %r = 0

  Name: or
  Pre: C1 == ~C2
  %sub = add i8 %x, C1
  %sub1 = sub i8 C2, %x
  %r = or i8 %sub, %sub1
  =>
  %r = -1

  Name: xor
  Pre: C1 == ~C2
  %sub = add i8 %x, C1
  %sub1 = sub i8 C2, %x
  %r = xor i8 %sub, %sub1
  =>
  %r = -1
2020-12-21 08:51:43 -05:00
Kazu Hirata 3285ee143b [Analysis, IR, CodeGen] Use llvm::erase_if (NFC) 2020-12-20 09:19:35 -08:00
Nikita Popov 6fa1230594 [MemLoc] Fix debug print for LocationSize 2020-12-20 17:52:48 +01:00
Kazu Hirata a6516a820d [Analysis] Remove dead function getInstTypePair (NFC)
The last use of getInstTypePair with two parameters was removed on on
Jan 9, 2015 in commit 33d7f9de33.  It
seems to be unused since then.
2020-12-19 10:57:35 -08:00
Kazu Hirata 805d59593f [Analysis, CodeGen, IR] Use contains (NFC) 2020-12-18 19:08:17 -08:00
Roman Lebedev e9289dc25f
[InstSimplify] Don't miscompile `X == 0 ? abs(X) : -abs(X) --> -abs(X)` xform
The transform wasn't checking that the LHS of the comparison
*is* the `X` in question...
This is the miscompile that was holding up D87188.

Thanks to Dave Green for producing an actionable reproducer!
2020-12-18 21:18:13 +03:00
Florian Hahn a74941da71
Revert "[BasicAA] Handle two unknown sizes for GEPs"
Temporarily revert commit 8b1c4e310c.

After 8b1c4e310c the compile-time for `MultiSource/Benchmarks/MiBench/consumer-lame`
dramatically increases with -O3 & LTO, causing issues for builders with
that configuration.

I filed PR48553 with a smallish reproducer that shows a 10-100x compile
time increase.
2020-12-18 17:59:12 +00:00
Cullen Rhodes 7c8796f9db [TTI] Add supportsScalableVectors target hook
This is split off from D91718 and adds a new target hook
supportsScalableVectors that can be queried to check if scalable vectors
are supported by the backend. For AArch64 this returns true if SVE is
enabled.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D93060
2020-12-18 10:37:01 +00:00
Kazu Hirata 9895c7012d [InlineCost] Implement cost-benefit-based inliner
This patch adds an alternative cost metric for the inliner to take
into account both the cost (i.e. size) and cycle count savings into
account.

Without this patch, we decide to inline a given call site if the size
of inlining the call site is below the threshold that is computed
according to the hotness of the call site.

This patch adds a new cost metric, turned off by default, to take over
the handling of hot call sites.  Specifically, with the new cost
metric, we decide to inline a given call site if the ratio of cycle
savings to size exceeds a threshold.  The cycle savings are computed
from call site costs, parameter propagation, folded conditional
branches, etc, all weighted by their respective profile counts.  The
size is primarily the callee size, but we subtract call site costs and
the size of basic blocks that are never executed.

The new cost metric implicitly takes advantage of the machine function
splitter recently introduced by Snehasish Kumar, which dramatically
reduces the cost of duplicating (e.g. inlining) cold basic blocks by
placing cold basic blocks of hot functions in the .text.split
section.

We evaluated the new cost metric on clang bootstrap and SPECInt 2017.

For clang bootstrap, we observe 0.69% runtime improvement.

For SPECInt we report the change in IntRate the C/C++ benchmarks.  All
benchmarks apart from perlbench and omnetpp improve, on average by
0.21% with the max for mcf at 1.96%.

Benchmark               % Change
500.perlbench_r         -0.45
502.gcc_r                0.13
505.mcf_r                1.96
520.omnetpp_r           -0.28
523.xalancbmk_r          0.49
525.x264_r               0.00
531.deepsjeng_r          0.00
541.leela_r              0.35
557.xz_r                 0.21

Differential Revision: https://reviews.llvm.org/D92780
2020-12-18 00:37:24 -08:00
Kazu Hirata ed6a135246 [IVDescriptors] Remove getConsecutiveDirection (NFC)
The last use of the function was removed on Sep 18, 2016 in commit
5f8cc0c346.

The function was later moved to llvm/lib/Analysis/IVDescriptors.cpp on
Sep 12, 2018 in commit 7e98d69847.
2020-12-17 20:19:15 -08:00
dfukalov 9ed8e0caab [NFC] Reduce include files dependency and AA header cleanup (part 2).
Continuing work started in https://reviews.llvm.org/D92489:

Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h".

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92852
2020-12-17 14:04:48 +03:00
Bardia Mahjour 6eff12788e [DDG] Data Dependence Graph - DOT printer - recommit
This is being recommitted to try and address the MSVC complaint.

This patch implements a DDG printer pass that generates a graph in
the DOT description language, providing a more visually appealing
representation of the DDG. Similar to the CFG DOT printer, this
functionality is provided under an option called -dot-ddg and can
be generated in a less verbose mode under -dot-ddg-only option.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D90159
2020-12-16 12:37:36 -05:00
Whitney Tsang fa3693ad0b [LoopNest] Handle loop-nest passes in LoopPassManager
Per http://llvm.org/OpenProjects.html#llvm_loopnest, the goal of this
patch (and other following patches) is to create facilities that allow
implementing loop nest passes that run on top-level loop nests for the
New Pass Manager.

This patch extends the functionality of LoopPassManager to handle
loop-nest passes by specializing the definition of LoopPassManager that
accepts both kinds of passes in addPass.

Only loop passes are executed if L is not a top-level one, and both
kinds of passes are executed if L is top-level. Currently, loop nest
passes should have the following run method:

PreservedAnalyses run(LoopNest &, LoopAnalysisManager &,
LoopStandardAnalysisResults &, LPMUpdater &);

Reviewed By: Whitney, ychen
Differential Revision: https://reviews.llvm.org/D87045
2020-12-16 17:07:14 +00:00
Max Kazantsev 8b330f1f69 [SCEV] Add missing type check into getRangeForAffineNoSelfWrappingAR
We make type widening without checking if it's needed. Bail if the max
iteration count is wider than AR's type.
2020-12-15 14:50:32 +07:00
Kazu Hirata ddc5a5920e [Analysis] Use llvm::erase_value (NFC) 2020-12-14 22:40:13 -08:00
Bardia Mahjour a29ecca781 Revert "[DDG] Data Dependence Graph - DOT printer"
This reverts commit fd4a10732c, to
investigate the failure on windows: http://lab.llvm.org:8011/#/builders/127/builds/3274
2020-12-14 16:54:20 -05:00
Bardia Mahjour fd4a10732c [DDG] Data Dependence Graph - DOT printer
This patch implements a DDG printer pass that generates a graph in
the DOT description language, providing a more visually appealing
representation of the DDG. Similar to the CFG DOT printer, this
functionality is provided under an option called -dot-ddg and can
be generated in a less verbose mode under -dot-ddg-only option.

Differential Revision: https://reviews.llvm.org/D90159
2020-12-14 16:41:14 -05:00
Philip Reames f5fe8493e5 [LAA] Relax restrictions on early exits in loop structure
his is a preparation patch for supporting multiple exits in the loop vectorizer, by itself it should be mostly NFC. This patch moves the loop structure checks from LAA to their respective consumers (where duplicates don't already exist).  Moving the checks does end up changing some of the optimization warnings and debug output slightly, but nothing that appears to be a regression.

Why do this? Well, after auditing the code, I can't actually find anything in LAA itself which relies on having all instructions within a loop execute an equal number of times. This patch simply makes this explicit so that if one consumer - say LV in the near future (hopefully) - wants to handle a broader class of loops, it can do so.

Differential Revision: https://reviews.llvm.org/D92066
2020-12-14 12:44:01 -08:00
Stanislav Mekhanoshin 87d7757bbe [SLP] Control maximum vectorization factor from TTI
D82227 has added a proper check to limit PHI vectorization to the
maximum vector register size. That unfortunately resulted in at
least a couple of regressions on SystemZ and x86.

This change reverts PHI handling from D82227 and replaces it with
a more general check in SLPVectorizerPass::tryToVectorizeList().
Moved to tryToVectorizeList() it allows to restart vectorization
if initial chunk fails.

However, this function is more general and handles not only PHI
but everything which SLP handles. If vectorization factor would
be limited to maximum vector register size it would limit much
more vectorization than before leading to further regressions.
Therefore a new TTI callback getMaximumVF() is added with the
default 0 to preserve current behavior and limit nothing. Then
targets can decide what is better for them.

The callback gets ElementSize just like a similar getMinimumVF()
function and the main opcode of the chain. The latter is to avoid
regressions at least on the AMDGPU. We can have loads and stores
up to 128 bit wide, and <2 x 16> bit vector math on some
subtargets, where the rest shall not be vectorized. I.e. we need
to differentiate based on the element size and operation itself.

Differential Revision: https://reviews.llvm.org/D92059
2020-12-14 08:49:40 -08:00
Nikita Popov 22dba707b0 [AC] Handle (X+C1)<C2 assumes (PR48408)
InstCombine canonicalizes X>C && X<C' style comparisons into
(X+C1)<C2. This type of expression is recognized by some analyses
like LVI, but currently not when used inside assumptions, because
AssumptionCache does not track affected values for it.
2020-12-13 21:00:32 +01:00
Nikita Popov bb939ebfd7 [BasicAA] Handle known non-zero variable index
BasicAA currently handles cases like Scale*V0 + (-Scale)*V1 where
V0 != V1, but does not handle the simpler case of Scale*V with
V != 0. Add it based on an isKnownNonZero() call.

I'm not passing a context instruction for now, because the existing
approach of always using GEP1 for context could result in symmetry
issues.

Differential Revision: https://reviews.llvm.org/D93162
2020-12-13 13:20:05 +01:00
Kazu Hirata 9293b251b5 [Analysis/Interval] Remove isLoop (NFC)
The last use of isLoop was removed on Apr 29, 2002 in commit
09bbb5c015 as part of an effort to
remove "old induction varaible cannonicalization pass built on top of
interval analysis".
2020-12-12 10:09:35 -08:00
Nikita Popov d716eab197 [BasicAA] Make non-equal index handling simpler to extend (NFC) 2020-12-12 15:00:47 +01:00
Kazu Hirata eb44682d67 [Analysis] Use is_contained (NFC) 2020-12-11 21:19:31 -08:00
Mircea Trofin f76b7f22f0 [MLGO] Fix build break as result of new InstructionCost (D91174) 2020-12-11 20:28:39 -08:00
Nikita Popov 8b1c4e310c [BasicAA] Handle two unknown sizes for GEPs
If we have two unknown sizes and one GEP operand and one non-GEP
operand, then we currently simply return MayAlias. The comment says
we can't do anything useful ... but we can! We can still check that
the underlying objects are different (and do so for the GEP-GEP case).

To reduce the compile-time impact, this a) checks this early, before
doing the relatively expensive GEP decomposition that will not be
used and b) doesn't do the check if the other operand is a phi or
select. In that case, the phi/select will already recurse, so this
would just do two slightly different recursive walks that arrive at
the same roots.

Compile-time is still a bit of a mixed bag: https://llvm-compile-time-tracker.com/compare.php?from=624af932a808b363a888139beca49f57313d9a3b&to=845356e14adbe651a553ed11318ddb5e79a24bcd&stat=instructions
On average this is a small improvement, but sqlite with ThinLTO has
a 0.5% regression (lencod has a 1% improvement).

The BasicAA test case checks this by using two memsets with unknown
size. However, the more interesting case where this is useful is
the LoopVectorize test case, as analysis of accesses in loops tends
to always us unknown sizes.

Differential Revision: https://reviews.llvm.org/D92401
2020-12-11 18:45:53 +01:00
David Sherwood 9b76160e53 [Support] Introduce a new InstructionCost class
This is the first in a series of patches that attempts to migrate
existing cost instructions to return a new InstructionCost class
in place of a simple integer. This new class is intended to be
as light-weight and simple as possible, with a full range of
arithmetic and comparison operators that largely mirror the same
sets of operations on basic types, such as integers. The main
advantage to using an InstructionCost is that it can encode a
particular cost state in addition to a value. The initial
implementation only has two states - Normal and Invalid - but these
could be expanded over time if necessary. An invalid state can
be used to represent an unknown cost or an instruction that is
prohibitively expensive.

This patch adds the new class and changes the getInstructionCost
interface to return the new class. Other cost functions, such as
getUserCost, etc., will be migrated in future patches as I believe
this to be less disruptive. One benefit of this new class is that
it provides a way to unify many of the magic costs in the codebase
where the cost is set to a deliberately high number to prevent
optimisations taking place, e.g. vectorization. It also provides
a route to represent the extremely high, and unknown, cost of
scalarization of scalable vectors, which is not currently supported.

Differential Revision: https://reviews.llvm.org/D91174
2020-12-11 08:12:54 +00:00
Arthur Eubanks c80e193587 [NFC] Inline maxDevirtIterationsReached()
This was separated in the past because the cl::opt was in the .cpp file
but DevirtSCCRepeatedPass::run() was in the .h file. Now that
DevirtSCCRepeatedPass::run() is in the .cpp file, get rid of the tiny
maxDevirtIterationsReached(), it's bad for readability.
2020-12-10 22:12:29 -08:00
Xun Li 31e60b9133 [coroutine] should disable inline before calling coro split
This is a rework of D85812, which didn't land.
When callee coroutine function is inlined into caller coroutine function before coro-split pass, llvm will emits "coroutine should have exactly one defining @llvm.coro.begin". It seems that coro-early pass can not handle this quiet well.
So we believe that unsplited coroutine function should not be inlined.
This patch fix such issue by not inlining function if it has attribute "coroutine.presplit" (it means the function has not been splited) to fix this issue
test plan: check-llvm, check-clang

In D85812, there was suggestions on moving the macros to Attributes.td to avoid circular header dependency issue.
I believe it's not worth doing just to be able to use one constant string in one place.
Today, there are already 3 possible attribute values for "coroutine.presplit": c6543cc6b8/llvm/lib/Transforms/Coroutines/CoroInternal.h (L40-L42)
If we move them into Attributes.td, we would be adding 3 new attributes to EnumAttr, just to support this, which I think is an overkill.

Instead, I think the best way to do this is to add an API in Function class that checks whether this function is a coroutine, by checking the attribute by name directly.

Differential Revision: https://reviews.llvm.org/D92706
2020-12-08 08:53:08 -08:00
Philip Reames 2656885390 Teach isKnownNonEqual how to recurse through invertible multiplies
Build on the work started in 8f07629, and add the multiply case. In the process, more clearly describe the requirement for the operation we're looking through.

Differential Revision: https://reviews.llvm.org/D92726
2020-12-07 14:52:08 -08:00
Florian Hahn f19876c536 [ConstraintElimination] Bail out if system gets too big.
For some inputs, the constraint system can grow quite large during
solving, because it replaces complex constraints with one or more
simpler constraints. This adds a cut-off to avoid compile-time explosion
on problematic inputs.
2020-12-06 20:19:15 +00:00
Nikita Popov 5e69e2ebad [BasicAA] Migrate "same base pointer" logic to decomposed GEPs
BasicAA has some special bit of logic for "same base pointer" GEPs
that performs a structural comparison: It only looks at two GEPs
with the same base (as opposed to two GEP chains with a MustAlias
base) and compares their indexes in a limited way. I generalized
part of this code in D91027, and this patch merges the remainder
into the normal decomposed GEP logic.

What this code ultimately wants to do is to determine that
gep %base, %idx1 and gep %base, %idx2 don't alias if %idx1 != %idx2,
and the access size fits within the stride.

We can express this in terms of a decomposed GEP expression with
two indexes scale*%idx1 + -scale*%idx2 where %idx1 != %idx2, and
some appropriate checks for sizes and offsets.

This makes the reasoning slightly more powerful, and more
importantly brings all the GEP logic under a common umbrella.

Differential Revision: https://reviews.llvm.org/D92723
2020-12-06 10:27:35 +01:00
Philip Reames 8f076291be Add recursive decomposition reasoning to isKnownNonEqual
The basic idea is that by looking through operand instructions which don't change the equality result that we can push the existing known bits comparison down past instructions which would obscure them.

We have analogous handling in InstSimplify for most - though weirdly not all - of these cases starting from an icmp root. It's a bit unfortunate to duplicate logic, but since my actual goal is to extend BasicAA, the icmp logic doesn't help. (And just makes it hard to test here.)  The BasicAA change will be posted separately for review.

Differential Revision: https://reviews.llvm.org/D92698
2020-12-05 15:58:19 -08:00
Philip Reames bfda69416c [BasicAA] Fix a bug with relational reasoning across iterations
Due to the recursion through phis basicaa does, the code needs to be extremely careful not to reason about equality between values which might represent distinct iterations. I'm generally skeptical of the correctness of the whole scheme, but this particular patch fixes one particular instance which is demonstrateable incorrect.

Interestingly, this appears to be the second attempted fix for the same issue. The former fix is incomplete and doesn't address the actual issue.

Differential Revision: https://reviews.llvm.org/D92694
2020-12-05 14:10:21 -08:00
Florian Hahn 4e5c0c2a63 [ConstraintElimination] Wrap dump() call in LLVM_DEBUG (NFC).
ConstraintSystem::dump only generates output with -debug, but there's no
need to call it without -debug.
2020-12-05 13:14:53 +00:00
Florian Hahn 4ceecc820b [ConstraintElimination] Handle constraints with all zero var coeffs.
Constraints where all variable coefficients are 0 do not add any useful
information. When checking, we can check if they are always true/false.
2020-12-05 12:06:53 +00:00
Nikita Popov f8afba5f7a [AA] Add statistics for alias results (NFC)
Count how many NoAlias/MustAlias/MayAlias we get from top-level
queries.
2020-12-05 11:09:15 +01:00
Arthur Eubanks 7f6f9f4cf9 [NewPM] Make pass adaptors less templatey
Currently PassBuilder.cpp is by far the file that takes longest to
compile. This is due to tons of templates being instantiated per pass.

Follow PassManager by using wrappers around passes to avoid making
the adaptors templated on the pass type. This allows us to move various
adaptors' run methods into .cpp files.

This reduces the compile time of PassBuilder.cpp on my machine from 66
to 39 seconds. It also reduces the size of opt from 685M to 676M.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D92616
2020-12-04 08:30:50 -08:00
Arthur Eubanks 2f0de58294 [NewPM] Support --print-before/after in NPM
This changes --print-before/after to be a list of strings rather than
legacy passes. (this also has the effect of not showing the entire list
of passes in --help-hidden after --print-before/after, which IMO is
great for making it less verbose).

Currently PrintIRInstrumentation passes the class name rather than pass
name to llvm::shouldPrintBeforePass(), meaning
llvm::shouldPrintBeforePass() never functions as intended in the NPM.
There is no easy way of converting class names to pass names outside of
within an instance of PassBuilder.

This adds a map of pass class names to their short names in
PassRegistry.def within PassInstrumentationCallbacks. It is populated
inside the constructor of PassBuilder, which takes a
PassInstrumentationCallbacks.

Add a pointer to PassInstrumentationCallbacks inside
PrintIRInstrumentation and use the newly created map.

This is a bit hacky, but I can't think of a better way since the short
id to class name only exists within PassRegistry.def. This also doesn't
handle passes not in PassRegistry.def but rather added via
PassBuilder::registerPipelineParsingCallback().

llvm/test/CodeGen/Generic/print-after.ll doesn't seem very useful now
with this change.

Reviewed By: ychen, jamieschmeiser

Differential Revision: https://reviews.llvm.org/D87216
2020-12-03 16:52:14 -08:00
Philip Reames 0129cd5035 Use deref facts derived from minimum object size of allocations
This change should be fairly straight forward. If we've reached a call, check to see if we can tell the result is dereferenceable from information about the minimum object size returned by the call.

To control compile time impact, I'm only adding the call for base facts in the routine. getObjectSize can also do recursive reasoning, and we don't want that general capability here.

As a follow up patch (without separate review), I will plumb through the missing TLI parameter. That will have the effect of extending this to known libcalls - malloc, new, and the like - whereas currently this only covers calls with the explicit allocsize attribute.

Differential Revision: https://reviews.llvm.org/D90341
2020-12-03 15:01:14 -08:00
modimo 1860331932 [MemCpyOpt] Correctly merge alias scopes during call slot optimization
When MemCpyOpt performs call slot optimization it will concatenate the `alias.scope` metadata between the function call and the memcpy. However, scoped AA relies on the domains in metadata to be maintained in a caller-callee relationship. Naive concatenation breaks this assumption leading to bad AA results.

The fix is to take the intersection of domains then union the scopes within those domains.

The original bug came from a case of rust bad codegen which uses this bad aliasing to perform additional memcpy optimizations. As show in the added test case `%src` got forwarded past its lifetime leading to a dereference of garbage data.

Testing
ninja check-llvm

Reviewed By: jeroen.dobbelaere

Differential Revision: https://reviews.llvm.org/D91576
2020-12-03 09:23:37 -08:00
dfukalov 2ce38b3f03 [NFC] Reduce include files dependency.
1. Removed #include "...AliasAnalysis.h" in other headers and modules.
2. Cleaned up includes in AliasAnalysis.h.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92489
2020-12-03 18:25:05 +03:00
Nick Desaulniers bc044a88ee [Inline] prevent inlining on stack protector mismatch
It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an attribute((no_stack_protector)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.

While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u

SSP attributes can be ordered by strength. Weakest to strongest, they
are: ssp, sspstrong, sspreq.  Callees with differing SSP attributes may be
inlined into each other, and the strongest attribute will be applied to the
caller. (No change)

After this change:
* A callee with no SSP attributes will no longer be inlined into a
  caller with SSP attributes.
* The reverse is also true: a callee with an SSP attribute will not be
  inlined into a caller with no SSP attributes.
* The alwaysinline attribute overrides these rules.

Functions that get synthesized by the compiler may not get inlined as a
result if they are not created with the same stack protector function
attribute as their callers.

Alternative approach to https://reviews.llvm.org/D87956.

Fixes pr/47479.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: rnk, MaskRay

Differential Revision: https://reviews.llvm.org/D91816
2020-12-02 11:00:16 -08:00
jasonliu a65d8c5d72 [XCOFF][AIX] Generate LSDA data and compact unwind section on AIX
Summary:
AIX uses the existing EH infrastructure in clang and llvm.
The major differences would be
1. AIX do not have CFI instructions.
2. AIX uses a new personality routine, named __xlcxx_personality_v1.
   It doesn't use the GCC personality rountine, because the
   interoperability is not there yet on AIX.
3. AIX do not use eh_frame sections. Instead, it would use a eh_info
section (compat unwind section) to store the information about
personality routine and LSDA data address.

Reviewed By: daltenty, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D91455
2020-12-02 18:42:44 +00:00
Sanjay Patel 9d6d24c250 [JumpThreading][VectorUtils] avoid infinite loop on unreachable IR
https://llvm.org/PR48362

It's possible that we could stub this out sooner somewhere
within JumpThreading, but I'm not sure how to do that, and
then we would still have potential danger in other callers.

I can't find a way to trigger this using 'instsimplify',
however, because that already has a bailout on unreachable
blocks.
2020-12-02 13:39:33 -05:00
Fangrui Song a5309438fe static const char *const foo => const char foo[]
By default, a non-template variable of non-volatile const-qualified type
having namespace-scope has internal linkage, so no need for `static`.
2020-12-01 10:33:18 -08:00
Caroline Concatto 4b0ef2b075 [NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type
This patch replaces the attribute  `unsigned VF`  in the class
IntrinsicCostAttributes by `ElementCount VF`.
This is a non-functional change to help upcoming patches to compute the cost
model for scalable vector inside this class.

Differential Revision: https://reviews.llvm.org/D91532
2020-12-01 11:12:51 +00:00
Wei Wang 93dc1b5b8c [Remarks][2/2] Expand remarks hotness threshold option support in more tools
This is the #2 of 2 changes that make remarks hotness threshold option
available in more tools. The changes also allow the threshold to sync with
hotness threshold from profile summary with special value 'auto'.

This change expands remarks hotness threshold option
-fdiagnostics-hotness-threshold in clang and *-remarks-hotness-threshold in
other tools to utilize hotness threshold from profile summary.

Remarks hotness filtering relies on several driver options. Table below lists
how different options are correlated and affect final remarks outputs:

| profile | hotness | threshold | remarks printed |
|---------|---------|-----------|-----------------|
| No      | No      | No        | All             |
| No      | No      | Yes       | None            |
| No      | Yes     | No        | All             |
| No      | Yes     | Yes       | None            |
| Yes     | No      | No        | All             |
| Yes     | No      | Yes       | None            |
| Yes     | Yes     | No        | All             |
| Yes     | Yes     | Yes       | >=threshold     |

In the presence of profile summary, it is often more desirable to directly use
the hotness threshold from profile summary. The new argument value 'auto'
indicates threshold will be synced with hotness threshold from profile summary
during compilation. The "auto" threshold relies on the availability of profile
summary. In case of missing such information, no remarks will be generated.

Differential Revision: https://reviews.llvm.org/D85808
2020-11-30 21:55:50 -08:00
Nick Desaulniers 91aff1d8ba [InlineCost] prefer range-for. NFC
Prefer range-for over iterators when such methods exist. Precommitted
from https://reviews.llvm.org/D91816.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D92350
2020-11-30 16:07:40 -08:00
Mircea Trofin 5fe10263ab [llvm][inliner] Reuse the inliner pass to implement 'always inliner'
Enable performing mandatory inlinings upfront, by reusing the same logic
as the full inliner, instead of the AlwaysInliner. This has the
following benefits:
- reduce code duplication - one inliner codebase
- open the opportunity to help the full inliner by performing additional
function passes after the mandatory inlinings, but before th full
inliner. Performing the mandatory inlinings first simplifies the problem
the full inliner needs to solve: less call sites, more contextualization, and,
depending on the additional function optimization passes run between the
2 inliners, higher accuracy of cost models / decision policies.

Note that this patch does not yet enable much in terms of post-always
inline function optimization.

Differential Revision: https://reviews.llvm.org/D91567
2020-11-30 12:03:39 -08:00
Nikita Popov e987fbdd85 [BasicAA] Generalize recursive phi alias analysis
For recursive phis, we skip the recursive operands and check that
the remaining operands are NoAlias with an unknown size. Currently,
this is limited to inbounds GEPs with positive offsets, to
guarantee that the recursion only ever increases the pointer.

Make this more general by only requiring that the underlying object
of the phi operand is the phi itself, i.e. it it based on itself in
some way. To compensate, we need to use a beforeOrAfterPointer()
location size, as we no longer have the guarantee that the pointer
is strictly increasing.

This allows us to handle some additional cases like negative geps,
geps with dynamic offsets or geps that aren't inbounds.

Differential Revision: https://reviews.llvm.org/D91914
2020-11-29 10:25:23 +01:00
Nikita Popov 1dea8ed8b7 [BasicAA] Remove unnecessary known size requirement
The size requirement on V2 was present because it was not clear
whether an unknown size would allow an access before the start of
V2, which could then overlap. This is clarified since D91649: In
this part of BasicAA, all accesses can occur only after the base
pointer, even if they have unknown size.

This makes the positive and negative offset cases symmetric.

Differential Revision: https://reviews.llvm.org/D91482
2020-11-28 10:17:12 +01:00
Nikita Popov 8351f9b5ce [ValueTracking] Fix assert on shufflevector of pointers
In this case getScalarSizeInBits() is not well-defined. Use the
existing TyBits variable that handles vectors of pointers correctly.
2020-11-27 21:19:31 +01:00
Martin Storsjö fa10383664 Revert "[BasicAA] Fix BatchAA results for phi-phi assumptions"
This reverts commit 8166ed1a7a,
as it caused some compilations to hang/loop indefinitely, see
https://reviews.llvm.org/D91936 for details.
2020-11-27 21:50:59 +02:00
Cullen Rhodes 7b8d50b141 [InstSimplify] Clarify use of FixedVectorType in SimplifySelectInst
Folding a select of vector constants that include undef elements only
applies to fixed vectors, but there's no earlier check the type is not
scalable so it crashes for scalable vectors. This adds a check so this
optimization is only attempted for fixed vectors.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92046
2020-11-27 09:55:29 +00:00
Kazu Hirata 60e749aa23 [InlineCost] Fix indentation (NFC) 2020-11-26 18:00:55 -08:00
Nikita Popov 8166ed1a7a [BasicAA] Fix BatchAA results for phi-phi assumptions
Add a flag that disables caching when computing aliasing results
potentially based on a phi-phi NoAlias assumption. We'll still
insert cache entries temporarily to catch infinite recursion,
but will drop them afterwards, so they won't persist in BatchAA.

Differential Revision: https://reviews.llvm.org/D91936
2020-11-26 21:43:50 +01:00
Nikita Popov 4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Max Kazantsev 035955f925 Revert "Return "[SCEV] Use isBasicBlockEntryGuardedByCond in isLoopBackedgeGuardedByCond", 2nd try"
This reverts commit f690986f31.

Compile time then and again...
2020-11-26 18:12:51 +07:00
Max Kazantsev f690986f31 Return "[SCEV] Use isBasicBlockEntryGuardedByCond in isLoopBackedgeGuardedByCond", 2nd try
Reverted because the compile time impact is still too high.

isKnownViaNonRecursiveReasoning is used twice, we can do it just once.

Differential Revision: https://reviews.llvm.org/D92152
2020-11-26 17:45:13 +07:00
Max Kazantsev 91d6b6b5fb Revert "[SCEV] Use isBasicBlockEntryGuardedByCond in isLoopBackedgeGuardedByCond"
This reverts commit 3d4c0460ec.

Compile time impact is still high. Need to understand why.

Differential Revision: https://reviews.llvm.org/D92153
2020-11-26 17:28:30 +07:00
Max Kazantsev 3d4c0460ec [SCEV] Use isBasicBlockEntryGuardedByCond in isLoopBackedgeGuardedByCond
Previously we tried to using isKnownPredicateAt, but it makes an
extra query to isKnownPredicate, which has negative impact on compile
time. Let's try to use more lightweight isBasicBlockEntryGuardedByCond.

Differential Revision: https://reviews.llvm.org/D92152
2020-11-26 17:08:38 +07:00
Max Kazantsev 3b6481eae2 Revert "[SCEV] Use isKnownPredicateAt in isLoopBackedgeGuardedByCond"
This reverts commit 14f2ad0e3c.

Reverting to investigate compile time drop.

Differential Revision: https://reviews.llvm.org/D92152
2020-11-26 16:42:43 +07:00
Max Kazantsev 14f2ad0e3c [SCEV] Use isKnownPredicateAt in isLoopBackedgeGuardedByCond
A piece of code in `isLoopBackedgeGuardedByCond` basically duplicates
the dominators traversal from `isBlockEntryGuardedByCond` called from
`isKnownPredicateAt`, but it's less powerful because it does not give context
to `isImpliedCond`. This patch reuses the `isKnownPredicateAt `function there,
reducing the amount of code duplication and making it more powerful.

Differential Revision: https://reviews.llvm.org/D92152
Reviewed By: skatkov
2020-11-26 13:20:02 +07:00
Max Kazantsev f10500e220 [IndVars] Use isLoopBackedgeGuardedByCond for last iteration check
Use more context to prove contextual facts about the last iteration. It is
only executed when the backedge is taken, so we can use `isLoopBackedgeGuardedByCond`
to make this check.

Differential Revision: https://reviews.llvm.org/D91535
Reviewed By: skatkov
2020-11-26 12:37:21 +07:00
Joe Ellis 06654a5348 [SVE] Fix TypeSize warning in RuntimePointerChecking::insert
The TypeSize warning would occur because RuntimePointerChecking::insert
was not scalable vector aware. The fix is to use
ScalarEvolution::getSizeOfExpr to grab the size of types.

Differential Revision: https://reviews.llvm.org/D90171
2020-11-25 16:59:03 +00:00
Cullen Rhodes 1ba4b82f67 [LAA] NFC: Rename [get]MaxSafeRegisterWidth -> [get]MaxSafeVectorWidthInBits
MaxSafeRegisterWidth is a misnomer since it actually returns the maximum
safe vector width. Register suggests it relates directly to a physical
register where it could be a vector spanning one or more physical
registers.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91727
2020-11-25 13:06:26 +00:00
Max Kazantsev 9130651126 Revert "[SCEV] Generalize no-self-wrap check in isLoopInvariantExitCondDuringFirstIterations"
This reverts commit 7dcc889917.

This patch introduced a logical error that breaks whole logic of this analysis.
All checks we are making are supposed to be loop-independent, so that we could
safely remove the range check. The 'nw' fact is loop-dependent, so we can remove
the check basing on facts from this very check.

Motivating examples will follow-up.
2020-11-25 13:26:17 +07:00
Philip Reames 10ddb927c1 [SCEV] Use isa<> pattern for testing for CouldNotCompute [NFC]
Some older code - and code copied from older code - still directly tested against the singelton result of SE::getCouldNotCompute.  Using the isa<SCEVCouldNotCompute> form is both shorter, and more readable.
2020-11-24 18:47:49 -08:00
Philip Reames b3a8a15343 [LAA] Minor code style tweaks [NFC] 2020-11-24 15:49:27 -08:00
Janek van Oirschot 42eaf4fe0a [HardwareLoops] Change order of SCEV expression construction for InitLoopCount.
Putting the +1 before the zero-extend will allow scalar evolution to fold the expression in some cases such as the one shown in PowerPC's `shrink-wrap.ll` test.

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D91724
2020-11-24 18:01:42 +00:00
Max Kazantsev 02fdbc3567 Revert "[NFC][SCEV] Generalize monotonicity check for full and limited iteration space"
This reverts commit 2734a9ebf4.

This patch appeared to not be a NFC. It introduced an execution path where
monotonicity check on limited space started relying in existing nsw/nuw
flags, which is illegal. The motivating test will follow-up.
2020-11-24 17:56:59 +07:00
Arthur Eubanks aff058b1a9 Reland [CGSCC] Detect devirtualization in more cases
The devirtualization wrapper misses cases where if it wraps a pass
manager, an individual pass may devirtualize an indirect call created by
a previous pass. For example, inlining may create a new indirect call
which is devirtualized by instcombine. Currently the devirtualization
wrapper will not see that because it only checks cgscc edges at the very
beginning and end of the pass (manager) it wraps.

This fixes some tests testing this exact behavior in the legacy PM.

Instead of checking WeakTrackingVHs for CallBases at the very beginning
and end of the pass it wraps, check every time
updateCGAndAnalysisManagerForPass() is called.

check-llvm and check-clang with -abort-on-max-devirt-iterations-reached
on by default doesn't show any failures outside of tests specifically
testing it so it doesn't needlessly rerun passes more than necessary.
(The NPM -O2/3 pipeline run the inliner/function simplification pipeline
under a devirtualization repeater pass up to 4 times by default).

http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks
shows that 7zip has ~1% compile time regression. I looked at it and saw
that there indeed was devirtualization happening that was not previously
caught, so now it reruns the CGSCC pipeline on some SCCs, which is WAI.

The initial land assumed CallBase WeakTrackingVHs would always be
CallBases, but they can be RAUW'd with undef.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89587
2020-11-23 21:28:59 -08:00
Yichao Yu 4bc88a0e9a Enable support for floating-point division reductions
Similar to fsub, fdiv can also be vectorized using fmul.

Also http://llvm.org/viewvc/llvm-project?view=revision&revision=215200

Differential Revision: https://reviews.llvm.org/D34078

Co-authored-by: Jameson Nash <jameson@juliacomputing.com>
2020-11-23 20:00:58 -05:00
Arthur Eubanks 6a2799cf8e Revert "[CGSCC] Detect devirtualization in more cases"
This reverts commit 14a68b4aa9.

Causes building self hosted clang to crash when using NPM.
2020-11-23 13:21:05 -08:00
Arthur Eubanks 3c811ce4f3 [NPM] Share pass building options with legacy PM
We should share options when possible.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91741
2020-11-23 13:04:05 -08:00
Arthur Eubanks 7167e5203a Port -print-memderefs to NPM
There is lots of code duplication, but hopefully it won't matter soon.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D91683
2020-11-23 11:56:22 -08:00
Arthur Eubanks 14a68b4aa9 [CGSCC] Detect devirtualization in more cases
The devirtualization wrapper misses cases where if it wraps a pass
manager, an individual pass may devirtualize an indirect call created by
a previous pass. For example, inlining may create a new indirect call
which is devirtualized by instcombine. Currently the devirtualization
wrapper will not see that because it only checks cgscc edges at the very
beginning and end of the pass (manager) it wraps.

This fixes some tests testing this exact behavior in the legacy PM.

Instead of checking WeakTrackingVHs for CallBases at the very beginning
and end of the pass it wraps, check every time
updateCGAndAnalysisManagerForPass() is called.

check-llvm and check-clang with -abort-on-max-devirt-iterations-reached
on by default doesn't show any failures outside of tests specifically
testing it so it doesn't needlessly rerun passes more than necessary.
(The NPM -O2/3 pipeline run the inliner/function simplification pipeline
under a devirtualization repeater pass up to 4 times by default).

http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks
shows that 7zip has ~1% compile time regression. I looked at it and saw
that there indeed was devirtualization happening that was not previously
caught, so now it reruns the CGSCC pipeline on some SCCs, which is WAI.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89587
2020-11-23 11:55:20 -08:00
Jay Foad 000400ca0a Fix speling in comments. NFC. 2020-11-23 14:43:24 +00:00
Mikael Holmen faf848ac32 [Inline] Fix in handling of ptrtoint in InlineCost
ConstantOffsetPtrs contains mappings from a Value to a base pointer and
an offset. The offset is typed and has a size, and at least when dealing
with ptrtoint, it could happen that we had a mapping from a ptrtoint
with type i32 to an offset with type i16. This could later cause
problems, showing up in PR 47969 and PR 38500.

In PR 47969 we ended up in an assert complaining that trunc i16 to i16
is invalid and in Pr 38500 that a cmp on an i32 and i16 value isn't
valid.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D90610
2020-11-23 14:33:06 +01:00
Max Kazantsev 48d7cc6ae2 [SCEV] Fix incorrect treatment of max taken count. PR48225
SCEV makes a logical mistake when handling EitherMayExit in
case when both conditions must be met to exit the loop. The
mistake looks like follows: "if condition `A` fails within at most `X` first
iterations, and `B` fails within at most `Y` first iterations, then `A & B`
fails at most within `min (X, Y)` first iterations". This is wrong, because
both of them must fail at the same time.

Simple example illustrating this is following: we have an IV with step 1,
condition `A` = "IV is even", condition `B` = "IV is odd". Both `A` and `B`
will fail within first two iterations. But it doesn't mean that both of them
will fail within first two first iterations at the same time, which would mean
that IV is neither even nor odd at the same time within first 2 iterations.

We can only do so for known exact BE counts, but not for max.

Differential Revision: https://reviews.llvm.org/D91942
Reviewed By: nikic
2020-11-23 16:52:39 +07:00
Max Kazantsev 47e31d1b5e [NFC] Reduce code duplication in binop processing in computeExitLimitFromCondCached
Handling of `and` and `or` vastly uses copy-paste. Factored out into
a helper function as preparation step for further fix (see PR48225).

Differential Revision: https://reviews.llvm.org/D91864
Reviewed By: nikic
2020-11-23 13:18:12 +07:00
Nikita Popov 6f5ef648a5 [BasicAA] Avoid unnecessary cache update (NFC)
If the final recursive query returns MayAlias as well, there is
no need to update the cache (which already stores MayAlias).
2020-11-22 20:10:45 +01:00
Sanjay Patel c5a4d80fd4 [ValueTracking][MemCpyOpt] avoid crash on inttoptr with vector pointer type (PR48075) 2020-11-22 12:54:18 -05:00
Simon Pilgrim 24d6e60488 [Analysis] Remove unused system header includes
Cleanup unused system headers and fix an implicit dependency
2020-11-22 10:32:37 +00:00
Nikita Popov ded5928866 [BasicAA] Remove unnecessary sextOrSelf (NFC)
We are doing a sextOrTrunc directly afterwards, so this seems
useless. There is a multiplication in between, but truncating
before or after the multiplication should not make a difference.
2020-11-21 21:32:56 +01:00
Nikita Popov 0d114f56d7 [BasicAA] Return DecomposedGEP (NFC)
Instead of requiring the caller to initialize the DecomposedGEP
structure and then passing it in by reference, make
DecomposeGEPExpression() responsible for initializing and returning
the structure.
2020-11-21 21:05:26 +01:00
Nikita Popov f4412c5ae4 [BasicAA] Remove some intermediate variables (NFC)
Use DecompGEP1.Offset instead of GEP1BaseOffset, etc. I found the
asymmetry of modifying DecompGEP1.VarIndices, but not modifying
DecompGEP1.Offset odd here.
2020-11-21 20:36:25 +01:00
Nikita Popov 913a99c474 [BasicAA] Remove stale FIXME (NFC)
If aliasGEP returns MayAlias, the code does fall through to
aliasPHI etc, so this FIXME is no longer applicable.
2020-11-21 20:07:26 +01:00
Kazu Hirata 226beb494c [Analysis] Use llvm::is_contained (NFC) 2020-11-20 18:08:05 -08:00
Hongtao Yu f3c445697d [CSSPGO] IR intrinsic for pseudo-probe block instrumentation
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:

1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.

We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.

Let's now look at an example. Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}

```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86490
2020-11-20 10:39:24 -08:00
Nikita Popov e8dc6e9a32 [MemLoc] Use hasValue() method more (NFC)
Followup to 7de7c40898. I previously
removed a number of == comparisons to LocationSize::unknown(), but
missed these != comparisons.
2020-11-19 22:29:44 +01:00
Nikita Popov 7de7c40898 [MemLoc] Use hasValue() method (NFC)
Instead of comparing to LocationSize::unknown(), prefer calling
the hasValue() method instead, which is less reliant on
implementation details.
2020-11-19 21:53:50 +01:00
Nikita Popov 393b9e9db3 [MemLoc] Require LocationSize argument (NFC)
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
2020-11-19 21:45:52 +01:00
Artur Pilipenko 887c7660bd [BasicAA] Deoptimize intrinsics don't modify memory
Similarly to assumes and guards deoptimize intrinsics are
marked as writing to ensure proper control dependencies
but they never modify any particular memory location.

Differential Revision: https://reviews.llvm.org/D91658
2020-11-19 12:08:33 -08:00
Nikita Popov 22ec72f803 [Lint] Use MemoryLocation
Instead of separately passing pointer and size, make use of
MemoryLocation. This allows us to also reuse all the existing
logic for determining the MemoryLocation correponding to an
instruction or call argument.

Not quite NFC because used locations may be more precise in some
cases.
2020-11-19 20:55:25 +01:00
Leonard Chan a97f62837f [llvm][IR] Add dso_local_equivalent Constant
The `dso_local_equivalent` constant is a wrapper for functions that represents a
value which is functionally equivalent to the global passed to this. That is, if
this accepts a function, calling this constant should have the same effects as
calling the function directly. This could be a direct reference to the function,
the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the
resolved function as a call target.

When lowered, the returned address must have a constant offset at link time from
some other symbol defined within the same binary. The address of this value is
also insignificant. The name is leveraged from `dso_local` where use of a function
or variable is resolved to a symbol in the same linkage unit.

In this patch:
- Addition of `dso_local_equivalent` and handling it
- Update Constant::needsRelocation() to strip constant inbound GEPs and take
  advantage of `dso_local_equivalent` for relative references

This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959)
which makes vtables readonly. This works by replacing the dynamic relocations for
function pointers in them with static relocations that represent the offset between
the vtable and virtual functions. If a function is externally defined,
`dso_local_equivalent` can be used as a generic wrapper for the function to still
allow for this static offset calculation to be done.

See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details.

Differential Revision: https://reviews.llvm.org/D77248
2020-11-19 10:26:17 -08:00
Simon Pilgrim fceaff41d6 [ValueTracking] computeKnownBitsFromShiftOperator - move shift amount analysis to top of the function. NFCI.
These are all lightweight to compute and helps avoid issues with Known being used to hold both the shift amount and then the shifted result.

Minor cleanup for D90479.
2020-11-19 13:50:49 +00:00
Mircea Trofin 8ab2353a4c [NFC][TFUtils] also include output specs lookup logic in loadOutputSpecs
The lookup logic is also reusable.

Also refactored the API to return the loaded vector - this makes it more
clear what state it is in in the case of error (as it won't be
returned).

Differential Revision: https://reviews.llvm.org/D91759
2020-11-18 21:20:21 -08:00
Mircea Trofin b51e844f7a [NFC][TFUtils] Extract out the output spec loader
It's generic for the 'development mode', not specific to the inliner
case.

Differential Revision: https://reviews.llvm.org/D91751
2020-11-18 20:03:20 -08:00
Nikita Popov cd3c22c47e [BasicAA] Generalize base offset modulus handling
The GEP aliasing implementation currently has two pieces of code
that solve two different subsets of the same basic problem: If you
have GEPs with offsets 4*x + 0 and 4*y + 1 (assuming access size 1),
then they do not alias regardless of whether x and y are the same.

One implementation is in aliasSameBasePointerGEPs(), which looks at
this in a limited structural way. It requires both GEP base pointers
to be exactly the same, then (optionally) a number of equal indexes,
then an unknown index, then a non-equal index into a struct. This
set of limitations works, but it's overly restrictive and hides the
core property we're trying to exploit.

The second implementation is part of aliasGEP() itself and tries to
find a common modulus in the scales, so it can then check that the
constant offset doesn't overlap under modular arithmetic. The second
implementation has the right idea of what the general problem is,
but effectively only considers power of two factors in the scales
(while aliasSameBasePointerGEPs also works with non-pow2 struct sizes.)

What this patch does is to adjust the aliasGEP() implementation to
instead find the largest common factor in all the scales (i.e. the GCD)
and use that as the modulus.

Differential Revision: https://reviews.llvm.org/D91027
2020-11-18 21:48:49 +01:00
Nikita Popov 85ccdcaa50 [BasicAA] Remove assert in AA evaluator
As reported in https://reviews.llvm.org/D91383#2401825, this
assert breaks external -aa-eval tests. We'll have to fix this
case before re-enabling it.
2020-11-18 20:04:38 +01:00
Simon Pilgrim eef203dbdf [Analysis] CGSCCPassManager.cpp - fix Wshadow warnings. NFCI. 2020-11-18 09:59:31 +00:00
Wei Wang 3279347da0 [BPI] Look through bitcasts in calcZeroHeuristic
Constant hoisting may hide the constant value behind bitcast for And's
operand. Track down the constant to make the BFI result consistent
regardless of hoisting.

Differential Revision: https://reviews.llvm.org/D91450
2020-11-17 09:33:05 -08:00
Nikita Popov cb4fc25c91 [BasicAA] Make alias GEP positive offset handling symmetric
aliasGEP() currently implements some special handling for the case
where all variable offsets are positive, in which case the constant
offset can be taken as the minimal offset. However, it does not
perform the same handling for the all-negative case. This means that
the alias-analysis result between two GEPs is asymmetric:
If GEP1 - GEP2 is all-positive, then GEP2 - GEP1 is all-negative,
and the first will result in NoAlias, while the second will result
in MayAlias.

Apart from producing sub-optimal results for one order, this also
violates our caching assumption. In particular, if BatchAA is used,
the cached result depends on the order of the GEPs in the first query.
This results in an inconsistency in BatchAA and AA results, which
is how I noticed this issue in the first place.

Differential Revision: https://reviews.llvm.org/D91383
2020-11-17 18:05:34 +01:00
Sander de Smalen f571fe6df5 Reland [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost.
This relands https://reviews.llvm.org/D91059 and reverts commit
30fded75b4.

GetRegUsage now returns 0 when Ty is not a valid vector element type.
2020-11-17 13:45:10 +00:00
Philip Reames 0f41a2fe83 test commit for new client 2020-11-16 17:26:52 -08:00
Michael Liao f375885ab8 [InferAddrSpace] Teach to handle assumed address space.
- In certain cases, a generic pointer could be assumed as a pointer to
  the global memory space or other spaces. With a dedicated target hook
  to query that address space from a given value, infer-address-space
  pass could infer and propagate that to all its users.

Differential Revision: https://reviews.llvm.org/D91121
2020-11-16 17:06:33 -05:00
Philip Reames 257d33c815 [SCEV] Factor out part of wrap flag detection logic [NFC](try 2)
This is a cut down version of 1ec6e1 which was reverted due to a compile time issue.  The key changes made from that patch: 1) only infer the flags needed along each path, 2) be careful to preserve order of checks, and 3) avoid computing NW flags at all since we need to prove the stronger property (does not cross 0) in the caller anyways.

Assuming this doesn't trip regressions, I'm going to try weakening (1).  My end objective is to move flag inference into addrec construction.  If I can't weaken (1) without compile time impact, I'll have a problem.
2020-11-16 12:07:21 -08:00
Kazu Hirata 147ccc848a [JumpThreading] Call eraseBlock when folding a conditional branch
This patch teaches the jump threading pass to call BPI->eraseBlock
when it folds a conditional branch.

Without this patch, BranchProbabilityInfo could end up with stale edge
probabilities for the basic block containing the conditional branch --
one edge probability with less than 1.0 and the other for a removed
edge.

This patch is one of the steps before we can safely re-apply D91017.

Differential Revision: https://reviews.llvm.org/D91511
2020-11-15 22:29:30 -08:00
Kazu Hirata c5cc2d8b94 [BranchProbabilityInfo] Use predecessors(BB) and successors(BB) (NFC) 2020-11-15 19:26:38 -08:00
Nikita Popov 3b7f84d97f [AA] Add missing AAQI parameter
This alias() call did not pass on the AAQueryInfo.
2020-11-15 20:29:53 +01:00
Nikita Popov 9ace4b337f Revert "[SCEV] Factor out part of wrap flag detection logic [NFC-ish]"
This reverts commit 1ec6e1eb8a.

This change causes a significant compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=dd0b8b94d0796bd895cc998dd163b4fbebceb0b8&to=1ec6e1eb8a084bffae8a40236eb9925d8026dd07&stat=instructions

I assume that this is due to the non-NFC part of the change, which
now performs expensive nowrap inference even for nowrap flags that
are not used by the particular code.
2020-11-15 10:19:44 +01:00
Philip Reames 1ec6e1eb8a [SCEV] Factor out part of wrap flag detection logic [NFC-ish]
In an effort to make code around flag determination more readable, and (possibly) prepare for a follow up change, factor out some of the flag detection logic.  In the process, reduce the number of locations we mutate wrap flags by a couple.

Note that this isn't NFC.  The old code tried for NSW xor (NUW || NW).  This is, two different paths computed different sets of wrap flags.  The new code will try for all three.  The result is that some expressions end up with a few extra flags set.
2020-11-14 19:21:05 -08:00
Nikita Popov 0b72444211 [BasicAA] Remove unnecessary size limitation
We're dropping a common offset from both GEPs here. It's not
necessary for the access sizes to be the same as well.
2020-11-14 16:51:31 +01:00
Nikita Popov 9a85643cd3 [KnownBits] Combine abs() implementations
ValueTracking was using a more powerful abs() implementation. Roll
it into KnownBits::abs(). Also add an exhaustive test for abs(),
in both the poisoning and non-poisoning variants.
2020-11-13 22:23:50 +01:00
Nikita Popov f3124a46c1 [SCEV] Fix nsw flags for GEP expressions
The SCEV code for constructing GEP expressions currently assumes
that the addition of the base and all the offsets is nsw if the GEP
is inbounds. While the addition of the offsets is indeed nsw, the
addition to the base address is not, as the base address is
interpreted as an unsigned value.

Fix the GEP expression code to not assume nsw for the base+offset
calculation. However, do assume nuw if we know that the offset is
non-negative. With this, we use the same behavior as the
construction of GEP addrecs does. (Modulo the fact that we
disregard SCEV unification, as the pre-existing FIXME points out).

Differential Revision: https://reviews.llvm.org/D90648
2020-11-13 18:19:32 +01:00
Nikita Popov 92b708902e [ValueTracking] Don't set nsw flag for inbounds addition
When computing the known bits for a GEP, don't set the nsw flag
when adding an offset to an address. The nsw flag only applies to
pure offset additions (see also D90708).

The nsw flag is only used in a very minor way by the code, to the
point that I was not able to come up with a test case where it
makes a difference.

Differential Revision: https://reviews.llvm.org/D90637
2020-11-13 17:58:21 +01:00
Piotr Sobczak 47dec5aa60 [DivergenceAnalysis] Use addRequiredTransitive
For querying divergence the chained analysis passes are required
to be alive, for instance LoopInfoWrapperPass.

Ensure that by using addRequiredTransitive.

Differential Revision: https://reviews.llvm.org/D91335
2020-11-13 14:40:00 +01:00
Simon Pilgrim 49623fa77a [ValueTracking] computeKnownBitsFromShiftOperator use KnownBits direct for constant shift amounts.
Let KnownBits shift handlers deal with out-of-range shift amounts.
2020-11-13 10:54:35 +00:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Max Kazantsev 0a1d394bf3 [NFC] Refactor loop-invariant getters to return Optional 2020-11-13 15:03:10 +07:00
Nikita Popov c00545dc32 [BasicAA] Remove checks for GEP decomposition limit reached
The GEP aliasing code currently checks for the GEP decomposition
limit being reached (i.e., we did not reach the "final" underlying
object). As far as I can see, these checks are not necessary. It is
perfectly fine to work with a GEP whose base can still be further
decomposed.

Looking back through the commit history, these checks were originally
introduced in 1a444489e9. However, I
believe that the problem this was intended to address was later
properly fixed with 1726fc698c, and
the checks are no longer necessary since then (and were not the
right fix in the first place).

Differential Revision: https://reviews.llvm.org/D91010
2020-11-12 20:43:38 +01:00
Jamie Schmeiser 5f672fefeb Reland: Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source
Summary:
Expand the print-memoryssa and print<memoryssa> passes with a new hidden
option -cfg-dot-mssa that names a file. When set, a dot-cfg style file
will be generated into the named file with the memoryssa comments retained
and those blocks containing them shown in light pink. The option does
nothing in isolation.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea), dblaikie (David Blaikie)

Differential Revision: https://reviews.llvm.org/D90638
2020-11-12 17:39:14 +00:00
Simon Pilgrim f72d350bfb [ValueTracking] Update computeKnownBitsFromShiftOperator callbacks to take KnownBits shift amount. NFCI.
We were creating this internally, but will need to support general KnownBits amounts as part of D90479.
2020-11-12 16:56:55 +00:00
Simon Pilgrim 8996742741 [KnownBits] Add KnownBits::makeConstant helper. NFCI.
Helper for cases where we need to create a KnownBits from a (fully known) constant value.
2020-11-12 16:16:04 +00:00
Anh Tuyen Tran a20b3620bb Revert "Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source"
This reverts commit 45d459e752 due to
build issue in Poly.
2020-11-12 15:48:14 +00:00
Jamie Schmeiser 45d459e752 Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source
Summary:
Expand the print-memoryssa and print<memoryssa> passes with a new hidden
option -cfg-dot-mssa that names a file. When set, a dot-cfg style file
will be generated into the named file with the memoryssa comments retained
and those blocks containing them shown in light pink. The option does
nothing in isolation.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea), dblaikie (David Blaikie)

Differential Revision: https://reviews.llvm.org/D90638
2020-11-12 15:41:16 +00:00
Simon Pilgrim 11c106544b [ValueTracking] Update computeKnownBitsFromShiftOperator callbacks to use KnownBits shift handling. NFCI. 2020-11-12 15:31:26 +00:00
Max Kazantsev 2734a9ebf4 [NFC][SCEV] Generalize monotonicity check for full and limited iteration space
A piece of logic of `isLoopInvariantExitCondDuringFirstIterations` is actually
a generalized predicate monotonicity check. This patch moves it into the
corresponding method and generalizes it a bit.

Differential Revision: https://reviews.llvm.org/D90395
Reviewed By: apilipenko
2020-11-12 12:37:07 +07:00
Arthur Eubanks d9cbceb041 [CGSCC][Inliner] Handle new non-trivial edges in updateCGAndAnalysisManagerForPass
Previously the inliner did a bit of a hack by adding ref edges for all
new edges introduced by performing an inline before calling
updateCGAndAnalysisManagerForPass(). This was because
updateCGAndAnalysisManagerForPass() didn't handle new non-trivial call
edges.

This adds handling of non-trivial call edges to
updateCGAndAnalysisManagerForPass().  The inliner called
updateCGAndAnalysisManagerForFunctionPass() since it was handling adding
newly introduced edges (so updateCGAndAnalysisManagerForPass() would
only have to handle promotion), but now it needs to call
updateCGAndAnalysisManagerForCGSCCPass() since
updateCGAndAnalysisManagerForPass() is now handling the new call edges
and function passes cannot add new edges.

We follow the previous path of adding trivial ref edges then letting promotion
handle changing the ref edges to call edges and the CGSCC updates. So
this still does not allow adding call edges that result in an addition
of a non-trivial ref edge.

This is in preparation for better detecting devirtualization. Previously
since the inliner itself would add ref edges,
updateCGAndAnalysisManagerForPass() would think that promotion and thus
devirtualization had happened after any sort of inlining.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91046
2020-11-11 13:43:49 -08:00
Sander de Smalen 30fded75b4 Revert "[LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost."
This reverts commits:
* [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost.
  b873aba394.
* [LoopVectorizer] Silence warning in GetRegUsage.
  9ff701100a.
2020-11-11 14:41:55 +00:00
Simon Pilgrim f6a326adef [ValueTracking] computeKnownBitsFromShiftOperator - merge zero/one callbacks to single KnownBits callback. NFCI.
Another cleanup for D90479 - handle the Known Ones/Zeros in a single callback, which will make it much easier to jump over to the KnownBits shift handling.
2020-11-11 14:22:42 +00:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
Sander de Smalen b873aba394 [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost.
This is more accurate than dividing the bitwidth based on the element count by the
maximum register size, as it can just reuse whatever has been calculated for
legalization of these types.

This change is also necessary when calculating register usage for scalable vectors, where
the legalization of these types cannot be done based on the widest register size, because
that does not take the 'vscale' component into account.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D91059
2020-11-11 10:18:50 +00:00
Max Kazantsev 7dcc889917 [SCEV] Generalize no-self-wrap check in isLoopInvariantExitCondDuringFirstIterations
Lift limitation on step being `+/- 1`. In fact, the only thing it is needed for
is proving no-self-wrap. We can instead check this flag directly.

Theoretically it can increase the scope of the transform, but I could not
construct such test easily.

Differential Revision: https://reviews.llvm.org/D91126
Reviewed By: apilipenko
2020-11-11 11:17:13 +07:00
Kazu Hirata 21fbe2ee68 Revert "[BranchProbabilityInfo] Use SmallVector (NFC)"
This reverts commit 2f1038c7b6.
2020-11-10 19:17:13 -08:00
Simon Pilgrim 46a734621d [ValueTracking] computeKnownBitsFromShiftOperator - always return with Known2 containing the shifted value source. NFCI.
As detailed on D90479, in most circumstances we will always call computeKnownBits for Op0, so always perform this by pulling out the duplicate calls.
2020-11-10 17:03:17 +00:00
Simon Pilgrim 929a127932 [ValueTracking] computeKnownBitsFromShiftOperator - consistently use Known2 for the shifted value. NFCI.
Minor cleanup as part of getting D90479 moving again.
2020-11-10 17:03:17 +00:00
Kazu Hirata 85cd7ffade [BranchProbabilityInfo] Use a range-based for loop (NFC) 2020-11-10 09:00:18 -08:00
David Green b2ac9681a7 [ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR

This will hopefully mean that low overhead loops are more tied together,
and we can more reliably generate loops without reverting or being at
the whims of the register allocator.

This is a fairly simple change in itself, but leads to a number of other
required alterations.

 - The hardware loop pass, if UsePhi is set, now generates loops of the
   form:
       %start = llvm.start.loop.iterations(%N)
     loop:
       %p = phi [%start], [%dec]
       %dec = llvm.loop.decrement.reg(%p, 1)
       %c = icmp ne %dec, 0
       br %c, loop, exit
 - For this a new llvm.start.loop.iterations intrinsic was added, identical
   to llvm.set.loop.iterations but produces a value as seen above, gluing
   the loop together more through def-use chains.
 - This new instrinsic conceptually produces the same output as input,
   which is taught to SCEV so that the checks in MVETailPredication are not
   affected.
 - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has
   been left mostly as before. We should now more reliably be able to tell
   that the t2DoLoopStart is correct without having to prove it, but
   t2WhileLoopStart and tail-predicated loops will remain the same.
 - And all the tests have been updated. There are a lot of them!

This patch on it's own might cause more trouble that it helps, with more
tail-predicated loops being reverted, but some additional patches can
hopefully improve upon that to get to something that is better overall.

Differential Revision: https://reviews.llvm.org/D89881
2020-11-10 15:57:58 +00:00
Max Kazantsev 3ec69c16c3 [NFC] Different way of getting step 2020-11-10 13:48:02 +07:00
Max Kazantsev 6022a8b7e8 [SCEV] Drop cached ranges of AddRecs after flag update
Our range computation methods benefit from no-wrap flags. But if the ranges
were first computed before the flags were set, the cached range will be too
pessimistic.

We need to drop cached ranges whenever we sharpen AddRec's no wrap flags.

Differential Revision: https://reviews.llvm.org/D89847
Reviewed By: fhahn
2020-11-10 12:37:12 +07:00
Kazu Hirata 2f1038c7b6 [BranchProbabilityInfo] Use SmallVector (NFC)
This patch simplifies BranchProbabilityInfo by changing the type of
Probs.

Without this patch:

  DenseMap<Edge, BranchProbability> Probs

maps an ordered pair of a BasicBlock* and a successor index to an edge
probability.

With this patch:

  DenseMap<const BasicBlock *, SmallVector<BranchProbability, 2>> Probs

maps a BasicBlock* to a vector of edge probabilities.

BranchProbabilityInfo has a property that for a given basic block, we
either have edge probabilities for all successors or do not have any
edge probability at all.  This property combined with the current map
type leads to a somewhat complicated algorithm in eraseBlock to erase
map entries one by one while increasing the successor index.

The new map type allows us to remove the all edge probabilities for a
given basic block in a more intuitive manner, namely:

  Probs.erase(BB);

Differential Revision: https://reviews.llvm.org/D91017
2020-11-09 17:29:40 -08:00
Michael Liao fa5d31f825 [GlobalsAA] Teach to handle `addrspacecast`. 2020-11-09 00:04:52 -05:00
Sanjay Patel 00808e321c [InstSimplify] allow vector folds for (Pow2C << X) == NonPow2C
Existing pre-conditions seem to be correct:
https://rise4fun.com/Alive/lCLB

  Name: non-zero C1
  Pre: !isPowerOf2(C1) && isPowerOf2(C2) && C1 != 0
  %sub = shl i8 C2, %X
  %cmp = icmp eq i8 %sub, C1
  =>
  %cmp = false

  Name: one == C2
  Pre: !isPowerOf2(C1) && isPowerOf2(C2) && C2 == 1
  %sub = shl i8 C2, %X
  %cmp = icmp eq i8 %sub, C1
  =>
  %cmp = false

  Name: nuw
  Pre: !isPowerOf2(C1) && isPowerOf2(C2)
  %sub = shl nuw i8 C2, %X
  %cmp = icmp eq i8 %sub, C1
  =>
  %cmp = false

  Name: nsw
  Pre: !isPowerOf2(C1) && isPowerOf2(C2)
  %sub = shl nsw i8 C2, %X
  %cmp = icmp eq i8 %sub, C1
  =>
  %cmp = false
2020-11-08 09:52:05 -05:00
Nikita Popov 4b860240a6 [BasicAA] Unify struct/other offset (NFC)
The distinction between StructOffset and OtherOffset has been
originally introduced by 82069c44ca,
which applied different reasoning to both offset kinds. However,
this distinction was not actually correct, and has been fixed by
c84e77aeae. Since then, we only ever
consider the sum StructOffset + OtherOffset, so we may as well
store it in that form directly.
2020-11-07 18:56:05 +01:00
Nikita Popov 784937b9bb [BasicAA] Use smul_ov helper (NFCI)
Instead of performing the multiplication in double the bit width
and using active bits to determine overflow, use the existing
smul_ov() APInt method to detect overflow.

The smul_ov() implementation is not particularly efficient, but
it's still better than doing this a wide, usually 128-bit, type.
2020-11-07 18:14:48 +01:00
Nikita Popov 57b3bc8c60 [CaptureTracking] Add statistics (NFC)
Add basic statistics on the number of pointers that have been
determined to maybe capture / not capture.
2020-11-07 12:57:00 +01:00
Nikita Popov f63ab188c6 [CaptureTracking] Early abort on too many uses (NFCI)
If there are too many uses, we should directly return -- there's
no point in inspecting the remaining uses in the worklist, as we
have to conservatively assume a capture anyway. This also means
that tooManyUses() gets called exactly once, rather than
potentially many times.

This restores the behavior prior to e9832dfdf3,
where this was accidentally changed while moving the AddUses logic
into a closure, thus making the return a return from the closure
rather than the whole function.
2020-11-07 11:52:08 +01:00
Nikita Popov d35366bcca [CaptureTracking] Correctly handle multiple uses in one instruction
If the same value is used multiple times in the same instruction,
CaptureTracking may end up reporting the wrong use as being captured,
and/or report the same use as being captured multiple times.

Make sure that all checks take the use operand number into account,
rather than performing unreliable comparisons against the used value.

I'm not sure whether this can cause any problems in practice, but
at least some capture trackers (ArgUsesTracker, AACaptureUseTracker)
do care about which call argument is captured.
2020-11-07 11:31:20 +01:00
Nikita Popov bac97993ca [CaptureTracking] Avoid duplicate shouldExplode() check (NFCI)
We check shouldExplore() before adding uses to the worklist, so
uses that should not be explored will not reach captured() in the
first place.
2020-11-07 10:16:58 +01:00
Kazu Hirata 118c3f3cf2 [BranchProbabilityInfo] Simplify getEdgeProbability (NFC)
The patch simplifies BranchProbabilityInfo::getEdgeProbability by
handling two cases separately, depending on whether we have edge
probabilities.

- If we have edge probabilities, then add up probabilities for
  successors being equal to Dst.

- Otherwise, return the number of ocurrences divided by the total
  number of successors.

Differential Revision: https://reviews.llvm.org/D90980
2020-11-06 22:47:22 -08:00
Kazu Hirata 30929d1f7b [BranchProbabilityInfo] Use succ_size (NFC) 2020-11-06 11:05:35 -08:00
Simon Pilgrim 20f87d82ed [InstCombine] computeKnownBitsMul - use KnownBits::isNonZero() helper.
Avoid an expensive isKnownNonZero() call - this is a small cleanup before moving the extra NSW functionality from computeKnownBitsMul into KnownBits::computeForMul.
2020-11-06 17:27:13 +00:00
Yevgeny Rouban 681d6c711f [BranchProbabilityInfo] Introduce method copyEdgeProbabilities(). NFC
A new method is introduced to allow bulk copy of outgoing edge
probabilities from one block to another. This can be useful when
a block is cloned from another one and we do not know if there
are edge probabilities set for the original block or not.
Copying outside of the BranchProbabilityInfo class makes the user
unconditionally set the cloned block's edge probabilities even if
they are unset for the original block.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D90839
2020-11-06 14:52:35 +07:00
Yevgeny Rouban e38c8e7590 [BranchProbabilityInfo] Remove block handles in eraseBlock()
BranchProbabilityInfo::eraseBlock() is a public method and
can be called without deleting the block itself.
This method is made remove the correspondent tracking handle
from BranchProbabilityInfo::Handles along with
the probabilities of the block. Handles.erase() call is moved
to eraseBlock().
In setEdgeProbability() we need to add the block handle only once.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D90838
2020-11-06 13:13:58 +07:00
Yevgeny Rouban 4931158d27 [BranchProbabilityInfo] Get rid of MaxSuccIdx. NFC
This refactoring allows to eliminate the MaxSuccIdx map
proposed in the commit a7b662d0.
The idea is to remove probabilities for a block BB for
all its successors one by one from first, second, ...
till N-th until they are defined in Probs. This works
because probabilities for the block are set at once for
all its successors from number 0 to N-1 and the rest
are removed if there were stale probs.
The protected method setEdgeProbability(), which set
probabilities for individual successor, is removed.
This makes it clear that the probabilities are set in
bulk by the public method with the same name.

Reviewed By: kazu, MaskRay

Differential Revision: https://reviews.llvm.org/D90837
2020-11-06 12:21:24 +07:00
Anna Thomas afe92642cc Revert "[CaptureTracking] Avoid overly restrictive dominates check"
This reverts commit 15694fd6ad.
Need to investigate and fix a failing clang test: synchronized.m.
Might need a test update.
2020-11-05 12:27:15 -05:00
Anna Thomas 15694fd6ad [CaptureTracking] Avoid overly restrictive dominates check
CapturesBefore tracker has an overly restrictive dominates check when
the `BeforeHere` and the capture point are in different basic blocks.
All we need to check is that there is no path from the capture point
to `BeforeHere` (which is less stricter than the dominates check).
See added testcase in one of the users of CapturesBefore.

Reviewed-By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90688
2020-11-05 11:38:50 -05:00
Simon Pilgrim 6729b6de1f [KnownBits] Move ValueTracking SREM KnownBits handling to KnownBits::srem. NFCI.
Move the ValueTracking implementation to KnownBits, the SelectionDAG version is more limited so I'm intending to replace that as a separate commit.
2020-11-05 14:58:33 +00:00
Simon Pilgrim e237d56b43 [KnownBits] Move ValueTracking/SelectionDAG UREM KnownBits handling to KnownBits::urem. NFCI.
Both these have the same implementation - so move them to a single KnownBits copy.

GlobalISel will be able to use this as well with minimal effort.
2020-11-05 14:30:59 +00:00
Simon Pilgrim 32bee18b84 [KnownBits] Move ValueTracking/SelectionDAG UDIV KnownBits handling to KnownBits::udiv. NFCI.
Both these have the same implementation - so move them to a single KnownBits copy.

GlobalISel will be able to use this as well with minimal effort.
2020-11-05 13:42:42 +00:00
Max Kazantsev ab7ef35d34 Revert "[SCEV] Handle non-positive case in isImpliedViaOperations"
This reverts commit 8dc98897c4.

Commited by mistake.
2020-11-05 11:27:55 +07:00
Max Kazantsev 8dc98897c4 [SCEV] Handle non-positive case in isImpliedViaOperations
We already handle non-negative case there. Add support for non-positive.
2020-11-05 11:07:37 +07:00
Atmn Patel cea0599aa7 [LangRef] Adds llvm.loop.mustprogress loop metadata
This patch adds the llvm.loop.mustprogress loop metadata. This is to be
added to loops where the frontend language requires that the loop makes
observable interactions with the environment. This is the loop-level
equivalent to the function attribute `mustprogress` defined in D86233.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D88464
2020-11-04 22:32:50 -05:00
Nikita Popov 52b86d35a4 [MemorySSA] Use provided memory location even if instruction is call
If getClobberingMemoryAccess() is called with an explicit
MemoryLocation, but the starting access happens to be a call, the
provided location is currently ignored, and alias analysis queries
will be performed against the call instruction instead. Something
similar happens if the starting access is a load with a MemoryDef.

Change the implementation to not set Q.Inst in the first place if
we want to perform a MemoryLocation-based query, to make sure it
can't be turned into an Instruction-based query along the way...

Additionally, remove the special handling that lifetime.start
intrinsics currently get. They simply report NoAlias for clobbers
between lifetime.start and other calls, but that's obviously not
right if the other call is something like a memset or memcpy. The
default behavior we get from getModRefInfo() will already do the
right thing here.

Differential Revision: https://reviews.llvm.org/D88782
2020-11-04 20:30:22 +01:00
Sanjay Patel c74db55ff5 [InstSimplify] allow vector folds for icmp Pred (1 << X), 0x80 2020-11-04 08:12:48 -05:00
Arthur Eubanks 06926e0f01 Port print-must-be-executed-contexts and print-mustexecute to NPM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90207
2020-11-03 21:06:46 -08:00
Fangrui Song 491dd2711f [LazyCallGraph] Build SCCs of the reference graph in order
```
// The legacy PM CGPassManager discovers SCCs this way:
for function in the source order
  tarjanSCC(function)

// While the new PM CGSCCPassManager does:
for function in the reversed source order [1]
  discover a reference graph SCC
  build call graph SCCs inside the reference graph SCC
```

In the common cases, reference graph ~= call graph, the new PM order is
undesired because for `a | b | c` (3 independent functions), the new PM will
process them in the reversed order: c, b, a. If `a <-> b <-> c`, we can see
that `-print-after-all` will report the sole SCC as `scc: (c, b, a)`.

This patch corrects the iteration order. The discovered SCC order will match
the legacy PM in the common cases.

For some tests (`Transforms/Inline/cgscc-*.ll` and
`unittests/Analysis/CGSCCPassManagerTest.cpp`), the behaviors are dependent on
the SCC discovery order and there are too many check lines for the particular
order.  This patch simply reverses the function order to avoid changing too many
check lines.

Differential Revision: https://reviews.llvm.org/D90566
2020-11-02 13:22:42 -08:00
Florian Hahn b3b993a7ad Reland "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts the revert commit 408c4408fa.

This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.

Original message:

On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
2020-11-02 15:39:29 +00:00
Nikita Popov cc91554ebb [SCEV] Delay strengthening of nowrap flags
Strengthening nowrap flags is relatively expensive. Make sure we
only do it if we're actually going to use the flags -- we don't
use them for many recursive invocations. Additionally, if we're
reusing an existing SCEV node, there's no point in trying to
strengthen the flags if we don't have any new baseline facts.

This change falls slightly short of being NFC, because the way
flags during add+addrec / mul+addrec folding are handled may be
more precise (as less operands are included in the calculation).
2020-11-01 22:18:07 +01:00
Nikita Popov 6ec56467cb [SCEV] Construct GEP expression more efficiently (NFCI)
Instead of performing a sequence of pairwise additions, directly
construct a multi-operand add expression.

This should be NFC modulo any SCEV canonicalization deficiencies.
2020-11-01 19:00:57 +01:00
Florian Hahn 799033d8c5 Reland "[SLP] Consider alternatives for cost of select instructions."
This reverts the revert commit a1b53db324.

This patch includes a fix for a reported issue, caused by
matchSelectPattern returning UMIN for selects of pointers in
some cases by looking to some connected casts.

For now, ensure integer instrinsics are only returned for selects of
ints or int vectors.
2020-10-31 16:52:36 +00:00
Arthur Eubanks 5c31b8b94f Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit 10f2a0d662.

More uint64_t overflows.
2020-10-31 00:25:32 -07:00
Florian Hahn a1b53db324 Revert "[SLP] Consider alternatives for cost of select instructions."
This reverts commit 1922570489.

This appears to cause a crash in the following example

 a, b, c;
 l() {
   int e = a, f = l, g, h, i, j;
   float *d = c, *k = b;
   for (;;)
     for (; g < f; g++) {
       k[h] = d[i];
       k[h - 1] = d[j];
       h += e << 1;
       i += e;
     }
 }

 clang -cc1 -triple i386-unknown-linux-gnu -emit-obj -target-cpu pentium-m -O1 -vectorize-loops -vectorize-slp reduced.c

 llvm::Type *llvm::Type::getWithNewBitWidth(unsigned int) const: Assertion `isIntOrIntVectorTy() && "Original type expected to be a vector of integers or a scalar integer."' failed.
2020-10-30 21:26:14 +00:00
Florian Hahn 408c4408fa Revert "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts commit 73f01e3df5.

This appears to break
http://lab.llvm.org:8011/#/builders/85/builds/383.
2020-10-30 21:26:14 +00:00
Anna Thomas 7aac3a9048 [CFG] Replace hardcoded max BBs explored as CL option. NFC.
This option was hardcoded to 32. Changing this as a CL option since we
have seen some cases downstream where increasing this limit allows us to
disprove reachability.

Reviewed-By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90487
2020-10-30 15:11:48 -04:00
Arthur Eubanks 10f2a0d662 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-30 10:03:46 -07:00
Roman Lebedev ef22d500f7
[NFCI][SCEV] getPtrToIntExpr(): use SCEVRewriteVisitor<> for ptrtoint cast sinking
This is functionally-identical to the previous implementation,
just using a generic interface to do that instead of hand-rolled one,
with caching as a bonus. Thought the sinking is still recursive..

Note that SCEVRewriteVisitor<>'s default implementations
don't preserve NoWrap flags on Add/Mul (but does on AddRec!),
but here we know we can preserve them,
so `visitAddExpr()`/`visitMulExpr()` are specialized.
2020-10-30 17:05:14 +03:00
Florian Hahn 73f01e3df5 [TTI] Add VecPred argument to getCmpSelInstrCost.
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.

Reviewed By: dmgreen, RKSimon

Differential Revision: https://reviews.llvm.org/D90070
2020-10-30 13:49:08 +00:00
Roman Lebedev b4916918e5
[SCEV] SCEVPtrToIntExpr simplifications
If we've got an SCEVPtrToIntExpr(op), where op is not an SCEVUnknown,
we want to sink the SCEVPtrToIntExpr into an operand,
so that the operation is performed on integers,
and eventually we end up with just an `SCEVPtrToIntExpr(SCEVUnknown)`.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89692
2020-10-30 11:13:35 +03:00
Roman Lebedev 81fc53a36a
[SCEV] Introduce SCEVPtrToIntExpr (PR46786)
And use it to model LLVM IR's `ptrtoint` cast.

This is essentially an alternative to D88806, but with no chance for
all the problems it caused due to having the cast as implicit there.
(see rG7ee6c402474a2f5fd21c403e7529f97f6362fdb3)

As we've established by now, there are at least two reasons why we want this:
* It will allow SCEV to actually model the `ptrtoint` casts
  and their operands, instead of treating them as `SCEVUnknown`
* It should help with initial problem of PR46786 - this should eventually allow us
  to not loose pointer-ness of an expression in more cases

As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=46786 | PR46786 ]], in principle,
we could just extend `SCEVUnknown` with a `is ptrtoint` cast, because `ScalarEvolution::getPtrToIntExpr()`
should sink the cast as far down into the expression as possible,
so in the end we should always end up with `SCEVPtrToIntExpr` of `SCEVUnknown`.

But i think that it isn't the best solution, because it doesn't really matter
from memory consumption side - there probably won't be *that* many `SCEVPtrToIntExpr`s
for it to matter, and it allows for much better discoverability.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89456
2020-10-30 11:13:35 +03:00
Alina Sbirlea 9d93b150c9 [AA] Pass query info.
Pass AAQI in places where it was missed.

Part of D89991.
Author: haoranxu510 (Haoran Xu)
2020-10-29 18:07:53 -07:00
Stefanos Baziotis a3345300b6 [LCSSA] Doc for special treatment of PHIs
Differential Revision: https://reviews.llvm.org/D89739
2020-10-29 22:50:07 +02:00
Florian Hahn 1922570489 [SLP] Consider alternatives for cost of select instructions.
Some architectures do not have general vector select instructions (e.g.
AArch64). But some cmp/select patterns can be vectorized using other
instructions/intrinsics.

One example is using min/max instructions for certain patterns.

This patch updates the cost calculations for selects in the SLP
vectorizer to consider using min/max intrinsics.

This patch does not change SLP vectorizer's codegen itself to actually
generate those intrinsics, but relies on the backends to lower the
vector cmps & selects. This keeps things simple on the SLP side and
works well in practice for AArch64.

This exposes additional SLP vectorization opportunities in some
benchmarks on AArch64 (-O3 -flto).

Metric: SLP.NumVectorInstructions

Program                                        base    slp     diff
 test-suite...ications/JM/ldecod/ldecod.test   502.00  697.00  38.8%
 test-suite...ications/JM/lencod/lencod.test   1023.00 1414.00 38.2%
 test-suite...-typeset/consumer-typeset.test    56.00   65.00  16.1%
 test-suite...6/464.h264ref/464.h264ref.test   804.00  822.00   2.2%
 test-suite...006/453.povray/453.povray.test   3335.00 3357.00  0.7%
 test-suite...CFP2000/177.mesa/177.mesa.test   2110.00 2121.00  0.5%
 test-suite...:: External/Povray/povray.test   2378.00 2382.00  0.2%

Reviewed By: RKSimon, samparker

Differential Revision: https://reviews.llvm.org/D89969
2020-10-29 20:39:50 +00:00
Max Kazantsev 3fc601b641 [NFC][SCEV] Use generic predicate checkers to simplify code 2020-10-29 18:12:28 +07:00
Florian Hahn 88d6421e4c [SCEV] Match 'zext (trunc A to iB) to iY' as URem.
URem operations with constant power-of-2 second operands are modeled as
such. This patch on its own has very little impact (e.g. no changes in
CodeGen for MultiSource/SPEC2000/SPEC2006 on X86 -O3 -flto), but I'll
soon post follow-up patches that make use of it to more accurately
determine the trip multiple.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89821
2020-10-29 10:46:52 +00:00
Max Kazantsev ef129f01e9 [SCEV][NFC] Use general predicate checkers in monotonicity check
This makes the code more compact and readable.
2020-10-29 16:45:52 +07:00
Max Kazantsev a5b2e795c3 [NFC][SCEV] Refactor monotonic predicate checks to return enums instead of bools
This patch gets rid of output parameter which is not needed for most users
and prepares this API for further refactoring.
2020-10-29 16:01:25 +07:00
Philip Reames 4e4abd16a7 [Deref] Use maximum trip count instead of exact trip count
When trying to prove that a memory access touches only dereferenceable memory across all iterations of a loop, use the maximum exit count rather than an exact one.  In many cases we can't prove exact exit counts whereas we can prove an upper bound.

The test included is for a single exit loop with a min(C,V) exit count, but the true motivation is support for multiple exits loops.  It's just really hard to write a test case for multiple exits because the vectorizer (the primary user of this API), bails far before this.  For multiple exits, this allows a mix of analyzeable and unanalyzable exits when only analyzeable exits are needed to prove deref.
2020-10-28 14:33:30 -07:00
Dávid Bolvanský 49cddb90f6 [MemLoc] Adjust memccpy support in MemoryLocation::getForArgument
Use LocationSize::upperBound instead of precise since we only know an upper bound on the number of bytes read/written.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D89885
2020-10-28 21:26:10 +01:00
Max Kazantsev 160a453138 Return "[IndVars] Remove monotonic checks with unknown exit count"
This reverts commit e038b60d91.
This reverts commit a0d84d8031.

This revert was a mistake. The reason of the failures was
"Use uint64_t for branch weights instead of uint32_t"

Differential Revision: https://reviews.llvm.org/D87832
2020-10-28 18:51:40 +07:00
Max Kazantsev 5ef84688fb Re-enable "[SCEV] Prove implications of different type via truncation"
When we need to prove implication of expressions of different type width,
the default strategy is to widen everything to wider type and prove in this
type. This does not interact well with AddRecs with negative steps and
unsigned predicates: such AddRec will likely not have a `nuw` flag, and its
`zext` to wider type will not be an AddRec. In contraty, `trunc` of an AddRec
in some cases can easily be proved to be an `AddRec` too.

This patch introduces an alternative way to handling implications of different
type widths. If we can prove that wider type values actually fit in the narrow type,
we truncate them and prove the implication in narrow type.

The return was due to revert of underlying patch that this one depends on.

Unit test temporarily disabled because the required logic in SCEV is switched
off due to compile time reasons.

Differential Revision: https://reviews.llvm.org/D89548
2020-10-28 16:02:14 +07:00
Luqman Aden 4c0a016927 Rename EHPersonality::MSVC_Win64SEH to EHPersonality::MSVC_TableSEH. NFC.
The types of SEH aren't x86(-32) vs x64 but rather stack-based exception chaining
vs table-based exception handling. x86-32 is the only arch for which Windows
uses the former. 32-bit ARM would use what is called Win64SEH today, which
is a bit confusing so instead let's just rename it to be a bit more clear.

Reviewed By: compnerd, rnk

Differential Revision: https://reviews.llvm.org/D90117
2020-10-27 23:22:13 -07:00
Max Kazantsev 624fc63a05 [SCEV] Re-enable "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 3
We can sharpen the range of a AddRec if we know that it does not
self-wrap and know the symbolic iteration count in the loop. If we can
evaluate the value of AddRec on the last iteration and prove that at least
one its intermediate value lies between start and end, then no-wrap flag
allows us to conclude that all of them also lie between start and end. So
the estimate of range can be improved to union of ranges of start and end.

Switched off by default, can be turned on by flag.

Differential Revision: https://reviews.llvm.org/D89381
Reviewed By: lebedev.ri, nikic
2020-10-28 12:39:41 +07:00
Fangrui Song d69ada30e2 [BranchProbabilityInfo] Make MaxSuccIdx[Src] efficient and add a comment about the subtle eraseBlock. NFC
Follow-up to D90272.
2020-10-27 16:29:23 -07:00
Kazu Hirata a7b662d0f4 [BranchProbabilityInfo] Fix eraseBlock
This patch ensures that BranchProbabilityInfo::eraseBlock(BB) deletes
all entries in Probs associated with with BB.

Without this patch, stale entries for BB may remain in Probs after
eraseBlock(BB), leading to a situation where a newly created basic
block has an edge probability associated with it even before the pass
responsible for creating the basic block adds any edge probability to
it.

Consider the current implementation of eraseBlock(BB):

  for (const_succ_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I) {
    auto MapI = Probs.find(std::make_pair(BB, I.getSuccessorIndex()));
    if (MapI != Probs.end())
      Probs.erase(MapI);
  }

Notice that it uses succ_begin(BB) and succ_end(BB), which are based
on BB->getTerminator().  This means that if the terminator changes
between calls to setEdgeProbability and eraseBlock, then we may not
examine all pairs associated with BB.

This is exactly what happens in MaybeMergeBasicBlockIntoOnlyPred,
which merges basic blocks A into B if A is the sole predecessor of B,
and B is the sole successor of A.  It replaces the terminator of A
with UnreachableInst before (indirectly) calling eraseBlock(A).

The patch fixes the problem by keeping track of all edge probablities
entered with setEdgeProbability in a map from BasicBlock* to a
successor index.

Differential Revision: https://reviews.llvm.org/D90272
2020-10-27 16:14:25 -07:00
Raphael Isemann e038b60d91 Revert "[IndVars] Remove monotonic checks with unknown exit count"
This reverts commit c6ca26c0bf.
This breaks stage2 builds due to hitting this assert:
```
   Assertion failed: (WeightSum <= UINT32_MAX && "Expected weights to scale down to 32 bits"), function calcMetadataWeights
```
when compiling AArch64RegisterBankInfo.cpp in LLVM.
2020-10-27 15:31:37 +01:00
Nico Weber 2a4e704c92 Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit e5766f25c6.
Makes clang assert when building Chromium, see https://crbug.com/1142813
for a repro.
2020-10-27 09:26:21 -04:00
Alex Richardson d323c8f791 [ValueTracking][NFC] Use Log2(Align) instead of countTrailingZeroes
The latter can probably be optimized to the same final code, but this might
help -O0 builds.
2020-10-27 12:16:45 +00:00
Shimin Cui 22e4346e05 [ValueTracking] Add tracking of the alignment assume bundle
This patch is to add the support of the value tracking of the alignment assume bundle.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D88669
2020-10-27 12:16:45 +00:00
Max Kazantsev c6ca26c0bf [IndVars] Remove monotonic checks with unknown exit count
Even if the exact exit count is unknown, we can still prove that this
exit will not be taken. If we can prove that the predicate is monotonic,
fulfilled on first & last iteration, and no overflow happened in between,
then the check can be removed.

Differential Revision: https://reviews.llvm.org/D87832
Reviewed By: apilipenko
2020-10-27 11:35:16 +07:00
Arthur Eubanks e5766f25c6 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-26 20:24:04 -07:00
Chen Zheng 00e573cadb [LSR] fix typo in comments and rename for a new added hook. 2020-10-26 22:29:22 -04:00
Duncan P. N. Exon Smith d4c667c9af Avoid unnecessary uses of `MDNode::getTemporary`, NFC
This is a long-delayed follow-up to
5e5b85098d.

`TempMDNode` includes a bunch of machinery for RAUW, and should only be
used when necessary. RAUW wasn't being used in any of these cases... it
was just a placeholder for a self-reference.

Where the real node was using `MDNode::getDistinct`, just replace the
temporary argument with `nullptr`.

Where the real node was using `MDNode::get`, the `replaceOperandWith`
call was "promoting" the node to a distinct one implicitly due to
self-reference detection in `MDNode::handleChangedOperand`. The
`TempMDNode` was serving a purpose by delaying uniquing, but it's way
simpler to just call `MDNode::getDistinct` in the first place.

Note that using a self-reference at all in these places is a hold-over
from before `distinct` metadata existed. It was an old trick to create
distinct nodes. It would be intrusive to change, including bitcode
upgrades, etc., and it's harmless so I'm not sure there's much value in
removing it from existing schemas. After this commit it still has a tiny
memory cost (in the extra metadata operand) but no more overhead in
construction.

Differential Revision: https://reviews.llvm.org/D90079
2020-10-26 17:03:25 -04:00
Joe Ellis bf60bb26ec [SVE] Fix TypeSize warning in llvm::getGEPInductionOperand
We do not need to use the implicit cast here. We can instead can rely on
a comparison between two TypeSize objects instead. This algorithm will
work fine with scalable vectors.

Reviewed By: DavidTruby

Differential Revision: https://reviews.llvm.org/D90146
2020-10-26 17:40:32 +00:00
Joe Ellis 467e5cf40f [SVE][AArch64] Fix TypeSize warning in loop vectorization legality
The warning would fire when calling isDereferenceableAndAlignedInLoop
with a scalable load. Calling isDereferenceableAndAlignedInLoop with a
scalable load would result in the use of the now deprecated implicit
cast of TypeSize to uint64_t through the overloaded operator.

This patch fixes this issue by:

- no longer considering vector loads as candidates in
  canVectorizeWithIfConvert. This doesn't make sense in the context of
  identifying scalar loads to vectorize.

- making use of getFixedSize inside isDereferenceableAndAlignedInLoop --
  this removes the dependency on the deprecated interface, and will
  trigger an assertion error if the function is ever called with a
  scalable type.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D89798
2020-10-26 17:40:04 +00:00
Nikita Popov ebeef022aa [SCEV] Strenthen nowrap flags after constant folding for mul exprs
Same change as 0dda633317, but for
mul expressions. We want to first fold any constant operans and
then strengthen the nowrap flags, as we can compute more precise
flags at that point.
2020-10-25 19:43:58 +01:00
Nikita Popov 1ff313f098 [SCEV] Always constant fold mul expression operands
Establish parity with the handling of add expressions, by always
constant folding mul expression operands before checking the depth
limit (this is a non-recursive simplification). The code was already
unconditionally constant folding the case where all operands were
constants, but was not folding multiple constant operands together
if there were also non-constant operands.

This requires picking out a different demonstration for depth-based
folding differences in the limit-depth.ll test.
2020-10-25 18:50:06 +01:00
Nikita Popov 22a5cde541 [SCEV] Separate out constant folding in mul expr creation
Separate out the code handling constant folding into a separate
block, that is independent of other folds that need a constant
first operand. Also make some minor adjustments to make the
constant folding look nearly identical to the same code in
getAddExpr().

The only reason this change is not strictly NFC is that the
C1*(C2+V) fold is moved below the constant folding, which means
that it now also applies to C1*C2*(C3+V), as it should.
2020-10-25 18:46:50 +01:00
Nikita Popov 0dda633317 [SCEV] Strength nowrap flags after constant folding
We should first try to constant fold the add expression and only
strengthen nowrap flags afterwards. This allows us to determine
stronger flags if e.g. only two operands are left after constant
folding (and thus "guaranteed no wrap region" code applies) or the
resulting operands are non-negative and thus nsw->nuw strengthening
applies.
2020-10-25 18:00:22 +01:00
Sanjay Patel e77ba263fe [InstSimplify] peek through 'not' operand in logic-of-icmps fold
This extends D78430 to solve cases like:
https://llvm.org/PR47858

There are still missed opportunities shown in the tests,
and as noted in the earlier patches, we have related
functionality in InstCombine, so we may want to extend
other folds in a similar way.

A semi-random sampling of test diff proofs in this patch:
https://rise4fun.com/Alive/sS4C
2020-10-25 11:13:30 -04:00
Nikita Popov 1a7a9efec3 [BasicAA] Avoid duplicate cache lookup (NFCI)
Rather than performing the cache lookup with both possible orders
for the locations, use the same canonicalization as the other
AliasCache lookups in BasicAA.
2020-10-24 10:19:02 +02:00
Nikita Popov d09c592142 [BasicAA] Fix caching in the presence of phi cycles
Any time we insert a block into VisitedPhiBBs, previously cached
values may no longer be valid for the recursive alias queries. As
such, perform them using an empty AAQueryInfo.

Note that if we recurse to the same phi, the block will already
be inserted, so we reuse the old AAQueryInfo, and thus still
protect against infinite recursion.

This problem can appear with with an without BatchAA, but is more
likely to occur with BatchAA, as more values are cached.

Differential Revision: https://reviews.llvm.org/D90066
2020-10-24 09:58:02 +02:00
Nikita Popov dd887d97ce [PhiValues] Use SetVector to avoid non-determinism
I'm not sure whether this can cause actual non-determinism in the
compiler output, but at least it causes non-determinism in the
statistics collected by BasicAA.

Use SetVector to have a predictable iteration order.
2020-10-23 20:14:02 +02:00
Arthur Eubanks 5668eda864 Revert "[CGSCC] Detect devirtualization in more cases"
This reverts commit 3024fe5b55.

Causes major compile time regressions:
https://llvm-compile-time-tracker.com/compare.php?from=3b8d8954bf2c192502d757019b9fe434864068e9&to=3024fe5b55ed72633915f613bd5e2826583c396f&stat=instructions
2020-10-23 09:53:52 -07:00
Chen Zheng 1e0b6c1df0 [LSR] ignore profitable chain when reg num is not major cost.
Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D89665
2020-10-23 09:35:48 -04:00
Sanjay Patel c72198079d [ValueTracking] add range limits for cttz
As discussed in D89952,
instcombine can sometimes find a way to reduce similar patterns,
but it is incomplete.
InstSimplify uses the computeConstantRange() ValueTracking analysis
via simplifyICmpWithConstant(), so we just need to fill in the max
value of cttz to process any "icmp pred cttz(X), C" pattern (the
min value is initialized to zero automatically).

https://alive2.llvm.org/ce/z/Z_SLWZ

Follow-up to D89976.
2020-10-23 08:43:45 -04:00
Sanjay Patel 3fb0d6b0d5 [ValueTracking] add range limits for ctlz
As discussed in D89952,
instcombine can sometimes find a way to reduce similar patterns,
but it is incomplete.
InstSimplify uses the computeConstantRange() ValueTracking analysis
via simplifyICmpWithConstant(), so we just need to fill in the max
value of ctlz to process any "icmp pred ctlz(X), C" pattern (the
min value is initialized to zero automatically).

Follow-up to D89976.
2020-10-23 08:43:45 -04:00
Sanjay Patel 748ecc6b32 [ValueTracking] add range limits for ctpop
As discussed in D89952,
instcombine can sometimes find a way to reduce similar patterns,
but it is incomplete.
InstSimplify uses the computeConstantRange() ValueTracking analysis
via simplifyICmpWithConstant(), so we just need to fill in the max
value of ctpop to process any "icmp pred ctpop(X), C" pattern (the
min value is initialized to zero automatically).

Differential Revision: https://reviews.llvm.org/D89976
2020-10-23 08:17:54 -04:00
Max Kazantsev 6e574abf61 [SCEV][NFC] Cache symbolic max exit count
We want to have a caching version of symbolic BE exit count
rather than recompute it every time we need it.

Differential Revision: https://reviews.llvm.org/D89954
Reviewed By: nikic, efriedma
2020-10-23 12:29:37 +07:00
Arthur Eubanks 3024fe5b55 [CGSCC] Detect devirtualization in more cases
The devirtualization wrapper misses cases where if it wraps a pass
manager, an individual pass may devirtualize an indirect call created by
a previous pass. For example, inlining may create a new indirect call
which is devirtualized by instcombine. Currently the devirtualization
wrapper will not see that because it only checks cgscc edges at the very
beginning and end of the pass (manager) it wraps.

This fixes some tests testing this exact behavior in the legacy PM.

This piggybacks off of updateCGAndAnalysisManagerForPass()'s detection
of promoted ref to call edges.

This supercedes one of the previous mechanisms to detect
devirtualization by keeping track of potentially promoted call
instructions via WeakTrackingVHs.

There is one more existing way of detecting devirtualization, by
checking if the number of indirect calls has decreased and the number of
direct calls has increased in a function. It handles cases where calls
to functions without definitions are promoted, and some tests rely on
that. LazyCallGraph doesn't track edges to functions without
definitions so this part can't be removed in this change.

check-llvm and check-clang with -abort-on-max-devirt-iterations-reached
on by default doesn't show any failures outside of tests specifically
testing it so it doesn't needlessly rerun passes more than necessary.
(The NPM -O2/3 pipeline run the inliner/function simplification pipeline
under a devirtualization repeater pass up to 4 times by default).

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89587
2020-10-22 19:44:22 -07:00
Nikita Popov 1882568fcb [BasicAA] Only add visited phi blocks temporarily
Visited phi blocks only need to be added for the duration of the
recursive alias queries, they should not leak into following code.

Once again, while this also improves analysis precision, this is
mainly intended to clarify the applicability scope of VisitedPhiBBs.
2020-10-22 22:26:29 +02:00
Nikita Popov 2b372570ee [BasicAA] Don't track visited blocks for phi-phi alias query
We only need the VisitedPhiBBs to disambiguate comparisons of
values from two different loop iterations. If we're comparing
two phis from the same basic block in lock-step, the compared
values will always be on the same iteration.

While this also increases precision, this is mainly intended
to clarify the scope of VisitedPhiBBs.
2020-10-22 22:12:21 +02:00
Venkataramanan Kumar 57cdc52c4d Initial support for vectorization using Libmvec (GLIBC vector math library)
Differential Revision: https://reviews.llvm.org/D88154
2020-10-22 16:01:39 -04:00
Max Kazantsev cc2eb3b5e2 [SCEV][NFC] Simplify internals of BackedgeTakenInfo 2020-10-22 17:39:56 +07:00
Max Kazantsev e2858bf633 [SCEV][NFC] Rename MaxAndComplete -> ConstantMaxAndComplete
This better reflects what this variable is about.
2020-10-22 16:37:06 +07:00
Max Kazantsev 6379090ea7 [SCEV][NFC] Rename getMax -> getConstantMax
This better reflects what this logic actually does.
2020-10-22 15:12:54 +07:00
Sjoerd Meijer 51d7df3fa1 [InstructionSimplify] icmp (X+Y), (X+Z) simplification
This improves simplifications for pattern `icmp (X+Y), (X+Z)` -> `icmp Y,Z`
if only one of the operands has NSW set, e.g.:

    icmp slt (x + 0), (x +nsw 1)

We can still safely rewrite this to:

    icmp slt 0, 1

because we know that the LHS can't overflow if the RHS has NSW set and
C1 < C2 && C1 >= 0, or C2 < C1 && C1 <= 0

This simplification is useful because ScalarEvolutionExpander which is used to
generate code for SCEVs in different loop optimisers is not always able to put
back NSW flags across control-flow, thus inhibiting CFG simplifications.

Differential Revision: https://reviews.llvm.org/D89317
2020-10-22 08:55:52 +01:00
Quentin Colombet ee6abef532 [ValueTracking] Interpret GEPs as a series of adds multiplied by the related scaling factor
Prior to this patch, computeKnownBits would only try to deduce trailing zeros
bits for getelementptrs. This patch adds the logic to treat geps as a series
of add * scaling factor.

Thanks to this patch, using a gep or performing an address computation
directly "by hand" (ptrtoint followed by adds and mul followed by inttoptr)
offers the same computeKnownBits information.

Previously, the "by hand" approach would have given more information.

This is related to https://llvm.org/PR47241.

Differential Revision: https://reviews.llvm.org/D86364
2020-10-21 15:07:04 -07:00
Arthur Eubanks 958abe0180 [NFC] Clean up always false variables
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89023
2020-10-21 10:54:55 -07:00
Max Kazantsev bed02fa8b0 Revert "[SCEV] Prove implications of different type via truncation"
This reverts commit 80852a4f2f.

Test is now broken because underlying required patch was also reverted SUDDENLY.
2020-10-21 13:03:46 +07:00
Max Kazantsev 80852a4f2f [SCEV] Prove implications of different type via truncation
When we need to prove implication of expressions of different type width,
the default strategy is to widen everything to wider type and prove in this
type. This does not interact well with AddRecs with negative steps and
unsigned predicates: such AddRec will likely not have a `nuw` flag, and its
`zext` to wider type will not be an AddRec. In contraty, `trunc` of an AddRec
in some cases can easily be proved to be an `AddRec` too.

This patch introduces an alternative way to handling implications of different
type widths. If we can prove that wider type values actually fit in the narrow type,
we truncate them and prove the implication in narrow type.

Differential Revision: https://reviews.llvm.org/D89548
Reviewed By: fhahn
2020-10-21 12:53:22 +07:00
Fangrui Song d9f91a3d14 Revert D89381 "[SCEV] Recommit "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 2"
This reverts commit a10a64e7e3.

It broke polly/test/ScopInfo/NonAffine/non-affine-loop-condition-dependent-access_3.ll
The difference suggests that this may be a serious issue.
2020-10-20 21:03:58 -07:00
Sanjay Patel 7c516504a1 [InstSimplify] allow vector splats for icmp-of-neg folds 2020-10-20 09:24:36 -04:00
Max Kazantsev a10a64e7e3 [SCEV] Recommit "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 2
Fixed wrapping range case & proof methods reduced to constant range
checks to save compile time.

Differential Revision: https://reviews.llvm.org/D89381
2020-10-20 11:32:36 +07:00
Amy Huang ea693a1627 [NPM] Port module-debuginfo pass to the new pass manager
Port pass to NPM and update tests in DebugInfo/Generic.

Differential Revision: https://reviews.llvm.org/D89730
2020-10-19 14:31:17 -07:00
Roman Lebedev e0567582b8
[NFCI][SCEV] Always refer to enum SCEVTypes as enum, not integer
The main tricky thing here is forward-declaring the enum:
we have to specify it's underlying data type.

In particular, this avoids the danger of switching over the SCEVTypes,
but actually switching over an integer, and not being notified
when some case is not handled.

I have updated most of such switches to be exaustive and not have
a default case, where it's pretty obvious to be the intent,
however not all of them.
2020-10-20 00:10:22 +03:00
Roman Lebedev d4b0aa9773
[NFC][SCEV] BuildConstantFromSCEV(): reformat, NFC
Makes diff in next commit more readable
2020-10-20 00:10:22 +03:00
Mircea Trofin d454328ea8 [ML] Add final reward logging facility.
Allow logging final rewards. A final reward is logged only once, and is
serialized as all-zero values, except for the last one.

Differential Revision: https://reviews.llvm.org/D89626
2020-10-19 08:44:50 -07:00
Roman Lebedev d083d55c2c
[NFC][SCEV] Rename SCEVCastExpr into SCEVIntegralCastExpr
All existing SCEV cast types operate on integers.
D89456 will add SCEVPtrToIntExpr cast expression type.
I believe this is best for consistency.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89455
2020-10-19 10:59:53 +03:00
Max Kazantsev 199826baa8 [NFC][SCEV] Use getMinusOne where possible 2020-10-19 12:56:09 +07:00
Nikita Popov 6de8d7f1ad [BasicAA] Accept AATags by const reference (NFC)
Rather than swapping the value, the sizes, the AA tags and the
underlying objects multiple times, invoke the helper methods
with swapped arguments.
2020-10-18 18:19:01 +02:00
Nikita Popov f9172d3c7b [AA] Add helper to update result (NFC)
This pattern was repeated a few times, and for some reason always
using insert or try_emplace, even though we know in advance that
we're looking for an existing entry and not trying to create a
new one.
2020-10-18 16:43:26 +02:00
Nikita Popov 9d2b8300b7 [BasicAA] Avoid alias query if result cannot be used (NFCI)
Rather then querying first and then checking additional conditions,
check the conditions first. They are much cheaper than the alias
query.
2020-10-18 00:00:15 +02:00
Nikita Popov 3c6fe0fc77 [BasicAA] Fix stale comment (NFC)
DataLayout is always around...
2020-10-17 23:58:58 +02:00
Roman Lebedev ec54867df5
[SCEV] Model `ashr exact x, C` as `(abs(x) EXACT/u (1<<C)) * signum(x)`
It's not pretty, but probably better than modelling it
as an opaque SCEVUnknown, i guess.

It is relevant e.g. for the loop that was brought up in
https://bugs.llvm.org/show_bug.cgi?id=46786#c26
as an example of what we'd be able to better analyze
once SCEV handles `ptrtoint` (D89456).

But as it is evident, even if we deal with `ptrtoint` there,
we also fail to model such an `ashr`.
Also, modeling of mul-of-exact-shr/div could use improvement.

As per alive2:
https://alive2.llvm.org/ce/z/tnfZKd
```
define i8 @src(i8 %0) {
  %2 = ashr exact i8 %0, 4
  ret i8 %2
}

declare i8 @llvm.abs(i8, i1)
declare i8 @llvm.smin(i8, i8)
declare i8 @llvm.smax(i8, i8)

define i8 @tgt(i8 %x) {
  %abs_x = call i8 @llvm.abs(i8 %x, i1 false)
  %div = udiv exact i8 %abs_x, 16
  %t0 = call i8 @llvm.smax(i8 %x, i8 -1)
  %t1 = call i8 @llvm.smin(i8 %t0, i8 1)
  %r = mul nsw i8 %div, %t1
  ret i8 %r
}
```
Transformation seems to be correct!
2020-10-17 21:22:24 +03:00
Roman Lebedev 130cc662b5
[NFC][SCEV] Refactor getAbsExpr() out of createSCEV() 2020-10-17 21:21:02 +03:00
Roman Lebedev be1678bdb9
[NFC][SCEV] Add 'getMinusOne()' method 2020-10-17 21:20:58 +03:00
Juneyoung Lee 62a0ec1612 Add support for !noundef metatdata on loads
This patch adds metadata !noundef and makes load instructions can optionally have it.
A load with !noundef always return a well-defined value (has no undef bit or isn't poison).
If the loaded value isn't well defined, the behavior is undefined.

This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values.
It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise.

The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead.
The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89050
2020-10-17 13:50:10 +09:00
Alina Sbirlea dc97138123 [MemorySSA] Verify clobbering within reachable blocks.
Resolves PR45976.
2020-10-16 17:46:28 -07:00
Nikita Popov 74c8c2d903 Revert "Recommit "[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs""
This reverts commit 32b72c3165.

While better than before, this change still introduces a large
compile-time regression (>3% on mafft):
https://llvm-compile-time-tracker.com/compare.php?from=fbd62fe60fb2281ca33da35dc25ca3c87ec0bb51&to=32b72c3165bf65cca2e8e6197b59eb4c4b60392a&stat=instructions

Additionally, the logic here doesn't look quite right to me,
I will comment in more detail on the differential revision.
2020-10-16 21:36:33 +02:00
Arthur Eubanks faf5210420 [CGSCC] Add -abort-on-max-devirt-iterations-reached option
Aborts if we hit the max devirtualization iteration.
Will be useful for testing that changes to devirtualization don't cause
devirtualization to repeat passes more times than necessary.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D89519
2020-10-16 12:34:52 -07:00
Jay Foad 1417abe54c [AMDGPU] Add new llvm.amdgcn.fma.legacy intrinsic
Differential Revision: https://reviews.llvm.org/D89558
2020-10-16 17:10:21 +01:00
Max Kazantsev 32b72c3165 Recommit "[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs"
It was reverted because of negative compile time impact. In this version,
less powerful proof methods are used (non-recursive reasoning only), and
scope limited to constant End values to avoid explision of complex proofs.

Differential Revision: https://reviews.llvm.org/D89381
2020-10-16 17:35:13 +07:00
Cullen Rhodes fbd62fe60f [ValueTracking] Clarify TypeSize comparisons
TypeSize comparisons using overloaded operators should be replaced by
the new isKnownXY comparators when the operands can be fixed-length or
scalable vectors.

In ValueTracking there are several uses of the overloaded operators in
`isKnownNonZero` and `ComputeMultiple`. In the former we already bail
out on scalable vectors since we currently have no way to represent
DemandedElts, and the latter is operating on scalar integers, so we can
assume fixed-size in both instances.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D89387
2020-10-16 10:31:12 +00:00
Dávid Bolvanský 28691cdd71 [MemLoc] Support memchr/memccpy in MemoryLocation::getForArgument
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D89321
2020-10-16 11:37:29 +02:00
Nikita Popov 7d3b475810 Revert "[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs"
This reverts commit 905101c360.

This causes a large compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=cc175c2cc8e638462bab74e0781e06f9b6eb5017&to=905101c36025fe1c8ecdf9a20cd59db036676073&stat=instructions
2020-10-16 09:47:38 +02:00
Max Kazantsev 1eb2c6d23f [SCEV][NFC] Split out type balancing in implication engine
We plan to introduce more advanced ways of dealing with different types.
2020-10-16 13:40:24 +07:00
Max Kazantsev 905101c360 [SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs
We can sharpen the range of a AddRec if we know that it does not
self-wrap and know the symbolic iteration count in the loop. If we can
evaluate the value of AddRec on the last iteration and prove that at least
one its intermediate value lies between start and end, then no-wrap flag
allows us to conclude that all of them also lie between start and end. So
the estimate of range can be improved to union of ranges of start and end.

Differential Revision: https://reviews.llvm.org/D89381
Reviewed By: efriedma
2020-10-16 12:00:39 +07:00
Anh Tuyen Tran 224fd6ff48 [NFC][CaptureTracking] Move static function isNonEscapingLocalObject to llvm namespace
Function isNonEscapingLocalObject is a static one within BasicAliasAnalysis.cpp.
It wraps around PointerMayBeCaptured of CaptureTracking, checking whether a pointer
is to a function-local object, which never escapes from the function.

Although at the moment, isNonEscapingLocalObject is used only by BasicAliasAnalysis,
its functionality can be used by other pass(es), one of which I will put up for review
very soon. Instead of copying the contents of this static function, I move it to llvm
scope, and place it amongst other functions with similar functionality in CaptureTracking.

The rationale for the location are:
- Pointer escape and pointer being captured are actually two sides of the same coin
- isNonEscapingLocalObject is wrapping around another function in CaptureTracking

Reviewed By: jdoerfert (Johannes Doerfert)

Differential Revision: https://reviews.llvm.org/D89465
2020-10-15 18:37:29 +00:00
Roman Lebedev 7ee6c40247
Revert "Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"" and it's follow-ups
While we haven't encountered an earth-shattering problem with this yet,
by now it is pretty evident that trying to model the ptr->int cast
implicitly leads to having to update every single place that assumed
no such cast could be needed. That is of course the wrong approach.

Let's back this out, and re-attempt with some another approach,
possibly one originally suggested by Eli Friedman in
https://bugs.llvm.org/show_bug.cgi?id=46786#c20
which should hopefully spare us this pain and more.

This reverts commits 1fb6104293,
7324616660,
aaafe350bb,
e92a8e0c74.

I've kept&improved the tests though.
2020-10-14 16:09:18 +03:00
Juneyoung Lee 9b3c2a72e4 [ValueTracking] Use assume's noundef operand bundle
This patch updates `isGuaranteedNotToBeUndefOrPoison` to use `llvm.assume`'s `noundef` operand bundle.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89219
2020-10-14 20:16:33 +09:00
Tim Northover 630d264798 Analysis: only query size of sized objects.
Recently we started looking into sret parameters, though the issue could crop
up elsewhere. If the pointee type is opaque, we should not try to compute its
size because that leads to an assertion failure.
2020-10-14 12:16:05 +01:00
Roman Lebedev e92a8e0c74
[SCEV] BuildConstantFromSCEV(): actually properly handle SExt-of-pointer case
As being pointed out by @efriedma in
https://reviews.llvm.org/rGaaafe350bb65#inline-4883
of course we can't just call ptrtoint in sign-extending case
and be done with it, because it will zero-extend.

I'm not sure what i was thinking there.

This is very much not an NFC, however looking at the user of
BuildConstantFromSCEV() i'm not sure how to actually show that
it results in a different constant expression.
2020-10-13 22:22:30 +03:00
Simon Pilgrim 2e604d23b4 [Analysis] findAffectedValues - remove unused ConstantInt argument. NFCI.
We can use m_ConstantInt without a result value as we don't ever use it.
2020-10-13 14:35:18 +01:00
Roman Lebedev aaafe350bb
[SCEV] BuildConstantFromSCEV(): properly handle SCEVSignExtend from ptr
Much similar to the ZExt/Trunc handling.
Thanks goes to Alexander Richardson for nudging towards noticing this one proactively.

The appropriate (currently crashing) test coverage added.
2020-10-13 12:19:59 +03:00
Roman Lebedev 7324616660
[SCEV] BuildConstantFromSCEV(): properly handle SCEVZeroExtend from ptr
As being reported in https://reviews.llvm.org/D88806#2326944,
this is pretty much the sibling problem of https://reviews.llvm.org/D88806#2325340,
with root cause being that SCEV now models `ptrtoint` as trunc/zext/self of unknown.

The appropriate (currently crashing) test coverage added.
2020-10-13 11:47:44 +03:00
Roman Lebedev 1fb6104293
Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"
This relands commit 1c021c64ca which was
reverted in commit 17cec6a11a because
an assertion was being triggered, since `BuildConstantFromSCEV()`
wasn't updated to handle the case where the constant we want to truncate
is actually a pointer. I was unsuccessful in coming up with a test case
where we'd end there with constant zext/sext of a pointer,
so i didn't handle those cases there until there is a test case.

Original commit message:

While we indeed can't treat them as no-ops, i believe we can/should
do better than just modelling them as `unknown`. `inttoptr` story
is complicated, but for `ptrtoint`, it seems straight-forward
to model it just as a zext-or-trunc of unknown.

This may be important now that we track towards
making inttoptr/ptrtoint casts not no-op,
and towards preventing folding them into loads/etc
(see D88979/D88789/D88788)

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88806
2020-10-12 23:02:55 +03:00
Hans Wennborg 17cec6a11a Revert 1c021c64c "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"
> While we indeed can't treat them as no-ops, i believe we can/should
> do better than just modelling them as `unknown`. `inttoptr` story
> is complicated, but for `ptrtoint`, it seems straight-forward
> to model it just as a zext-or-trunc of unknown.
>
> This may be important now that we track towards
> making inttoptr/ptrtoint casts not no-op,
> and towards preventing folding them into loads/etc
> (see D88979/D88789/D88788)
>
> Reviewed By: mkazantsev
>
> Differential Revision: https://reviews.llvm.org/D88806

It caused the following assert during Chromium builds:

  llvm/lib/IR/Constants.cpp:1868:
  static llvm::Constant *llvm::ConstantExpr::getTrunc(llvm::Constant *, llvm::Type *, bool):
  Assertion `C->getType()->isIntOrIntVectorTy() && "Trunc operand must be integer"' failed.

See code review for a link to a reproducer.

This reverts commit 1c021c64ca.
2020-10-12 18:39:35 +02:00
Max Kazantsev 28237c33d9 [NFC] Remove redundant isFullSet checks
Full set case is handled inside intersection, no need to
litter the code with duplicating them outside.
2020-10-12 20:41:16 +07:00
Roman Lebedev 1c021c64ca
[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown
While we indeed can't treat them as no-ops, i believe we can/should
do better than just modelling them as `unknown`. `inttoptr` story
is complicated, but for `ptrtoint`, it seems straight-forward
to model it just as a zext-or-trunc of unknown.

This may be important now that we track towards
making inttoptr/ptrtoint casts not no-op,
and towards preventing folding them into loads/etc
(see D88979/D88789/D88788)

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88806
2020-10-12 11:04:03 +03:00
Craig Topper 9e72d3eaf3 [ValueTracking] Use KnownBits::countMaxLeadingZeros/countMaxTrailingZeros to make code more readable. NFC 2020-10-11 14:26:18 -07:00
David Green be6e8e50f4 [LV] Tail folded inloop reductions.
This expands upon the inloop reductions added in e9761688e41cb9e976,
allowing them to be inserted into tail folded loops. Reductions are
generates with the form:

  x = select(mask, vecop, zero)
  v = vecreduce.add(x)
  c = add chain, v

Where zero here is chosen as the identity value for add reductions. The
backend is then expected to fold the select and the vecreduce into a
single predicated instruction.

Most of the code is fairly straight forward, except for the creation of
blockmasks which need to ensure they are created in dominance order. The
order they are added is altered to be after any phis, keeping the
requirements for the underlying IR.

Differential Revision: https://reviews.llvm.org/D84451
2020-10-11 16:58:34 +01:00
Florian Hahn 2e9fd754b4 [SCEV] Handle ULE in applyLoopGuards.
Handle ULE predicate in similar fashion to ULT predicate in
applyLoopGuards.
2020-10-10 16:26:28 +01:00
Florian Hahn 8f56e382f7 [SCEV] Do not apply info from loop guards in AddRecs.
We cannot guarantee that the replacement expression is loop-invariant in
all AddRecs in the source expression. Use a rewriter that skips
AddRecExpr for now.

Fixes PR47776.
2020-10-09 14:47:26 +01:00
Quentin Colombet 9431f8ad2e [KnownBits] Add a computeForMul method
This patch refactors the logic in ValueTracking.cpp so that
computeKnownBitsForMul now uses a helper function from KnownBits.

NFC

Differential Revision: https://reviews.llvm.org/D88935
2020-10-08 11:33:06 -07:00
Simon Pilgrim 119a143699 [Analysis] ScalarEvolution::getUMinFromMismatchedTypes - assert we've found the max type. NFCI.
Found by clang static analyzer.
2020-10-08 19:04:29 +01:00
Max Kazantsev a5ef2e0a1e Return "[SCEV] Prove implicaitons via AddRec start"
The initial version of the patch was reverted because it missed the check that
the predicate being proved is actually guarded by this check on 1st iteration.
If it was not executed on 1st iteration (but possibly executes after that), then
it is incorrect to use reasoning about IV start to prove it.

Added the test where the miscompile was seen. Unfortunately, my attempts
to reduce it with bugpoint did not succeed; it can further be reduced when
we understand how to do it without losing the initial bug's notion.

Returning assuming the miscompiles are now gone.

Differential Revision: https://reviews.llvm.org/D88208
2020-10-08 11:15:35 +07:00
Mircea Trofin ac2018da61 [NFC][MLInliner] Getters should return by reference 2020-10-07 13:55:38 -07:00
Florian Hahn a73166a452 [LAA] Use DL to get element size for bound computation.
Currently LAA uses getScalarSizeInBits to compute the size of an element
when computing the end bound of an access.

This does not work as expected for pointers to pointers, because
getScalarSizeInBits will return 0 for pointer types.

By using DataLayout to get the size of the element we can also correctly
handle pointer element types.

Note the changes to the existing test, which seems to also use the wrong
offset for the end.

Fixes PR47751.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D88953
2020-10-07 18:57:07 +01:00
Amara Emerson 322d0afd87 [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787
2020-10-07 10:36:44 -07:00
Dávid Bolvanský 86429c4eaf [SimplifyLibCalls] Optimize mempcpy_chk to mempcpy 2020-10-06 17:08:46 +02:00
Max Kazantsev bbb0ee6e34 Revert "[SCEV] Prove implicaitons via AddRec start"
This reverts commit 69acdfe075.

Need to investigate reported miscompiles.
2020-10-06 11:40:14 +07:00
Mircea Trofin 36bb1fb1fe [MLInliner] Factor out logging
Factored out the logging facility, to allow its reuse outside the
inliner.

Differential Revision: https://reviews.llvm.org/D88770
2020-10-05 18:09:17 -07:00
Dávid Bolvanský a4bae56ab8 Revert "[SLC] Optimize mempcpy_chk to mempcpy"
This reverts commit 3f1fd59de3.
2020-10-05 22:27:14 +02:00
Dávid Bolvanský 3f1fd59de3 [SLC] Optimize mempcpy_chk to mempcpy
As reported in PR46735:

void* f(void *d, const void *s, size_t l)
{
    return __builtin___mempcpy_chk(d, s, l, __builtin_object_size(d, 0));
}

This can be optimized to `return mempcpy(d, s, l);`.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86019
2020-10-05 22:18:36 +02:00
Simon Pilgrim 2cd7b0e130 [ValueTracking] canCreateUndefOrPoison - use APInt to check bounds instead of getZExtValue().
Fixes OSS Fuzz #26135
2020-10-05 13:45:27 +01:00
Fangrui Song c36d441b6b [SDA] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off builds 2020-10-04 12:17:16 -07:00
Simon Pilgrim dca4b7130d [Analysis] resolveAllCalls - fix use after std::move warning. NFCI.
We can't use Use.Calls after its std::move()'d to TmpCalls as it will be in an undefined state. Instead, swap with the known empty map in TmpCalls so we can then safely emplace_back into the now empty Use.Calls.

Fixes clang static analyzer warning.
2020-10-03 17:52:20 +01:00
Simon Pilgrim fa59135bf1 [Analysis] Drop local maxAPInt/minAPInt helpers. NFCI.
Use standard APIntOps::smax/smin helpers instead.
2020-10-02 14:56:12 +01:00
Simon Pilgrim 71b89b1493 LoopAccessAnalysis.cpp - use const reference in for-range loops. NFCI. 2020-10-02 13:56:30 +01:00
Max Kazantsev b8ac19cf1c [SCEV] Limited support for unsigned preds in isImpliedViaOperations
The logic there only considers `SLT/SGT` predicates. We can use the same logic
for proving `ULT/UGT` predicates if all involved values are non-negative.

Adding full-scale support for unsigned might be challenging because of code amount,
so we can consider this in the future.

Differential Revision: https://reviews.llvm.org/D88087
Reviewed By: reames
2020-10-02 10:20:57 +07:00
Simon Pilgrim 95a440b936 [IR] PatternMatch - add m_FShl/m_FShr funnel shift intrinsic matchers. NFCI. 2020-10-01 14:42:34 +01:00
Max Kazantsev 69acdfe075 [SCEV] Prove implicaitons via AddRec start
If we know that some predicate is true for AddRec and an invariant
(w.r.t. this AddRec's loop), this fact is, in particular, true on the first
iteration. We can try to prove the facts we need using the start value.

The motivating example is proving things like
```
  isImpliedCondOperands(>=, X, 0, {X,+,-1}, 0}
```

Differential Revision: https://reviews.llvm.org/D88208
Reviewed By: reames
2020-10-01 17:09:38 +07:00
Max Kazantsev c93a39dd1f [SCEV][NFC] Introduce isKnownPredicateAt method
We can query known predicates in different points, respecting
their dominating conditions.
2020-10-01 12:11:24 +07:00
Arthur Eubanks 4fbd83c716 [ObjCARCAA][NewPM] Add already ported objc-arc-aa to PassRegistry.def
Also add missing AnalysisKey definition.
2020-09-30 08:50:44 -07:00
Simon Moll 05ae04c396 [DA][SDA] SyncDependenceAnalysis re-write
This patch achieves two things:
1. It breaks up the `join_blocks` interface between the SDA to the DA to
   return two separate sets for divergent loops exits and divergent,
disjoint path joins.
2. It updates the SDA algorithm to run in O(n) time and improves the
   precision on divergent loop exits.

This fixes `https://bugs.llvm.org/show_bug.cgi?id=46372` (by virtue of
the improved `join_blocks` interface) and revealed an imprecise expected
result in the `Analysis/DivergenceAnalysis/AMDGPU/hidden_loopdiverge.ll`
test.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D84413
2020-09-30 17:36:26 +02:00
Florian Hahn 0eab9d5823 [SCEV] Verify that all mapped SCEV AddRecs refer to valid loops.
This check helps to guard against cases where expressions referring to
invalidated/deleted loops are not properly invalidated.

The additional check is motivated by the reproducer shared for 8fdac7cb7a
and I think in general make sense as a sanity check.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D88166
2020-09-30 12:46:55 +01:00
Nikita Popov ac8a51c701 [ValueTracking] Early exit known non zero for phis
After D88276 we no longer expect computeKnownBits() to prove
non-zeroness for cases where isKnownNonZero() can't, so don't
fall through to it.
2020-09-29 21:07:36 +02:00
Max Kazantsev 9100bd772d [SCEV][NFC] Introduce isBasicBlockEntryGuardedByCond
Currently, we have `isLoopEntryGuardedByCond` method in SCEV, which
checks that some fact is true if we enter the loop. In fact, this is just a
particular case of more general concept `isBasicBlockEntryGuardedByCond`
applied to given loop's header. In fact, the logic if this code is largely
independent on the given loop and only cares code above it.

This patch makes this generalization. Now we can query it for any block,
and `isBasicBlockEntryGuardedByCond` is just a particular case.

Differential Revision: https://reviews.llvm.org/D87828
Reviewed By: fhahn
2020-09-29 15:53:45 +07:00
Serguei Katkov 297ec61130 [IsKnownNonZero] Handle the case with non-constant phi nodes
Handle the case when all inputs of phi are proven to be non zero.

Constants are checked in beginning of this method before check for depth of recursion,
so it is a partial case of non-constant phi.

Recursion depth is already handled by the function.

Reviewers: aqjune, nikic, efriedma
Reviewed By: nikic
Subscribers: dantrushin, hiraditya, jdoerfert, llvm-commits
Differential Revision: https://reviews.llvm.org/D88276
2020-09-29 15:22:10 +07:00
Florian Hahn b76df593eb Revert "Recommit "[SCCP] Do not replace deref'able ptr with un-deref'able one.""
Looks like there is still another remaining issue:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-msan/builds/22273/steps/build%20libcxx%2Fmsan/logs/stdio

This reverts commit 86a20d9e34.
2020-09-29 09:18:19 +01:00
Florian Hahn 86a20d9e34 Recommit "[SCCP] Do not replace deref'able ptr with un-deref'able one."
This version includes an small fix allowing function pointers to be
unconditionally replaced for now.

This reverts commit 4c5e4aa89b.
2020-09-29 09:10:27 +01:00
Sanjay Patel 33125cffda [CostModel] fill in arguments as part of intrinsic attribute constructor
This appears to be an error of code duplication - instead of
one constructor variant calling another, we have N similar
but not identical versions.

I think this is 'NFC' based on the current callers, but it's
hard to tell or guess the intent in all cases.
2020-09-28 15:27:45 -04:00
Juneyoung Lee ba8911d560 [ValueTracking] Fix analyses to update CxtI to be phi's incoming edges' terminators
It was mentioned that D88276 that when a phi node is visited, terminators at their incoming edges should be used for CtxI.
This is a patch that makes two functions (ComputeNumSignBitsImpl, isGuaranteedNotToBeUndefOrPoison) to do so.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D88360
2020-09-28 23:24:20 +09:00
Florian Hahn 0ad793f321 [SCEV] Also use info from assumes in applyLoopGuards.
Similar to collecting information from branches guarding a loop, we can
also collect information from assumes dominating the loop header.

Fixes PR47247.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D87854
2020-09-28 13:14:24 +01:00
Nikita Popov fe79061be2 [LVI][CVP] Use block value when simplifying icmps
Add a flag to getPredicateAt() that allows making use of the block
value. This allows us to take into account range information from
the current block, rather than only information that is threaded
over edges, making the icmp simplification in CVP a lot more
powerful.

I'm not changing getPredicateAt() to use the block value
unconditionally to avoid any impact on the JumpThreading pass,
which is somewhat picky about LVI query order.

Most test changes here are just icmps that now get dropped (while
previously only a result used in a return was replaced). The three
tests in icmp.ll show some representative improvements. Some of
the folds this enables have been covered by IPSCCP in the meantime,
but LVI can reason about some cases which are hard to support in
IPSCCP, such as in test_br_cmp_with_offset.

The compile-time time cost of doing this is fairly minimal, with
a ~0.05% CTMark regression for ReleaseThinLTO:
https://llvm-compile-time-tracker.com/compare.php?from=709d03f8af4da4204849a70f01798e7cebba2e32&to=6236fd503761f43c99f4537121e057a01056f185&stat=instructions

This is because the block values will typically already be queried
and cached by other CVP optimizations anyway.

Differential Revision: https://reviews.llvm.org/D69686
2020-09-27 20:25:16 +02:00
Nikita Popov 709d03f8af [LVI] Clarify getValueAt/getValueInBlock doc comments (NFC)
The lattice value returned by getValueInBlock() holds at the start
of the block, not at the end. Also make it clearer what the
difference between getValueInBlock() and getValueAt() is.
2020-09-27 18:21:19 +02:00
Nikita Popov 9b959b59df [LVI] Require context instruction in external API (NFCI)
Require CxtI in getConstant() and getConstantRange() APIs.
Accordingly drop the BB parameter, as it is implied by
CxtI->getParent().

This makes sure we don't forget to pass the context instruction,
and makes the API contract clearer (also clean up the comments to
that effect -- the value holds at the context instruction, not
the end of the block).
2020-09-27 18:07:24 +02:00
Sanjay Patel 816b0a9c9f [CostModel] add cl option to check size and latency costs; NFC
This is a setting used by SimplifyCFG, LoopUnroll, and InlineCost,
but there is apparently no direct test coverage for any of those
cost model values.
2020-09-27 09:52:56 -04:00
Sanjay Patel 645c53a9d9 [ValueTracking] enhance isKnownNeverInfinity to understand sitofp
As discussed in D87877, instcombine already has this fold,
but it was missing from the more general ValueTracking logic.

https://alive2.llvm.org/ce/z/PumYZP
2020-09-27 08:40:31 -04:00
Florian Hahn 7d274aa9be [SCEV] Add support for `x != 0` to CollectCondition.
Add support for NE predicates with 0 constants. Those can be translated
to UMaxExpr(x, 1).
2020-09-25 18:58:55 +01:00
Florian Hahn b5a3b901c7 [SCEV] Add support for `x == constant` to CollectCondition.
Add support for EQ predicates with constant operand. In that case, using
the constant instead of an unknown expression should always be
beneficial.
2020-09-25 16:56:49 +01:00
Florian Hahn 8858340bd3 [SCEV] Swap operands if LHS is not unknown.
Currently we only use information from guards for unknown expressions.
Swap LHS/RHS and predicate, if LHS is not unknown.
2020-09-25 15:50:01 +01:00
Florian Hahn df77ce7cad [SCEV] Extract code to collect conditions to lambda (NFC).
This makes re-using the common functionality easier in follow-up
patches.
2020-09-25 15:12:42 +01:00
Juneyoung Lee 92106641ae [ValueTracking] Make isGuaranteedNotToBeUndefOrPoison exit early when MetadataAsValue is given
It is set to conservatively return false, otherwise noundef attributes are added to function calls with metadata arguments.
2020-09-25 09:50:09 +09:00
Juneyoung Lee 1c45220028 [ValueTracking] Check uses of Argument if it is given to isGuaranteedNotToBeUndefOrPoison
This is a patch that allows isGuaranteedNotToBeUndefOrPoison to return more precise result
when an argument is given, by looking through its uses at the entry block (and following blocks as well, if it is checking poison only).

This is useful when there is a function call with noundef arguments at the entry block.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D88207
2020-09-25 08:57:57 +09:00
Andrew Litteken f02c4c87b4 [IRSim] Adding wrapper pass for IRSimilarityIdentfier
This introduces an analysis pass that wraps IRSimilarityIdentifier,
and adds a printer pass to examine in what function similarities are
being found.

Test for what the printer pass can find are in
test/Analysis/IRSimilarityIdentifier.

Reviewed by: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D86973
2020-09-24 14:59:41 -05:00
Andrew Litteken 08d145e6d7 [IRSim][NFC] Removing dead variables from IRSimilarityIdentifier.cpp
As informed by danielkiss.

Follow up to Differential Revision: https://reviews.llvm.org/D86972
2020-09-24 11:43:33 -05:00
Matt Arsenault d65a7003c4 OpaquePtr: Add helpers for sret to mirror byval
Sret should really have a type parameter like byval does.
2020-09-24 09:57:28 -04:00
Florian Hahn d4ddf63fc4 [SCEV] Use loop guard info when computing the max BE taken count in howFarToZero.
For some expressions, we can use information from loop guards when
we are looking for a maximum. This patch applies information from
loop guards to the expression used to compute the maximum backedge
taken count in howFarToZero. It currently replaces an unknown
expression X with UMin(X, Y), if the loop is guarded by
X ult Y.

This patch is minimal in what conditions it applies, and there
are a few TODOs to generalize.

This partly addresses PR40961. We will also need an update to
LV to address it completely.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D67178
2020-09-24 11:06:55 +01:00
Andrew Litteken b63bfc2030 [IRSim] Adding a basic similarity identifier.
This takes the mapped instructions from the IRInstructionMapper, and
passes it to the Suffix Tree to find the repeated substrings.  Within
each set of repeated substrings, the IRSimilarityCandidates are compared
against one another for structure, and ensuring that the operands in the
instructions are used in the same way.  Each of these structurally
similarity IRSimilarityCandidates are contained in a SimilarityGroup.

Tests checking for identifying identity of structure, different
isomorphic structure, and different
nonisomoprhic structure are found in
unittests/Analysis/IRSimilarityIdentifierTest.cpp.

Differential Revision: https://reviews.llvm.org/D86972
2020-09-24 02:05:25 -05:00
Andrew Litteken beeceb92c0 [IRSim][NFC] Removing warning from IRSimilarityIdentifier 2020-09-24 00:26:32 -05:00
Andrew Litteken d1aa143aa8 [IRSim] Adding structural comparison to IRSimilarityCandidate.
Just because sequences of instructions are similar to one another,
doesn't mean they are doing the same thing.

This introduces a structural check for the IRSimilarityCandidate that
compares two IRSimilarityCandidates against one another, and in each
instruction creates a mapping between the operands and results, or
checks that the existing mapping is valid.  If this check passes, it
means we have structurally similar IRSimilarityCandidates.

Tests for whether the candidates are found in
unittests/Analysis/IRSimilarityIdentifierTest.cpp.

Recommit of: b27db2bb68 for Differential
URL.

Differential Revision: https://reviews.llvm.org/D86971
2020-09-23 22:42:30 -05:00
Andrew Litteken 0a8e097e72 Revert "[IRSim] Adding structural comparison to IRSimilarityCandidate."
This reverts commit b27db2bb68.
2020-09-23 22:40:37 -05:00
Andrew Litteken b27db2bb68 [IRSim] Adding structural comparison to IRSimilarityCandidate.
Just because sequences of instructions are similar to one another,
doesn't mean they are doing the same thing.

This introduces a structural check for the IRSimilarityCandidate that
compares two IRSimilarityCandidates against one another, and in each
instruction creates a mapping between the operands and results, or
checks that the existing mapping is valid.  If this check passes, it
means we have structurally similar IRSimilarityCandidates.

Tests for whether the candidates are found in
unittests/Analysis/IRSimilarityIdentifierTest.cpp.
2020-09-23 22:31:12 -05:00
Sam McCall fa69b60806 [JSON] Add error reporting to fromJSON and ObjectMapper
Translating between JSON objects and C++ strutctures is common.
From experience in clangd, fromJSON/ObjectMapper work well and save a lot of
code, but aren't adopted elsewhere at least partly due to total lack of error
reporting beyond "ok"/"bad".

The recently-added error model should be rich enough for most applications.
It requires tracking the path within the root object and reporting local
errors at appropriate places.
To do this, we exploit the fact that the call graph of recursive
parse functions mirror the structure of the JSON itself.
The current path is represented as a linked list of segments, each of which is
on the stack as a parameter. Concretely, fromJSON now looks like:
  bool fromJSON(const Value&, T&, Path);

Beyond the signature change, this is reasonably unobtrusive: building
the path segments is mostly handled by ObjectMapper and the vector<T> fromJSON.
However the root caller of fromJSON must now create a Root object to
store the errors, which is a little clunky.

I've added high-level parse<T>(StringRef) -> Expected<T>, but it's not
general enough to be the primary interface I think (at least, not usable in
clangd).

All existing users (mostly just clangd) are updated in this patch,
making this change backwards-compatible is a bit hairy.

Differential Revision: https://reviews.llvm.org/D88103
2020-09-24 01:20:09 +02:00
Arthur Eubanks 6b1ce83a12 [NewPM][CGSCC] Handle newly added functions in updateCGAndAnalysisManagerForPass
This seems to fit the CGSCC updates model better than calling
addNewFunctionInto{Ref,}SCC() on newly created/outlined functions.
Now addNewFunctionInto{Ref,}SCC() are no longer necessary.

However, this doesn't work on newly outlined functions that aren't
referenced by the original function. e.g. if a() was outlined into b()
and c(), but c() is only referenced by b() and not by a(), this will
trigger an assert.

This also fixes an issue I was seeing with newly created functions not
having passes run on them.

Ran check-llvm with expensive checks.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87798
2020-09-23 15:22:18 -07:00
Andrew Litteken 6ada9e516f [IRSim] Adding IRSimilarityCandidate that contains a region of IRInstructionData.
The IRSimilarityCandidate is a container to hold a region of
IRInstructions and offer interfaces for the starting instruction, ending
instruction, parent function, length.  It also assigns a global value
number for each unique instance of a value in the region.

It also contains an interface to compare two IRSimilarity as to whether
they have the same sequence of similar instructions.

Tests for whether the instructions are similar are found in
unittests/Analysis/IRSimilarityIdentifierTest.cpp.

Recommit of: 4944bb190f

Differential Revision: https://reviews.llvm.org/D86970
2020-09-23 13:43:34 -05:00
Sanjay Patel 6189a8d9f5 [TTI] add wrapper for matching vector reduction to reduce code duplication; NFC
I'm not sure what this means, but the order in which we try
the matches makes a difference on at least 1 regression test...
2020-09-23 13:48:57 -04:00
Andrew Litteken 88bc59c300 Revert "[IRSim] Adding IRSimilarityCandidate that contains a region of IRInstructionData."
This reverts commit 4944bb190f.
2020-09-22 21:02:34 -05:00
Andrew Litteken 4944bb190f [IRSim] Adding IRSimilarityCandidate that contains a region of IRInstructionData.
The IRSimilarityCandidate is a container to hold a region of
IRInstructions and offer interfaces for the starting instruction, ending
instruction, parent function, length.  It also assigns a global value
number for each unique instance of a value in the region.

It also contains an interface to compare two IRSimilarity as to whether
they have the same sequence of similar instructions.

Tests for whether the instructions are similar are found in
unittests/Analysis/IRSimilarityIdentifierTest.cpp.

Differential Revision: https://reviews.llvm.org/D86970
2020-09-22 18:42:31 -05:00
Stefanos Baziotis 89c1e35f3c [LoopInfo] empty() -> isInnermost(), add isOutermost()
Differential Revision: https://reviews.llvm.org/D82895
2020-09-22 23:28:51 +03:00
Max Kazantsev e2703c021d [SCEV] Handle `less` predicates for FoundPred = NE
Currently these predicates are ignored, yet their handling is
pretty simple. I could not find a single test where it would
actually change something, but it's only because isImpliedCondOperands
is not smart enough to prove it further on. Yet the situation when
we come there with `less` predicate is pretty common.

Differential Revision: https://reviews.llvm.org/D87890
Reviewed By: fhahn
2020-09-22 18:56:35 +07:00
Meera Nakrani a3d0dce260 [ARM][TTI] Prevents constants in a min(max) or max(min) pattern from being hoisted when in a loop
Changes TTI function getIntImmCostInst to take an additional Instruction parameter,
which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT.
We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated.
Required minor changes in some non-ARM backends to allow for the optional parameter to be included.

Differential Revision: https://reviews.llvm.org/D87457
2020-09-22 11:54:10 +00:00
Max Kazantsev 16fde88dbd [SCEV] Support unsigned predicates in isKnownPredicateViaNoOverflow
SCEV should be able to prove facts like `x <u x+1<nuw>`.

Differential Revision: https://reviews.llvm.org/D88015
Reviewed By: lebedev.ri
2020-09-22 17:14:05 +07:00
Arthur Eubanks 9db0c572c1 [Delinearization][NewPM] Port delinearization to NPM
Also make tests in Analysis/Delinearization work under NPM.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87741
2020-09-21 17:59:08 -07:00
Fangrui Song 8fdac7cb7a Revert D71539 "Recommit "[SCEV] Look through single value PHIs.""
This reverts commit 11dccf8d3a.

A bootstrapped clang crashes (due to ArrayRef::front called on an empty
ArrayRef) when compiling some files.  Very strangely, this only reproduces with
modules.

```
13 0x0000564d3349e968 llvm::ArrayRef<llvm::BasicBlock*>::front() const /proc/self/cwd/llvm/include/llvm/ADT/ArrayRef.h:160:7
14 0x0000564d3349e896 llvm::LoopBase<llvm::BasicBlock, llvm::Loop>::getHeader() const /proc/self/cwd/llvm/include/llvm/Analysis/LoopInfo.h:104:50
15 0x0000564d3349fd9d llvm::LoopBase<llvm::BasicBlock, llvm::Loop>::getLoopLatch() const /proc/self/cwd/llvm/include/llvm/Analysis/LoopInfoImpl.h:210:11
16 0x0000564d33593c8a llvm::ScalarEvolution::computeBackedgeTakenCount(llvm::Loop const*, bool) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:6933:15
17 0x0000564d33592ebc llvm::ScalarEvolution::getBackedgeTakenInfo(llvm::Loop const*) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:0:30
18 0x0000564d33593a54 llvm::ScalarEvolution::getBackedgeTakenCount(llvm::Loop const*, llvm::ScalarEvolution::ExitCountKind) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:6487:36
19 0x0000564d32be2402 llvm::ScalarEvolution::getConstantMaxBackedgeTakenCount(llvm::Loop const*) /proc/self/cwd/llvm/include/llvm/Analysis/ScalarEvolution.h:768:5
20 0x0000564d33590807 llvm::ScalarEvolution::getRangeRef(llvm::SCEV const*, llvm::ScalarEvolution::RangeSignHint) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:5495:19
21 0x0000564d320abab7 llvm::ScalarEvolution::getSignedRange(llvm::SCEV const*) /proc/self/cwd/llvm/include/llvm/Analysis/ScalarEvolution.h:840:12
22 0x0000564d335a03aa llvm::ScalarEvolution::isKnownPredicateViaConstantRanges(llvm::CmpInst::Predicate, llvm::SCEV const*, llvm::SCEV const*) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:9239:60
23 0x0000564d33586a80 llvm::ScalarEvolution::isKnownViaNonRecursiveReasoning(llvm::CmpInst::Predicate, llvm::SCEV const*, llvm::SCEV const*) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:10284:60
```
2020-09-21 17:21:43 -07:00
Roman Lebedev 0ab99bb314
[NFC][SCEV] Cleanup lowering of @llvm.uadd.sat, (-1 - V) is just ~V 2020-09-21 22:10:59 +03:00
Roman Lebedev 64e2cb7e96
[SCEV] Recognize @llvm.uadd.sat as `%y + umin(%x, (-1 - %y))`
----------------------------------------
define i32 @src(i32 %x, i32 %y) {
%0:
  %r = uadd_sat i32 %x, %y
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %t0 = sub nsw nuw i32 4294967295, %y
  %t1 = umin i32 %x, %t0
  %r = add nuw i32 %t1, %y
  ret i32 %r
}
Transformation seems to be correct!

The alternative, naive, lowering could be the following,
although i don't think it's better,
thought it will likely be needed for sadd/ssub/*shl:

----------------------------------------
define i32 @src(i32 %x, i32 %y) {
%0:
  %r = uadd_sat i32 %x, %y
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %t0 = zext i32 %x to i33
  %t1 = zext i32 %y to i33
  %t2 = add nuw i33 %t0, %t1
  %t3 = zext i32 4294967295 to i33
  %t4 = umin i33 %t2, %t3
  %r = trunc i33 %t4 to i32
  ret i32 %r
}
Transformation seems to be correct!
2020-09-21 20:25:54 +03:00
Roman Lebedev fedc9549d5
[SCEV] Recognize @llvm.usub.sat as `%x - (umin %x, %y)`
----------------------------------------
define i32 @src(i32 %x, i32 %y) {
%0:
  %r = usub_sat i32 %x, %y
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %t0 = umin i32 %x, %y
  %r = sub nuw i32 %x, %t0
  ret i32 %r
}
Transformation seems to be correct!
2020-09-21 20:25:54 +03:00
Roman Lebedev 1bb7ab8c4a
[SCEV] Recognize @llvm.abs as smax(x, -x)
As per alive2 (ignoring undef):

----------------------------------------
define i32 @src(i32 %x, i1 %y) {
%0:
  %r = abs i32 %x, 0
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i1 %y) {
%0:
  %neg_x = mul i32 %x, 4294967295
  %r = smax i32 %x, %neg_x
  ret i32 %r
}
Transformation seems to be correct!

----------------------------------------
define i32 @src(i32 %x, i1 %y) {
%0:
  %r = abs i32 %x, 1
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i1 %y) {
%0:
  %neg_x = mul nsw i32 %x, 4294967295
  %r = smax i32 %x, %neg_x
  ret i32 %r
}
Transformation seems to be correct!
2020-09-21 20:25:53 +03:00
Florian Hahn 11dccf8d3a Recommit "[SCEV] Look through single value PHIs."
This commit was originally because it was suspected to cause a crash,
but a reproducer did not surface.

A crash that was exposed by this change was fixed in 1d8f2e5292.

This reverts the revert commit 0581c0b0ee.
2020-09-21 11:59:50 +01:00
Nikita Popov 445db89b53 [LVI] Get value range from mask comparison
InstCombine likes to canonicalize comparisons of the form
X == C || X == C+1 into (X & -2) == C'. Make sure LVI can still
recover the value range from this. Can of course also be useful
for proper mask comparisons.

For the sake of clarity, the implementation goes through KnownBits
to compute the range.
2020-09-20 21:13:57 +02:00
Nikita Popov f94bbe19b6 [LVI] Refactor getValueFromICmpCondition (NFC)
Rewrite this in a way where the core logic is in a separate
function, that is invoked with swapped operands. This makes it
easier to add handling for additional icmp patterns.
2020-09-20 21:13:57 +02:00
Dávid Bolvanský 2990518b03 [MemLoc] Support lllvm.memcpy.inline in MemoryLocation::getForArgument
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87971
2020-09-20 14:01:48 +02:00
Fangrui Song 6913812abc Fix some clang-tidy bugprone-argument-comment issues 2020-09-19 20:41:25 -07:00
Dávid Bolvanský d716f1608c [MemLoc] Support bcmp in MemoryLocation::getForArgument
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87964
2020-09-19 17:12:43 +02:00
Sanjay Patel f74a334fe3 [ConstantFolding] add undef handling for fmin/fmax intrinsics
The output here may not be optimal (yet), but it should be
consistent for commuted operands (it was not before) and
correct. We can do better by checking FMF and NaN if needed.

Code in InstSimplify generally assumes that we have already
folded code like this, so it was not handling 2 constant
inputs by commuting consistently.
2020-09-19 10:31:01 -04:00
Andrew Litteken 132aaec4f2 [IRSim] Adding ilist for IRInstructionData.
The IRInstructionData structs are a different representation of the
program.  This list treats the program as if it was "flattened" and
the only parent is this list.  This lets us easily create ranges of
instructions.

Differential Revision: https://reviews.llvm.org/D86969
2020-09-19 00:18:39 -05:00
Vitaly Buka 97bfac076a [NFC][StackSafety] Replace auto with type
Fixes static analyzer is warning.
2020-09-18 17:10:28 -07:00