Commit Graph

17117 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin ca4bf58e4e [AMDGPU] Support unaligned flat scratch in TLI
Adjust SITargetLowering::allowsMisalignedMemoryAccessesImpl for
unaligned flat scratch support. Mostly needed for global isel.

Differential Revision: https://reviews.llvm.org/D93669
2020-12-22 16:12:31 -08:00
Sanjay Patel f6929c0195 [SLP] add reduction tests for maxnum/minnum intrinsics; NFC 2020-12-22 16:05:39 -05:00
Arnold Schwaighofer 333108e8be Add a llvm.coro.end.async intrinsic
The llvm.coro.end.async intrinsic allows to specify a function that is
to be called as the last action before returning. This function will be
inlined after coroutine splitting.

This function can contain a 'musttail' call to allow for guaranteed tail
calling as the last action.

Differential Revision: https://reviews.llvm.org/D93568
2020-12-22 10:52:28 -08:00
Florian Hahn ac90bbc9cb
[LoopDeletion] Add test case where outer loop needs to be deleted.
In the test case @test1, the inner loop cannot be removed, because it
has a live-out value. But the outer loop is a no-op and can be removed.
2020-12-22 17:49:20 +00:00
Philip Reames f106b281be [tests] precommit a test mentioned in review for D93317 2020-12-22 09:47:19 -08:00
Siddhesh Poyarekar 6fcb039956 Fold comparison of __builtin_object_size expression with -1 for non-const size
When __builtin_dynamic_object_size returns a non-constant expression, it cannot
be -1 since that is an invalid return value for object size. However since
passes running after the substitution don't know this, they are unable to
optimize away the comparison and hence the comparison and branch stays in there.
This change generates an appropriate call to llvm.assume to help the optimizer
folding the test.

glibc is considering adopting __builtin_dynamic_object_size for additional
protection[1] and this change will help reduce branching overhead in fortified
implementations of all of the functions that don't have the __builtin___*_chk
type builtins, e.g. __ppoll_chk.

Also remove the test limit-max-iterations.ll because it was deemed unnecessary
during review.

[1] https://sourceware.org/pipermail/libc-alpha/2020-November/120191.html

Differential Revision: https://reviews.llvm.org/D93015
2020-12-22 10:56:31 +01:00
Gil Rapaport a56280094e [LV] Avoid needless fold tail
When the trip-count is provably divisible by the maximal/chosen VF, folding the
loop's tail during vectorization is redundant. This commit extends the existing
test for constant trip-counts to any trip-count known to be divisible by
maximal/selected VF by SCEV.

Differential Revision: https://reviews.llvm.org/D93615
2020-12-22 10:25:20 +02:00
Simon Pilgrim 88c5b50060 [AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts (REAPPLIED)
The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns.

This should allow us to begin resolving PR46896 et al.

Ensure we block poison in a funnel shift value - similar to rG0fe91ad463fea9d08cbcd640a62aa9ca2d8d05e0

Reapplied with fix for PR48068 - we weren't checking that the shift values could be hoisted from their basicblocks.

Differential Revision: https://reviews.llvm.org/D90625
2020-12-21 15:22:27 +00:00
Sanjay Patel 38ca7face6 [InstSimplify] reduce logic with inverted add/sub ops
https://llvm.org/PR48559
This could be part of a larger ValueTracking API,
but I don't see that currently.

https://rise4fun.com/Alive/gR0

  Name: and
  Pre: C1 == ~C2
  %sub = add i8 %x, C1
  %sub1 = sub i8 C2, %x
  %r = and i8 %sub, %sub1
  =>
  %r = 0

  Name: or
  Pre: C1 == ~C2
  %sub = add i8 %x, C1
  %sub1 = sub i8 C2, %x
  %r = or i8 %sub, %sub1
  =>
  %r = -1

  Name: xor
  Pre: C1 == ~C2
  %sub = add i8 %x, C1
  %sub1 = sub i8 C2, %x
  %r = xor i8 %sub, %sub1
  =>
  %r = -1
2020-12-21 08:51:43 -05:00
Sanjay Patel d6118759f3 [InstSimplify] add tests for inverted logic operands; NFC 2020-12-21 08:51:42 -05:00
Arthur Eubanks 1a883484af [test] Fix reg-usage.ll under NPM
The -O2 isn't used in the test.
2020-12-20 15:41:29 -08:00
Andrew Litteken 7c6f28a438 [IROutliner] Deduplicating functions that only require inputs.
Extracted regions can have both inputs and outputs.  In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics).  We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch deduplicates
extracted functions that only have inputs and non of the special cases.

We test that correctly deduplicate in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Reviewers: jroelofs, paquette

Differential Revision: https://reviews.llvm.org/D86978
2020-12-19 17:34:34 -06:00
Andrew Litteken b8a2b6af37 Revert "[IROutliner] Deduplicating functions that only require inputs."
Missing reviewers and differential revision in commit message.

This reverts commit 5cdc4f57e5.
2020-12-19 17:33:49 -06:00
Andrew Litteken 5cdc4f57e5 [IROutliner] Deduplicating functions that only require inputs.
Extracted regions can have both inputs and outputs.  In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics).  We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch deduplicates
extracted functions that only have inputs and non of the special cases.

We test that correctly deduplicate in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
2020-12-19 17:26:29 -06:00
Roman Lebedev c043f5055e
[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 1
... for conditional branch case
2020-12-20 00:18:36 +03:00
Roman Lebedev 262ff9c23e
[SimplifyCFG] Teach TryToMergeLandingPad() to preserve DomTree 2020-12-20 00:18:36 +03:00
Roman Lebedev 6a1617d67c
[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 2
... for the custom case returning void.
2020-12-20 00:18:36 +03:00
Roman Lebedev b94520c9ee
[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 1
... for the general case of returning a value.
2020-12-20 00:18:35 +03:00
Roman Lebedev 83659c7076
[SimplifyCFG] simplifySingleResume(): FoldReturnIntoUncondBranch() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Apparently, there were no dedicated tests just for that functionality,
so i'm adding one here.
2020-12-20 00:18:34 +03:00
Roman Lebedev b7d00e29b7
[SimplifyCFG] Teach simplifySingleResume() to preserve DomTree 2020-12-20 00:18:34 +03:00
Roman Lebedev c209b88dd4
[SimplifyCFG] Teach simplifyCommonResume() to preserve DomTree 2020-12-20 00:18:34 +03:00
Roman Lebedev 76e74d9395
[SimplifyCFG] Teach removeEmptyCleanup() to preserve DomTree 2020-12-20 00:18:33 +03:00
Roman Lebedev 4be8707e64
[SimplifyCFG] Teach FoldTwoEntryPHINode() to preserve DomTree
Still boring, simply drop all edges to successors of DomBlock,
and add an edge to to BB instead.
2020-12-20 00:18:33 +03:00
Roman Lebedev b43b77ff9b
[NFCI][SimlifyCFG] simplifyOnce(): also perform DomTree validation
And that exposes that a number of tests don't *actually* manage to
maintain DomTree validity, which is inline with my observations.

Once again, SimlifyCFG pass currently does not require/preserve DomTree
by default, so this is effectively NFC.
2020-12-20 00:18:32 +03:00
Andrew Litteken c52bcf3a9b [IRSim][IROutliner] Limit to extracting regions that only require
inputs.

Extracted regions can have both inputs and outputs.  In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics).  We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch limits the
extracted regions to those that only require inputs, and do not have any
 other special cases.

We test that we do not outline the wrong constants in:
test/Transforms/IROutliner/outliner-different-constants.ll
test/Transforms/IROutliner/outliner-different-globals.ll
test/Transforms/IROutliner/outliner-constant-vs-registers.ll

We test that correctly outline in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Reviewers: paquette, plofti

Differential Revision: https://reviews.llvm.org/D86977
2020-12-19 13:33:54 -06:00
Juneyoung Lee 1e785e9262 apply update_test_checks.py to a few files in llvm/test/Transforms/InstCombine 2020-12-20 01:03:19 +09:00
Aditya Kumar 1ab4db0f84 [HotColdSplit] Reflect full cost of parameters in split penalty
Make the penalty for splitting a region more accurately reflect the cost
of materializing all of the inputs/outputs to/from the region.

This almost entirely eliminates code growth within functions which
undergo splitting in key internal frameworks, and reduces the size of
those frameworks between 2.6% to 3%.

rdar://49167240

Patch by: Vedant Kumar(@vsk)
Reviewers: hiraditya,rjf,t.p.northover
Reviewed By: hiraditya,rjf

Differential Revision: https://reviews.llvm.org/D59715
2020-12-18 17:06:17 -08:00
Akira Hatanaka ffd982f7db [ObjC][ARC] Fix a bug where the inline-asm retain/claim RV marker wasn't
inserted when the original call had a 'returned' argument

The code is testing whether the instruction BBI points to is the call
that is paired up with the retainRV/claimRV call, but it doesn't work
when the call has a 'returned' argument since GetArgRCIdentityRoot looks
through 'returned' arguments.

rdar://72485383
2020-12-18 16:59:06 -08:00
Nikita Popov 2af2f58ec0 [InstCombine] Regenerate test checks (NFC) 2020-12-18 20:55:26 +01:00
Nikita Popov 1f1145006b [DSE] Use correct memory location for read clobber check
MSSA DSE starts at a killing store, finds an earlier store and
then checks that the earlier store is not read along any paths
(without being killed first). However, it uses the memory location
of the killing store for that, not the earlier store that we're
attempting to eliminate.

This has a number of problems:

* Mismatches between what BasicAA considers aliasing and what DSE
  considers an overwrite (even though both are correct in isolation)
  can result in miscompiles. This is PR48279, which D92045 tries to
  fix in a different way. The problem is that we're using a location
  from a store that is potentially not executed and thus may be UB,
  in which case analysis results can be arbitrary.
* Metadata on the killing store may be used to determine aliasing,
  but there is no guarantee that the metadata is valid, as the specific
  killing store may not be executed. Using the metadata on the earlier
  store is valid (it is the store we're removing, so on any execution
  where its removal may be observed, it must be executed).
* The location is imprecise. For full overwrites the killing store
  will always have a location that is larger or equal than the earlier
  access location, so it's beneficial to use the earlier access
  location. This is not the case for partial overwrites, in which
  case either location might be smaller. There is some room for
  improvement here.

Using the earlier access location means that we can no longer cache
which accesses are read for a given killing store, as we may be
querying different locations. However, it turns out that simply
dropping the cache has no notable impact on compile-time.

Differential Revision: https://reviews.llvm.org/D93523
2020-12-18 20:26:53 +01:00
Roman Lebedev 0e94ba9d40
[NFC][InstCombine] Fixup check lines for prof md in select_meta.ll test 2020-12-18 21:33:30 +03:00
Roman Lebedev 897c985e1e
[InstCombine] Canonicalize SPF to abs intrinsic
This patch enables canonicalization of SPF_ABS and SPF_ABS
to the abs intrinsic.

This is a recommit, the original try was
05d4c4ebc2,
but it was reverted due to an apparent miscompile,
which since then has just been fixed by the previous commit.

Differential Revision: https://reviews.llvm.org/D87188
2020-12-18 21:18:14 +03:00
Roman Lebedev e9289dc25f
[InstSimplify] Don't miscompile `X == 0 ? abs(X) : -abs(X) --> -abs(X)` xform
The transform wasn't checking that the LHS of the comparison
*is* the `X` in question...
This is the miscompile that was holding up D87188.

Thanks to Dave Green for producing an actionable reproducer!
2020-12-18 21:18:13 +03:00
Roman Lebedev 9b183a1452
[NFC][InstSimplify] Add miscompiled testcase from D87188/D87197
Thanks to Dave Green for producing an actionable reproducer!
It is (obviously) a miscompile:
```
----------------------------------------
define i32 @select_abs_of_abs_eq_wrong(i32 %x, i32 %y) {
%0:
  %abs = abs i32 %x, 0
  %neg = sub i32 0, %abs
  %cmp = icmp eq i32 %y, 0
  %sel = select i1 %cmp, i32 %neg, i32 %abs
  ret i32 %sel
}
=>
define i32 @select_abs_of_abs_eq_wrong(i32 %x, i32 %y) {
%0:
  %abs = abs i32 %x, 0
  ret i32 %abs
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
i32 %x = #xe0000000 (3758096384, -536870912)
i32 %y = #x00000000 (0)

Source:
i32 %abs = #x20000000 (536870912)
i32 %neg = #xe0000000 (3758096384, -536870912)
i1 %cmp = #x1 (1)
i32 %sel = #xe0000000 (3758096384, -536870912)

Target:
i32 %abs = #x20000000 (536870912)
Source value: #xe0000000 (3758096384, -536870912)
Target value: #x20000000 (536870912)

Alive2: Transform doesn't verify!

```
2020-12-18 21:18:13 +03:00
Florian Hahn a74941da71
Revert "[BasicAA] Handle two unknown sizes for GEPs"
Temporarily revert commit 8b1c4e310c.

After 8b1c4e310c the compile-time for `MultiSource/Benchmarks/MiBench/consumer-lame`
dramatically increases with -O3 & LTO, causing issues for builders with
that configuration.

I filed PR48553 with a smallish reproducer that shows a 10-100x compile
time increase.
2020-12-18 17:59:12 +00:00
Whitney Tsang 2a814cd9e1 Ensure SplitEdge to return the new block between the two given blocks
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.

Below is an example of splitting the block bb1 at its first instruction.

/// Original IR
bb0:
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
	br bb1
bb1:
	br bb1.split
bb1.split:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
	br bb1.split
bb1.split
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:

Differential Revision: https://reviews.llvm.org/D92200
2020-12-18 17:37:17 +00:00
Xun Li 4652718ee3 Cleanup coro-inline.ll
Following up with the comments in D92706.
- Use -passes instead of -enable-new-pm
- CoroEarly should happen before AlwaysInliner, adjust it.
- Remove some unnecessary barriers (still kept one)
- Cleanup unnecessary debug info

Differential Revision: https://reviews.llvm.org/D93342
2020-12-18 08:05:04 -08:00
Sanjay Patel 47aaa99c0e [VectorCombine] allow peeking through GEPs when creating a vector load
This is an enhancement motivated by https://llvm.org/PR16739
(see D92858 for another).

We can look through a GEP to find a base pointer that may be
safe to use for a vector load. If so, then we shuffle (shift)
the necessary vector element over to index 0.

Alive2 proof based on 1 of the regression tests:
https://alive2.llvm.org/ce/z/yPJLkh

The vector translation is independent of endian (verify by
changing to leading 'E' in the datalayout string).

Differential Revision: https://reviews.llvm.org/D93229
2020-12-18 09:25:03 -05:00
Yevgeny Rouban 324d96b637 [IndVars] A test for adding trunc instructions to unwind blocks
Differential Revision: https://reviews.llvm.org/D93521
Reviewed By: skatkov
2020-12-18 17:08:26 +07:00
Andrew Litteken cea807602a [IRSim][IROutliner] Adding InstVisitor to disallow certain operations.
This adds a custom InstVisitor to return false on instructions that
should not be allowed to be outlined.  These match the illegal
instructions in the IRInstructionMapper with exception of the addition
of the llvm.assume intrinsic.

Tests all the tests marked: illegal-*-.ll with a test for each kind of
instruction that has been marked as illegal.

Reviewers: jroelofs, paquette

Differential Revisions: https://reviews.llvm.org/D86976
2020-12-17 19:33:57 -06:00
Nikita Popov 3d56644f18 [DSE] Add test for potential caching bug (NFC)
This one would miscompile if read-clobber checks switched to using
the EarlierAccess location, but the read cache was retained.
2020-12-17 23:35:01 +01:00
Sanjay Patel 71a1b9fe76 [VectorCombine] add tests for gep load with cast; NFC 2020-12-17 16:40:55 -05:00
Roman Lebedev 2d07414ee5
[SimplifyCFG] Teach simplifyUnreachable() to preserve DomTree
Pretty boring, removeUnwindEdge() already known how to update DomTree,
so if we are to call it, we must first flush our own pending updates;
otherwise, we just stop predecessors from branching to us,
and for certain predecessors, stop their predecessors from
branching to them also.
2020-12-18 00:37:22 +03:00
Roman Lebedev 2ee724863e
[SimplifyCFG] ConstantFoldTerminator() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a number of tests,
all of which are marked as such so that they do not regress.
2020-12-18 00:37:22 +03:00
Roman Lebedev 164e0847a5
[SimplifyCFG] DeleteDeadBlock() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-18 00:37:21 +03:00
Bangtian Liu 511cfe9441 Revert "Ensure SplitEdge to return the new block between the two given blocks"
This reverts commit d20e0c3444.
2020-12-17 21:00:37 +00:00
Nikita Popov 1b84934f90 [DSE] Add more tests for read clobber location (NFC) 2020-12-17 21:03:00 +01:00
Andrew Litteken dae34463e3 [IRSim][IROutliner] Adding the extraction basics for the IROutliner.
Extracting the similar regions is the first step in the IROutliner.

Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed.  Each
IRSimilarityCandidate is used to define an OutlinableRegion.  Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.

Each region is then extracted with the CodeExtractor into its own
function.

We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Recommit of bf899e8913 fixing memory
leaks.

Reviewers: paquette, jroelofs, yroux

Differential Revision: https://reviews.llvm.org/D86975
2020-12-17 11:27:26 -06:00
Nabeel Omer df2b9a3e02 [DebugInfo] Avoid re-ordering assignments in LCSSA
The LCSSA pass makes use of a function insertDebugValuesForPHIs() to
propogate dbg.value() intrinsics to newly inserted PHI instructions. Faulty
behaviour occurs when the parent PHI of a newly inserted PHI is not the
most recent assignment to a source variable. insertDebugValuesForPHIs ends
up propagating a value that isn't the most recent assignemnt.

This change removes the call to insertDebugValuesForPHIs() from LCSSA,
preventing incorrect dbg.value intrinsics from being propagated.
Propagating variable locations between blocks will occur later, during
LiveDebugValues.

Differential Revision: https://reviews.llvm.org/D92576
2020-12-17 16:17:32 +00:00
Bangtian Liu d20e0c3444 Ensure SplitEdge to return the new block between the two given blocks
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.

Below is an example of splitting the block bb1 at its first instruction.

/// Original IR
bb0:
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
	br bb1
bb1:
	br bb1.split
bb1.split:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
	br bb1.split
bb1.split
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:

Differential Revision: https://reviews.llvm.org/D92200
2020-12-17 16:00:15 +00:00
Florian Hahn 01089c876b
[InstCombine] Preserve !annotation on newly created instructions.
If the source instruction has !annotation metadata, all instructions
created during combining should also have it. Tell the builder to
add it.

The !annotation system was discussed on llvm-dev as part of
'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)

This patch is based on an earlier patch by Francis Visoiu Mistrih.

Reviewed By: thegameg, lebedev.ri

Differential Revision: https://reviews.llvm.org/D91444
2020-12-17 15:20:23 +00:00
Florian Hahn 75c04bfc61
[SimplifyCFG] Preserve !annotation in FoldBranchToCommonDest.
When folding a branch to a common destination, preserve !annotation on
the created instruction, if the terminator of the BB that is going to be
removed has !annotation. This should ensure that !annotation is attached
to the instructions that 'replace' the original terminator.

Reviewed By: jdoerfert, lebedev.ri

Differential Revision: https://reviews.llvm.org/D93410
2020-12-17 14:06:58 +00:00
Jun Ma 0138399903 [InstCombine] Remove scalable vector restriction in InstCombineCasts
Differential Revision: https://reviews.llvm.org/D93389
2020-12-17 22:02:33 +08:00
Cullen Rhodes 1fd3a04775 [LV] Disable epilogue vectorization for scalable VFs
Epilogue vectorization doesn't support scalable vectorization factors
yet, disable it for now.

Reviewed By: sdesmalen, bmahjour

Differential Revision: https://reviews.llvm.org/D93063
2020-12-17 12:14:03 +00:00
Florian Hahn eba09a2db9
[InstCombine] Preserve !annotation for newly created instructions.
When replacing an instruction with !annotation with a newly created
replacement, add the !annotation metadata to the replacement.

This mostly covers cases where the new instructions are created using
the ::Create helpers. Instructions created by IRBuilder will be handled
by D91444.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D93399
2020-12-17 09:06:51 +00:00
Hongtao Yu ac068e014b [CSSPGO] Consume pseudo-probe-based AutoFDO profile
This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`.

Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites).

**Adjustment to profile format **

A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like
```
!CFGChecksum: 12345
```
is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums.

Differential Revision: https://reviews.llvm.org/D92347
2020-12-16 15:57:18 -08:00
alex-t 35ec3ff76d Disable Jump Threading for the targets with divergent control flow
Details: Jump Threading does not make sense for the targets with divergent CF
         since they do not use branch prediction for speculative execution.
         Also in the high level IR there is no enough information to conclude that the branch is divergent or uniform.
         This may cause errors in further CF lowering.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93302
2020-12-17 02:40:54 +03:00
Roman Lebedev d22a47e9ff
[SimplifyCFG] Teach mergeEmptyReturnBlocks() to preserve DomTree
A first real transformation that didn't already knew how to do that,
but it's pretty tame - either change successor of all the predecessors
of a block and carefully delay deletion of the block until afterwards
the DomTree updates are appled, or add a successor to the block.

There wasn't a great test coverage for this, so i added extra, to be sure.
2020-12-17 01:03:50 +03:00
Roman Lebedev 5cce4aff18
[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-17 01:03:49 +03:00
Roman Lebedev 49dac4aca0
[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-17 01:03:49 +03:00
Roman Lebedev 4fc169f664
[SimplifyCFG] removeUnreachableBlocks() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Apparently, there were no dedicated tests just for that functionality,
so i'm adding one here.
2020-12-17 01:03:49 +03:00
Roman Lebedev aa2009fe78
[NFCI][SimplifyCFG] Mark all the SimplifyCFG tests that already don't invalidate DomTree as such
First step after e113317958,
in these tests, DomTree is valid afterwards, so mark them as such,
so that they don't regress.

In further steps, SimplifyCFG transforms shall taught to preserve DomTree,
in as small steps as possible.
2020-12-17 01:03:49 +03:00
Rong Xu 0abd744597 [PGO] Use the sum of profile counts to fix the function entry count
Raw profile count values for each BB are not kept after profile
annotation. We record function entry count and branch weights
and use them to compute the count when needed.  This mechanism
works well in a perfect world, but often breaks in real programs,
because of number prevision, inconsistent profile, or bugs in
BFI). This patch uses sum of profile count values to fix
function entry count to make the BFI count close to real profile
counts.

Differential Revision: https://reviews.llvm.org/D61540
2020-12-16 13:37:43 -08:00
Sanjay Patel 46c331bf26 [VectorCombine] adjust test alignments for better coverage; NFC 2020-12-16 16:30:45 -05:00
Sanjay Patel 38ebc1a13d [VectorCombine] optimize alignment for load transform
Here's another minimal step suggested by D93229 / D93397 .
(I'm trying to be extra careful in these changes because
load transforms are easy to get wrong.)

We can optimistically choose the greater alignment of a
load and its pointer operand. As the test diffs show, this
can improve what would have been unaligned vector loads
into aligned loads.

When we enhance with gep offsets, we will need to adjust
the alignment calculation to include that offset.

Differential Revision: https://reviews.llvm.org/D93406
2020-12-16 15:25:45 -05:00
Florian Hahn 70bd75426e [SimplifyCFG] Precommit test for preserving !annotation. 2020-12-16 18:47:45 +00:00
Sanjay Patel aaaf0ec72b [VectorCombine] loosen alignment constraint for load transform
As discussed in D93229, we only need a minimal alignment constraint
when querying whether a hypothetical vector load is safe. We still
pass/use the potentially stronger alignment attribute when checking
costs and creating the new load.

There's already a test that changes with the minimum code change,
so splitting this off as a preliminary commit independent of any
gep/offset enhancements.

Differential Revision: https://reviews.llvm.org/D93397
2020-12-16 12:25:18 -05:00
Florian Hahn 4a6a4e573f
[InstCombine] Precommit tests for !annotation metadata handling. 2020-12-16 15:43:30 +00:00
Bangtian Liu c10757200d Revert "Ensure SplitEdge to return the new block between the two given blocks"
This reverts commit cf638d793c.
2020-12-16 11:52:30 +00:00
Bangtian Liu cf638d793c Ensure SplitEdge to return the new block between the two given blocks
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.

Below is an example of splitting the block bb1 at its first instruction.

/// Original IR
bb0:
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
	br bb1
bb1:
	br bb1.split
bb1.split:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
	br bb1.split
bb1.split
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:

Differential Revision: https://reviews.llvm.org/D92200
2020-12-15 23:32:29 +00:00
Johannes Doerfert dcaec81211 [OpenMP] Use assumptions during ICV tracking
The OpenMP 5.1 assumptions `no_openmp` and `no_openmp_routines` allow us
to ignore calls that would otherwise prevent ICV tracking.

Once we track more ICVs we might need to distinguish the ones that could
be impacted even with `no_openmp_routines`.

Reviewed By: sstefan1

Differential Revision: https://reviews.llvm.org/D92050
2020-12-15 16:51:34 -06:00
Roman Lebedev a7deedc414
[NFC][Tests][SimplifyCFG] Trim whitespaces at the end of lines 2020-12-16 00:38:00 +03:00
Philip Reames a048e2fa1d [tests] fix an accidental target dependence added in 99ac8868 2020-12-15 11:07:30 -08:00
Philip Reames 99ac8868cf [tests][LV] precommit tests for D93317 2020-12-15 10:53:34 -08:00
Jun Ma 52a3267ffa [InstCombine] Remove scalable vector restriction in foldVectorBinop
Differential Revision: https://reviews.llvm.org/D93289
2020-12-15 21:14:59 +08:00
Florian Hahn 0e0295fd61
[LV] Pass explicit vector width to not require a X86 target. 2020-12-15 12:52:22 +00:00
Jun Ma e12f584578 [InstCombine] Remove scalable vector restriction in InstCombineCompares
Differential Revision: https://reviews.llvm.org/D93269
2020-12-15 20:36:57 +08:00
Jun Ma 2ac58e21a1 [InstCombine] Remove scalable vector restriction when fold SelectInst
Differential Revision: https://reviews.llvm.org/D93083
2020-12-15 20:36:57 +08:00
Paul Walker 6d35bd1d48 [CodeGenPrepare] Update optimizeGatherScatterInst for scalable vectors.
optimizeGatherScatterInst does nothing specific to fixed length vectors
but uses FixedVectorType to extract the number of elements.  This patch
simply updates the code to use VectorType and getElementCount instead.

For testing I just copied Transforms/CodeGenPrepare/X86/gather-scatter-opt.ll
replacing `<4 x ` with `<vscale x 4`.

Differential Revision: https://reviews.llvm.org/D92572
2020-12-15 10:57:51 +00:00
Florian Hahn 8a7e770638
[LV] Add reduction test, which exposed a crash in a pending patch. 2020-12-15 09:42:00 +00:00
Max Kazantsev 8b330f1f69 [SCEV] Add missing type check into getRangeForAffineNoSelfWrappingAR
We make type widening without checking if it's needed. Bail if the max
iteration count is wider than AR's type.
2020-12-15 14:50:32 +07:00
Max Kazantsev 2fc2e6de82 [Test] Test on assertion failure with expensive SCEV range inference 2020-12-15 13:47:19 +07:00
Rong Xu 54e03d03a7 [PGO] Verify BFI counts after loading profile data
This patch adds the functionality to compare BFI counts with real
profile
counts right after reading the profile. It will print remarks under
-Rpass-analysis=pgo, or the internal option -pass-remarks-analysis=pgo.

Differential Revision: https://reviews.llvm.org/D91813
2020-12-14 15:56:10 -08:00
Sanjay Patel 8593e197bc [VectorCombine] add alignment test for gep load; NFC 2020-12-14 18:31:19 -05:00
Sanjay Patel d399f870b5 [VectorCombine] make load transform poison-safe
As noted in D93229, the transform from scalar load to vector load
potentially leaks poison from the extra vector elements that are
being loaded.

We could use freeze here (and x86 codegen at least appears to be
the same either way), but we already have a shuffle in this logic
to optionally change the vector size, so let's allow that
instruction to serve both purposes.

Differential Revision: https://reviews.llvm.org/D93238
2020-12-14 17:42:01 -05:00
Craig Topper 25067f179f [LoopIdiomRecognize] Teach detectShiftUntilZeroIdiom to recognize loops where the counter is decrementing.
This adds support for loops like

unsigned clz(unsigned x) {
    unsigned w = sizeof (x) * CHAR_BIT;
    while (x) {
        w--;
        x >>= 1;
    }

    return w;
}

and

unsigned clz(unsigned x) {
    unsigned w = sizeof (x) * CHAR_BIT - 1;
    while (x >>= 1) {
        w--;
    }

    return w;
}

To support these we look for add x, -1 as well as add x, 1 that
we already matched. If the value was -1 we need to subtract from
the initial counter value instead of adding to it.

Fixes PR48404.

Differential Revision: https://reviews.llvm.org/D92745
2020-12-14 14:25:05 -08:00
Sanjay Patel 9c1765acab [VectorCombine] add test for load with offset; NFC 2020-12-14 14:40:06 -05:00
Roman Lebedev 59560e8589
[SimplifyCFG] FoldBranchToCommonDest(): temporairly put back restrictions on liveout uses of bonus instructions (PR48450)
Even though d38205144f was mostly a correct
fix for the external non-PHI users, it's not a *generally* correct fix,
because the 'placeholder' values in those trivial PHI's we create
shouldn't be *always* 'undef', but the PHI itself for the backedges,
else we end up with wrong value, as the `@pr48450_2` test shows.

But we can't just do that, because we can't check that the PHI
can be it's own incoming value when coming from certain predecessor,
because we don't have a dominator tree.

So until we can address this correctness problem properly,
ensure that we don't perform the transformation
if there are such problematic external uses.

Making dominator tree available there is going to be involved,
since `-simplifycfg` pass currently does not preserve/update domtree...
2020-12-14 20:14:31 +03:00
Roman Lebedev effbbdec6e
[NFC][SimplifyCFG] Add another miscompiled test for PR48450 2020-12-14 20:14:31 +03:00
Stanislav Mekhanoshin 87d7757bbe [SLP] Control maximum vectorization factor from TTI
D82227 has added a proper check to limit PHI vectorization to the
maximum vector register size. That unfortunately resulted in at
least a couple of regressions on SystemZ and x86.

This change reverts PHI handling from D82227 and replaces it with
a more general check in SLPVectorizerPass::tryToVectorizeList().
Moved to tryToVectorizeList() it allows to restart vectorization
if initial chunk fails.

However, this function is more general and handles not only PHI
but everything which SLP handles. If vectorization factor would
be limited to maximum vector register size it would limit much
more vectorization than before leading to further regressions.
Therefore a new TTI callback getMaximumVF() is added with the
default 0 to preserve current behavior and limit nothing. Then
targets can decide what is better for them.

The callback gets ElementSize just like a similar getMinimumVF()
function and the main opcode of the chain. The latter is to avoid
regressions at least on the AMDGPU. We can have loads and stores
up to 128 bit wide, and <2 x 16> bit vector math on some
subtargets, where the rest shall not be vectorized. I.e. we need
to differentiate based on the element size and operation itself.

Differential Revision: https://reviews.llvm.org/D92059
2020-12-14 08:49:40 -08:00
Markus Lavin 2a6782bb9f Reland [DebugInfo] Improve dbg preservation in LSR.
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo).  Before rewrite (but after introduction of new
induction variables) use SCEV to compute an equivalent set of values for
each @llvm.dbg.value in the loop body (among the loop header PHI-nodes).
After rewrite (and dead PHI elimination) update those @llvm.dbg.value
now referencing undef by picking a remaining value from its equivalence
set.  Allow match with offset by inserting compensation code in the
DIExpression.

Fixes : PR38815

Differential Revision: https://reviews.llvm.org/D87494
2020-12-14 16:15:18 +01:00
Anton Afanasyev fac7c7ec3c [SLP] Fix vector element size for the store chains
Vector element size could be different for different store chains.
This patch prevents wrong computation of maximum number of elements
for that case.

Differential Revision: https://reviews.llvm.org/D93192
2020-12-14 15:51:43 +03:00
Simon Pilgrim 5a02bf4f95 [IRCE] Add test case for PR48051 2020-12-14 12:01:19 +00:00
Craig Topper 2acd5a4738 [LoopIdiom] Pre-commit tests for D92745. NFC 2020-12-13 23:25:00 -08:00
Anton Afanasyev b8c847ee73 [SLP][Test] Precommit test for D93192
This test shows failure of combined stores chains vectorization
2020-12-14 09:23:47 +03:00
Nikita Popov 22dba707b0 [AC] Handle (X+C1)<C2 assumes (PR48408)
InstCombine canonicalizes X>C && X<C' style comparisons into
(X+C1)<C2. This type of expression is recognized by some analyses
like LVI, but currently not when used inside assumptions, because
AssumptionCache does not track affected values for it.
2020-12-13 21:00:32 +01:00
Roman Lebedev d38205144f
[SimplifyCFG] FoldBranchToCommonDest(): bonus instrns must only be used by PHI nodes in successors (PR48450)
In particular, if the successor block, which is about to get a new
predecessor block, currently only has a single predecessor,
then the bonus instructions will be directly used within said successor,
which is fine, since the block with bonus instructions dominates that
successor. But once there's a new predecessor, the IR is no longer valid,
and we don't fix it, because we only update PHI nodes.

Which means, the live-out bonus instructions must be exclusively used
by the PHI nodes in successor blocks. So we have to form trivial PHI nodes.
which will then be successfully updated to recieve cloned bonus instns.

This all works fine, except for the fact that we don't have access to
the dominator tree, and we don't ignore unreachable code,
so we sometimes do end up having to deal with some weird IR.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48450
2020-12-13 00:06:57 +03:00
Nikita Popov afbb6d97b5 [CVP] Simplify and generalize switch handling
CVP currently handles switches by checking an equality predicate
on all edges from predecessor blocks. Of course, this can only
work if the value being switched over is defined in a different block.

Replace this implementation with a call to getPredicateAt(), which
also does the predecessor edge predicate check (if not defined in
the same block), but can also do quite a bit more: It can reason
about phi-nodes by checking edge predicates for incoming values,
it can reason about assumes, and it can reason about block values.

As such, this makes the implementation both simpler and more
powerful. The compile-time impact on CTMark is in the noise.
2020-12-12 21:12:27 +01:00
Nikita Popov ff523aa441 [CVP] Add additional switch tests (NFC)
These cover cases handled by getPredicateAt(), but not by the
current implementation:

 * Assumes based on context instruction.
 * Value from phi node in same block (using per-pred reasoning).
 * Value from non-phi node in same block (using block-val reasoning).
2020-12-12 20:58:00 +01:00
David Green ab97c9bdb7 [LV] Fix scalar cost for tail predicated loops
When it comes to the scalar cost of any predicated block, the loop
vectorizer by default regards this predication as a sign that it is
looking at an if-conversion and divides the scalar cost of the block by
2, assuming it would only be executed half the time. This however makes
no sense if the predication has been introduced to tail predicate the
loop.

Original patch by Anna Welker

Differential Revision: https://reviews.llvm.org/D86452
2020-12-12 14:21:40 +00:00