Commit Graph

17117 Commits

Author SHA1 Message Date
Arthur Eubanks 85af1d6257 [test] Fix pr45360.ll under NPM
The IR is the same under the NPM, but some basic block labels and value
names are different.
2020-12-28 14:42:52 -08:00
Andrew Litteken e6ae623314 [IROutliner] Adding support for consolidating functions with different output arguments.
Certain regions can have values introduced inside the region that are
used outside of the region. These may not be the same for each similar
region, so we must create one over arching set of arguments for the
consolidated function.

We do this by iterating over the outputs for each extracted function,
and creating as many different arguments to encapsulate the different
outputs sets. For each output set, we create a different block with the
necessary stores from the value to the output register. There is then
one switch statement, controlled by an argument to the function, to
differentiate which block to use.

Changed Tests for consistency:
llvm/test/Transforms/IROutliner/extraction.ll
llvm/test/Transforms/IROutliner/illegal-assumes.ll
llvm/test/Transforms/IROutliner/illegal-memcpy.ll
llvm/test/Transforms/IROutliner/illegal-memmove.ll
llvm/test/Transforms/IROutliner/illegal-vaarg.ll

Tests to test new functionality:
llvm/test/Transforms/IROutliner/outlining-different-output-blocks.ll
llvm/test/Transforms/IROutliner/outlining-remapped-outputs.ll
llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll

Reviewers: jroelofs, paquette

Differential Revision: https://reviews.llvm.org/D87296
2020-12-28 16:17:07 -06:00
Nikita Popov 4a16c507cb [InstCombine] Disable unsafe select transform behind a flag
This disables the poison-unsafe select -> and/or transform behind
a flag (we continue to perform the fold by default). This is intended
to simplify evaluation and testing while we teach various passes
to directly recognize the select pattern.

This only disables the main select -> and/or transform. A number of
related ones are instead changed to canonicalize to the a ? b : false
and a ? true : b forms which represent and/or respectively. This
requires a bit of care to avoid infinite loops, as we do not want
!a ? b : false to be converted into a ? false : b.

The basic idea here is the same as D93065, but keeps the change
behind a flag for now.

Differential Revision: https://reviews.llvm.org/D93840
2020-12-28 22:43:52 +01:00
Sanjay Patel 236c4524a7 [InstSimplify] remove ctpop of 1 (low) bit
https://llvm.org/PR48608

As noted in the test comment, we could handle a more general
case in instcombine and remove this, but I don't have evidence
that we need to do that.

https://alive2.llvm.org/ce/z/MRW9gD
2020-12-28 16:06:20 -05:00
Sanjay Patel 1351f719d4 [InstSimplify] add tests for ctpop; NFC (PR48608) 2020-12-28 16:06:19 -05:00
Roman Lebedev ef93f7a11c
[SimplifyCFG] FoldBranchToCommonDest: gracefully handle unreachable code ()
We might be dealing with an unreachable code,
so the bonus instruction we clone might be self-referencing.

There is a sanity check that all uses of bonus instructions
that are not in the original block with said bonus instructions
are PHI nodes, and that is obviously not the case
for self-referencing instructions..

So if we find such an use, just rewrite it.

Thanks to Mikael Holmén for the reproducer!

Fixes https://bugs.llvm.org/show_bug.cgi?id=48450#c8
2020-12-28 23:31:19 +03:00
Philip Reames 4b33b23877 Reapply "[LV] Vectorize (some) early and multiple exit loops"" w/fix for builder
This reverts commit 4ffcd4fe9a thus restoring e4df6a40da.

The only change from the original patch is to add "llvm::" before the call to empty(iterator_range).  This is a speculative fix for the ambiguity reported on some builders.
2020-12-28 10:13:28 -08:00
Arthur Eubanks 4ffcd4fe9a Revert "[LV] Vectorize (some) early and multiple exit loops"
This reverts commit e4df6a40da.

Breaks Windows bots, e.g. http://45.33.8.238/win/30472/step_4.txt
and http://lab.llvm.org:8011/#/builders/83/builds/2078/steps/5/logs/stdio
2020-12-28 10:05:41 -08:00
Philip Reames e4df6a40da [LV] Vectorize (some) early and multiple exit loops
This patch is a major step towards supporting multiple exit loops in the vectorizer. This patch on it's own extends the loop forms allowed in two ways:

    single exit loops which are not bottom tested
    multiple exit loops w/ a single exit block reached from all exits and no phis in the exit block (because of LCSSA this implies no values defined in the loop used later)

The restrictions on multiple exit loop structures will be removed in follow up patches; disallowing cases for now makes the code changes smaller and more obvious. As before, we can only handle loops with entirely analyzable exits. Removing that restriction is much harder, and is not part of currently planned efforts.

The basic idea here is that we can force the last iteration to run in the scalar epilogue loop (if we have one). From the definition of SCEV's backedge taken count, we know that no earlier iteration can exit the vector body. As such, we can leave the decision on which exit to be taken to the scalar code and generate a bottom tested vector loop which runs all but the last iteration.

The existing code already had the notion of requiring one iteration in the scalar epilogue, this patch is mainly about generalizing that support slightly, making sure we don't try to use this mechanism when tail folding, and updating the code to reflect the difference between a single exit block and a unique exit block (very mechanical).

Differential Revision: https://reviews.llvm.org/D93317
2020-12-28 09:40:42 -08:00
Roman Lebedev d4ccef38d0
[InstCombine] 'hoist xor-by-constant from xor-by-value': ignore constantexprs
As it is being reported (in post-commit review) in
https://reviews.llvm.org/D93857
this fold (as i expected, but failed to come up with test coverage
despite trying) has issues with constant expressions.
Since we only care about true constants, which constantexprs are not,
don't perform such hoisting for constant expressions.
2020-12-28 20:15:20 +03:00
Juneyoung Lee 9d70dbdc2b [InstCombine] use poison as placeholder for undemanded elems
Currently undef is used as a don’t-care vector when constructing a vector using a series of insertelement.
However, this is problematic because undef isn’t undefined enough.
Especially, a sequence of insertelement can be optimized to shufflevector, but using undef as its placeholder makes shufflevector a poison-blocking instruction because undef cannot be optimized to poison.
This makes a few straightforward optimizations incorrect, such as:

```
;  https://bugs.llvm.org/show_bug.cgi?id=44185

define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) {
  %xv = insertelement <4 x float> %q, float %x, i32 2
  %r = shufflevector <4 x float> %y, <4 x float> %xv, <4 x i32> { 0, 6, 2, undef }
  ret <4 x float> %r ; %r[3] is undef
}
=>
define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) {
  %r = insertelement <4 x float> %y, float %x, i32 1
  ret <4 x float> %r ; %r[3] = %y[3], incorrect if %y[3] = poison
}

Transformation doesn't verify!
ERROR: Target is more poisonous than source
```

I’d like to suggest
1. Using poison as insertelement’s placeholder value (IRBuilder::CreateVectorSplat should be patched too)
2. Updating shufflevector’s semantics to return poison element if mask is undef

Note that poison is currently lowered into UNDEF in SelDag, so codegen part is okay.
m_Undef() matches PoisonValue as well, so existing optimizations will still fire.

The only concern is hidden miscompilations that will go incorrect when poison constant is given.
A conservative way is copying all tests having `insertelement undef` & replacing it with `insertelement poison` & run Alive2 on it, but it will create many tests and people won’t like it. :(

Instead, I’ll simply locally maintain the tests and run Alive2.
If there is any bug found, I’ll report it.

Relevant links: https://bugs.llvm.org/show_bug.cgi?id=43958 , http://lists.llvm.org/pipermail/llvm-dev/2019-November/137242.html

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93586
2020-12-28 08:58:15 +09:00
Florian Hahn 4ad41902e8
[GVN] Correctly set modified status when doing PRE on indices.
This patch updates GVN to correctly return the modified status, if PRE
is performed on indices. It fixes a crash when building the test-suite
with EXPENSIVE_CHECKS and LTO.
2020-12-27 21:58:31 +00:00
Juneyoung Lee d3f1f7b6bc [EarlyCSE] Use m_LogicalAnd/Or matchers to handle branch conditions
EarlyCSE's handleBranchCondition says:

```
// If the condition is AND operation, we can propagate its operands into the
// true branch. If it is OR operation, we can propagate them into the false
// branch.
```

This holds for the corresponding select patterns as well.

This is a part of an ongoing work for disabling buggy select->and/or transformations.
See llvm.org/pr48353 and D93065 for more context

Proof:
and: https://alive2.llvm.org/ce/z/MQWodU
or: https://alive2.llvm.org/ce/z/9GLbB_

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93842
2020-12-28 05:36:26 +09:00
Juneyoung Lee f1d648b973 [GVN] Use m_LogicalAnd/Or to propagate equality from branch conditions
This patch makes GVN recognize `select c1, c2, false` as well as `select c1, true, c2`
branch condition and propagate equality from these.

See llvm.org/pr48353, D93065

Differential Revision: https://reviews.llvm.org/D93841
2020-12-28 05:28:38 +09:00
Juneyoung Lee 0060f10134 [EarlyCSE] Add tests for select form of and/or (NFC) 2020-12-28 04:19:22 +09:00
Juneyoung Lee 0d05c1f40d [GVN] Add tests for select form of and/or (NFC) 2020-12-28 03:39:57 +09:00
Florian Hahn 0ea3749b3c
[LV] Set up branch from middle block earlier.
Previously the branch from the middle block to the scalar preheader & exit
was being set-up at the end of skeleton creation in completeLoopSkeleton.
Inserting SCEV or runtime checks may result in LCSSA phis being created,
if they are required. Adjusting branches afterwards may break those
PHIs.

To avoid this, we can instead create the branch from the middle block
to the exit after we created the middle block, so we have the final CFG
before potentially adjusting/creating PHIs.

This fixes a crash for the included test case. For the non-crashing
case, this is almost a NFC with respect to the generated code. The
only change is the order of the predecessors of the involved branch
targets.

Note an assertion was moved from LoopVersioning() to
LoopVersioning::versionLoop. Adjusting the branches means loop-simplify
form may be broken before constructing LoopVersioning. But LV only uses
LoopVersioning to annotate the loop instructions with !noalias metadata,
which does not require loop-simplify form.

This is a fix for an existing issue uncovered by D93317.
2020-12-27 18:21:12 +00:00
Nikita Popov 0af42d3dc7 [PatternMatch][LVI] Handle select-form and/or in LVI
Following the discussion in D93065, this adds m_LogicalAnd() and
m_LogicalOr() matchers, that match A && B and A || B logical
operations, either as bitwise operations or select expressions.
As an example usage, LVI is adapted to use these matchers for its
condition reasoning.

The plan here is to switch other parts of LLVM that reason about
and/or of conditions to also support the select forms, and then
merge D93065 (or a variant thereof) to disable the poison-unsafe
select to and/or transform.

Differential Revision: https://reviews.llvm.org/D93827
2020-12-27 17:39:02 +01:00
Arthur Eubanks 8791949f55 [test] Pin some tests to legacy PM
These all have NPM RUN lines.
2020-12-26 13:46:02 -08:00
Nikita Popov 5bc5c016c4 [CVP] Add tests for select form of and/or (NFC)
This tests their handling inside LVI. See D93065 for wider context.
2020-12-26 21:48:24 +01:00
Nikita Popov b218407512 [ValueTracking] Handle more non-trivial conditions in isKnownNonZero()
In 35676a4f9a I've added handling for
non-trivial dominating conditions that imply non-zero on the true
branch. This adds the same support for the false branch.

The changes in pr45360.ll change block ordering and naming, but
don't change the control flow. The urem is still guaraded by a
non-zero check correctly.
2020-12-26 15:48:04 +01:00
Nikita Popov e8c7e7cdbb [ValueTracking] Add more known non zero tests (NFC)
Add tests for non-trivial conditions that imply non-zero on the
false branch rather than the true branch.

The last case already folds due to canonicalization.
2020-12-26 15:48:04 +01:00
Nikita Popov 35676a4f9a [InstCombine] Generalize icmp handling in isKnownNonZero()
The dominating condition handling in isKnownNonZero() currently
only takes into account conditions of the form "x != 0" or "x == 0".
However, there are plenty of other conditions that imply non-zero,
a common one being "x s> 0".

Peculiarly, the handling for assumes was already dealing with more
general non-zero-ness conditions, so this just reuses the same
logic for the dominating condition case.
2020-12-25 16:49:23 +01:00
Nikita Popov b0e6007c82 [InstCombine] Add additional tests for known non zero (NFC)
Check conditions that imply non-zero, even if they are not literally
"x != 0".

Using ctlz for testing, as explicit comparison might get folded by
other reasoning.
2020-12-25 16:28:30 +01:00
Roman Lebedev 25aebe2ccf
[LoopIdiom] 'left-shift-until-bittest': keep no-wrap flags on shift, fix edge-case miscompilation for %x.next
While `%x.curr` is always safe to compute, because `LoopBackedgeTakenCount`
will always be smaller than `bitwidth(X)`, i.e. we never get poison,
rewriting `%x.next` is more complicated, however, because `X << LoopTripCount`
will be poison iff `LoopTripCount == bitwidth(X)` (which will happen
iff `BitPos` is `bitwidth(x) - 1` and `X` is `1`).

So unless we know that isn't the case (as alive2 notes, we know it's safe
to do iff shift had no-wrap flags, or bitpos does not indicate signbit,
or we know that %x is never `1`), we'll need to emit an alternative,
safe IR, by either just shifting the `%x.curr`, or conditionally selecting
between the computed `%x.next` and `0`..
Former IR looks better so let's do that.

While there, ensure that we don't drop no-wrap flags from said shift.
2020-12-24 21:20:52 +03:00
Roman Lebedev 6e074a8324
[NFC][LoopIdiom] Improve test coverage for 'left-shift-until-bittest' pattern
In particular, add tests with no-wrap flags on shift,
a test where %x is not `1`, and ensure that tests where %bit
is a constant bitwidth-1, or is not a constant bitwidth-1
test both liveout values.
2020-12-24 21:20:51 +03:00
Roman Lebedev d9ebaeeb46
[InstCombine] Hoist xor-by-constant from xor-by-value
This is one of the deficiencies that can be observed in
https://godbolt.org/z/YPczsG after D91038 patch set.

This exposed two missing folds, one was fixed by the previous commit,
another one is `(A ^ B) | ~(A ^ B) --> -1` / `(A ^ B) & ~(A ^ B) --> 0`.

`-early-cse` will catch it: https://godbolt.org/z/4n1T1v,
but isn't meaningful to fix it in InstCombine,
because we'd need to essentially do our own CSE,
and we can't even rely on `Instruction::isIdenticalTo()`,
because there are no guarantees that the order of operands matches.
So let's just accept it as a loss.
2020-12-24 21:20:50 +03:00
Roman Lebedev 8001dcbd50
[NFC][InstCombine] Add test coverage for `(x ^ C) ^ y` pattern 2020-12-24 21:20:50 +03:00
Roman Lebedev 5b78303433
[InstCombine] Fold `a & ~(a ^ b)` to `x & y`
```
----------------------------------------
define i32 @and_xor_not_common_op(i32 %a, i32 %b) {
%0:
  %b2 = xor i32 %b, 4294967295
  %t2 = xor i32 %a, %b2
  %t4 = and i32 %t2, %a
  ret i32 %t4
}
=>
define i32 @and_xor_not_common_op(i32 %a, i32 %b) {
%0:
  %t4 = and i32 %a, %b
  ret i32 %t4
}
Transformation seems to be correct!
```
2020-12-24 21:20:49 +03:00
Roman Lebedev 1fda23367d
[NFC][InstCombine] Add test for `a & ~(a ^ b)` pattern
... which is a variation of `a & (a ^ ~b)` --> a & b`.
A follow-up patch exposes this missing fold, so we need to fix it first.
2020-12-24 21:20:48 +03:00
Roman Lebedev da4c7e15df
[NFC][InstCombine] Autogenerate check lines in vec_shuffle.ll test 2020-12-24 21:20:48 +03:00
Nikita Popov ef2f843347 Revert "[InstCombine] Check inbounds in load/store of gep null transform (PR48577)"
This reverts commit 899faa50f2.

Upon further consideration, this does not fix the right issue.
Doing this fold for non-inbounds GEPs is legal, because the
resulting pointer is still based-on null, which has no associated
address range, and as such and access to it is UB.

https://bugs.llvm.org/show_bug.cgi?id=48577#c3
2020-12-24 12:36:56 +01:00
Nikita Popov 90177912a4 Revert "[InstCombine] Fold gep inbounds of null to null"
This reverts commit eb79fd3c92.

This causes stage2 crashes, possibly due to StringMap being
miscompiled. Reverting for now.
2020-12-24 10:20:31 +01:00
Juneyoung Lee db7a2f347f Precommit transform tests that have poison as insertelement's placeholder
This commit copies existing tests at llvm/Transforms and replaces
'insertelement undef' in those files with 'insertelement poison'.
(see https://reviews.llvm.org/D93586)

Tests listed using this script:

grep -R -E '^[^;]*insertelement <.*> undef,' . | cut -d":" -f1 | uniq |
wc -l

Tests updated:

file_org=llvm/test/Transforms/$1
file=${file_org%.ll}-inseltpoison.ll
cp $file_org $file
sed -i -E 's/^([^;]*)insertelement <(.*)> undef/\1insertelement <\2> poison/g' $file
head -1 $file | grep "Assertions have been autogenerated by utils/update_test_checks.py" -q
if [ "$?" == 1 ]; then
  echo "$file : should be manually updated"
  # I manually updated the script
  exit 1
fi
python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file
2020-12-24 11:46:17 +09:00
Andrew Litteken 48ad8194a5 [IRSim] Adding support for isomorphic predicates
Some predicates, can be considered the same as long as the operands are
flipped. For example, a > b gives the same result as b > a. This maps
instructions in a greater than form, to their appropriate less than
form, swapping the operands in the IRInstructionData only, allowing for
more flexible matching.

Tests:

llvm/test/Transforms/IROutliner/outlining-isomorphic-predicates.ll
llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp

Reviewers: jroelofs, paquette

Recommit of commit 0503926602

Differential Revision: https://reviews.llvm.org/D87310
2020-12-23 19:42:35 -06:00
Andrew Litteken 45a4f34bd1 Revert "[IRSim] Adding support for isomorphic predicates"
Reverting due to unit test errors between commits.

This reverts commit 0503926602.
2020-12-23 15:14:19 -06:00
Roman Lebedev f8079355c6
[InstCombine] canonicalizeAbsNabs(): don't propagate NSW flag for NABS patter
As Nuno is noting in post-commit review in
https://reviews.llvm.org/D87188#2467915
it is not correct to keep NSW for negated abs pattern,
so don't do that.
2020-12-24 00:06:09 +03:00
Andrew Litteken 0503926602 [IRSim] Adding support for isomorphic predicates
Some predicates, can be considered the same as long as the operands are
flipped. For example, a > b gives the same result as b > a. This maps
instructions in a greater than form, to their appropriate less than
form, swapping the operands in the IRInstructionData only, allowing for
more flexible matching.

Tests:

llvm/test/Transforms/IROutliner/outlining-isomorphic-predicates.ll
llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp

Reviewers: jroelofs, paquette

Differential Revision: https://reviews.llvm.org/D87310
2020-12-23 15:02:00 -06:00
Andrew Litteken cce473e0c5 [IRSim] Adding commutativity matching to structure checking
Certain instructions, such as adds and multiplies can have the operands
flipped and still be considered the same. When we are analyzing
structure, this gives slightly more flexibility to create a mapping from
one region to another. We can add both operands in a corresponding
instruction to an operand rather than just the exact match. We then try
to eliminate items from the set, until there is only one valid mapping
between the regions of code.

We do this for adds, multiplies, and equality checking. However, this is
not done for floating point instructions, since the order can still
matter in some cases.

Tests:

llvm/test/Transforms/IROutliner/outlining-commutative-fp.ll
llvm/test/Transforms/IROutliner/outlining-commutative.ll
llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp

Reviewers: jroelofs, paquette

Differential Revision: https://reviews.llvm.org/D87311
2020-12-23 15:02:00 -06:00
Nikita Popov 759b8c11c3 [InstCombine] Handle different pointer types when folding gep of null
The source pointer type is not necessarily the same as the result
pointer type, so we can't simply return the original null pointer,
it might be a different one.
2020-12-23 21:58:26 +01:00
Nikita Popov eb79fd3c92 [InstCombine] Fold gep inbounds of null to null
Effectively, this is what we were previously already doing when
the GEP was used in conjunction with a load or store, but this
fold can also be applied more generally:

> The only in bounds address for a null pointer in the default
> address-space is the null pointer itself.
2020-12-23 21:41:53 +01:00
Nikita Popov 87087a02ae [InstCombine] Add tests for gep of null (NFC)
We were only considering the gep of null pattern in conjunction
with a load/store. Also test it independently.
2020-12-23 21:41:53 +01:00
Nikita Popov 899faa50f2 [InstCombine] Check inbounds in load/store of gep null transform (PR48577)
If the GEP isn't inbounds, then accessing a GEP of null location
is generally not UB.

While this is a minimal fix, the GEP of null handling should
probably be its own fold.
2020-12-23 21:03:22 +01:00
Nikita Popov de127d83d8 [InstCombine] Add tests for PR48577 (NFC) 2020-12-23 21:03:22 +01:00
Roman Lebedev 2b61e7c68c
[LoopIdiom] 'left-shift until bittest' idiom: support rewriting loop as countable, allow extra cruft
The current state of the transform is still not enough to support
my motivational pattern, because it has one more "induction variable".

I have delayed posting this patch, because originally even just rewriting
the loop as countable wasn't enough to nicely transform my motivational pattern,
because i expected that extra IV to be rewritten afterwards,
but it wasn't happening until i fixed that in D91800.

So, this patch allows the  'left-shift until bittest' loop idiom
as long as the inserted ops are cheap,
and lifts any and all extra use checks on the instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D92754
2020-12-23 22:28:10 +03:00
Roman Lebedev a0ddc61c5b
[LoopIdiom] 'left-shift until bittest' idiom: support canonical sign bit mask
If the bitmask is for sign bit, instcombine would have canonicalized
the pattern into a proper sign bit check. Supporting that is still
simple, but requires a bit of a roundtrip - we first have to use
`decomposeBitTestICmp()`, and the rest again just works.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D91726
2020-12-23 22:28:09 +03:00
Roman Lebedev cb2e5980ba
[LoopIdiom] 'left-shift until bittest' idiom: support constant bit mask
The handing of the case where the mask is a constant is trivial,
if said constant is a power of two, the bit in question is log2(mask),
rest just works.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D91725
2020-12-23 22:28:09 +03:00
Roman Lebedev e124844709
[LoopIdiom] Introduce 'left-shift until bittest' idiom
The motivation here is the following inner loop in fp16/fp24 -> fp32 expander,
that runs as part of the floating-point DNG decompression in RawSpeed library:
cd380bb9a2/src/librawspeed/decompressors/DeflateDecompressor.cpp (L112-L115)
```
      while (!(fp32_fraction & (1 << 23))) {
        fp32_exponent -= 1;
        fp32_fraction <<= 1;
      }
```
(https://godbolt.org/z/r13YMh)
As one might notice, that loop is currently uncountable, and that whole code stays scalar.
Yet, it is rather trivial to make that loop countable:
 https://godbolt.org/z/do8WMz
and we can prove that via alive2:
 https://alive2.llvm.org/ce/z/7vQnji (ha nice, isn't it?)
... and that allow for the whole fp16->fp32 code to vectorize:
 https://godbolt.org/z/7hYr13

Now, while i'd love to get there, i feel like i should take it in steps.

For now, this introduces support for the most basic case,
where the bit position is known as a variable,
and the loop *will* go away (has no live-outs other than the recurrence,
no extra instructions in the loop).

I have added sufficient (i believe) test coverage,
and alive2 is happy with those transforms.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D91038
2020-12-23 22:28:09 +03:00
Andrew Litteken b1191c8438 [IROutliner] Adding support for elevating constants that are not the same in each region to arguments
When there are constants that have the same structural location, but not
the same value, between different regions, we cannot simply outline the
region. Instead, we find the constants that are not the same in each
location, and promote them to arguments to be passed into the respective
functions. At each call site, we pass the constant in as an argument
regardless of type.

Added/Edited Tests:

llvm/test/Transforms/IROutliner/outlining-constants-vs-registers.ll
llvm/test/Transforms/IROutliner/outlining-different-constants.ll
llvm/test/Transforms/IROutliner/outlining-different-globals.ll

Reviewers: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D87294
2020-12-23 13:03:05 -06:00
Evgeniy Brevnov 9fb074e7bb [BPI] Improve static heuristics for "cold" paths.
Current approach doesn't work well in cases when multiple paths are predicted to be "cold". By "cold" paths I mean those containing "unreachable" instruction, call marked with 'cold' attribute and 'unwind' handler of 'invoke' instruction. The issue is that heuristics are applied one by one until the first match and essentially ignores relative hotness/coldness
 of other paths.

New approach unifies processing of "cold" paths by assigning predefined absolute weight to each block estimated to be "cold". Then we propagate these weights up/down IR similarly to existing approach. And finally set up edge probabilities based on estimated block weights.

One important difference is how we propagate weight up. Existing approach propagates the same weight to all blocks that are post-dominated by a block with some "known" weight. This is useless at least because it always gives 50\50 distribution which is assumed by default anyway. Worse, it causes the algorithm to skip further heuristics and can miss setting more accurate probability. New algorithm propagates the weight up only to the blocks that dominates and post-dominated by a block with some "known" weight. In other words, those blocks that are either always executed or not executed together.

In addition new approach processes loops in an uniform way as well. Essentially loop exit edges are estimated as "cold" paths relative to back edges and should be considered uniformly with other coldness/hotness markers.

Reviewed By: yrouban

Differential Revision: https://reviews.llvm.org/D79485
2020-12-23 22:47:36 +07:00