Commit Graph

808 Commits

Author SHA1 Message Date
Sander de Smalen 171d12489f [SLPVectorizer] NFC: Migrate getVectorCallCosts to use InstructionCost.
This change also changes getReductionCost to return InstructionCost,
and it simplifies two expressions by removing a redundant 'isValid' check.
2021-01-25 12:27:01 +00:00
Sanjay Patel 77adbe6a8c [SLP] fix fast-math requirements for fmin/fmax reductions
a6f0221276 enabled intersection of FMF on reduction instructions,
so it is safe to ease the check here.

There is still some room to improve here - it looks like we
have nearly duplicate flags propagation logic inside of the
LoopUtils helper but it is limited targets that do not form
reduction intrinsics (they form the shuffle expansion).
2021-01-24 08:55:56 -05:00
Kazu Hirata 1238378f18 [llvm] Use pop_back_val (NFC) 2021-01-23 10:56:33 -08:00
Sanjay Patel a6f0221276 [SLP] fix fast-math-flag propagation on FP reductions
As shown in the test diffs, we could miscompile by
propagating flags that did not exist in the original
code.

The flags required for fmin/fmax reductions will be
fixed in a follow-up patch.
2021-01-23 11:17:20 -05:00
Anton Rapetov a4914dc1f2 [SLP] do not traverse constant uses
Walking the use list of a Constant (particularly, ConstantData)
is not scalable, since a given constant may be used by many
instructinos in many functions in many modules.

Differential Revision: https://reviews.llvm.org/D94713
2021-01-22 08:14:09 -05:00
Sanjay Patel 2f03528f5e [SLP] rename reduction variable to avoid shadowing; NFC
The code structure can likely be improved now that
'OperationData' is gone.
2021-01-21 16:02:38 -05:00
Sanjay Patel d77753381f [SLP] simplify reduction matching
This is NFC-intended and removes the "OperationData"
class which had become nothing more than a recurrence
(reduction) type.

I adjusted the matching logic to distinguish
instructions from non-instructions - that's all that
the "IsLeafValue" member was keeping track of.
2021-01-21 14:58:57 -05:00
Sanjay Patel c09be0d2a0 [SLP] reduce reduction code for checking vectorizable ops; NFC
This is another step towards removing `OperationData` and
fixing FMF matching/propagation bugs when forming reductions.
2021-01-20 11:14:48 -05:00
Sanjay Patel 1c54112a57 [SLP] refactor more reduction functions; NFC
We were able to remove almost all of the state from
OperationData, so these don't make sense as members
of that class - just pass the RecurKind in as a param.

More streamlining is possible, but I'm trying to avoid
logic/typo bugs while fixing this. Eventually, we should
not need the `OperationData` class.
2021-01-20 11:14:48 -05:00
Sanjay Patel 8590d24543 [SLP] move reduction createOp functions; NFC
We were able to remove almost all of the state from
OperationData, so these don't make sense as members
of that class - just pass the RecurKind in as a param.
2021-01-20 11:14:48 -05:00
Alexey Bataev e463bd53c0 Revert "[SLP]Merge reorder and reuse shuffles."
This reverts commit 438682de6a to fix the
bug with the reducing size of the resulting vector for the entry node
with multiple users.
2021-01-19 11:48:04 -08:00
Sanjay Patel 5b77ac32b1 [SLP] match maxnum/minnum intrinsics as FP reduction ops
After much refactoring over the last 2 weeks to the reduction
matching code, I think this change is finally ready.

We effectively broke fmax/fmin vector reduction optimization
when we started canonicalizing to intrinsics in instcombine,
so this should restore that functionality for SLP.

There are still FMF problems here as noted in the code comments,
but we should be avoiding miscompiles on those for fmax/fmin by
restricting to full 'fast' ops (negative tests are included).

Fixing FMF propagation is a planned follow-up.

Differential Revision: https://reviews.llvm.org/D94913
2021-01-18 17:37:16 -05:00
Sanjay Patel 3dbbadb8ef [SLP] rename reduction query for min/max ops; NFC
This will avoid confusion once we start matching
min/max intrinsics. All of these hacks to accomodate
cmp+sel idioms should disappear once we canonicalize
to min/max intrinsics.
2021-01-18 09:32:57 -05:00
Sanjay Patel d1c4e859ce [SLP] reduce opcode API dependency in reduction cost calc; NFC
The icmp opcode is now hard-coded in the cost model call.
This will make it easier to eventually remove all opcode
queries for min/max patterns as we transition to intrinsics.
2021-01-18 09:32:57 -05:00
Sanjay Patel 49b96cd9ef [SLP] remove opcode field from reduction data class
This is NFC-intended and another step towards supporting
intrinsics as reduction candidates.

The remaining bits of the OperationData class do not make
much sense as-is, so I will try to improve that, but I'm
trying to take minimal steps because it's still not clear
how this was intended to work.
2021-01-16 13:55:52 -05:00
Sanjay Patel fcfcc3cc6b [SLP] fix typos; NFC 2021-01-16 13:55:52 -05:00
Sanjay Patel 48dbac5b6b [SLP] remove unnecessary use of 'OperationData'
This is another NFC-intended patch to allow matching
intrinsics (example: maxnum) as candidates for reductions.

It's possible that the loop/if logic can be reduced now,
but it's still difficult to understand how this all works.
2021-01-16 13:55:52 -05:00
Sanjay Patel ceb3cdccd0 [SLP] remove dead code in reduction matching; NFC
To get into this block we had: !A || B || C
and we checked C in the first 'if' clause
leaving !A || B. But the 2nd 'if' is checking:
A && !B --> !(!A || B)
2021-01-15 17:03:26 -05:00
Sanjay Patel 1f21de535d [SLP] remove unused reduction functions; NFC
These were made obsolete by simplifying the code in recent patches.
2021-01-15 14:59:33 -05:00
Sanjay Patel b21905dfe3 [SLP] remove unnecessary state in matching reductions
This is NFC-intended. I'm still trying to figure out
how the loop where this is used works. It does not
seem like we require this data at all, but it's
hard to confirm given the complicated predicates.
2021-01-14 18:32:37 -05:00
Bjorn Pettersson d58512b2e3 [SLP] Don't vectorize stores of non-packed types (like i1, i2)
In the spirit of commit fc783e91e0 (llvm-svn: 248943) we
shouldn't vectorize stores of non-packed types (i.e. types that
has padding between consecutive variables in a scalar layout,
but being packed in a vector layout).

The problem was detected as a miscompile in a downstream test case.

Reviewed By: anton-afanasyev

Differential Revision: https://reviews.llvm.org/D94446
2021-01-14 11:30:33 +01:00
Sanjay Patel 123674a816 [SLP] simplify type check for reductions
This is NFC-intended. The 'valid' call allows int/FP/pointers
for other parts of SLP. The difference here is that we can't
reduce pointers.
2021-01-13 13:30:46 -05:00
Sanjay Patel 9e7895a868 [SLP] reduce code duplication while processing reductions; NFC 2021-01-12 16:03:57 -05:00
Sanjay Patel 92fb5c49e8 [SLP] rename variable to improve readability; NFC
The OperationData in the 2nd block (visiting the operands)
is completely independent of the 1st block.
2021-01-12 16:03:57 -05:00
Sanjay Patel 554be30a42 [SLP] reduce code duplication in processing reductions; NFC 2021-01-12 16:03:57 -05:00
Sanjay Patel 46507a96fc [SLP] reduce code duplication while matching reductions; NFC 2021-01-12 16:03:57 -05:00
David Sherwood b7ccaca537 [NFC] Remove min/max functions from InstructionCost
Removed the InstructionCost::min/max functions because it's
fine to use std::min/max instead.

Differential Revision: https://reviews.llvm.org/D94301
2021-01-11 09:00:12 +00:00
Sanjay Patel 3f09c77d33 [SLP] fix typo in assert
This snuck into 0aa75fb12f , but I didn't catch it locally.
2021-01-10 13:15:04 -05:00
Sanjay Patel 0aa75fb12f [SLP] put verifyFunction call behind EXPENSIVE_CHECKS
A severe compile-time slowdown from this call is noted in:
https://llvm.org/PR48689
My naive fix was to put it under LLVM_DEBUG ( 267ff79 ),
but that's not limiting in the way we want.
This is a quick fix (or we could just remove the call completely
and rely on some later pass to discover potentially wrong IR?).
A bigger/better fix would be to improve/limit verifyFunction()
as noted in:
https://llvm.org/PR47712

Differential Revision: https://reviews.llvm.org/D94328
2021-01-10 12:32:21 -05:00
Alexander Belyaev bcbdeafa9c Revert "[SLP]Need shrink the load vector after reordering."
This reverts commit 4284afdf94.

This changes computed values in fused_batchnorm_test_cpu.

Not equal to tolerance rtol=1e-06, atol=0.001
Mismatched value: a is different from b.
not close where = (array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1]), array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1]), array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
       1, 1]), array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3,
       4, 5]))
not close lhs = [-0.6636615  -0.9804948  -1.148275   -0.68193716 -0.8572368  -0.65046215
 -0.6993756  -1.2244141  -1.0938729  -0.50369143 -0.51830524 -0.738452
 -0.7214286  -0.48115745 -0.9380924  -0.9341769  -0.5916775  -1.2896856
 -0.7264182  -0.9746917  -0.783249   -0.7659018  -0.86214024 -0.47784212]
not close rhs = [ 0.44102234  0.12418899 -0.04359123  0.42274666  0.24744703  0.45422167
  0.40530816 -0.11973029  0.01081094  0.6009924   0.5863786   0.3662318
  0.38325527  0.62352633  0.1665914   0.1705069   0.5130063  -0.18500176
  0.37826565  0.12999213  0.3214348   0.338782    0.24254355  0.62684166]
not close dif = [1.1046839 1.1046838 1.1046838 1.1046839 1.1046839 1.1046839 1.1046838
 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046838
 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839
 1.1046839 1.1046838 1.1046838]
not close tol = [0.00100044 0.00100012 0.00100004 0.00100042 0.00100025 0.00100045
 0.00100041 0.00100012 0.00100001 0.0010006  0.00100059 0.00100037
 0.00100038 0.00100062 0.00100017 0.00100017 0.00100051 0.00100019
 0.00100038 0.00100013 0.00100032 0.00100034 0.00100024 0.00100063]
2021-01-08 14:42:26 +01:00
Sanjay Patel 267ff7901c [SLP] limit verifyFunction to debug build (PR48689)
As noted in PR48689, the verifier may have some kind
of exponential behavior that should be addressed
separately. For now, only run it in debug mode to
prevent problems for release+asserts.
That limit is what we had before D80401, and I'm
not sure if there was a reason to change it in that
patch.
2021-01-08 08:10:17 -05:00
Kazu Hirata 33bf1cad75 [llvm] Use *Set::contains (NFC) 2021-01-07 20:29:34 -08:00
Sanjay Patel 4c7148d75c [SLP] remove opcode identifier for reduction; NFC
Another step towards allowing intrinsics in reduction matching.
2021-01-07 14:07:27 -05:00
Alexey Bataev 4284afdf94 [SLP]Need shrink the load vector after reordering.
After merging the shuffles, we cannot rely on the previous shuffle
anymore and need to shrink the final shuffle, if it is required.

Reported in D92668

Differential Revision: https://reviews.llvm.org/D93967
2021-01-07 04:50:48 -08:00
Oliver Stannard 76f6b125ce Revert "[llvm] Use BasicBlock::phis() (NFC)"
Reverting because this causes crashes on the 2-stage buildbots, for
example http://lab.llvm.org:8011/#/builders/7/builds/1140.

This reverts commit 9b228f107d.
2021-01-07 09:43:33 +00:00
Kazu Hirata cfeecdf7b6 [llvm] Use llvm::all_of (NFC) 2021-01-06 18:27:36 -08:00
Kazu Hirata 9b228f107d [llvm] Use BasicBlock::phis() (NFC) 2021-01-06 18:27:35 -08:00
Sanjay Patel 4c022b5a41 [SLP] use reduction kind's opcode to create new instructions; NFC
Similar to 5a1d31a28 -
This should be no-functional-change because the reduction kind
opcodes are 1-for-1 mappings to the instructions we are matching
as reductions. But we want to remove the need for the
`OperationData` opcode field because that does not work when
we start matching intrinsics (eg, maxnum) as reduction candidates.
2021-01-06 14:37:44 -05:00
Sanjay Patel 5d24089a70 [SLP] reduce code for propagating flags on reductions; NFC
If we add/change to match intrinsics, this might get more
wordy, but there's no need to list each kind currently.
2021-01-06 14:37:44 -05:00
Juneyoung Lee 4a8e6ed2f7 [SLP,LV] Use poison constant vector for shufflevector/initial insertelement
This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef.
This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector.
The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ).

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D94061
2021-01-06 11:22:50 +09:00
Sanjay Patel 6a03f8ab62 [SLP] reduce code for finding reduction costs; NFC
We can get both (vector/scalar) costs in a single switch
instead of sequentially.
2021-01-05 17:35:54 -05:00
Sanjay Patel 5a1d31a284 [SLP] use reduction kind's opcode for cost model queries; NFC
This should be no-functional-change because the reduction kind
opcodes are 1-for-1 mappings to the instructions we are matching
as reductions. But we want to remove the need for the
`OperationData` opcode field because that does not work when
we start matching intrinsics (eg, maxnum) as reduction candidates.
2021-01-05 15:12:40 -05:00
Sanjay Patel d4a999b453 [SLP] reduce code duplication; NFC 2021-01-05 15:12:40 -05:00
Sanjay Patel 3b8b2c7da2 [SLP] delete unused pairwise reduction option
SLP tries to model 2 forms of vector reductions: pairwise and splitting.
From the cost model code comments, those are defined using an example as:

  /// Pairwise:
  ///  (v0, v1, v2, v3)
  ///  ((v0+v1), (v2+v3), undef, undef)
  /// Split:
  ///  (v0, v1, v2, v3)
  ///  ((v0+v2), (v1+v3), undef, undef)

I don't know the full history of this functionality, but it was partly
added back in D29402. There are apparently no users at this point (no
regression tests change). X86 might have managed to work-around the need
for this through cost model and codegen improvements.

Removing this code makes it easier to continue the work that was started
in D87416 / D88193. The alternative -- if there is some target that is
silently using this option -- is to move this logic into LoopUtils. We
have related/duplicate functionality there via llvm::createTargetReduction().

Differential Revision: https://reviews.llvm.org/D93860
2021-01-05 13:23:07 -05:00
Sanjay Patel 36263a7ccc [LoopUtils] remove redundant opcode parameter; NFC
While here, rename the inaccurate getRecurrenceBinOp()
because that was also used to get CmpInst opcodes.

The recurrence/reduction kind should always refer to the
expected opcode for a reduction. SLP appears to be the
only direct caller of createSimpleTargetReduction(), and
that calling code ideally should not be carrying around
both an opcode and a reduction kind.

This should allow us to generalize reduction matching to
use intrinsics instead of only binops.
2021-01-04 17:05:28 -05:00
Sanjay Patel c74e8539ff [Analysis] flatten enums for recurrence types
This is almost all mechanical search-and-replace and
no-functional-change-intended (NFC). Having a single
enum makes it easier to match/reason about the
reduction cases.

The goal is to remove `Opcode` from reduction matching
code in the vectorizers because that makes it harder to
adapt the code to handle intrinsics.

The code in RecurrenceDescriptor::AddReductionVar() is
the only place that required closer inspection. It uses
a RecurrenceDescriptor and a second InstDesc to sometimes
overwrite part of the struct. It seem like we should be
able to simplify that logic, but it's not clear exactly
which cmp+sel patterns that we are trying to handle/avoid.
2021-01-01 12:20:16 -05:00
Sanjay Patel 8ca60db40b [LoopUtils] reduce FMF and min/max complexity when forming reductions
I don't know if there's some way this changes what the vectorizers
may produce for reductions, but I have added test coverage with
3567908 and 5ced712 to show that both passes already have bugs in
this area. Hopefully this does not make things worse before we can
really fix it.
2020-12-30 15:22:26 -05:00
Sanjay Patel 21a3a0225d [SLP] replace local reduction enum with RecurrenceKind; NFCI
I'm not sure if the SLP enum was created before the IVDescriptor
RecurrenceDescriptor / RecurrenceKind existed, but the code in
SLP is now redundant with that class, so it just makes things
more complicated to have both. We eventually call LoopUtils
createSimpleTargetReduction() to create reduction ops, so we
might as well standardize on those enum names.

There's still a question of whether we need to use TTI::ReductionFlags
vs. MinMaxRecurrenceKind, but that can be another clean-up step.

Another option would just be to flatten the enums in RecurrenceDescriptor
into a single enum. There isn't much benefit (smaller switches?) to
having a min/max subset.
2020-12-29 14:52:11 -05:00
Kazu Hirata 789d250613 [CodeGen, Transforms] Use *Map::lookup (NFC) 2020-12-27 09:57:27 -08:00
Sanjay Patel badf0f20f3 [SLP] rename reduction variables for readability; NFC
I am hoping to extend the reduction matching code, and it is
hard to distinguish "ReductionData" from "ReducedValueData".
So extend the tree/root metaphor to include leaves.

Another problem is that the name "OperationData" does not
provide insight into its purpose. I'm not sure if we can alter
that underlying data structure to make the code clearer.
2020-12-26 11:20:25 -05:00