llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	e0568ef2c5	[VectorCombine] add tests for possible extract->shuffle; NFC	2020-02-25 08:41:59 -05:00
Sanjay Patel	10ea01d80d	[VectorCombine] make cost calc consistent for binops and cmps Code duplication (subsequently removed by refactoring) allowed a logic discrepancy to creep in here. We were being conservative about creating a vector binop -- but not a vector cmp -- in the case where a vector op has the same estimated cost as the scalar op. We want to be more aggressive here because that can allow other combines based on reduced instruction count/uses. We can reverse the transform in DAGCombiner (potentially with a more accurate cost model) if this causes regressions. AFAIK, this does not conflict with InstCombine. We have a scalarize transform there, but it relies on finding a constant operand or a matching insertelement, so that means it eliminates an extractelement from the sequence (so we won't have 2 extracts by the time we get here if InstCombine succeeds). Differential Revision: https://reviews.llvm.org/D75062	2020-02-25 08:41:59 -05:00
Sanjay Patel	62dd44d76d	[VectorCombine] fix cost calc for extract-cmp getOperationCost() is not the cost we wanted; that's not the throughput value that the rest of the calculation uses. We may want to switch everything in this code to use the getInstructionThroughput() wrapper to avoid these kinds of problems, but I'll look at that as a follow-up because that can create other logical diffs via using optional parameters (we'd need to speculatively create the vector instruction to make a fair(er) comparison).	2020-02-16 10:40:28 -05:00
Sanjay Patel	19b62b79db	[VectorCombine] try to form vector binop to eliminate an extract element binop (extelt X, C), (extelt Y, C) --> extelt (binop X, Y), C This is a transform that has been considered for canonicalization (instcombine) in the past because it reduces instruction count. But as shown in the x86 tests, it's impossible to know if it's profitable without a cost model. There are many potential target constraints to consider. We have implemented similar transforms in the backend (DAGCombiner and target-specific), but I don't think we have this exact fold there either (and if we did it in SDAG, it wouldn't work across blocks). Note: this patch was intended to handle the more general case where the extract indexes do not match, but it got too big, so I scaled it back to this pattern for now. Differential Revision: https://reviews.llvm.org/D74495	2020-02-13 17:23:27 -05:00
Sanjay Patel	d3551516ee	[VectorCombine] adjust tests for extract-binop; NFC We want the extra-use tests to be consistent with the earlier single-use tests and be as cheap as possible in vector form to show cost model edge cases. So use i8 and extract from element 0 since that should be cheap for all x86 targets.	2020-02-13 10:51:01 -05:00
Sanjay Patel	5b08eaf2b7	[VectorCombine] add more extract-binop tests; NFC See D74495.	2020-02-13 10:07:20 -05:00
Sanjay Patel	2452f85b57	[VectorCombine] add tests for extract-binop; NFC	2020-02-12 11:33:21 -05:00
Sanjay Patel	a17f03bd93	[VectorCombine] new IR transform pass for partial vector ops We have several bug reports that could be characterized as "reducing scalarization", and this topic was also raised on llvm-dev recently: http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html ...so I'm proposing that we deal with these patterns in a new, lightweight IR vector pass that runs before/after other vectorization passes. There are 4 alternate options that I can think of to deal with this kind of problem (and we've seen various attempts at all of these), but they all have flaws: InstCombine - can't happen without TTI, but we don't want target-specific folds there. SDAG - too late to assist other vectorization passes; TLI is not equipped for these kind of cost queries; limited to a single basic block. CGP - too late to assist other vectorization passes; would need to re-implement basic cleanups like CSE/instcombine. SLP - doesn't fit with existing transforms; limited to a single basic block. This initial patch/transform is based on existing code in AggressiveInstCombine: we walk backwards through the function looking for a pattern match. But we diverge from that cost-independent IR canonicalization pass by using TTI to decide if the vector alternative is profitable. We probably have at least 10 similar bug reports/patterns (binops, constants, inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements. It's possible that we could iterate on a worklist to fix-point like InstCombine does, but it's safer to start with a most basic case and evolve from there, so I didn't try to do anything fancy with this initial implementation. Differential Revision: https://reviews.llvm.org/D73480	2020-02-09 10:04:41 -05:00

1 2

58 Commits