llvm-project

Commit Graph

Author	SHA1	Message	Date
Caroline Concatto	36710c38c1	[NFC]Migrate VectorCombine.cpp to use InstructionCost This patch changes these functions: vectorizeLoadInsert isExtractExtractCheap foldExtractedCmps scalarizeBinopOrCmp getShuffleExtract foldBitcastShuf to use the class InstructionCost when calling TTI.get<something>Cost(). This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174 ps.:This patch adds the test \|\| !NewCost.isValid(), because we want to return false when: !NewCost.isValid && !OldCost.isValid()->the cost to transform it expensive and !NewCost.isValid() && OldCost.isValid() Therefore for simplication we only add test for !NewCost.isValid() Differential Revision: https://reviews.llvm.org/D94069	2021-01-18 13:37:21 +00:00
Sanjay Patel	47aaa99c0e	[VectorCombine] allow peeking through GEPs when creating a vector load This is an enhancement motivated by https://llvm.org/PR16739 (see D92858 for another). We can look through a GEP to find a base pointer that may be safe to use for a vector load. If so, then we shuffle (shift) the necessary vector element over to index 0. Alive2 proof based on 1 of the regression tests: https://alive2.llvm.org/ce/z/yPJLkh The vector translation is independent of endian (verify by changing to leading 'E' in the datalayout string). Differential Revision: https://reviews.llvm.org/D93229	2020-12-18 09:25:03 -05:00
Sanjay Patel	38ebc1a13d	[VectorCombine] optimize alignment for load transform Here's another minimal step suggested by D93229 / D93397 . (I'm trying to be extra careful in these changes because load transforms are easy to get wrong.) We can optimistically choose the greater alignment of a load and its pointer operand. As the test diffs show, this can improve what would have been unaligned vector loads into aligned loads. When we enhance with gep offsets, we will need to adjust the alignment calculation to include that offset. Differential Revision: https://reviews.llvm.org/D93406	2020-12-16 15:25:45 -05:00
Sanjay Patel	aaaf0ec72b	[VectorCombine] loosen alignment constraint for load transform As discussed in D93229, we only need a minimal alignment constraint when querying whether a hypothetical vector load is safe. We still pass/use the potentially stronger alignment attribute when checking costs and creating the new load. There's already a test that changes with the minimum code change, so splitting this off as a preliminary commit independent of any gep/offset enhancements. Differential Revision: https://reviews.llvm.org/D93397	2020-12-16 12:25:18 -05:00
Sanjay Patel	d399f870b5	[VectorCombine] make load transform poison-safe As noted in D93229, the transform from scalar load to vector load potentially leaks poison from the extra vector elements that are being loaded. We could use freeze here (and x86 codegen at least appears to be the same either way), but we already have a shuffle in this logic to optionally change the vector size, so let's allow that instruction to serve both purposes. Differential Revision: https://reviews.llvm.org/D93238	2020-12-14 17:42:01 -05:00
Sanjay Patel	12b684ae02	[VectorCombine] improve readability; NFC If we are going to allow adjusting the pointer for GEPs, rearranging the code a bit will make it easier to follow.	2020-12-10 13:10:26 -05:00
Sanjay Patel	b2ef264096	[VectorCombine] allow peeking through an extractelt when creating a vector load This is an enhancement to load vectorization that is motivated by a pattern in https://llvm.org/PR16739. Unfortunately, it's still not enough to make a difference there. We will have to handle multi-use cases in some better way to avoid creating multiple overlapping loads. Differential Revision: https://reviews.llvm.org/D92858	2020-12-09 10:36:14 -05:00
Artem Belevich	c36c0fabd1	[VectorCombine] Avoid crossing address space boundaries. We can not bitcast pointers across different address spaces, and VectorCombine should be careful when it attempts to find the original source of the loaded data. Differential Revision: https://reviews.llvm.org/D89577	2020-10-16 13:19:31 -07:00
Sanjay Patel	1e6b240d7d	[IRBuilder][VectorCombine] make and use a convenience function for unary shuffle; NFC This reduces code duplication for common construct. Follow-ups can use this in SLP, LoopVectorizer, and other passes.	2020-09-21 13:47:01 -04:00
Sanjay Patel	48a23bccf3	[VectorCombine] limit load+insert transform to one-use As discussed in: https://llvm.org/PR47558 ...there are several potential fixes/follow-ups visible in the test case, but this is the quickest and safest fix of the perf regression.	2020-09-17 14:29:15 -04:00
Sanjay Patel	ddd9575d15	[VectorCombine] rearrange bailouts for load insert for efficiency; NFC	2020-09-17 13:50:37 -04:00
Fangrui Song	4452cc4086	[VectorCombine] Don't vectorize scalar load under asan/hwasan/memtag/tsan Similar to the tsan suppression in `Utils/VNCoercion.cpp:getLoadLoadClobberFullWidthSize` (rL175034; load widening used by GVN), the D81766 optimization should be suppressed under tsan due to potential spurious data race reports: struct A { int i; const short s; // the load cannot be vectorized because int modify; // it overlaps with bytes being concurrently modified long pad1, pad2; }; // __tsan_read16 does not know that some bytes are undef and accessing is safe Similarly, under asan, users can mark memory regions with `__asan_poison_memory_region`. A widened load can lead to a spurious use-after-poison error. hwasan/memtag should be similarly suppressed. `mustSuppressSpeculation` suppresses asan/hwasan/tsan but not memtag, so we need to exclude memtag in `vectorizeLoadInsert`. Note, memtag suppression can be relaxed if the load is aligned to the its granule (usually 16), but that is out of scope of this patch. Reviewed By: spatel, vitalybuka Differential Revision: https://reviews.llvm.org/D87538	2020-09-15 09:47:21 -07:00
Huihui Zhang	b4f04d7135	[VectorCombine][SVE] Do not fold bitcast shuffle for scalable type. First, shuffle cost for scalable type is not known for scalable type; Second, we cannot reason if the narrowed shuffle mask for scalable type is a splat or not. E.g., Bitcast splat vector from type <vscale x 4 x i32> to <vscale x 8 x i16> will involve narrowing shuffle mask <vscale x 4 x i32> zeroinitializer to <vscale x 8 x i32> with element sequence of <0, 1, 0, 1, ...>, which cannot be reasoned if it's a valid splat or not. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86995	2020-09-02 15:02:16 -07:00
Sanjay Patel	8fb055932c	[VectorCombine] allow vector loads with mismatched insert type This is an enhancement to D81766 to allow loading the minimum target vector type into an IR vector with a different number of elements. In one of the motivating tests from PR16739, SLP creates <2 x float> load ops mixed with <4 x float> insert ops, so we want to handle that pattern in addition to potential oversized vectors created by the vectorizers. For now, we are assuming the insert/extract subvector with undef is free because there is no exact corresponding TTI modeling for that. Differential Revision: https://reviews.llvm.org/D86160	2020-09-02 08:11:36 -04:00
Christopher Tetreault	5e63083435	[SVE] Remove calls to VectorType::getNumElements from Transforms/Vectorize Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D82056	2020-08-27 12:02:20 -07:00
Bjorn Pettersson	11446b02c7	[VectorCombine] Fix for non-zero addrspace when creating vector load from scalar load This is a fixup to commit `43bdac2906`, to make sure the address space from the original load pointer is retained in the vector pointer. Resolves problem with Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed. due to address space mismatch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D85912	2020-08-13 18:25:32 +02:00
Sanjay Patel	cc892fd9f4	[VectorCombine] early exit if target has no vector registers Based on post-commit discussion in: D81766 Other vectorization passes (SLP and Loop) use this TTI API similarly.	2020-08-12 09:22:31 -04:00
Sanjay Patel	b0b95dab1c	[VectorCombine] add safety check for 0-width register Based on post-commit discussion in D81766, Hexagon sets this to "0". I'll see if I can come up with a test, but making the obvious code fix first to unblock that target.	2020-08-11 20:30:02 -04:00
Sanjay Patel	43bdac2906	[VectorCombine] try to create vector loads from scalar loads This patch was adjusted to match the most basic pattern that starts with an insertelement (so there's no extract created here). Hopefully, that removes any concern about interfering with other passes. Ie, the transform should almost always be profitable. We could make an argument that this could be part of canonicalization, but we conservatively try not to create vector ops from scalar ops in passes like instcombine. If the transform is not profitable, the backend should be able to re-scalarize the load. Differential Revision: https://reviews.llvm.org/D81766	2020-08-09 09:05:06 -04:00
Benjamin Kramer	b44470547e	Make helpers static. NFC.	2020-07-09 13:48:56 +02:00
Sanjay Patel	b6315aee5b	[VectorCombine] try to form vector compare and binop to eliminate scalar ops binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1) --> vcmp = cmp Pred X, VecC ext (binop vNi1 vcmp, (shuffle vcmp, Index1)), Index0 This is a larger pattern than the existing extractelement folds because we can't reasonably vectorize the sub-patterns with constants based on cost model calcs (it doesn't usually make sense to replace a single extracted scalar op with constant operand with a vector op). I salvaged as much of the existing logic as I could, but there might be better ways to share and reduce code. The motivating case from PR43745: https://bugs.llvm.org/show_bug.cgi?id=43745 ...is the special case of a 2-way reduction. We tried to get SLP to handle that particular pattern in D59710, but that caused crashing and regressions. This patch is more general, but hopefully safer. The v2f64 test with SSE2 surprised me - the cost model accounting looks like this: OldCost = 0 (free extract of f64 at index 0) + 1 (extract of f64 at index 1) + 2 (scalar fcmps) + 1 (and of bools) = 4 NewCost = 2 (vector fcmp) + 1 (shuffle) + 1 (vector 'and') + 1 (extract of bool) = 5 Differential Revision: https://reviews.llvm.org/D82474	2020-06-29 10:38:52 -04:00
Sanjay Patel	3b95d8346d	[VectorCombine] refactor - make helper function for extract to shuffle logic; NFC Preliminary for D82474	2020-06-29 09:55:34 -04:00
Sanjay Patel	a0f967418f	[VectorCombine] give invalid index value a name; NFC	2020-06-24 11:10:36 -04:00
Sanjay Patel	54143e2bd5	[VectorCombine] do not use magic number for undef mask element; NFC	2020-06-22 20:47:09 -04:00
Sanjay Patel	9934cc544c	[VectorCombine] make helper function for shift-shuffle; NFC This will probably be useful for other extract patterns.	2020-06-22 12:23:52 -04:00
Sanjay Patel	98c2f4eea5	[VectorCombine] add helper to replace uses and rename The tests are regenerated to show a path that missed renaming, but there should be no functional difference from this patch.	2020-06-22 09:58:49 -04:00
Sanjay Patel	de65b356dc	[VectorCombine] add/use pass-level IRBuilder This saves creating/destroying a builder every time we perform some transform. The tests show instruction ordering diffs resulting from always inserting at the root instruction now, but those should be benign.	2020-06-22 09:01:29 -04:00
Sanjay Patel	cce625f73d	[VectorCombine] improve IR debugging by providing/salvaging value names The tests are regenerated to show the diffs, but there should be no functional change from this patch.	2020-06-22 08:35:47 -04:00
Sanjay Patel	6bdd531af5	[VectorCombine] create class for pass to hold analyses, etc; NFC This doesn't change anything currently, but it would make sense to create a class-level IRBuilder instead of recreating that everywhere. As we expand to more optimizations, we will probably also want to hold things like the DataLayout or other constant refs in here too.	2020-06-21 16:07:33 -04:00
Sanjay Patel	741e20f3d6	[VectorCombine] fix assert for type of compare operand As shown in the post-commit comment for D81661 - we need to loosen the type assertion to allow scalarization of a compare for vectors of pointers.	2020-06-20 15:20:17 -04:00
Sanjay Patel	216a37bb46	[VectorCombine] refactor extract-extract logic; NFCI	2020-06-19 14:52:27 -04:00
Sanjay Patel	6d864097a2	[VectorCombine] fix crash while transforming constants This is a variation of the proposal in D82049 with an extra test.	2020-06-19 12:30:32 -04:00
Sanjay Patel	46a285ad9e	[IRBuilder] add/use wrapper to create a generic compare based on predicate type; NFC The predicate can always be used to distinguish between icmp and fcmp, so we don't need to keep repeating this check in the callers.	2020-06-18 15:47:06 -04:00
Sanjay Patel	ed67f5e7ab	[VectorCombine] scalarize compares with insertelement operand(s) Generalize scalarization (recently enhanced with D80885) to allow compares as well as binops. Similar to binops, we are avoiding scalarization of a loaded value because that could avoid a register transfer in codegen. This requires 1 extra predicate that I am aware of: we do not want to scalarize the condition value of a vector select. That might also invert a transform that we do in instcombine that prefers a vector condition operand for a vector select. I think this is the final step in solving PR37463: https://bugs.llvm.org/show_bug.cgi?id=37463 Differential Revision: https://reviews.llvm.org/D81661	2020-06-16 13:48:10 -04:00
Roman Lebedev	7aeb41b3c8	[NFCI] VectorCombine: add statistic for bitcast(shuf()) -> shuf(bitcast()) xform	2020-06-12 23:10:53 +03:00
Sanjay Patel	039ff29ef6	[VectorCombine] remove unused parameters; NFC	2020-06-11 19:15:03 -04:00
Simon Pilgrim	5dc4e7c2b9	[VectorCombine] scalarizeBinop - support an all-constant src vector operand scalarizeBinop currently folds vec_bo((inselt VecC0, V0, Index), (inselt VecC1, V1, Index)) -> inselt(vec_bo(VecC0, VecC1), scl_bo(V0,V1), Index) This patch extends this to account for cases where one of the vec_bo operands is already all-constant and performs similar cost checks to determine if the scalar binop with a constant still makes sense: vec_bo((inselt VecC0, V0, Index), VecC1) -> inselt(vec_bo(VecC0, VecC1), scl_bo(V0,extractelt(V1,Index)), Index) Fixes PR42174 Differential Revision: https://reviews.llvm.org/D80885	2020-06-09 19:02:05 +01:00
Simon Pilgrim	5006e551d3	LoopAnalysisManager.h - reduce includes to forward declarations. NFC. Move implicit include dependencies down to header/source files.	2020-06-06 14:06:46 +01:00
Sanjay Patel	7eed772a27	[PatternMatch] abbreviate vector inst matchers; NFC Readability is not reduced with these opcodes/match lines, so reduce odds of awkward wrapping from 80-col limit.	2020-05-24 09:19:47 -04:00
Sanjay Patel	024098ae53	[VectorCombine] set preserve alias analysis As noted in D80236, moving the pass in the pipeline exposed this shortcoming. Extra work to recalculate the alias results showed up as a compile-time slowdown.	2020-05-22 16:25:16 -04:00
Sanjay Patel	81e9ede3a2	[VectorCombine] forward walk through instructions to improve chaining of transforms This is split off from D79799 - where I was proposing to fully iterate over a function until there are no more transforms. I suspect we are still going to want to do something like that eventually. But we can achieve the same gains much more efficiently on the current set of regression tests just by reversing the order that we visit the instructions. This may also reduce the motivation for D79078, but we are still not getting the optimal pattern for a reduction.	2020-05-16 13:08:01 -04:00
Sanjay Patel	5f730b645d	[VectorCombine] account for extra uses in scalarization cost Follow-up to D79452. Mimics the extra use cost formula for the inverse transform with extracts.	2020-05-11 15:20:57 -04:00
Sanjay Patel	0d2a0b44c8	[VectorCombine] scalarize binop of inserted elements into vector constants As with the extractelement patterns that are currently in vector-combine, there are going to be several possible variations on this theme. This should be the clearest, simplest example. Scalarization is the right direction for target-independent canonicalization, and InstCombine has some of those folds already, but it doesn't do this. I proposed a similar transform in D50992. Here in vector-combine, we can check the cost model to be sure it's profitable, so there should be less risk. Differential Revision: https://reviews.llvm.org/D79452	2020-05-08 16:31:12 -04:00
Sam Parker	e3056ae9a0	[NFC][TTI] Explicit use of VectorType The API for shuffles and reductions uses generic Type parameters, instead of VectorType, and so assertions and casts are used a lot. This patch makes those types explicit, which means that the clients can't be lazy, but results in less ambiguity, and that can only be a good thing. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562 Differential Revision: https://reviews.llvm.org/D78357	2020-04-20 09:16:52 +01:00
Sanjay Patel	bef6e67e95	[VectorCombine] transform bitcasted shuffle to wider elements bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC' This is the widen shuffle elements enhancement to D76727. It builds on the analysis and simplifications in D77881 and rG6a7e958a423e. The phase ordering tests show that we can simplify inverse shuffles across a binop in both directions (widen/narrow or narrow/widen) now. There's another potential transform visible in some of the remaining TODOs - move a bitcasted operand of a shuffle after the shuffle. Differential Revision: https://reviews.llvm.org/D78371	2020-04-19 08:24:38 -04:00
Benjamin Kramer	6f64daca8f	Upgrade calls to CreateShuffleVector to use the preferred form of passing an array of ints No functionality change intended.	2020-04-15 12:51:38 +02:00
Christopher Tetreault	3297e9b7c3	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: rriddle, sdesmalen, efriedma Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77259	2020-04-13 12:29:43 -07:00
Sanjay Patel	1318ddbc14	[VectorUtils] rename scaleShuffleMask to narrowShuffleMaskElts; NFC As proposed in D77881, we'll have the related widening operation, so this name becomes too vague. While here, change the function signature to take an 'int' rather than 'size_t' for the scaling factor, add an assert for overflow of 32-bits, and improve the documentation comments.	2020-04-11 10:05:49 -04:00
Sanjay Patel	ce97ce3a5d	[VectorCombine] try to form a better extractelement Extracting to the same index that we are going to insert back into allows forming select ("blend") shuffles and enables further transforms. Admittedly, this is a quick-fix for a more general problem that I'm hoping to solve by adding transforms for patterns that start with an insertelement. But this might resolve some regressions known to be caused by the extract-extract transform (although I have not gotten more details on those yet). In the motivating case from PR34724: https://bugs.llvm.org/show_bug.cgi?id=34724 The combination of subsequent instcombine and codegen transforms gets us this improvement: vmovshdup %xmm0, %xmm2 ## xmm2 = xmm0[1,1,3,3] vhaddps %xmm1, %xmm1, %xmm4 vmovshdup %xmm1, %xmm3 ## xmm3 = xmm1[1,1,3,3] vaddps %xmm0, %xmm2, %xmm0 vaddps %xmm1, %xmm3, %xmm1 vshufps $200, %xmm4, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm4[0,3] vinsertps $177, %xmm1, %xmm0, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm1[2] --> vmovshdup %xmm0, %xmm2 ## xmm2 = xmm0[1,1,3,3] vhaddps %xmm1, %xmm1, %xmm1 vaddps %xmm0, %xmm2, %xmm0 vshufps $200, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm1[0,3] Differential Revision: https://reviews.llvm.org/D76623	2020-04-03 13:55:13 -04:00
Sanjay Patel	b6050ca181	[VectorCombine] transform bitcasted shuffle to narrower elements bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC' We do not attempt this in InstCombine because we do not want to change types and create new shuffle ops that are potentially not lowered as well as the original code. Here, we can check the cost model to see if it is worthwhile. I've aggressively enabled this transform even if the types are the same size and/or equal cost because moving the bitcast allows InstCombine to make further simplifications. In the motivating cases from PR35454: https://bugs.llvm.org/show_bug.cgi?id=35454 ...this is enough to let instcombine and the backend eliminate the redundant shuffles, but we probably want to extend VectorCombine to handle the inverse pattern (shuffle-of-bitcast) to get that simplification directly in IR. Differential Revision: https://reviews.llvm.org/D76727	2020-04-02 13:30:22 -04:00

1 2

63 Commits