llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexey Bataev	af870e11ae	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 09:08:46 -07:00
Alexey Bataev	b82344a019	Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." This reverts commit `daf6e18c55` to fix the compiler crash.	2021-04-20 08:29:32 -07:00
Alexey Bataev	daf6e18c55	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 07:46:49 -07:00
Alexey Bataev	cf00cb8bed	Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." This reverts commit `b232771aca` to fix buildbots.	2021-04-20 07:16:11 -07:00
Alexey Bataev	b232771aca	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 06:55:55 -07:00
Alexey Bataev	8030481065	Revert "[SLP]Add detection of shuffled/perfect matching of tree entries." This reverts commit `d6fde91379` to fix compiler crashes.	2021-04-19 14:10:04 -07:00
Alexey Bataev	d6fde91379	[SLP]Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Differential Revision: https://reviews.llvm.org/D100495	2021-04-19 13:29:30 -07:00
Simon Pilgrim	b49c41afba	[SLP] createOp - fix null dereference warning. NFCI. Only attempt to propagateIRFlags if we have both SelectInst - afaict we shouldn't have matched a min/max reduction without both SelectInst, but static analyzer doesn't know that.	2021-04-14 15:24:41 +01:00
dfukalov	d066079728	[NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset. Main reason is preparation to transform AliasResult to class that contains offset for PartialAlias case. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98027	2021-04-09 12:54:22 +03:00
Alexey Bataev	ab124bbe2a	[SLP]Fix PR49898: Infinite loop in SLP vectorizer. We should not re-try attempt of finding of the consecutive store chain if it was tried before. Differential Revision: https://reviews.llvm.org/D100131	2021-04-08 14:18:06 -07:00
Alexey Bataev	a78e86e6be	[SLP]Avoid multiple attempts to vectorize CmpInsts. No need to lookup through and/or try to vectorize operands of the CmpInst instructions during attempts to find/vectorize min/max reductions. Compiler implements postanalysis of the CmpInsts so we can skip extra attempts in tryToVectorizeHorReductionOrInstOperands and save compile time. Differential Revision: https://reviews.llvm.org/D99950	2021-04-07 06:15:42 -07:00
Alexey Bataev	00a84f9a7f	[SLP]Improve vectorization of the CmpInst instructions. During vectorization better to postpone the vectorization of the CmpInst instructions till the end of the basic block. Otherwise we may vectorize it too early and may miss some vectorization patterns, like reductions. Reworked part of D57059 Differential Revision: https://reviews.llvm.org/D99796	2021-04-05 06:22:51 -07:00
Fangrui Song	8e5f3d04f2	[SLPVectorizer] Fix divide-by-zero after D99719 Will add a test case later.	2021-04-02 11:13:51 -07:00
Alexey Bataev	5fcb07a070	[SLP]Fix a bug in min/max reduction, number of condition uses. The ultimate reduction node may have multiple uses, but if the ultimate reduction is min/max reduction and based on SelectInstruction, the condition of this select instruction must have only single use. Differential Revision: https://reviews.llvm.org/D99753	2021-04-02 07:09:44 -07:00
Florian Hahn	0f3230390b	[SLP] Better estimate cost of no-op extracts on target vectors. The motivation for this patch is to better estimate the cost of extracelement instructions in cases were they are going to be free, because the source vector can be used directly. A simple example is %v1.lane.0 = extractelement <2 x double> %v.1, i32 0 %v1.lane.1 = extractelement <2 x double> %v.1, i32 1 %a.lane.0 = fmul double %v1.lane.0, %x %a.lane.1 = fmul double %v1.lane.1, %y Currently we only consider the extracts free, if there are no other users. In this particular case, on AArch64 which can fit <2 x double> in a vector register, the extracts should be free, independently of other users, because the source vector of the extracts will be in a vector register directly, so it should be free to use the vector directly. The SLP vectorized version of noop_extracts_9_lanes is 30%-50% faster on certain AArch64 CPUs. It looks like this does not impact any code in SPEC2000/SPEC2006/MultiSource both on X86 and AArch64 with -O3 -flto. This originally regressed after D80773, so if there's a better alternative to explore, I'd be more than happy to do that. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D99719	2021-04-02 10:40:12 +01:00
Alexey Bataev	c03696da5e	[SLP]Improve and fix getVectorElementSize. 1. Need to cleanup InstrElementSize map for each new tree, otherwise might use sizes from the previous run of the vectorization attempt. 2. No need to include into analysis the instructions from the different basic blocks to save compile time. Differential Revision: https://reviews.llvm.org/D99677	2021-04-01 06:51:26 -07:00
Alexey Bataev	ce98a0556a	[SLP]Remove `else` after `return`, NFC.`	2021-04-01 05:33:01 -07:00
Sanjay Patel	da381cf7ce	[SLP] allow matching integer min/max intrinsics as reduction ops This is a 2nd try of: `3c8473ba53` which was reverted at: `a26312f9d4` because of crashing. This version includes extra code and tests to avoid the known crashing examples as discussed in PR49730. Original commit message: As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics. Most of the real work to make this possible was in: `7202f47508` Differential Revision: https://reviews.llvm.org/D98981	2021-03-29 09:38:18 -04:00
Sanjay Patel	b0797e0c12	[SLP] use dyn_cast instead of isa + cast; NFC	2021-03-26 13:52:31 -04:00
Sanjay Patel	a26312f9d4	Revert "[SLP] allow matching integer min/max intrinsics as reduction ops" This reverts commit `3c8473ba53` and includes test diffs to maintain testing status. There's at least 1 place that was not updated with `7202f47508` , so we can crash mismatching select and intrinsics as shown in PR49730.	2021-03-26 09:59:14 -04:00
Yevgeny Rouban	f7ef26ef0b	[SLP] Fix crash in reduction for integer min/max The SCEV commit `b46c085d2b` [NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions seems to reveal a new crash in SLPVectorizer. SLP crashes expecting a SelectInst as an externally used value but umin() call is found. The patch relaxes the assumption to make the IR flag propagation safe. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D99328	2021-03-25 21:44:21 +07:00
Alexey Bataev	568c874117	[SLP]Improve and simplify extendSchedulingRegion. We do not need to scan further if the upper end or lower end of the basic block is reached already and the instruction is not found. It means that the instruction is definitely in the lower part of basic block or in the upper block relatively. This should improve compile time for the very big basic blocks. Differential Revision: https://reviews.llvm.org/D99266	2021-03-25 05:31:58 -07:00
Sander de Smalen	55d18b3cc2	[TTI] Return a TypeSize from getRegisterBitWidth. This patch changes the interface to take a RegisterKind, to indicate whether the register bitwidth of a scalar register, fixed-width vector register, or scalable vector register must be returned. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98874	2021-03-24 14:45:13 +00:00
Alexey Bataev	99203f2004	[Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967	2021-03-23 14:25:36 -07:00
Alexey Bataev	f1b47ad278	Revert "[Analysis]Add getPointersDiff function to improve compile time." This reverts commit `065a14a12d` to investigate and fix crash in SLP vectorizer.	2021-03-23 13:17:54 -07:00
Alexey Bataev	065a14a12d	[Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967	2021-03-23 12:58:42 -07:00
Sanjay Patel	3c8473ba53	[SLP] allow matching integer min/max intrinsics as reduction ops As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics. Most of the real work to make this possible was in: `7202f47508` Differential Revision: https://reviews.llvm.org/D98981	2021-03-23 08:56:44 -04:00
Bjorn Pettersson	688cdddafb	[SLP] Honor min/max regsize and min/max VF in vectorizeStores Make sure we use PowerOf2Floor instead of PowerOf2Ceil when calculating max number of elements that fits inside a vector register (otherwise we could end up creating vectors larger than the maximum vector register size). Also make sure we honor the min/max VF (as given by TTI or cmd line parameters) when doing vectorizeStores. Reviewed By: anton-afanasyev Differential Revision: https://reviews.llvm.org/D97691	2021-03-22 17:29:35 +01:00
Alexey Bataev	b3ced9852c	[SLP]Fix crash on extending scheduling region. If SLP vectorizer tries to extend the scheduling region and runs out of the budget too early, but still extends the region to the new ending instructions (i.e., it was able to extend the region for the first instruction in the bundle, but not for the second), the compiler need to recalculate dependecies in full, just like if the extending was successfull. Without it, the schedule data chunks may end up with the wrong number of (unscheduled) dependecies and it may end up with the incorrect function, where the vectorized instruction does not dominate on the extractelement instruction. Differential Revision: https://reviews.llvm.org/D98531	2021-03-18 06:11:08 -07:00
David Green	e2935dcfc4	[TTI] Add a Mask to getShuffleCost This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask can be provided a more accurate cost can be provided by the backend. For example VREV costs could be returned by the ARM backend. This should be an NFC until then, laying the groundwork for that to be added. Differential Revision: https://reviews.llvm.org/D98206	2021-03-17 17:46:26 +00:00
Bu Le	9abe500473	[SLP] Fix the trunc instruction insertion problem Current SLP pass has this piece of code that inserts a trunc instruction after the vectorized instruction. In the case that the vectorized instruction is a phi node and not the last phi node in the BB, the trunc instruction will be inserted between two phi nodes, which will trigger verify problem in debug version or unpredictable error in another pass. This patch changes the algorithm to 'if the last vectorized instruction is a phi, insert it after the last phi node in current BB' to fix this problem.	2021-03-17 13:51:08 +03:00
Sanjay Patel	7202f47508	[SLP] separate min/max matching from its instruction-level implementation; NFC The motivation is to handle integer min/max reductions independently of whether they are in the current cmp+sel form or the planned intrinsic form. We assumed that min/max included a select instruction, but we can decouple that implementation detail by checking the instructions themselves rather than relying on the recurrence (reduction) type.	2021-03-16 17:16:11 -04:00
Sanjay Patel	40fdb43d30	[SLP] improve readability in reduction logic; NFC We had 2 different and ambiguously-named 'I' variables.	2021-03-16 07:35:13 -04:00
Valery N Dmitriev	73f94969b2	[SLP] Fix crash when matching associative reduction for integer min/max. Associative reduction matcher in SLP begins with select instruction but when it reached call to llvm.umax (or alike) via def-use chain the latter also matched as UMax kind. The routine's later code assumes matched instruction to be a select and thus it merely died on the first encountered cast that did not fit. Differential Revision: https://reviews.llvm.org/D98432	2021-03-11 11:52:57 -08:00
Sanjay Patel	23fd647cc6	[SLP] remove dead null check; NFC We cast<> to Instruction (not dyn_cast<>), so we already required/assumed that Cmp is not null.	2021-03-09 17:43:07 -05:00
Alexey Bataev	a054e94e9e	[SLP]Merge reorder and reuse shuffles. It is possible to merge reuse and reorder shuffles and reduce the total cost of the vectorization tree/number of final instructions. Differential Revision: https://reviews.llvm.org/D94992	2021-03-02 06:39:47 -08:00
David Green	bd4b61efbd	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-23 13:03:26 +00:00
Alexey Bataev	9a4dd4de9d	[SLP]No need to mark scatter load pointer as scalar as it gets vectorized. Pointer operand of scatter loads does not remain scalar in the tree (it gest vectorized) and thus must not be marked as the scalar that remains scalar in vectorized form. Differential Revision: https://reviews.llvm.org/D96818	2021-02-22 11:58:28 -08:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00
Sander de Smalen	171d12489f	[SLPVectorizer] NFC: Migrate getVectorCallCosts to use InstructionCost. This change also changes getReductionCost to return InstructionCost, and it simplifies two expressions by removing a redundant 'isValid' check.	2021-01-25 12:27:01 +00:00
Sanjay Patel	77adbe6a8c	[SLP] fix fast-math requirements for fmin/fmax reductions `a6f0221276` enabled intersection of FMF on reduction instructions, so it is safe to ease the check here. There is still some room to improve here - it looks like we have nearly duplicate flags propagation logic inside of the LoopUtils helper but it is limited targets that do not form reduction intrinsics (they form the shuffle expansion).	2021-01-24 08:55:56 -05:00
Kazu Hirata	1238378f18	[llvm] Use pop_back_val (NFC)	2021-01-23 10:56:33 -08:00
Sanjay Patel	a6f0221276	[SLP] fix fast-math-flag propagation on FP reductions As shown in the test diffs, we could miscompile by propagating flags that did not exist in the original code. The flags required for fmin/fmax reductions will be fixed in a follow-up patch.	2021-01-23 11:17:20 -05:00
Anton Rapetov	a4914dc1f2	[SLP] do not traverse constant uses Walking the use list of a Constant (particularly, ConstantData) is not scalable, since a given constant may be used by many instructinos in many functions in many modules. Differential Revision: https://reviews.llvm.org/D94713	2021-01-22 08:14:09 -05:00
Sanjay Patel	2f03528f5e	[SLP] rename reduction variable to avoid shadowing; NFC The code structure can likely be improved now that 'OperationData' is gone.	2021-01-21 16:02:38 -05:00
Sanjay Patel	d77753381f	[SLP] simplify reduction matching This is NFC-intended and removes the "OperationData" class which had become nothing more than a recurrence (reduction) type. I adjusted the matching logic to distinguish instructions from non-instructions - that's all that the "IsLeafValue" member was keeping track of.	2021-01-21 14:58:57 -05:00
Sanjay Patel	c09be0d2a0	[SLP] reduce reduction code for checking vectorizable ops; NFC This is another step towards removing `OperationData` and fixing FMF matching/propagation bugs when forming reductions.	2021-01-20 11:14:48 -05:00
Sanjay Patel	1c54112a57	[SLP] refactor more reduction functions; NFC We were able to remove almost all of the state from OperationData, so these don't make sense as members of that class - just pass the RecurKind in as a param. More streamlining is possible, but I'm trying to avoid logic/typo bugs while fixing this. Eventually, we should not need the `OperationData` class.	2021-01-20 11:14:48 -05:00
Sanjay Patel	8590d24543	[SLP] move reduction createOp functions; NFC We were able to remove almost all of the state from OperationData, so these don't make sense as members of that class - just pass the RecurKind in as a param.	2021-01-20 11:14:48 -05:00

1 2 3 4 5 ...

848 Commits