llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	ac80b0e84f	[LV] Mark Instr as const in scalarizeInstruction. (NFC). This is to reduce the diff in follow-up changes.	2022-09-13 09:10:02 +01:00
Florian Hahn	69d9bb2aad	[VPlan] Check recipe uses instead of type of underlying instr (NFC). Suggested by @Ayal post-commit, to reduce the dependence on the underlying instruction in favor of information available directly for the recipe.	2022-09-11 12:24:44 +01:00
Florian Hahn	da734473fa	[LV] Remove now dead variable after `2a78890b7b` (NFC).	2022-09-09 20:25:55 +01:00
Florian Hahn	2a78890b7b	[VPlan] Move SCEV expansion for pointer induction to VPExpandSCEV (NFC). Use VPExpandSCEVRecipe to expand the step of pointer inductions. This cleanup addresses a corresponding FIXME. It should be NFC, as steps for pointer induction must be constants, which makes expansion trivial.	2022-09-09 19:20:13 +01:00
Philip Reames	a33d98e20a	[LV] Pull out common expression [nfc]	2022-09-09 07:31:46 -07:00
Philip Reames	edb26268ce	[VPlan] Only generate single instr for stores uniform across all parts. Extend the approach taken by D133019 to store instructions. Differential Revision: https://reviews.llvm.org/D133497	2022-09-09 07:15:12 -07:00
Philip Reames	4c4c0d2c06	[LV] Use safe-divisor lowering for fixed vectors if profitable This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well. Differential Revision: https://reviews.llvm.org/D132591	2022-09-08 09:15:54 -07:00
Florian Hahn	422cf99161	[VPlan] Only generate single instr for loads uniform across all parts. VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a scalar instruction is generated per-part. This is a potential alternative D132892. For now the current patch only catches cases where the address is trivially invariant (defined outside VPlan), while D132892 catches any address that is considered invariant by SCEV AFAICT. It should be possible to hoist fully invariant recipes feeding loads out of the vector loop region as well, but in practice LICM should do that already. This version of the patch artificially limits this to loads to make it easier to compare, but this restriction should be easily liftable. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133019	2022-09-08 14:27:58 +01:00
Florian Hahn	408ebe5e3a	[VPlan] Move VPWidenCallRecipe to VPlanRecipes.cpp (NFC). Depends on D132585. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132586	2022-09-05 10:48:29 +01:00
Florian Hahn	fc444ddc77	[VPlan] Add field to track if intrinsic should be used for call. (NFC) This patch moves the cost-based decision whether to use an intrinsic or library call to the point where the recipe is created. This untangles code-gen from the cost model and also avoids doing some extra work as the information is already computed at construction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132585	2022-09-01 13:14:40 +01:00
Philip Reames	8936d86469	[LV] Add debug output for force scalar tracing [nfc] I keep finding myself needing to rule this out as a possible source of scalarization, so add debug output like we have for other instructions we decide to scalarize.	2022-08-29 15:17:51 -07:00
Florian Hahn	c78696813f	[LV] Remove unneeded getVectorIntrinsicIDForCall call (NFC). Suggested as independent fix during the review of D132585.	2022-08-29 10:19:47 +01:00
Kazu Hirata	56ea4f9bd3	[Transforms] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-27 21:21:02 -07:00
Kazu Hirata	a33ef8f2b7	Use llvm::all_equal (NFC)	2022-08-27 09:53:10 -07:00
Philip Reames	3dcec5e29f	[LV] Consistently use vputils::isUniformAfterVectorization [mostly nfc] I'd extracted isUniform, and Florian moved isUniformAfterVectorization out of VPlan at basically the same time. Let's go ahead and merge them. For the VPTransformState::get path, a VPValue without a def (which corresponds to an external IR value outside of VPLan) is explicitly handled above the uniform check. On the scalarizeInstruction path, I'm less sure why the change isn't visible, but test cases which would seem likely to hit it were already being handled as uniform through some other mechanism. It would be correct to consider values defined outside of vplan uniform here.	2022-08-26 11:09:17 -07:00
Philip Reames	2d5f025779	[LV] Extract utility for checking if VPValue is uniform [nfc]	2022-08-26 09:56:13 -07:00
Daniil Fukalov	9c710ebbdb	[TTI] NFC: Reduce InstructionCost::getValue() usage... in order to propagate `InstructionCost` value upper. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D103406	2022-08-26 16:37:32 +03:00
Philip Reames	23245a914b	[LV] Simplify code given isPredicatedInst doesn't dependent on VF any more [nfc]	2022-08-24 11:42:10 -07:00
Philip Reames	3ab00cfca9	[LV] Adjust code added in `f79214d1` for `531dd3634` [nfc] When rebasing the review which became `f79214d1`, I forgot to adjust for the changed semantics introduced by `531dd3634`. Functionally, this had no impact, but semantically it resulted in an incorrect result for isPredicatedInst. I noticed this while doing a follow up change.	2022-08-24 10:38:17 -07:00
Philip Reames	f79214d1e1	[LV] Support predicated div/rem operations via safe-divisor select idiom This patch adds support for vectorizing conditionally executed div/rem operations via a variant of widening. The existing support for predicated divrem in the vectorizer requires scalarization which we can't do for scalable vectors. The basic idea is that we can always divide (take remainder) by 1 without executing UB. As such, we can use the active lane mask to conditional select either the actual divisor for active lanes, or a constant one for inactive lanes. We already account for the cost of the active lane mask, so the only additional cost is a splat of one and the vector select. This is one of several possible approaches to this problem; see the review thread for discussion on some of the others. This one was chosen mostly because it was straight forward, and none of the others seemed oviously better. I enabled the new code only for scalable vectors. We could also legally enable it for fixed vectors as well, but I haven't thought through the cost tradeoffs between widening and scalarization enough to know if that's profitable. This will be explored in future patches. Differential Revision: https://reviews.llvm.org/D130164	2022-08-24 10:07:59 -07:00
David Green	8d830f8d68	[LV] Replace fixed-order cost model with a SK_Splice shuffle The existing cost model for fixed-order recurrences models the phi as an extract shuffle of a v1 vector. The shuffle produced should be a splice, as they take two vectors inputs are extracting from a subset of the lanes. On certain architectures the existing cost model can drastically under-estimate the correct cost for the shuffle, so this changes it to a SK_Splice and passes a correct Mask through to the getShuffleCost call. I believe this might be the first use of a SK_Splice shuffle cost model outside of scalable vectors, and some targets may require additions to the cost-model to correctly account for them. In tree targets appear to all have been updated where needed. Differential Revision: https://reviews.llvm.org/D132308	2022-08-24 13:00:32 +01:00
Florian Hahn	ff34432649	[LoopUtils] Remove unused Loop arg from addDiffRuntimeChecks (NFC). The argument is no longer used, remove it.	2022-08-23 10:15:28 +01:00
Philip Reames	27d3321c4f	[TTI] Use OperandValueInfo in getMemoryOpCost client api [nfc] This removes the last use of OperandValueKind from the client side API, and (once this is fully plumbed through TTI implementation) allow use of the same properties in store costing as arithmetic costing.	2022-08-22 11:26:31 -07:00
Philip Reames	c42a5f1cc2	[TTI] Migrate getOperandInfo to OperandVaueInfo [nfc] This is part of merging OperandValueKind and OperandValueProperties.	2022-08-22 10:19:02 -07:00
Philip Reames	5cd427106d	[TTI] Start process of merging OperandValueKind and OperandValueProperties [nfc] OperandValueKind and OperandValueProperties both provide facts about the operands of an instruction for purposes of cost modeling. We've discussed merging them several times; before I plumb through more flags, let's go ahead and do so. This change only adds the client side interface for getArithmeticInstrCost and makes a couple of minor changes in client code to prove that it works. Target TTI implementations still use the split flags. I'm deliberately splitting what could be one big change into a series of smaller ones so that I can lean on the compiler to catch errors along the way.	2022-08-22 09:48:15 -07:00
Simon Pilgrim	5263155d5b	[CostModel] Add CostKind argument to getShuffleCost Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future. Differential Revision: https://reviews.llvm.org/D132287	2022-08-21 10:54:51 +01:00
Philip Reames	b0a2c48e9f	[tti] Consolidate getOperandInfo without OperandValueProperties copies [nfc]	2022-08-19 16:22:22 -07:00
Alexey Bataev	d53e245951	[COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC. Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to better estimate cost with immediate values. Part of D126885.	2022-08-19 07:33:00 -07:00
Florian Hahn	b8709a9d03	[LV] Support fixed order recurrences. If the incoming previous value of a fixed-order recurrence is a phi in the header, go through incoming values from the latch until we find a non-phi value. Use this as the new Previous, all uses in the header will be dominated by the original phi, but need to be moved after the non-phi previous value. At the moment, fixed-order recurrences are modeled as a chain of first-order recurrences. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D119661	2022-08-18 19:15:52 +01:00
Philip Reames	531dd3634d	[LV] Restructure isPredicatedInst and isScalarWithPredication (w/a fix for uniform mem ops) This change reorganizes the code and comments to make the expected semantics of these routines more clear. However, this is not an NFC change. The functional change is having isScalarWithPredication return false if the instruction does not need predicated. Specifically, for the case of a uniform memory operation we were previously considering it not to be a predicated instruction, but were considering it to be scalable with predication. As can be seen with the test changes, this causes uniform memory ops which should have been lowered as uniform-per-parts values to instead be lowering via naive scalarization or if scalarization is infeasible (i.e. scalable vectors) aborted entirely. I also don't trust the code to bail out correctly 100% of the time, so it's possible we had a crash or miscompile from trying to scalarize something which isn't scalaralizable. I haven't found a concrete example here, but I am suspicious. Differential Revision: https://reviews.llvm.org/D131093	2022-08-18 07:14:04 -07:00
Kazu Hirata	50724716cd	[Transforms] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-14 12:51:58 -07:00
Kazu Hirata	109df7f9a4	[llvm] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-13 12:55:42 -07:00
Dinar Temirbulatov	cab6cd6834	[AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding. After D121595 was commited, I noticed regressions assosicated with small trip count numbersvectorisation by tail folding with scalable vectors. As a solution for those issues I propose to introduce the minimal trip count threshold value. Differential Revision: https://reviews.llvm.org/D130755	2022-08-09 22:10:17 +01:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Philip Reames	569a7f6aa3	[LV] Move definition of isPredicatedInst out of line and make it const [nfc]	2022-08-03 08:53:11 -07:00
Philip Reames	a1cab0daae	[LV] Use cost base decision for uniform mem op strategy [nfc-ish] This is mostly a stylistic change to make the uniform memop widening cost code fit more naturally with the sourounding code. Its not strictly speaking NFC as I added in the store with invariant value case, and we could in theory have a target where a gather/scatter is cheaper than a single load/store... but it's probably NFC in practice. Note that the scatter/gather result can still be overriden later if the result is uniform-by-parts.	2022-08-03 07:47:24 -07:00
Philip Reames	0b47615fcf	[LV] Recognize store of invariant value to invariant address as uniform This extends the handling of uniform memory operations to handle the case where a store is storing a loop invariant value. Unlike the general case of a store to an invariant address where we must use the last active lane, in this case we can use any lane since all lanes must produce the same result. For context, the basic structure of the existing code and how the change fits in: * First, we select a widening strategy. (The result is irrelevant for this patch.) * Then we determine if a computation is uniform within all lanes of VF. (Note this is the uniform-per-part definition, not LAI's uniform across all unrolled iterations definition.) * If it is, we overrule the widening strategy, and unconditionally scalarize. * VPReplicationRecipe - which is what actually does the scalarization - knows how to handle unform-per-part values including for scalable vectors. However, we do need to know that the expression is safe to execute without predication - e.g. the uniform mem op was unconditional in the original loop. (This part was split off and already landed.) An obvious question is why not simply implement the generic case? The answer is that I'm going to, but doing so without a canonicalization towards uniform causes regressions due to bad interaction with scalarization/uniformity of values feeding the uniform mem-op. This patch is needed to avoid those regressions. Differential Revision: https://reviews.llvm.org/D130364	2022-08-02 08:09:49 -07:00
David Sherwood	4ef9cb6c17	[AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses If we have interleave groups in the loop we want to vectorise then we should fall back on normal vectorisation with a scalar epilogue. In such cases when tail-folding is enabled we'll almost certainly go on to create vplans with very high costs for all vector VFs and fall back on VF=1 anyway. This is likely to be worse than if we'd just used an unpredicated vector loop in the first place. Once the vectoriser has proper support for analysing all the costs for each combination of VF and vectorisation style, then we should be able to remove this. Added an extra test here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D128342	2022-08-02 09:52:33 +01:00
jacquesguan	e38af7ba95	[LV] Refactor getExtendedAddReductionCost to support other extended reduction more than Add. Now the API getExtendedAddReductionCost is used to determine the cost of extended Add reduction with optional Mul. For Arm, it could cover the cases. But for other target, for example: RISCV, they support other kinds of extended recution, such as FAdd. This patch does the following changes: 1, Split getExtendedAddReductionCost into 2 new API: getExtendedReductionCost which handles the extended reduction with addtional input of Opcode; getMulAccReductionCost which handle the MLA cases the getExtendedAddReductionCost. 2, Refactor getReductionPatternCost, add some contraint condition to make sure the getMulAccReductionCost should only handle the reuction of Add + Mul. Differential Revision: https://reviews.llvm.org/D130868	2022-08-02 16:02:38 +08:00
Philip Reames	82c1b136db	[LV] Don't predicate uniform mem op stores unneccessarily We already had the reasoning about uniform mem op loads; if the address is accessed at least once, we know the instruction doesn't need predicated to ensure fault safety. For stores, we do need to ensure that the values visible in memory are the same with and without predication. The easiest sub-case to check for is that all the values being stored are the same. Since we know that at least one lane is active, this tells us that the value must be visible. Warning on confusing terminology: "uniform" vs "uniform mem op" mean two different things here, and this patch is specific to the later. It would not be legal to make this same change for merely "uniform" operations. Differential Revision: https://reviews.llvm.org/D130637	2022-07-28 08:55:52 -07:00
Kazu Hirata	95a932fb15	Remove redundaunt override specifiers (NFC) Identified with modernize-use-override.	2022-07-24 22:28:11 -07:00
Kazu Hirata	2d2e2e7ea9	[Vectorize] Remove isConsecutiveLoadOrStore (NFC) The last use was removed on Jan 4, 2022 in commit `95a93722db`.	2022-07-23 13:01:14 -07:00
Philip Reames	b5c7213647	[LV] Use early return to simplify code structure	2022-07-22 12:15:14 -07:00
Benjamin Kramer	5a445395e4	[LV] Remove unused variable. NFC.	2022-07-22 17:43:58 +02:00
Philip Reames	d7bf81fd51	[LV] Rework widening cost of uniform memory ops for clarity [nfc] Reorganize the code to make it clear what is and isn't handle, and why. Restructure bailout to remove (false and confusing) dependence on CM_Scalarize; just return invalid cost and propagate, that's what it is for.	2022-07-22 08:35:45 -07:00
Philip Reames	bd75350180	[LV] Fix a conceptual mistake around meaning of uniform in isPredicatedInst This code confuses LV's "Uniform" and LVL/LAI's "Uniform". Despite the common name, these are different. * LVs notion means that only the first lane of each unrolled part is required. That is, lanes within a single unroll factor are considered uniform. This allows e.g. widenable memory ops to be considered uses of uniform computations. * LVL and LAI's notion refers to all lanes across all unrollings. IsUniformMem is in turn defined in terms of LAI's notion. Thus a UniformMemOpmeans is a memory operation with a loop invariant address. This means the same address is accessed in every iteration. The tweaked piece of code was trying to match a uniform mem op (i.e. fully loop invariant address), but instead checked for LV's notion of uniformity. In theory, this meant with UF > 1, we could speculate a load which wasn't safe to execute. This ends up being mostly silent in current code as it is nearly impossible to create the case where this difference is visible. The closest I've come in the test case from 54cb87, but even then, the incorrect result is only visible in the vplan debug output; before this change we sink the unsafely speculated load back into the user's predicate blocks before emitting IR. Both before and after IR are correct so the differences aren't "interesting". The other test changes are uninteresting. They're cases where LV's uniform analysis is slightly weaker than SCEV isLoopInvariant.	2022-07-21 15:44:34 -07:00
David Sherwood	f15b6b2907	[AArch64] Add target hook for preferPredicateOverEpilogue This patch adds the AArch64 hook for preferPredicateOverEpilogue, which currently returns true if SVE is enabled and one of the following conditions (non-exhaustive) is met: 1. The "sve-tail-folding" option is set to "all", or 2. The "sve-tail-folding" option is set to "all+noreductions" and the loop does not contain reductions, 3. The "sve-tail-folding" option is set to "all+norecurrences" and the loop has no first-order recurrences. Currently the default option is "disabled", but this will be changed in a later patch. I've added new tests to show the options behave as expected here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D129560	2022-07-21 17:20:06 +01:00
Philip Reames	523a526a02	[LV] Fix miscompile due to srem/sdiv speculation safety condition An srem or sdiv has two cases which can cause undefined behavior, not just one. The existing code did not account for this, and as a result, we miscompiled when we encountered e.g. a srem i64 %v, -1 in a conditional block. Instead of hand rolling the logic, just use the utility function which exists exactly for this purpose. Differential Revision: https://reviews.llvm.org/D130106	2022-07-20 05:35:23 -07:00
Florian Hahn	a75760a269	[LV] Remove unnecessary cast in widenCallInstruction. (NFC)	2022-07-19 11:23:24 +01:00

1 2 3 4 5 ...

1725 Commits