llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	efd4938723	[VPlan] Handle IV vector splat using VPWidenCanonicalIV. This patch tries to use an existing VPWidenCanonicalIVRecipe instead of creating another step-vector for canonical induction recipes in widenIntOrFpInduction. This has the following benefits: 1. First step to avoid setting both vector and scalar values for the same induction def. 2. Reducing complexity of widenIntOrFpInduction through making things more explicit in VPlan 3. Only need to splat the vector IV for block in masks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116123	2022-01-29 16:25:27 +00:00
Florian Hahn	d4a8fc3a87	[VPlan] Introduce and use BranchOnCount VPInstruction. This patch adds a new BranchOnCount VPInstruction opcode with 2 operands. It first compares its 2 operands (increment of canonical induction and vector trip count), followed by a branch to either the exit block or back to the vector header. It must be the last recipe in the exit block of the topmost vector loop region. This extracts parts from D113224 and was discussed in D113223. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116479	2022-01-12 13:42:13 +00:00
Florian Hahn	86d113a8b8	[SCEVExpand] Do not create redundant 'or false' for pred expansion. This patch updates SCEVExpander::expandUnionPredicate to not create redundant 'or false, x' instructions. While those are trivially foldable, they can be easily avoided and hinder code that checks the size/cost of the generated checks before further folds. I am planning on look into a few other similar improvements to code generated by SCEVExpander. I remember a while ago @lebedev.ri working on doing some trivial folds like that in IRBuilder itself, but there where concerns that such changes may subtly break existing code. Reviewed By: reames, lebedev.ri Differential Revision: https://reviews.llvm.org/D116696	2022-01-06 11:52:19 +00:00
Florian Hahn	65c4d6191f	[VPlan] Add VPCanonicalIVPHIRecipe, partly retire createInductionVariable. At the moment, the primary induction variable for the vector loop is created as part of the skeleton creation. This is tied to creating the vector loop latch outside of VPlan. This prevents from modeling the whole vector loop in VPlan, which in turn is required to model preheader and exit blocks in VPlan as well. This patch introduces a new recipe VPCanonicalIVPHIRecipe to represent the primary IV in VPlan and CanonicalIVIncrement{NUW} opcodes for VPInstruction to model the increment. This allows us to partly retire createInductionVariable. At the moment, a bit of patching up is done after executing all blocks in the plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D113223	2022-01-05 10:46:06 +00:00
Philip Reames	e6ad9ef4e7	[instcombine] Canonicalize constant index type to i64 for extractelement/insertelement The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order. I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions. We chose the canonical type as i64 arbitrarily. We might consider changing this to something else in the future if we have cause. Differential Revision: https://reviews.llvm.org/D115387	2021-12-13 16:56:22 -08:00
David Green	fed3041863	[LV][ARM] Improve reduction costmodel for mismatching extension types. Given a MLA reduction from two different types (say i8 and i16), we were previously failing to find the reduction pattern, often making us chose the lower vector factor. This improves that by using the largest of the two extension types, allowing us to use the larger VF as the type of the reduction. As per https://godbolt.org/z/KP549EEYM the backend handles this valiantly, leading to better performance. Differential Revision: https://reviews.llvm.org/D115432	2021-12-10 15:40:58 +00:00
David Green	255ad73424	[ARM] Make MVE v2i1 predicates legal MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the two halves. This was never treated as a legal type in llvm in the past as there are not many 64bit instructions and no 64bit compares. There are a few instructions that could use it though, notably a VSELECT (as it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for similar reasons, some gathers/scatter and long multiplies and VCTP64 instructions. This patch goes through and makes v2i1 a legal type, handling all the cases that fall out of that. It also makes VSELECT legal for v2i64 as a side benefit. A lot of the codegen changes as a result - usually in way that is a little better or a little worse, but still expensive. Costs can change a little too in the process, again in a way that expensive things remain expensive. A lot of the tests that changed are mainly to ensure correctness - the code can hopefully be improved in the future where it comes up in practice. The intrinsics currently remain using the v4i1 they previously did to emulate a v2i1. This will be changed in a followup patch but this one was already large enough. Differential Revision: https://reviews.llvm.org/D114449	2021-12-03 14:05:41 +00:00
Huihui Zhang	9cd7c534e2	[InstCombine] Enable fold select into operand for FAdd, FMul, FSub and FDiv. For FAdd, FMul, FSub and FDiv, fold select into one of the operands to enable further optimizations, i.e., floating-point reduction detection. Turn code: %C = fadd %A, %B %D = select %cond, %C, %A into: %C = select %cond, %B, -0.000000e+00 %D = fadd %A, %C Alive2 verification (with --disable-undef-input), timed out otherwise. FAdd - https://alive2.llvm.org/ce/z/eUxN4Y FMul - https://alive2.llvm.org/ce/z/5SWZz4 FSub - https://alive2.llvm.org/ce/z/Dhj8dU FDiv - https://alive2.llvm.org/ce/z/Yj_NA2 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D113442	2021-11-22 15:10:10 -08:00
David Green	309f1e4ac8	[ARM] Add datalayout to costmodel tests. NFC This adds a sensible datalayout to the ARM cost model tests, to prevent the costs reported being incorrect for the size of pointers.	2021-11-16 09:49:42 +00:00
Roman Lebedev	b291597112	Revert rest of `IRBuilderBase`'s short-circuiting folds Upon further investigation and discussion, this is actually the opposite direction from what we should be taking, and this direction wouldn't solve the motivational problem anyway. Additionally, some more (polly) tests have escaped being updated. So, let's just take a step back here. This reverts commit `f3190dedee`. This reverts commit `749581d21f`. This reverts commit `f3df87d57e`. This reverts commit `ab1dbcecd6`.	2021-10-28 02:15:14 +03:00
Roman Lebedev	101aaf62ef	Revert "[NFC] `IRBuilderBase::CreateAdd()`: place constant onto RHS" Clang OpenMP codegen tests are failing, will recommit afterwards. This reverts commit `4723c9b3c6`.	2021-10-27 22:21:37 +03:00
Roman Lebedev	42712698fd	Revert "[IR] `IRBuilderBase::CreateAdd()`: short-circuit `x + 0` --> `x`" Clang OpenMP codegen tests are failing. This reverts commit `288f1f8abe`. This reverts commit `cb90e5356a`.	2021-10-27 22:21:37 +03:00
Roman Lebedev	cb90e5356a	[IR] `IRBuilderBase::CreateAdd()`: short-circuit `x + 0` --> `x` There's precedent for that in `CreateOr()`/`CreateAnd()`. The motivation here is to avoid bloating the run-time check's IR in `SCEVExpander::generateOverflowCheck()`. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 21:34:38 +03:00
Roman Lebedev	4723c9b3c6	[NFC] `IRBuilderBase::CreateAdd()`: place constant onto RHS	2021-10-27 21:34:38 +03:00
Roman Lebedev	f3df87d57e	[IR] `IRBuilderBase::CreateOr()`: fix short-circuiting for constant on LHS There is no guarantee that the constant is on RHS here, we have to handle both cases. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 18:01:06 +03:00
Roman Lebedev	ab1dbcecd6	[IR] `IRBuilderBase::CreateSelect()`: if cond is a constant i1, short-circuit While we could emit such a tautological `select`, it will stick around until the next instsimplify invocation, which may happen after we count the cost of this redundant `select`. Which is precisely what happens with loop vectorization legality checks, and that artificially increases the cost of said checks, which is bad. There is prior art for this in `IRBuilderBase::CreateAnd()`/`IRBuilderBase::CreateOr()`. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 18:01:05 +03:00
Craig Topper	765348298c	[CostModel] Update default cost model for sadd/ssub overflow to match TargetLowering The expansion for these was updated in https://reviews.llvm.org/D47927 but the cost model was not adjusted. I believe the cost model was also incorrect for the old expansion. The expansion prior to D47927 used 3 icmps using LHS, RHS, and Result to calculate theirs signs. Then 2 icmps to compare the signs. Followed by an And. The previous cost model was using 3 icmps and 2 selects. Digging back through git blame, those 2 selects in the cost model used to be 2 icmps, but were changed in https://reviews.llvm.org/D90681 Differential Revision: https://reviews.llvm.org/D110739	2021-09-30 09:41:14 -07:00
Simon Pilgrim	10c982e0b3	Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069) Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead. I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst. https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!) Differential Revision: https://reviews.llvm.org/D106450	2021-08-23 21:09:26 +01:00
David Green	41cedb1c9a	[LV][ARM] Tighten up MLA reduction costing This makes a couple of changes to the costing of MLA reduction patterns, to more accurately cost various patterns that can come up from vectorization. - The Arm implementation of getExtendedAddReductionCost is altered to only provide costs for legal or smaller types. Larger than legal types need to be split, which currently does not work very well, especially for predicated reductions where the predicate may be legal but needs to be split. Currently we limit it to legal or smaller input types. - The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext)) is a pattern that can come up, and can be treated the same as reduce(mul(ext, ext)) providing the extension types match. - And it has been adjusted to not count the ext in reduce(mul(ext, ext)) as part of a reduce(mul) pattern. Together these changes help to more accurately cost the mla reductions in cases such as where the extend types don't match or the extend opcodes are different, picking better vector factors that don't result in expanded reductions. Differential Revision: https://reviews.llvm.org/D106166	2021-07-28 12:50:58 +01:00
David Green	037b7715dd	[ARM] Extra MVE reduction vectorizer tests. NFC	2021-07-28 10:55:06 +01:00
Dylan Fleming	20b0fa91c9	[SVE] Add support for folding for select + masked loads Add folds to instcombine to support the removal of select instruction when the masked_load is guaranteed to zero the same lanes, i.e. select(mask, mload(,,mask,0), 0) -> mload(,,mask,0). Patch originally authored by @paulwalker-arm Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106376	2021-07-26 11:58:41 +01:00
Simon Pilgrim	1c9bec727a	[InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069) As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead. I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst. https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!) Differential Revision: https://reviews.llvm.org/D106450	2021-07-22 10:58:51 +01:00
Philip Reames	723144665b	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 4) Resubmit after the following changes: * Fix a latent bug related to unrolling with required epilogue (see `e49d65f`). I believe this is the cause of the prior PPC buildbot failure. * Disable non-latch exits for epilogue vectorization to be safe (`9ffa90d`) * Split out assert movement (`600624a`) to reduce churn if this gets reverted again. Previous commit message (try 3) Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll Previous commit message... This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-07-07 07:44:35 -07:00
Florian Hahn	23c2f2e6b2	[LV] Mark increment of main vector loop induction variable as NUW. This patch marks the induction increment of the main induction variable of the vector loop as NUW when not folding the tail. If the tail is not folded, we know that End - Start >= Step (either statically or through the minimum iteration checks). We also know that both Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV + %Step == %End. Hence we must exit the loop before %IV + %Step unsigned overflows and we can mark the induction increment as NUW. This should make SCEV return more precise bounds for the created vector loops, used by later optimizations, like late unrolling. At the moment quite a few tests still need to be updated, but before doing so I'd like to get initial feedback to make sure I am not missing anything. Note that this could probably be further improved by using information from the original IV. Attempt of modeling of the assumption in Alive2: https://alive2.llvm.org/ce/z/H_DL_g Part of a set of fixes required for PR50412. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D103255	2021-06-07 10:47:52 +01:00
serge-sans-paille	4ab3041acb	Revert "[NFC] remove explicit default value for strboolattr attribute in tests" This reverts commit `bda6e5bee0`. See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance	2021-05-24 19:43:40 +02:00
serge-sans-paille	bda6e5bee0	[NFC] remove explicit default value for strboolattr attribute in tests Since `d6de1e1a71`, no attributes is quivalent to setting attribute to false. This is a preliminary commit for https://reviews.llvm.org/D99080	2021-05-24 19:31:04 +02:00
Philip Reames	ed9d70781b	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3)" This reverts commit `6d3e3ae8a9`. Still seeing PPC build bot failures, and one arm self host bot failing. I'm officially stumped, and need help from a bot owner to reduce.	2021-05-17 20:53:28 -07:00
Philip Reames	6d3e3ae8a9	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3) Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll Previous commit message... This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-05-17 16:59:25 -07:00
David Green	4979c90458	[LV] Account for tripcount when calculation vectorization profitability The loop vectorizer will currently assume a large trip count when calculating which of several vectorization factors are more profitable. That is often not a terrible assumption to make as small trip count loops will usually have been fully unrolled. There are cases however where we will try to vectorize them, and especially when folding the tail by masking can incorrectly choose to vectorize loops that are not beneficial, due to the folded tail rounding the iteration count up for the vectorized loop. The motivating example here has a trip count of 5, so either performs 5 scalar iterations or 2 vector iterations (with VF=4). At a high enough trip count the vectorization becomes profitable, but the rounding up to 2 vector iterations vs only 5 scalar makes it unprofitable. This adds an alternative cost calculation when we know the max trip count and are folding tail by masking, rounding the iteration count up to the correct number for the vector width. We still do not account for anything like setup cost or the mixture of vector and scalar loops, but this is at least an improvement in a few cases that we have had reported. Differential Revision: https://reviews.llvm.org/D101726	2021-05-06 12:36:46 +01:00
David Green	8675ef100f	[LV] Logical and/or select costs D99674 stopped the folding of certain select operations into and/or, due to incorrect folding in the presence of poison. D97360 added some costs to attempt to account for the change, but only worked at the getUserCost level, not the getCmpSelInstrCost that the vectorizer will use directly. This adds similar logic into the vectorizer to handle these logical and/or selects, treating them like and/or directly. This fixes 60% performance regressions from code like the attached test case. Differential Revision: https://reviews.llvm.org/D99884	2021-04-08 10:39:47 +01:00
David Green	1a4d3d0bca	[LV] Add a logical and/or select cost test. NFC	2021-04-08 10:27:06 +01:00
David Green	a2e0312cda	[ARM] Tone down the MVE scalarization overhead The scalarization overhead was set deliberately high for MVE, whilst the codegen was new. It helps protect us against the negative ramifications of mixing scalar and vector instructions. This decreases that, especially for floating point where the cost of extracting/inserting lane elements can be low. For integer the cost is still fairly high due to the cross-register-bank copy, but is no longer n^2 in the length of the vector. In general, this will decrease the cost of scalarizing floats and long integer vectors. i64 increase in cost, having a high cost before and after this patch. For floats this allows up to start doing things like vectorizing fdiv instructions, even if they are scalarized. Differential Revision: https://reviews.llvm.org/D98245	2021-03-19 18:30:11 +00:00
David Green	fa450e98c5	[ARM] Test for predicated scalar memops. NFC This test shows a case where we can potentially scalarize the store in a predicated loop, creating a lot of instructions that would be much slower than scalar.	2021-03-09 21:57:18 +00:00
David Green	a1c34a9d6a	[ARM] Correct vector predicate type in MVE getCmpSelInstrCost	2021-02-19 14:43:51 +00:00
David Green	1a6744e3dc	[ARM] Add larger than legal ICmp costs A v8i32 compare will produce a v8i1 predicate, but during codegen the v8i32 will be split into two v4i32, potentially requiring two v4i1 predicates to be merged into a single v8i1. Because this merging of two v4i1's into a v8i1 is very expensive, we need to make the cost of the compare equally high. This patch adds the cost of that to ARMTTIImpl::getCmpSelInstrCost. Because we don't know whether the user of the predicate can be split, and the cost model is mostly pre-instruction, we may be pessimistic but that should only be for larger and legal types. This also adds min/max detection to the costmodel where it can be detected, to keep those in line with the cost of simple min/max instructions. Otherwise for the most part, costs that were already expensive have become more expensive. Differential Revision: https://reviews.llvm.org/D96692	2021-02-18 11:42:17 +00:00
David Green	1fbb3287fc	[ARM] MVE ICmp costing tests. NFC	2021-02-18 10:50:34 +00:00
Sanjay Patel	79b1b4a581	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
Simon Pilgrim	22302b2be0	[LoopVectorize][ARM] Regenerate mve-gather-scatter-tailpred.ll test Fix codegen after rG7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c	2021-02-05 12:32:45 +00:00
Philip Reames	4cb7d03481	Add missing test update from `3e5ce49` Sorry for the build break, apparently forgot to build ARM target.	2021-02-04 18:04:24 -08:00
David Green	4cc94b7313	[CostModel] Tests for showing the cost of intrinsics from the vectorizer. NFC	2021-01-24 14:47:15 +00:00
David Green	39db5753f9	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in `745bf6cf44`. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
David Green	dfac521da1	[ARM] Fix vector saddsat costs. It turns out the vectorizer calls the getIntrinsicInstrCost functions with a scalar return type and vector VF. This updates the costmodel to handle that, still producing the correct vector costs. A vectorizer test is added to show it vectorizing at the correct factor again.	2021-01-21 15:30:39 +00:00
David Green	a36a2864c0	[ARM][LV] Additional loop invariant reduction test. NFC	2021-01-08 15:15:08 +00:00
Juneyoung Lee	4a8e6ed2f7	[SLP,LV] Use poison constant vector for shufflevector/initial insertelement This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef. This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector. The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94061	2021-01-06 11:22:50 +09:00
Florian Hahn	8a47e6252a	[VPlan] Re-add interleave group members to plan. Creating in-loop reductions relies on IR references to map IR values to VPValues after interleave group creation. Make sure we re-add the updated member to the plan, so the look-ups still work as expected This fixes a crash reported after D90562.	2021-01-05 15:06:47 +00:00
Juneyoung Lee	9b29610228	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923	2020-12-30 22:36:08 +09:00
Juneyoung Lee	278aa65cc4	[IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93793	2020-12-30 04:21:04 +09:00
Florian Hahn	a74941da71	Revert "[BasicAA] Handle two unknown sizes for GEPs" Temporarily revert commit `8b1c4e310c`. After `8b1c4e310c` the compile-time for `MultiSource/Benchmarks/MiBench/consumer-lame` dramatically increases with -O3 & LTO, causing issues for builders with that configuration. I filed PR48553 with a smallish reproducer that shows a 10-100x compile time increase.	2020-12-18 17:59:12 +00:00
Roman Lebedev	5cce4aff18	[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-17 01:03:49 +03:00
David Green	ab97c9bdb7	[LV] Fix scalar cost for tail predicated loops When it comes to the scalar cost of any predicated block, the loop vectorizer by default regards this predication as a sign that it is looking at an if-conversion and divides the scalar cost of the block by 2, assuming it would only be executed half the time. This however makes no sense if the predication has been introduced to tail predicate the loop. Original patch by Anna Welker Differential Revision: https://reviews.llvm.org/D86452	2020-12-12 14:21:40 +00:00

1 2 3 4

166 Commits