llvm-project

Commit Graph

Author	SHA1	Message	Date
Anna Thomas	72750f0012	[TrivialDeadness] Introduce API separating two different usages The earlier usage of wouldInstructionBeTriviallyDead is based on the assumption that the use_count of that instruction being checked will be zero. This patch separates the API into two different ones: 1. The strictly conservative one where the instruction is trivially dead iff the uses are dead. 2. The slightly relaxed form, where an instruction is dead along paths where it is not used. The second form can be used in identifying instructions that are valid to sink down to uses (D109917). Reviewed-By: reames Differential Revision: https://reviews.llvm.org/D114647	2021-12-03 10:09:52 -05:00
spupyrev	93a2c2919f	profi - a flow-based profile inference algorithm: Part III (out of 3) This is a continuation of D109860 and D109903. An important challenge for profile inference is caused by the fact that the sample profile is collected on a fully optimized binary, while the block and edge frequencies are consumed on an early stage of the compilation that operates with a non-optimized IR. As a result, some of the basic blocks may not have associated sample counts, and it is up to the algorithm to deduce missing frequencies. The problem is illustrated in the figure where three basic blocks are not present in the optimized binary and hence, receive no samples during profiling. We found that it is beneficial to treat all such blocks equally. Otherwise the compiler may decide that some blocks are “cold” and apply undesirable optimizations (e.g., hot-cold splitting) regressing the performance. Therefore, we want to distribute the counts evenly along the blocks with missing samples. This is achieved by a post-processing step that identifies "dangling" subgraphs consisting of basic blocks with no sampled counts; once the subgraphs are found, we rebalance the flow so as every branch probability is 50:50 within the subgraphs. Our experiments indicate up to 1% performance win using the optimization on some binaries and a significant improvement in the quality of profile counts (when compared to ground-truth instrumentation-based counts) {F19093045} Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D109980	2021-12-02 12:01:30 -08:00
spupyrev	98dd2f9ed3	profi - a flow-based profile inference algorithm: Part II (out of 3) This is a continuation of D109860. Traditional flow-based algorithms cannot guarantee that the resulting edge frequencies correspond to a connected flow in the control-flow graph. For example, for an instance in the attached figure, a flow-based (or any other) inference algorithm may produce an output in which the hot loop is disconnected from the entry block (refer to the rightmost graph in the figure). Furthermore, creating a connected minimum-cost maximum flow is a computationally NP-hard problem. Hence, we apply a post-processing adjustments to the computed flow by connecting all isolated flow components ("islands"). This feature helps to keep all blocks with sample counts connected and results in significant performance wins for some binaries. {F19077343} Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D109903	2021-12-02 11:04:21 -08:00
Kazu Hirata	22d82949b0	[llvm] Fix "unused variable" warnings	2021-12-02 09:20:17 -08:00
Djordje Todorovic	2cdc6f2ca6	Reland "[LICM] Hoist LOAD without sinking the STORE" When doing load/store promotion within LICM, if we cannot prove that it is safe to sink the store we won't hoist the load, even though we can prove the load could be dereferenced and moved outside the loop. This patch implements the load promotion by moving it in the loop preheader by inserting proper PHI in the loop. The store is kept as is in the loop. By doing this, we avoid doing the load from a memory location in each iteration. Please consider this small example: loop { var = ptr; if (var) break; ptr= var + 1; } After this patch, it will be: var0 = ptr; loop { var1 = phi (var0, var2); if (var1) break; var2 = var1 + 1; ptr = var2; } This addresses some problems from [0]. [0] https://bugs.llvm.org/show_bug.cgi?id=51193 Differential revision: https://reviews.llvm.org/D113289	2021-12-02 03:53:50 -08:00
Florian Hahn	2de5f39e54	[BuildLibCalls] Add support for memset_pattern{4,8}. Add support for memset_pattern{4,8} similar to the existing memset_pattern16 handling. Reviewed By: ab Differential Revision: https://reviews.llvm.org/D114883	2021-12-02 11:04:25 +00:00
Nikita Popov	a0ff26e08c	[GlobalOpt] Fix assertion failure during instruction deletion This fixes the assertion failure reported in https://reviews.llvm.org/D114889#3166417, by making RecursivelyDeleteTriviallyDeadInstructionsPermissive() more permissive. As the function accepts a WeakTrackingVH, even if originally only Instructions were inserted, we may end up with different Value types after a RAUW operation. As such, we should not assume that the vector only contains instructions. Notably this matches the behavior of the RecursivelyDeleteTriviallyDeadInstructions() function variant which accepts a single value rather than vector.	2021-12-02 11:58:39 +01:00
Florian Hahn	0496edad49	[BuildLibCalls] Add additional attrs to memcpy_chk. `memcpy_chk` can be treated like `memcpy`, with the exception that it may not return (if it aborts the program). See D114793 for a similar patch for `memset_chk`. Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D114863	2021-12-02 09:50:14 +00:00
Arthur Eubanks	512534bc16	[Cloning] Clone metadata on function declarations Previously we missed cloning metadata on function declarations because we don't call CloneFunctionInto() on declarations in CloneModule(). Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D113812	2021-12-01 15:40:05 -08:00
spupyrev	7cc2493daa	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. UPD Dec 1st 2021: - synced the declaration and definition of the option `SampleProfileUseProfi ` to use type `cl::opt<bool`; - added `inline` for `SampleProfileInference<BT>::findUnlikelyJumps` and `SampleProfileInference<BT>::isExit` to avoid linking problems on windows. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-12-01 15:30:38 -08:00
Djordje Todorovic	72f9f066df	Revert "[LICM] Hoist LOAD without sinking the STORE" This reverts commit `ecb9d8e4e3`. I'll reland this as soon as the failing tests are fixed/updated.	2021-12-01 04:39:26 -08:00
Djordje Todorovic	ecb9d8e4e3	[LICM] Hoist LOAD without sinking the STORE When doing load/store promotion within LICM, if we cannot prove that it is safe to sink the store we won't hoist the load, even though we can prove the load could be dereferenced and moved outside the loop. This patch implements the load promotion by moving it in the loop preheader by inserting proper PHI in the loop. The store is kept as is in the loop. By doing this, we avoid doing the load from a memory location in each iteration. Please consider this small example: loop { var = ptr; if (var) break; ptr= var + 1; } After this patch, it will be: var0 = ptr; loop { var1 = phi (var0, var2); if (var1) break; var2 = var1 + 1; ptr = var2; } This addresses some problems from [0]. [0] https://bugs.llvm.org/show_bug.cgi?id=51193 Differential revision: https://reviews.llvm.org/D113289	2021-12-01 04:27:50 -08:00
Florian Hahn	6a5e29d13f	[BuildLibCalls] Add argmemonly, writeonly, nounwind to memset_chk. The memset_chk library function should match memset's attributes with respect of memory effects (argmemonly, writeonly). It also does not raise exceptions. It may not return, in case it aborts the program. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D114793	2021-12-01 10:09:52 +00:00
Nikita Popov	84b978da3b	[LoopUnrollRuntime] Remove unnecessary pointer BECount check (NFC) BECounts are guaranteed to be integers nowadays.	2021-12-01 10:32:37 +01:00
Philip Reames	8906a0fe64	[SCEVExpander] Drop poison generating flags when reusing instructions The basic problem we have is that we're trying to reuse an instruction which is mapped to some SCEV. Since we can have multiple such instructions (potentially with different flags), this is analogous to our need to drop flags when performing CSE. A trivial implementation would simply drop flags on any instruction we decided to reuse, and that would be correct. This patch is almost that trivial patch except that we preserve flags on the reused instruction when existing users would imply UB on overflow already. Adding new users can, at most, refine this program to one which doesn't execute UB which is valid. In practice, this fixes two conceptual problems with the previous code: 1) a binop could have been canonicalized into a form with different opcode or operands, or 2) the inbounds GEP case which was simply unhandled. On the test changes, most are pretty straight forward. We loose some flags (in some cases, they'd have been dropped on the next CSE pass anyways). The one that took me the longest to understand was the ashr-expansion test. What's happening there is that we're considering reuse of the mul, previously we disallowed it entirely, now we allow it with no flags. The surrounding diffs are all effects of generating the same mul with a different operand order, and then doing simple DCE. The loss of the inbounds is unfortunate, but even there, we can recover most of those once we actually treat branch-on-poison as immediate UB. Differential Revision: https://reviews.llvm.org/D112734	2021-11-29 15:23:34 -08:00
Bjorn Pettersson	297fb66484	Use a deterministic order when updating the DominatorTree This solves a problem with non-deterministic output from opt due to not performing dominator tree updates in a deterministic order. The problem that was analysed indicated that JumpThreading was using the DomTreeUpdater via llvm::MergeBasicBlockIntoOnlyPred. When preparing the list of updates to send to DomTreeUpdater::applyUpdates we iterated over a SmallPtrSet, which didn't give a well-defined order of updates to perform. The added domtree-updates.ll test case is an example that would result in non-deterministic printouts of the domtree. Semantically those domtree:s are equivalent, but it show the fact that when we use the domtree iterator the order in which nodes are visited depend on the order in which dominator tree updates are performed. Since some passes (at least EarlyCSE) are iterating over nodes in the dominator tree in a similar fashion as the domtree printer, then the order in which transforms are applied by such passes, transitively, also depend on the order in which dominator tree updates are performed. And taking EarlyCSE as an example the end result could be different depending on in which order the transforms are applied. Reviewed By: nikic, kuhar Differential Revision: https://reviews.llvm.org/D110292	2021-11-29 13:14:50 +01:00
Rosie Sumpter	c2441b6b89	[LoopVectorize] Add vector reduction support for fmuladd intrinsic Enables LoopVectorize to handle reduction patterns involving the llvm.fmuladd intrinsic. Differential Revision: https://reviews.llvm.org/D111555	2021-11-24 08:50:04 +00:00
Jun Ma	07333810ca	Revert "Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""""" This reverts commit `c93f93b2e3`.	2021-11-24 10:26:37 +08:00
Mehdi Amini	1392b654ff	Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)" This reverts commit `884b6dd311`. The windows build is broken with a linker error.	2021-11-23 20:10:36 +00:00
spupyrev	884b6dd311	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-11-23 11:02:40 -08:00
Zarko Todorovski	0d3add216f	[llvm][NFC] Inclusive language: Reword replace uses of sanity in llvm/lib/Transform comments and asserts Reworded some comments and asserts to avoid usage of `sanity check/test` Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114372	2021-11-23 13:22:55 -05:00
Philip Reames	065f777d27	Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)" This reverts commit `b00fc19822`. This change fails to build (link) on ubuntu x86,	2021-11-23 09:18:28 -08:00
spupyrev	b00fc19822	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-11-23 09:08:30 -08:00
Kazu Hirata	d1abf481da	[llvm] Use range-based for loops (NFC)	2021-11-19 21:12:13 -08:00
ksyx	97b9e8438e	[GVN][NFC] Remove redundant check The if-check above deleted part guarantees that StoreOffset <= LoadOffset and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0, while it could be meaningless to have a type with nonpositive size, so that the check could be removed. The values are converted to signed types to avoid unsigned operation with negative offsets. Part of revision D100179 Reapply commit `c35e8185d8` with fixing problem reported by mstorsjo	2021-11-19 20:24:36 -05:00
Senran Zhang	0425ea4621	[NFC][OpaquePtr][Evaluator] Remove call to PointerType::getElementType There are still another 2 uses of PointerType::getElementType in Evaluator when evaluating BitCast's on pointers. BitCast's on pointers should be removed when opaque ptr is ready, so I just keep them as is. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D114131	2021-11-19 10:32:55 +08:00
Philip Reames	8f95e915cd	[unroll-runtime] Relax two profitability limitations on multi-exit unrolling This change is mostly about getting rid of some "uninteresting" cases in a follow on deeper heuristic change. If anyone sees actually interesting code differences out of this, please let me know. I'm not expecting this to have much impact at all. Case 1 - With the single deoptimize non-latch exit, we can't have two exiting blocks sharing an exit block. We can only hit this with a poorly documented debug flag. Case 2 - Why should we treat epilog cases differently from prolog cases? Or to say it differently, why should starting with a constant control whether a multiple exit loop gets unrolled? Sorry for the lack of tests here. These are both exceedingly narrow cases in practice, and after a while trying, I couldn't come up with a test which did anything "useful" as opposed to simply exercise a random combination of force flags. Note that the legality cases for each are already exercised with force flags.	2021-11-15 13:00:14 -08:00
Philip Reames	423da61835	[runtime-unroll] Inline canSafelyUnrollMultiExitLoop [NFC] All of the interesting logic from this routine has been removed, inline the single check into the sole non-assert caller. The assert use has little value with the restructured code and is simply dropped.	2021-11-15 11:39:07 -08:00
Philip Reames	e99902a872	[runtime-unroll] Restructure if-clause to improve readability [NFC]	2021-11-15 11:13:27 -08:00
ksyx	72b5138d37	Revert "[GVN][NFC] Remove redundant check" This reverts commit `c35e8185d8`. mstorsjo reported in the revision thread that one VNCoercion assertion is violated and seemly in relate to this commit. As per "If a test case that demonstrates a problem is reported in the commit thread, please revert and investigate offline", this commit is reverted.	2021-11-15 09:14:13 -05:00
Mircea Trofin	a32c2c3808	[NFC] Use Optional<ProfileCount> to model invalid counts ProfileCount could model invalid values, but a user had no indication that the getCount method could return bogus data. Optional<ProfileCount> addresses that, because the user must dereference the optional. In addition, the patch removes concept duplication. Differential Revision: https://reviews.llvm.org/D113839	2021-11-14 19:03:30 -08:00
Kazu Hirata	098e935174	[llvm] Use range-based for loops with CallBase::args (NFC)	2021-11-14 09:32:36 -08:00
Mircea Trofin	0662a3612c	[NFC][InlineFunction] Renamed some vars to conform to coding style	2021-11-14 07:26:44 -08:00
ksyx	c35e8185d8	[GVN][NFC] Remove redundant check The if-check above deleted part guarantees that StoreOffset <= LoadOffset and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0, while it could be meaningless to have a type with nonpositive size, so that the check could be removed. Part of revision D100179 Reviewed By: nikic	2021-11-13 15:59:43 -05:00
Philip Reames	37ead201e6	[runtime-unroll] Use incrementing IVs instead of decrementing ones This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing. Why does this matter? A couple of reasons: * SCEV doesn't have a native subtract node. Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such. As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones. (You can see this in the inferred flags in some of the test cases.) * Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language. We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced. (You can see this looking at nearby phis in the test cases.) Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen. * Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value. We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.	2021-11-12 15:44:58 -08:00
Philip Reames	de2fed6152	[unroll] Keep unrolled iterations with initial iteration The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration. With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code. However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.	2021-11-12 11:40:50 -08:00
Florian Hahn	2ead34716a	[SimplifyCFG] Add early bailout if Use is not in same BB. Without this patch, passingValueIsAlwaysUndefined will iterate over all instructions from I to the end of the basic block, even if the use is outside the block. This patch adds an early bail out, if the use instruction is outside I's BB. This can greatly reduce compile-time in cases where very large basic blocks are involved, with a large number of PHI nodes and incoming values. Note that the refactoring makes the handling of the case where I is a phi and Use is in PHI more explicit as well: for phi nodes, we can also directly bail out. In the existing code, we would iterate until we reach the end and return false. Based on an earlier patch by Matt Wala. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D113293	2021-11-09 12:57:03 +00:00
Max Kazantsev	cb728cb8a9	[NFC] Get rid of hardcoded magical constant and use Optionals instead Refactor calculateIterationsToInvariance so that it doesn't need a magical constant to signify unknown answer.	2021-11-09 18:13:19 +07:00
Dmitry Makogon	ae14fae0ff	[SCEVExpander] Use stable_sort to sort loop Phis in SCEVExpander::replaceCongruentIVs This is a fix for test failures on expensive checks build caused by `db289340c8`. With LLVM_ENABLE_EXPENSIVE_CHECKS enabled the llvm::sort shuffles the given container. However, the sort is only called when the TTI is passed to replaceCongruentIVs. In the mentioned patch we pass it TTI, so the sort happens. But due to shuffling equivalent Phis may appear in different order from run to run. With the stable_sort instead of sort this is impossible - the order of sorted Phis is preserved.	2021-11-09 16:29:57 +07:00
Anton Afanasyev	ce4fa93db8	[SCCP] Tune cast instruction handling for overdefined operand Extended value is known to be inside range smaller than full one. Prevent SCCP to mark such value as overdefined. Fixes PR52253 Differential Revision: https://reviews.llvm.org/D112721	2021-11-08 18:34:30 +03:00
Kazu Hirata	0d182d9d1e	[Transforms] Use make_early_inc_range (NFC)	2021-11-07 17:03:15 -08:00
Benjamin Kramer	9b8b16457c	Put implementation details into anonymous namespaces. NFCI.	2021-11-07 15:18:30 +01:00
Kazu Hirata	843d1eda18	[llvm] Use llvm::reverse (NFC)	2021-11-06 19:31:18 -07:00
Kazu Hirata	1b108ab975	[Transforms] Use make_early_inc_range (NFC)	2021-11-02 18:13:23 -07:00
Dmitry Makogon	e09958d5eb	[LoopPeel] Peel loops with exits followed by an unreachable or deopt block Added support for peeling loops with exits that are followed either by an unreachable-terminated block or block that has a terminatnig deoptimize call. All blocks in the sequence must have an unique successor, maybe except for the last one. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D110922	2021-11-02 23:12:04 +07:00
Jun Ma	c93f93b2e3	Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."""" This reverts commit `3a998c06a8`.	2021-11-01 15:31:59 +08:00
Kazu Hirata	c714da2ceb	[Transforms] Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC)	2021-10-31 07:57:32 -07:00
Roman Lebedev	156f10c840	[IR] `SCEVExpander::generateOverflowCheck()`: short-circuit `umul_with_overflow`-by-one It's a no-op, no overflow happens ever: https://alive2.llvm.org/ce/z/Zw89rZ While generally i don't like such hacks, we have a very good reason to do this: here we are expanding a run-time correctness check for the vectorization, and said `umul_with_overflow` will not be optimized out before we query the cost of the checks we've generated. Which means, the cost of run-time checks would be artificially inflated, and after https://reviews.llvm.org/D109368 that will affect the minimal trip count for which these checks are even evaluated. And if they aren't even evaluated, then the vectorized code certainly won't be run. We could consider doing this in IRBuilder, but then we'd need to also teach `CreateExtractValue()` to look into chain of `insertvalue`'s, and i'm not sure there's precedent for that. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 19:45:55 +03:00
Nikita Popov	11a8423dab	[SCEV] Use reverse() (NFC)	2021-10-26 11:08:58 +02:00
Max Kazantsev	9bbfe0f72c	[NFC] Remove obsolete simplifyOnceImpl function The function simplifyOnce only calls simplifyOnceImpl and does nothing else. Having this separate helper makes no sense. Removing it. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D112517 Reviewed By: mkazantsev	2021-10-26 13:51:42 +07:00

1 2 3 4 5 ...

6025 Commits