llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	e302fa89aa	[LoopUnroll] Forget exit values when making changes. When unrolling, the exit values in LCSSA phis will get updated. Invalidate cached SCEV values for those phis in case SCEV looked through a exit phi. Fixes #58340.	2022-10-18 15:12:24 +01:00
Florian Hahn	b0ded70ebf	[LoopUnroll] Add test for mis-compile due to missing SCEV invalidation. Test for #58340.	2022-10-18 14:56:44 +01:00
Arthur Eubanks	f3a928e233	[opt] Don't translate legacy -analysis flag to require<analysis> Tests relying on this should explicitly use -passes='require<analysis>,foo'.	2022-10-07 14:54:34 -07:00
Florian Hahn	ec86e9a99b	[LoopUnroll] Add test for crash exposed by `9e931439`.	2022-10-07 20:02:58 +01:00
Nikita Popov	b43a4d0850	[LoopPeeling] Support peeling loops with non-latch exits Loop peeling currently requires that a) the latch is exiting b) a branch and c) other exits are unreachable/deopt. This patch removes all of these limitations, and adds the necessary branch weight updating support. It essentially works the same way as before with latch -> exiting terminator and loop trip count -> per exit trip count. It's worth noting that there are still other limitations in profitability heuristics: This patch enables peeling of loops to make conditions invariant (which is pretty much always highly profitable if possible), while peeling to make loads dereferenceable still checks that non-latch exits are unreachable and PGO-based peeling has even more conditions. Those checks could be relaxed later if we consider those cases profitable. The motivation for this change is that loops using iterator adaptors in Rust often optimize very badly, and end up with a loop phi of the form phi(true, false) in the final result. Peeling eliminates that phi and conditions based on it, which enables a lot of follow-on simplification. Differential Revision: https://reviews.llvm.org/D134803	2022-10-07 12:35:52 +02:00
Florian Hahn	7c0ff64b0f	[LAA] Change to function analysis for new PM. At the moment, LoopAccessAnalysis is a loop analysis for the new pass manager. The issue with that is that LAI caches SCEV expressions and modifications in a loop may impact SCEV expressions in other loops, but we do not have a convenient way to invalidate LAI for other loops withing a loop pipeline. To avoid this issue, turn it into a function analysis which returns a manager object that keeps track of the individual LAI objects per loop. Fixes #50940. Fixes #51669. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D134606	2022-10-01 15:44:27 +01:00
Florian Hahn	27330882a1	[LoopUnroll] Add cache verification failure test case. Test case for D134612.	2022-09-26 14:25:37 +01:00
Simon Pilgrim	09cb9fdef9	[InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635) Alive2: https://alive2.llvm.org/ce/z/sZ6wwS As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero. Differential Revision: https://reviews.llvm.org/D134172	2022-09-20 16:44:41 +01:00
Nikita Popov	dd61726d5b	Revert "[SimplifyCFG] accumulate bonus insts cost" This reverts commit `e5581df60a`. This causes major compile-time regressions, about 2-3% end-to-end on CTMark.	2022-09-19 14:46:43 +02:00
Yaxun (Sam) Liu	e5581df60a	[SimplifyCFG] accumulate bonus insts cost SimplifyCFG folds bool foo() { if (cond1) return false; if (cond2) return false; return true; } as bool foo() { if (cond1 \| cond2) return false return true; } 'cond2' is called 'bonus insts' in branch folding since they introduce overhead since the original CFG could do early exit but the folded CFG always executes them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into its predecessor BB which shares the destination. If it is below bonus-inst-threshold, SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed. When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts in the current BB to be considered for folding. This causes issue for unrolled loops which share destinations, e.g. bool foo(int a) { for (int i = 0; i < 32; i++) if (a[i] > 0) return false; return true; } After unrolling, it becomes bool foo(int a) { if(a[0]>0) return false if(a[1]>0) return false; //... if(a[31]>0) return false; return true; } SimplifyCFG will merge each BB with its predecessor BB, and ends up with 32 'bonus insts' which are always executed, which is much slower than the original CFG. The root cause is that SimplifyCFG does not consider the accumulated cost of 'bonus insts' which are folded from different BB's. This patch fixes that by introducing a ValueMap to track costs of 'bonus insts' coming from different BB's into the same BB, and cuts off if the accumulated cost exceeds a threshold. Reviewed by: Artem Belevich, Florian Hahn, Nikita Popov, Matt Arsenault Differential Revision: https://reviews.llvm.org/D132408	2022-09-18 20:21:14 -04:00
Jamie Schmeiser	5e3ac79690	Loop names used in reporting can grow very large Summary: The code for generating a name for loops for various reporting scenarios created a name by serializing the loop into a string. This may result in a very large name for a loop containing many blocks. Use the getName() function on the loop instead. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: Whitney (Whitney Tsang), aeubanks (Arthur Eubanks) Differential Revision: https://reviews.llvm.org/D133587	2022-09-09 13:45:14 -04:00
Florian Hahn	555e09c2b0	[LAA] Rename printing pass to print<access-info>. This updates the naming for the LAA printing pass to be in line with most other analysis printing passes. The old name has come up as confusing multiple times already, e.g. in D131924.	2022-08-26 11:00:09 +01:00
Craig Topper	37c47b2cac	[RISCV] Change how mtune aliases are implemented. The previous implementation translated from names like sifive-7-series to sifive-7-rv32 or sifive-7-rv64. This also required sifive-7-rv32 and sifive-7-rv64 to be valid CPU names. As those are not real CPUs it doesn't make sense to accept them in -mcpu. This patch does away with the translation and adds sifive-7-series directly to RISCV.td. Removing sifive-7-rv32 and sifive-7-rv64. sifive-7-series is only allowed in -mtune. I've also added "rocket" to RISCV.td but have not removed rocket-rv32 or rocket-rv64. To prevent -mcpu=sifive-7-series or -mcpu=rocket being used with llc, I've added a Feature32Bit to all rv32 CPUs. And made it an error to have an rv32 triple without Feature32Bit. sifive-7-series and rocket do not have Feature32Bit or Feature64Bit set so the user would need to provide -mattr=+32bit or -mattr=+64bit along with the -mcpu to avoid the error. SiFive no longer names their newer products with 3, 5, or 7 series. Instead we have p200 series, x200 series, p500 series, and p600 series. Following the previous behavior would require a sifive-p500-rv32 and sifive-p500-rv64 in order to support -mtune=sifive-p500-series. There is currently no p500 product, but it could start getting confusing if there was in the future. I'm open to hearing alternatives for how to achieve my main goal of removing sifive-7-rv32/rv64 as a CPU name. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131708	2022-08-18 16:22:25 -07:00
Max Kazantsev	ebabd6bf18	Return "[SCEV] Use context to strengthen flags of BinOps" This reverts commit `354fa0b480`. Returning as is. The patch was reverted due to a miscompile, but this patch is not causing it. This patch made it possible to infer some nuw flags in code guarded by `false` condition, and then someone else to managed to propagate the flag from dead code outside. Returning the patch to be able to reproduce the issue.	2022-08-16 14:12:36 +07:00
Max Kazantsev	354fa0b480	Revert "[SCEV] Use context to strengthen flags of BinOps" This reverts commit `34ae308c73`. Our internal testing found a miscompile. Not sure if it's caused by this patch or it revealed something else. Reverting while investigating.	2022-08-15 18:51:59 +07:00
Martin Sebor	0dcfe7aa35	[InstCombine] Tighten up known library function signature tests (PR #56463 ) Replace a switch statement used to validate arguments to known library functions with a more consistent table-driven approach and tighten it up.	2022-08-10 14:15:46 -06:00
Max Kazantsev	34ae308c73	[SCEV] Use context to strengthen flags of BinOps Sometimes SCEV cannot infer nuw/nsw from something as simple as ``` len in [0, MAX_INT] ... iv = phi(0, iv.next) guard(iv <s len) guard(iv <u len) iv.next = iv + 1 ``` just because flag strenthening only relies on definition and does not use local facts. This patch adds support for the simplest case: inference of flags of `add(x, constant)` if we can contextually prove that `x <= max_int - constant`. In case if it has negative CT impact, we can add an option to switch it off. I woudln't expect that though. Differential Revision: https://reviews.llvm.org/D129643 Reviewed By: apilipenko	2022-08-03 14:08:57 +07:00
Nikita Popov	534b9246a2	[LoopInfo] Allow cloning of callbr After D129288, callbr is safe to clone without special handling. This permits optimizations like loop unroll and loop unswitch on loops containing callbrs. Fixes https://github.com/llvm/llvm-project/issues/41834. Differential Revision: https://reviews.llvm.org/D129993	2022-07-19 09:57:28 +02:00
Nikita Popov	118d8fe46b	[LoopUnroll] Regenerate test checks (NFC)	2022-07-18 10:37:22 +02:00
Nikita Popov	2a721374ae	[IR] Don't use blockaddresses as callbr arguments Following some recent discussions, this changes the representation of callbrs in IR. The current blockaddress arguments are replaced with `!` label constraints that refer directly to callbr indirect destinations: ; Before: %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo)) to label %asm.fallthrough [label %foo] ; After: %res = callbr i8* asm "", "=r,r,!i"(i8* %x) to label %asm.fallthrough [label %foo] The benefit of this is that we can easily update the successors of a callbr, without having to worry about also updating blockaddress references. This should allow us to remove some limitations: * Allow unrolling/peeling/rotation of callbr, or any other clone-based optimizations (https://github.com/llvm/llvm-project/issues/41834) * Allow duplicate successors (https://github.com/llvm/llvm-project/issues/45248) This is just the IR representation change though, I will follow up with patches to remove limtations in various transformation passes that are no longer needed. Differential Revision: https://reviews.llvm.org/D129288	2022-07-15 10:18:17 +02:00
Florian Hahn	6d5f814357	[LoopUnrollRuntime] Invalidate SCEV for exit phi in ConnectProlog. ConnectProlog adds new incoming values to exit phi nodes which can change the SCEV for the phi after `20d798bd47`. Fix is analog to `cfc741bc0e`. Fixes #56286.	2022-06-29 20:28:43 +01:00
Florian Hahn	9a35f19e3e	[UnrollRuntime] Invalidate SCEVs for modified phis in ConnectEpilog. ConnectEpilog adds new incoming values to exit phi nodes which can change the SCEV for the phi after `20d798bd47`. Fix is analog to `cfc741bc0e`. Fixes #56282.	2022-06-29 18:26:00 +01:00
Florian Hahn	cfc741bc0e	[LoopPeel] Forget SCEV for updated exit phi values. LoopPeel add new incoming values to exit phi nodes which can change the SCEV for the phi after `20d798bd47`. Forget SCEVs for such phis. Fixes #56044. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D128164	2022-06-20 13:19:27 +02:00
Philip Reames	206f10d3f6	Plumb InstructionCost through unroll costing Teach the unroller(s) how to handle an invalid cost. This avoids crashes when the backend can't provide a cost due to either a fundemental limitation or an unimplemented cost model case. Differential Revision: https://reviews.llvm.org/D127305	2022-06-09 15:42:53 -07:00
Nuno Lopes	80b3dcc045	[Support] Make report_fatal_error respect its GenCrashDiag argument so it doesn't generate a backtrace There are a few places where we use report_fatal_error when the input is broken. Currently, this function always crashes LLVM with an abort signal, which then triggers the backtrace printing code. I think this is excessive, as wrong input shouldn't give a link to LLVM's github issue URL and tell users to file a bug report. We shouldn't print a stack trace either. This patch changes report_fatal_error so it uses exit() rather than abort() when its argument GenCrashDiag=false. Reviewed by: nikic, MaskRay, RKSimon Differential Revision: https://reviews.llvm.org/D126550	2022-05-30 19:19:23 +01:00
Nikita Popov	81c648a3d9	[LoopUnroll] Freeze tripcount rather than condition This is a followup to D125754. We introduce two branches, one before the unrolled loop and one before the epilogue (and similar for the prologue case). The previous patch only froze the condition on the first branch. Rather than independently freezing the second condition, this patch instead freezes TripCount and bases BECount on it. These are the two quantities involved in the conditions, and this ensures that both work on a consistent, non-poisonous trip count. Differential Revision: https://reviews.llvm.org/D125896	2022-05-24 09:42:39 +02:00
Nikita Popov	e44fe27251	[LoopUnroll] Regenerate test checks (NFC)	2022-05-18 17:20:09 +02:00
Nikita Popov	323514de58	[LoopUnroll] Avoid branch on poison for runtime unroll with multiple exits When performing runtime unrolling with multiple exits, one of the earlier (non-latch) exits may exit the loop on the first iteration, such that we never branch on the latch exit condition. As such, we need to freeze the condition of the new branch that is introduced before the loop, as it now executes unconditionally. Differential Revision: https://reviews.llvm.org/D125754	2022-05-18 09:51:22 +02:00
Whitney Tsang	80304c5f88	[LoopUnroll] Always respect user unroll pragma IMO when user provide unroll pragma, compiler should always respect it. It is not clear to me why loop unroll pass currently ensure that the unrolled loop size is limited by PragmaUnrollThreshold. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D119148	2022-04-11 14:33:24 -04:00
Andrew Wei	0af3e6a22d	[InstCombine] Sink instructions with multiple users in a successor block. This patch tries to sink instructions when they are only used in a successor block. This is a further enhancement patch based on Anna's commit: D109700, which allows sinking an instruction having multiple uses in a single user. In this patch, sink instructions with multiple users in a single successor block will be supported. It could fix a known issue from rust: https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610 Reviewed By: nikic, reames Differential Revision: https://reviews.llvm.org/D121585	2022-03-18 11:53:45 +08:00
William S. Moses	d9da6a535f	[LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information. This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D119965	2022-02-17 20:13:07 -05:00
Roman Lebedev	371fcb720e	[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in https://github.com/llvm/llvm-project/issues/53853 and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574 This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes https://github.com/llvm/llvm-project/issues/53853 Refs. https://github.com/rust-lang/rust/issues/85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854	2022-02-17 12:13:55 +03:00
Nick Desaulniers	79ebc3b0dd	[llvm][test] rewrite callbr to use i rather than X constraint NFC In D115311, we're looking to modify clang to emit i constraints rather than X constraints for callbr's indirect destinations. Prior to doing so, update all of the existing tests in llvm/ to match. Reviewed By: void, jyknight Differential Revision: https://reviews.llvm.org/D115410	2022-01-11 11:31:08 -08:00
Florian Hahn	9153c27385	[LoopUnroll] Make test more robust by removing undef. Replace a uses of undef in the tests. The undef causes runtime checks to be trivially fold/removeable, which does defeat the purpose of the test.	2022-01-08 15:44:23 +00:00
Nikita Popov	9e4aeb1f60	[LoopUnroll] Remove unrelated passes from test (NFC) Manually run these and use the result as the initial input for the test.	2022-01-07 09:20:23 +01:00
Philip Reames	7203140748	Revert "[unroll] Prune all but first copy of invariant exit" This reverts commit `9bd22595ba`. Seeing some bot failures which look plausibly connected. Revert while investigating/waiting for bots to stablize. e.g. https://lab.llvm.org/buildbot#builders/36/builds/15933	2022-01-03 11:57:35 -08:00
Philip Reames	9bd22595ba	[unroll] Prune all but first copy of invariant exit If we have an exit which is controlled by a loop invariant condition and which dominates the latch, we know only the copy in the first unrolled iteration can be taken. All other copies are dead. The change itself is pretty straight forward, but let me add two points of context: * I'd have expected other transform passes to catch this after unrolling, but I'm seeing multiple examples where we get to the end of O2/O3 without simplifying. * I'd like to do a stronger change which did CSE during unroll and accounted for invariant expressions (as defined by SCEV instead of trivial ones from LoopInfo), but that doesn't fit cleanly into the current code structure. Differential Revision: https://reviews.llvm.org/D116496	2022-01-03 09:55:19 -08:00
Philip Reames	f6e90fac35	Remove loop invariant exit conditions from tests in advance of D116496 Reviewer suggested this was more in spirit of the original tests.	2022-01-03 09:44:28 -08:00
Philip Reames	840fa88741	autogen unroll test for ease of futre update	2022-01-02 09:25:29 -08:00
Nemanja Ivanovic	a3ea9052d6	[PowerPC] Do not increase cost for getUserCost with MMA types Commit `150681f` increases cost of producing MMA types (vector pair and quad). However, it increases the cost for getUserCost() which is used in unrolling. As a result, loops that contain these types already (from the user code) cannot be unrolled (even with the user's unroll pragma). This was an unintended sideeffect. Reverting that portion of the commit to allow unrolling such loops. Differential revision: https://reviews.llvm.org/D115424	2021-12-21 13:36:08 -06:00
Michael Berg	f95ee6074a	[RISCV] Add target specific loop unrolling and peeling preferences Both these preference helper functions have initial support with this change. The loop unrolling preferences are set with initial settings to control thresholds, size and attributes of loops to unroll with some tuning done. The peeling preferences may need some tuning as well as the initial support looks much like what other architectures utilize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D113798	2021-12-18 12:54:50 -08:00
Philip Reames	2d31b02517	Compute estimated trip counts for multiple exit loops This change allows us to estimate trip count from profile metadata for all multiple exit loops. We still do the estimate only from the latch, but that's fine as it causes us to over estimate the trip count at worst. Reviewing the uses of the API, all but one are cases where we restrict a loop transformation (unroll, and vectorize respectively) when we know the trip count is short enough. So, as a result, the change makes these passes strictly less aggressive. The test change illustrates a case where we'd previously have runtime unrolled a loop which ran fewer iterations than the unroll factor. This is definitely unprofitable. The one case where an upper bound on estimate trip count could drive a more aggressive transform is peeling, and I duplicated the logic being removed from the generic estimation there to keep it the same. The resulting heuristic makes no sense and should probably be immediately removed, but we can do that in a separate change. This was noticed when analyzing regressions on D113939. I plan to come back and incorporate estimated trip counts from other exits, but that's a minor improvement which can follow separately. Differential Revision: https://reviews.llvm.org/D115362	2021-12-09 09:53:49 -08:00
Philip Reames	ad4644acce	[unroll] Add test coverage for loops with small estimated trip counts and multiple exits	2021-12-08 10:15:39 -08:00
Michael Berg	3e363f14e1	Revert "[RISCV] Add target specific loop unrolling and peeling preferences" This reverts commit `8487981a72`.	2021-12-07 15:13:42 -08:00
Michael Berg	8487981a72	[RISCV] Add target specific loop unrolling and peeling preferences Both these preference helper functions have initial support with this change. The loop unrolling preferences are set with initial settings to control thresholds, size and attributes of loops to unroll with some tuning done. The peeling preferences may need some tuning as well as the initial support looks much like what other architectures utilize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D113798	2021-12-07 15:06:42 -08:00
Philip Reames	8906a0fe64	[SCEVExpander] Drop poison generating flags when reusing instructions The basic problem we have is that we're trying to reuse an instruction which is mapped to some SCEV. Since we can have multiple such instructions (potentially with different flags), this is analogous to our need to drop flags when performing CSE. A trivial implementation would simply drop flags on any instruction we decided to reuse, and that would be correct. This patch is almost that trivial patch except that we preserve flags on the reused instruction when existing users would imply UB on overflow already. Adding new users can, at most, refine this program to one which doesn't execute UB which is valid. In practice, this fixes two conceptual problems with the previous code: 1) a binop could have been canonicalized into a form with different opcode or operands, or 2) the inbounds GEP case which was simply unhandled. On the test changes, most are pretty straight forward. We loose some flags (in some cases, they'd have been dropped on the next CSE pass anyways). The one that took me the longest to understand was the ashr-expansion test. What's happening there is that we're considering reuse of the mul, previously we disallowed it entirely, now we allow it with no flags. The surrounding diffs are all effects of generating the same mul with a different operand order, and then doing simple DCE. The loss of the inbounds is unfortunate, but even there, we can recover most of those once we actually treat branch-on-poison as immediate UB. Differential Revision: https://reviews.llvm.org/D112734	2021-11-29 15:23:34 -08:00
Philip Reames	8d85e945b2	[SCEV] Canonicalize X - urem X, Y patterns There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could get different results. The sub representation appears to produce strictly inferior results in practice, so I decided to canonicalize to the Y * X/Y version. The motivation here is that runtime unroll produces the sub X - (and X, Y-1) pattern when Y is a power of two. SCEV is thus unable to recognize that an unrolled loop exits because we don't figure out that the new unrolled step evenly divides the trip count of the unrolled loop. After instcombine runs, we convert the the andn form which SCEV recognizes, so essentially, this is just fixing a nasty pass ordering dependency. The ARM loop hardware interaction in the test diff is opague to me, but the comments in the review from others knowledge of the infrastructure appear to indicate these are improvements in loop recognition, not regressions. Differential Revision: https://reviews.llvm.org/D114018	2021-11-16 11:59:21 -08:00
Philip Reames	da327e7290	Fix a misleading FIXME in an unroll test	2021-11-15 12:20:10 -08:00
Philip Reames	37ead201e6	[runtime-unroll] Use incrementing IVs instead of decrementing ones This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing. Why does this matter? A couple of reasons: * SCEV doesn't have a native subtract node. Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such. As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones. (You can see this in the inferred flags in some of the test cases.) * Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language. We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced. (You can see this looking at nearby phis in the test cases.) Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen. * Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value. We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.	2021-11-12 15:44:58 -08:00
Philip Reames	de2fed6152	[unroll] Keep unrolled iterations with initial iteration The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration. With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code. However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.	2021-11-12 11:40:50 -08:00

1 2 3 4 5 ...

514 Commits