llvm-project

Commit Graph

Author	SHA1	Message	Date
Roman Lebedev	b79921a4a8	[NFC] Re-autogenerate checklines in a few tests being affected	2022-12-04 20:58:55 +03:00
Jamie Schmeiser	be1ff1fe58	[NFC] Refactor loop peeling code for calculating phi invariance. Summary: Refactor loop peeling code by moving code for calculating phi invariance into a separate class that does the calculation. Redescribe and rework the algorithm in preparation for adding increased functionality. Add test case that does not exhibit peeling that will be subsequently supported. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: mkazantsev (Max Kazantsev) Differential Revision: https://reviews.llvm.org/D138232	2022-11-25 09:07:14 -05:00
Alina Sbirlea	d1b19da854	[LoopPeeling] Add flag to disable support for peeling loops with non-latch exits Add a flag to allow disabling the changes in https://reviews.llvm.org/D134803. Differential Revision: https://reviews.llvm.org/D136643	2022-10-25 12:19:14 -07:00
Yaxun (Sam) Liu	9d5adc7e49	Revert "reland `e5581df60a` [SimplifyCFG] accumulate bonus insts cost" This reverts commit `bd7949bcd8`. Revert this patch since reviwers have different opinions regarding the approach in post-commit review. Will open RFC for further discussion. Differential Revision: https://reviews.llvm.org/D132408	2022-10-25 12:15:39 -04:00
Yaxun (Sam) Liu	bd7949bcd8	reland `e5581df60a` [SimplifyCFG] accumulate bonus insts cost Fixed compile time increase due to always constructing LocalCostTracker. Now only construct LocalCostTracker when needed.	2022-10-24 15:43:53 -04:00
Florian Hahn	e302fa89aa	[LoopUnroll] Forget exit values when making changes. When unrolling, the exit values in LCSSA phis will get updated. Invalidate cached SCEV values for those phis in case SCEV looked through a exit phi. Fixes #58340.	2022-10-18 15:12:24 +01:00
Florian Hahn	b0ded70ebf	[LoopUnroll] Add test for mis-compile due to missing SCEV invalidation. Test for #58340.	2022-10-18 14:56:44 +01:00
Arthur Eubanks	f3a928e233	[opt] Don't translate legacy -analysis flag to require<analysis> Tests relying on this should explicitly use -passes='require<analysis>,foo'.	2022-10-07 14:54:34 -07:00
Florian Hahn	ec86e9a99b	[LoopUnroll] Add test for crash exposed by `9e931439`.	2022-10-07 20:02:58 +01:00
Nikita Popov	b43a4d0850	[LoopPeeling] Support peeling loops with non-latch exits Loop peeling currently requires that a) the latch is exiting b) a branch and c) other exits are unreachable/deopt. This patch removes all of these limitations, and adds the necessary branch weight updating support. It essentially works the same way as before with latch -> exiting terminator and loop trip count -> per exit trip count. It's worth noting that there are still other limitations in profitability heuristics: This patch enables peeling of loops to make conditions invariant (which is pretty much always highly profitable if possible), while peeling to make loads dereferenceable still checks that non-latch exits are unreachable and PGO-based peeling has even more conditions. Those checks could be relaxed later if we consider those cases profitable. The motivation for this change is that loops using iterator adaptors in Rust often optimize very badly, and end up with a loop phi of the form phi(true, false) in the final result. Peeling eliminates that phi and conditions based on it, which enables a lot of follow-on simplification. Differential Revision: https://reviews.llvm.org/D134803	2022-10-07 12:35:52 +02:00
Florian Hahn	7c0ff64b0f	[LAA] Change to function analysis for new PM. At the moment, LoopAccessAnalysis is a loop analysis for the new pass manager. The issue with that is that LAI caches SCEV expressions and modifications in a loop may impact SCEV expressions in other loops, but we do not have a convenient way to invalidate LAI for other loops withing a loop pipeline. To avoid this issue, turn it into a function analysis which returns a manager object that keeps track of the individual LAI objects per loop. Fixes #50940. Fixes #51669. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D134606	2022-10-01 15:44:27 +01:00
Florian Hahn	27330882a1	[LoopUnroll] Add cache verification failure test case. Test case for D134612.	2022-09-26 14:25:37 +01:00
Simon Pilgrim	09cb9fdef9	[InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635) Alive2: https://alive2.llvm.org/ce/z/sZ6wwS As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero. Differential Revision: https://reviews.llvm.org/D134172	2022-09-20 16:44:41 +01:00
Nikita Popov	dd61726d5b	Revert "[SimplifyCFG] accumulate bonus insts cost" This reverts commit `e5581df60a`. This causes major compile-time regressions, about 2-3% end-to-end on CTMark.	2022-09-19 14:46:43 +02:00
Yaxun (Sam) Liu	e5581df60a	[SimplifyCFG] accumulate bonus insts cost SimplifyCFG folds bool foo() { if (cond1) return false; if (cond2) return false; return true; } as bool foo() { if (cond1 \| cond2) return false return true; } 'cond2' is called 'bonus insts' in branch folding since they introduce overhead since the original CFG could do early exit but the folded CFG always executes them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into its predecessor BB which shares the destination. If it is below bonus-inst-threshold, SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed. When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts in the current BB to be considered for folding. This causes issue for unrolled loops which share destinations, e.g. bool foo(int a) { for (int i = 0; i < 32; i++) if (a[i] > 0) return false; return true; } After unrolling, it becomes bool foo(int a) { if(a[0]>0) return false if(a[1]>0) return false; //... if(a[31]>0) return false; return true; } SimplifyCFG will merge each BB with its predecessor BB, and ends up with 32 'bonus insts' which are always executed, which is much slower than the original CFG. The root cause is that SimplifyCFG does not consider the accumulated cost of 'bonus insts' which are folded from different BB's. This patch fixes that by introducing a ValueMap to track costs of 'bonus insts' coming from different BB's into the same BB, and cuts off if the accumulated cost exceeds a threshold. Reviewed by: Artem Belevich, Florian Hahn, Nikita Popov, Matt Arsenault Differential Revision: https://reviews.llvm.org/D132408	2022-09-18 20:21:14 -04:00
Jamie Schmeiser	5e3ac79690	Loop names used in reporting can grow very large Summary: The code for generating a name for loops for various reporting scenarios created a name by serializing the loop into a string. This may result in a very large name for a loop containing many blocks. Use the getName() function on the loop instead. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: Whitney (Whitney Tsang), aeubanks (Arthur Eubanks) Differential Revision: https://reviews.llvm.org/D133587	2022-09-09 13:45:14 -04:00
Florian Hahn	555e09c2b0	[LAA] Rename printing pass to print<access-info>. This updates the naming for the LAA printing pass to be in line with most other analysis printing passes. The old name has come up as confusing multiple times already, e.g. in D131924.	2022-08-26 11:00:09 +01:00
Craig Topper	37c47b2cac	[RISCV] Change how mtune aliases are implemented. The previous implementation translated from names like sifive-7-series to sifive-7-rv32 or sifive-7-rv64. This also required sifive-7-rv32 and sifive-7-rv64 to be valid CPU names. As those are not real CPUs it doesn't make sense to accept them in -mcpu. This patch does away with the translation and adds sifive-7-series directly to RISCV.td. Removing sifive-7-rv32 and sifive-7-rv64. sifive-7-series is only allowed in -mtune. I've also added "rocket" to RISCV.td but have not removed rocket-rv32 or rocket-rv64. To prevent -mcpu=sifive-7-series or -mcpu=rocket being used with llc, I've added a Feature32Bit to all rv32 CPUs. And made it an error to have an rv32 triple without Feature32Bit. sifive-7-series and rocket do not have Feature32Bit or Feature64Bit set so the user would need to provide -mattr=+32bit or -mattr=+64bit along with the -mcpu to avoid the error. SiFive no longer names their newer products with 3, 5, or 7 series. Instead we have p200 series, x200 series, p500 series, and p600 series. Following the previous behavior would require a sifive-p500-rv32 and sifive-p500-rv64 in order to support -mtune=sifive-p500-series. There is currently no p500 product, but it could start getting confusing if there was in the future. I'm open to hearing alternatives for how to achieve my main goal of removing sifive-7-rv32/rv64 as a CPU name. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131708	2022-08-18 16:22:25 -07:00
Max Kazantsev	ebabd6bf18	Return "[SCEV] Use context to strengthen flags of BinOps" This reverts commit `354fa0b480`. Returning as is. The patch was reverted due to a miscompile, but this patch is not causing it. This patch made it possible to infer some nuw flags in code guarded by `false` condition, and then someone else to managed to propagate the flag from dead code outside. Returning the patch to be able to reproduce the issue.	2022-08-16 14:12:36 +07:00
Max Kazantsev	354fa0b480	Revert "[SCEV] Use context to strengthen flags of BinOps" This reverts commit `34ae308c73`. Our internal testing found a miscompile. Not sure if it's caused by this patch or it revealed something else. Reverting while investigating.	2022-08-15 18:51:59 +07:00
Martin Sebor	0dcfe7aa35	[InstCombine] Tighten up known library function signature tests (PR #56463 ) Replace a switch statement used to validate arguments to known library functions with a more consistent table-driven approach and tighten it up.	2022-08-10 14:15:46 -06:00
Max Kazantsev	34ae308c73	[SCEV] Use context to strengthen flags of BinOps Sometimes SCEV cannot infer nuw/nsw from something as simple as ``` len in [0, MAX_INT] ... iv = phi(0, iv.next) guard(iv <s len) guard(iv <u len) iv.next = iv + 1 ``` just because flag strenthening only relies on definition and does not use local facts. This patch adds support for the simplest case: inference of flags of `add(x, constant)` if we can contextually prove that `x <= max_int - constant`. In case if it has negative CT impact, we can add an option to switch it off. I woudln't expect that though. Differential Revision: https://reviews.llvm.org/D129643 Reviewed By: apilipenko	2022-08-03 14:08:57 +07:00
Nikita Popov	534b9246a2	[LoopInfo] Allow cloning of callbr After D129288, callbr is safe to clone without special handling. This permits optimizations like loop unroll and loop unswitch on loops containing callbrs. Fixes https://github.com/llvm/llvm-project/issues/41834. Differential Revision: https://reviews.llvm.org/D129993	2022-07-19 09:57:28 +02:00
Nikita Popov	118d8fe46b	[LoopUnroll] Regenerate test checks (NFC)	2022-07-18 10:37:22 +02:00
Nikita Popov	2a721374ae	[IR] Don't use blockaddresses as callbr arguments Following some recent discussions, this changes the representation of callbrs in IR. The current blockaddress arguments are replaced with `!` label constraints that refer directly to callbr indirect destinations: ; Before: %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo)) to label %asm.fallthrough [label %foo] ; After: %res = callbr i8* asm "", "=r,r,!i"(i8* %x) to label %asm.fallthrough [label %foo] The benefit of this is that we can easily update the successors of a callbr, without having to worry about also updating blockaddress references. This should allow us to remove some limitations: * Allow unrolling/peeling/rotation of callbr, or any other clone-based optimizations (https://github.com/llvm/llvm-project/issues/41834) * Allow duplicate successors (https://github.com/llvm/llvm-project/issues/45248) This is just the IR representation change though, I will follow up with patches to remove limtations in various transformation passes that are no longer needed. Differential Revision: https://reviews.llvm.org/D129288	2022-07-15 10:18:17 +02:00
Florian Hahn	6d5f814357	[LoopUnrollRuntime] Invalidate SCEV for exit phi in ConnectProlog. ConnectProlog adds new incoming values to exit phi nodes which can change the SCEV for the phi after `20d798bd47`. Fix is analog to `cfc741bc0e`. Fixes #56286.	2022-06-29 20:28:43 +01:00
Florian Hahn	9a35f19e3e	[UnrollRuntime] Invalidate SCEVs for modified phis in ConnectEpilog. ConnectEpilog adds new incoming values to exit phi nodes which can change the SCEV for the phi after `20d798bd47`. Fix is analog to `cfc741bc0e`. Fixes #56282.	2022-06-29 18:26:00 +01:00
Florian Hahn	cfc741bc0e	[LoopPeel] Forget SCEV for updated exit phi values. LoopPeel add new incoming values to exit phi nodes which can change the SCEV for the phi after `20d798bd47`. Forget SCEVs for such phis. Fixes #56044. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D128164	2022-06-20 13:19:27 +02:00
Philip Reames	206f10d3f6	Plumb InstructionCost through unroll costing Teach the unroller(s) how to handle an invalid cost. This avoids crashes when the backend can't provide a cost due to either a fundemental limitation or an unimplemented cost model case. Differential Revision: https://reviews.llvm.org/D127305	2022-06-09 15:42:53 -07:00
Nuno Lopes	80b3dcc045	[Support] Make report_fatal_error respect its GenCrashDiag argument so it doesn't generate a backtrace There are a few places where we use report_fatal_error when the input is broken. Currently, this function always crashes LLVM with an abort signal, which then triggers the backtrace printing code. I think this is excessive, as wrong input shouldn't give a link to LLVM's github issue URL and tell users to file a bug report. We shouldn't print a stack trace either. This patch changes report_fatal_error so it uses exit() rather than abort() when its argument GenCrashDiag=false. Reviewed by: nikic, MaskRay, RKSimon Differential Revision: https://reviews.llvm.org/D126550	2022-05-30 19:19:23 +01:00
Nikita Popov	81c648a3d9	[LoopUnroll] Freeze tripcount rather than condition This is a followup to D125754. We introduce two branches, one before the unrolled loop and one before the epilogue (and similar for the prologue case). The previous patch only froze the condition on the first branch. Rather than independently freezing the second condition, this patch instead freezes TripCount and bases BECount on it. These are the two quantities involved in the conditions, and this ensures that both work on a consistent, non-poisonous trip count. Differential Revision: https://reviews.llvm.org/D125896	2022-05-24 09:42:39 +02:00
Nikita Popov	e44fe27251	[LoopUnroll] Regenerate test checks (NFC)	2022-05-18 17:20:09 +02:00
Nikita Popov	323514de58	[LoopUnroll] Avoid branch on poison for runtime unroll with multiple exits When performing runtime unrolling with multiple exits, one of the earlier (non-latch) exits may exit the loop on the first iteration, such that we never branch on the latch exit condition. As such, we need to freeze the condition of the new branch that is introduced before the loop, as it now executes unconditionally. Differential Revision: https://reviews.llvm.org/D125754	2022-05-18 09:51:22 +02:00
Whitney Tsang	80304c5f88	[LoopUnroll] Always respect user unroll pragma IMO when user provide unroll pragma, compiler should always respect it. It is not clear to me why loop unroll pass currently ensure that the unrolled loop size is limited by PragmaUnrollThreshold. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D119148	2022-04-11 14:33:24 -04:00
Andrew Wei	0af3e6a22d	[InstCombine] Sink instructions with multiple users in a successor block. This patch tries to sink instructions when they are only used in a successor block. This is a further enhancement patch based on Anna's commit: D109700, which allows sinking an instruction having multiple uses in a single user. In this patch, sink instructions with multiple users in a single successor block will be supported. It could fix a known issue from rust: https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610 Reviewed By: nikic, reames Differential Revision: https://reviews.llvm.org/D121585	2022-03-18 11:53:45 +08:00
William S. Moses	d9da6a535f	[LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information. This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D119965	2022-02-17 20:13:07 -05:00
Roman Lebedev	371fcb720e	[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in https://github.com/llvm/llvm-project/issues/53853 and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574 This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes https://github.com/llvm/llvm-project/issues/53853 Refs. https://github.com/rust-lang/rust/issues/85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854	2022-02-17 12:13:55 +03:00
Nick Desaulniers	79ebc3b0dd	[llvm][test] rewrite callbr to use i rather than X constraint NFC In D115311, we're looking to modify clang to emit i constraints rather than X constraints for callbr's indirect destinations. Prior to doing so, update all of the existing tests in llvm/ to match. Reviewed By: void, jyknight Differential Revision: https://reviews.llvm.org/D115410	2022-01-11 11:31:08 -08:00
Florian Hahn	9153c27385	[LoopUnroll] Make test more robust by removing undef. Replace a uses of undef in the tests. The undef causes runtime checks to be trivially fold/removeable, which does defeat the purpose of the test.	2022-01-08 15:44:23 +00:00
Nikita Popov	9e4aeb1f60	[LoopUnroll] Remove unrelated passes from test (NFC) Manually run these and use the result as the initial input for the test.	2022-01-07 09:20:23 +01:00
Philip Reames	7203140748	Revert "[unroll] Prune all but first copy of invariant exit" This reverts commit `9bd22595ba`. Seeing some bot failures which look plausibly connected. Revert while investigating/waiting for bots to stablize. e.g. https://lab.llvm.org/buildbot#builders/36/builds/15933	2022-01-03 11:57:35 -08:00
Philip Reames	9bd22595ba	[unroll] Prune all but first copy of invariant exit If we have an exit which is controlled by a loop invariant condition and which dominates the latch, we know only the copy in the first unrolled iteration can be taken. All other copies are dead. The change itself is pretty straight forward, but let me add two points of context: * I'd have expected other transform passes to catch this after unrolling, but I'm seeing multiple examples where we get to the end of O2/O3 without simplifying. * I'd like to do a stronger change which did CSE during unroll and accounted for invariant expressions (as defined by SCEV instead of trivial ones from LoopInfo), but that doesn't fit cleanly into the current code structure. Differential Revision: https://reviews.llvm.org/D116496	2022-01-03 09:55:19 -08:00
Philip Reames	f6e90fac35	Remove loop invariant exit conditions from tests in advance of D116496 Reviewer suggested this was more in spirit of the original tests.	2022-01-03 09:44:28 -08:00
Philip Reames	840fa88741	autogen unroll test for ease of futre update	2022-01-02 09:25:29 -08:00
Nemanja Ivanovic	a3ea9052d6	[PowerPC] Do not increase cost for getUserCost with MMA types Commit `150681f` increases cost of producing MMA types (vector pair and quad). However, it increases the cost for getUserCost() which is used in unrolling. As a result, loops that contain these types already (from the user code) cannot be unrolled (even with the user's unroll pragma). This was an unintended sideeffect. Reverting that portion of the commit to allow unrolling such loops. Differential revision: https://reviews.llvm.org/D115424	2021-12-21 13:36:08 -06:00
Michael Berg	f95ee6074a	[RISCV] Add target specific loop unrolling and peeling preferences Both these preference helper functions have initial support with this change. The loop unrolling preferences are set with initial settings to control thresholds, size and attributes of loops to unroll with some tuning done. The peeling preferences may need some tuning as well as the initial support looks much like what other architectures utilize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D113798	2021-12-18 12:54:50 -08:00
Philip Reames	2d31b02517	Compute estimated trip counts for multiple exit loops This change allows us to estimate trip count from profile metadata for all multiple exit loops. We still do the estimate only from the latch, but that's fine as it causes us to over estimate the trip count at worst. Reviewing the uses of the API, all but one are cases where we restrict a loop transformation (unroll, and vectorize respectively) when we know the trip count is short enough. So, as a result, the change makes these passes strictly less aggressive. The test change illustrates a case where we'd previously have runtime unrolled a loop which ran fewer iterations than the unroll factor. This is definitely unprofitable. The one case where an upper bound on estimate trip count could drive a more aggressive transform is peeling, and I duplicated the logic being removed from the generic estimation there to keep it the same. The resulting heuristic makes no sense and should probably be immediately removed, but we can do that in a separate change. This was noticed when analyzing regressions on D113939. I plan to come back and incorporate estimated trip counts from other exits, but that's a minor improvement which can follow separately. Differential Revision: https://reviews.llvm.org/D115362	2021-12-09 09:53:49 -08:00
Philip Reames	ad4644acce	[unroll] Add test coverage for loops with small estimated trip counts and multiple exits	2021-12-08 10:15:39 -08:00
Michael Berg	3e363f14e1	Revert "[RISCV] Add target specific loop unrolling and peeling preferences" This reverts commit `8487981a72`.	2021-12-07 15:13:42 -08:00
Michael Berg	8487981a72	[RISCV] Add target specific loop unrolling and peeling preferences Both these preference helper functions have initial support with this change. The loop unrolling preferences are set with initial settings to control thresholds, size and attributes of loops to unroll with some tuning done. The peeling preferences may need some tuning as well as the initial support looks much like what other architectures utilize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D113798	2021-12-07 15:06:42 -08:00

1 2 3 4 5 ...

519 Commits