llvm-project

Commit Graph

Author	SHA1	Message	Date
Arthur Eubanks	4fa328074e	[NewPM][Pipeline] Add PipelineTuningOption to set inliner threshold The legacy PM allowed you to set a custom inliner threshold via builder.Inliner = llvm::createFunctionInliningPass(inline_threshold); This allows the same thing to be done with the new PM optimization pipelines. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D137038	2022-11-02 10:47:51 -07:00
Paul Walker	ab8257ca0e	[NFC] Fix a few whitespace inconsistencies.	2022-10-20 14:52:25 +00:00
Pavel Samolysov	1c530500ab	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-09-22 15:33:46 -07:00
Nuno Lopes	d953d01737	Introduce -enable-global-analyses to allow users to disable inter-procedural analyses Alive2 doesn't support verification of optimizations that use inter-procedural analyses. Right now, clang uses GlobalsAA by default and there's no way to disable it. This leads to Alive2 producing false positives. The added flag allows us to skip global analyses altogether. Differential Revision: https://reviews.llvm.org/D134139	2022-09-19 11:59:35 +01:00
Vitaly Buka	181d408186	[pipelines] OptimizerEarlyEPCallbacks for ThinLTO prelink Similar to OptimizerLastEPCallbacks workaround added D96320. Probably NFC as-is, I don't see anything hooked with this callbacks yet, but I we are looking to move sanitizers. Reviewed By: aeubanks, MaskRay Differential Revision: https://reviews.llvm.org/D133333	2022-09-06 15:54:04 -07:00
Arthur Eubanks	9599393eeb	Revert "[Pipelines] Introduce DAE after ArgumentPromotion" This reverts commit `b10a341aa5`. This commit exposes the pre-existing https://github.com/llvm/llvm-project/issues/56503 in some edge cases. Will fix that and then reland this.	2022-09-01 08:52:19 -07:00
Pavel Samolysov	b10a341aa5	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-08-28 10:47:03 +03:00
Pavel Samolysov	f964417c32	Revert "[Pipelines] Introduce DAE after ArgumentPromotion" The commit breaks the compiler when a function is used as a function parameter (hm... for a function from the standard C library?): ``` static float strtof(char , char ) {} void a() { strtof(a, 0); } ``` This reverts commit `879f5118fc`.	2022-08-26 13:43:09 +03:00
Pavel Samolysov	879f5118fc	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-08-25 10:55:47 +03:00
Pavel Samolysov	6703ad1e0c	Revert "[Pipelines] Introduce DAE after ArgumentPromotion" This reverts commit `3f20dcbf70`.	2022-08-24 12:44:13 +03:00
Pavel Samolysov	3f20dcbf70	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-08-24 10:36:12 +03:00
Ellis Hoag	0f946a50a4	[InstrProf] Add option to disable loop opt after PGO Add the `-enable-post-pgo-loop-rotation` option to enable or disable the loop rotation transformation [1]. With some instrumentations, e.g., function entry coverage [2], loop rotation is not necessary and can lead to some surprise differences in codegen, even for functions where instrumentation is blocked with `noprofile` or `skipprofile`. The default value is `true` so the default behavior does not change. [1] https://www.llvm.org/docs/LoopTerminology.html#loop-terminology-loop-rotate [2] https://reviews.llvm.org/D116180 Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D131817	2022-08-17 12:23:18 -07:00
Sanjay Patel	bfb9b8e075	[Passes] add a tail-call-elim pass near the end of the opt pipeline We call tail-call-elim near the beginning of the pipeline, but that is too early to annotate calls that get added later. In the motivating case from issue #47852, the missing 'tail' on memset leads to sub-optimal codegen. I experimented with removing the early instance of tail-call-elim instead of just adding another pass, but that appears to be slightly worse for compile-time: +0.15% vs. +0.08% time. "tailcall" shows adding the pass; "tailcall2" shows moving the pass to later, then adding the original early pass back (so 1596886802 is functionally equivalent to 180b0439dc ): https://llvm-compile-time-tracker.com/index.php?config=NewPM-O3&stat=instructions&remote=rotateright Note that there was an effort to split the tail call functionality into 2 passes - that could help reduce compile-time if we find that this change costs more in compile-time than expected based on the preliminary testing: D60031 Differential Revision: https://reviews.llvm.org/D130374	2022-07-25 15:25:47 -04:00
Alina Sbirlea	846d10f16a	Turn on flag to not re-run simplification pipeline. This patch turns on the flag `-enable-no-rerun-simplification-pipeline`, which means the simplification pipeline will not be rerun on unchanged functions in the CGSCCPass Manager. Compile time improvement: https://llvm-compile-time-tracker.com/compare.php?from=17457be1c393ff691cca032b04ea1698fedf0301&to=882301ebb893c8ef9f09fe1ea871f7995426fa07&stat=instructions No meaningful run time regressions observed in the llvm test suite and in additional internal workloads at this time. The example test in `test/Other/no-rerun-function-simplification-pipeline.ll` is a good means to understand the effect of this change: ``` define void @f1(void()* %p) alwaysinline { call void %p() ret void } define void @f2() #0 { call void @f1(void()* @f2) call void @f3() ret void } define void @f3() #0 { call void @f2() ret void } ``` There are two SCCs formed by the ModuleToPostOrderCGSCCAdaptor: (f1) and (f2, f3). The pass manager runs on the first SCC, leading to running the simplification pipeline (function and loop passes) on f1. With the flag on, after this, the output will have `Running analysis: ShouldNotRunFunctionPassesAnalysis on f1`. Next, the pass manager runs on the second SCC: (f2, f3). Since f1() was inlined, f2() now calls itself, and also calls f3(), while f3() only calls f2(). So the pass manager for the SCC first runs the Inliner on (f2, f3), then the simplification pipeline on f2. With the flag on, the output will have `Running analysis: ShouldNotRunFunctionPassesAnalysis on f2`; unless the inliner makes a change, this analysis remains preserved which means there's no reason to rerun the simplification pipeline. With the flag off, there is a second run of the simplification pipeline run on f2. Next, the same flow occurs for f3. The simplification pipeline is run on f3 a single time with the flag on, along with `ShouldNotRunFunctionPassesAnalysis on f3`, and twice with the flag off. The reruns occur only on f2 and f3 due to the additional ref edges.	2022-07-14 06:23:55 -07:00
Kazu Hirata	ec9a0e36d9	[IPO] Remove addLTOOptimizationPasses and addLateLTOOptimizationPasses (NFC) The last uses were removed on Apr 15, 2022 in commit `2e6ac54cf4`. Differential Revision: https://reviews.llvm.org/D129460	2022-07-11 20:15:24 -07:00
Ben Dunbobbin	325e7e8b87	[LLVM][LTO][LLD] Enable Profile Guided Layout (--call-graph-profile-sort) for FullLTO The CGProfilePass needs to be run during FullLTO compilation at link time to emit the .llvm.call-graph-profile section to the compiled LTO object file. Currently, it is being run only during the initial LTO-prelink compilation stage (to produce the bitcode files to be consumed by the linker) and so the section is not produced. ThinLTO is not affected because: - For ThinLTO-prelink compilation the CGProfilePass pass is not run because ThinLTO-prelink passes are added via buildThinLTOPreLinkDefaultPipeline. Normal and FullLTO-prelink passes are both added via buildPerModuleDefaultPipeline which uses the LTOPreLink parameter to customize its behavior for the FullLTO-prelink pass differences. - ThinLTO backend compilation phase adds the CGProfilePass (see: buildModuleOptimizationPipeline). Adjust when the pass is run so that the .llvm.call-graph-profile section is produced correctly for FullLTO. Fixes #56185 (https://github.com/llvm/llvm-project/issues/56185)	2022-07-01 13:57:36 +01:00
Mingming Liu	e0d069598b	[Inline] Annotate inline pass name with link phase information for analysis. The annotation is flag gated; flag is turned off by default. Differential Revision: https://reviews.llvm.org/D125495	2022-06-24 10:06:43 -07:00
Fangrui Song	d86a206f06	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 00:31:44 -07:00
Fangrui Song	557efc9a8b	[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded. Also remove cl::init(false) while touching the lines.	2022-06-03 21:59:05 -07:00
Arthur Eubanks	36096c2b38	[NFC][JumpThreading] Remove InsertFreezeWhenUnfoldingSelect pass parameter All callers pass true. select-unfold-freeze.ll is now a subset of select.ll so delete it. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126501	2022-05-26 16:13:34 -07:00
Chuanqi Xu	405bf90235	[NFC] [Pipelines] Hoist CoroCleanup as Module Pass This is similar to previous patch https://reviews.llvm.org/D123925. It could also reduce the time we call declaresCoroCleanupIntrinsics. And it is helpful for further changes. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D124362	2022-05-05 15:15:09 +08:00
Chuanqi Xu	7d40f562e7	[Pipelines] Hoist CoroCleanup to avoid blocking optimizations CoroCleanup is designed to lowering all the remaining coroutine intrinsics. It is required to run after CoroSplit only. However, the position of CoroCleanup now is far too late. The downside here is that the unlowered coroutine instrincs might blocking other optimizations too. So it should be a pure win to hoist the position of CoroCleanup. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D124360	2022-05-05 15:13:27 +08:00
Mingming Liu	408bb9a375	Add a regression test to guard the 0 hot-caller threshold in SamplePGO + ThinLTO. - Add a comment near where the threshold is set.	2022-04-25 18:29:56 +00:00
Chuanqi Xu	f9bee35689	[Pipelines] Hoist CoroEarly as a module pass This change could reduce the time we call `declaresCoroEarlyIntrinsics`. And it is helpful for future changes. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D123925	2022-04-19 11:04:24 +08:00
Wenju He	0bda12b5bc	[NewPM] Add OptimizerEarly module extension point VectorizerStart extension is module callback in old PM, but is function callback in new PM. We lack a module extension point between end of buildModuleSimplificationPipeline and the function optimization (including vectorizer) pipeline. So this patch adds a new module extension point before the function optimization pipeline. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D122296	2022-03-31 08:22:27 -07:00
Arthur Eubanks	9bd66b312c	[PassManager][Coroutine] Run passes under -O0 conditionally and run GlobalDCE CoroSplit lowers various coroutine intrinsics. It's a CGSCC pass and CGSCC passes don't run on unreachable functions. Normally GlobalDCE will come along and delete unreachable functions, but we don't run GlobalDCE under -O0, so an unreachable function with coroutine intrinsics may never have CoroSplit run on it. This patch adds GlobalDCE when coroutines intrinsics are present. It also now runs all coroutine passes conditional when coroutine intrinsics are present. This should also solve the -O0 regression reported in D105877 due to LazyCallGraph construction. Fixes https://github.com/llvm/llvm-project/issues/54117 Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D122275	2022-03-23 11:03:26 -07:00
Arthur Eubanks	4fc7c55fff	[NewPM] Actually recompute GlobalsAA before module optimization pipeline RequireAnalysis<GlobalsAA> doesn't actually recompute GlobalsAA. GlobalsAA isn't invalidated (unless specifically invalidated) because it's self-updating via ValueHandles, but can be imprecise during the self-updates. Rather than invalidating GlobalsAA, which would invalidate AAManager and any analyses that use AAManager, create a new pass that recomputes GlobalsAA. Fixes #53131. Differential Revision: https://reviews.llvm.org/D121167	2022-03-14 09:42:34 -07:00
Elia Geretto	942efa5927	[NewPM] Add extension points to LTO pipeline in PassBuilder This PR adds two extension points to the default LTO pipeline in PassBuilder, one at the beginning and one at the end. These two extension points already existed in the old pass manager, the aim is to replicate the same functionality in the new one. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D120491	2022-02-25 14:48:54 -08:00
Arthur Eubanks	18da681034	[NFC] Remove unnecessary function pass managers	2022-02-25 10:03:17 -08:00
William S. Moses	d9da6a535f	[LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information. This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D119965	2022-02-17 20:13:07 -05:00
Roman Lebedev	371fcb720e	[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in https://github.com/llvm/llvm-project/issues/53853 and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574 This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes https://github.com/llvm/llvm-project/issues/53853 Refs. https://github.com/rust-lang/rust/issues/85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854	2022-02-17 12:13:55 +03:00
Joseph Huber	9d3a47576c	[PassBuilder] Add OpenMPOpt to default LTO pipeline The LTO support for OpenMP offloading allows us to run the OpenMPOpt pass during the LTO pipeline. This patch introduces an early run of the Module pass and a late run of the CGSCC pass. These are quick no-ops if there is no OpenMP in the module. Depends on D118198 Differential Revision: https://reviews.llvm.org/D118611	2022-01-31 23:11:43 -05:00
Sjoerd Meijer	f269ec230e	[LoopFlatten] Move it from LPM2 to LPM1 In D110057 we moved LoopFlatten to a LoopPassManager. This caused a performance regression for our 64-bit targets (the 32-bit were unaffected), the pass is no longer triggering for a motivating example. The reason is that the IR is just very different than expected; we try to match loop statements and particular uses of induction variables. The easiest is to just move LoopFlatten to a place in the pipeline where the IR is as expected, which is just before IndVarSimplify. This means we move it from LPM2 to LPM1, so that it actually runs just a bit earlier from where it was running before. IndVarSimplify is responsible for significant rewrites that are difficult to "look through" in LoopFlatten. Differential Revision: https://reviews.llvm.org/D116612	2022-01-19 14:38:05 +00:00
Sjoerd Meijer	016022e5da	Recommit "[LoopFlatten] Move it to a LoopPassManager" This was reverted because of a performance regression, which is fixed by D116612 that I will commit directly after this change. This reverts commit `e92d63b467`.	2022-01-19 14:38:05 +00:00
David Green	e92d63b467	Revert "[LoopFlatten] Move it to a LoopPassManager" This commit caused performance regressions due to differences in the expected code during loop flattening. Reverting it until the fix is ready, which hopefully wont take too long. This reverts commit `86825fc2fb`.	2022-01-10 11:03:49 +00:00
Sjoerd Meijer	86825fc2fb	[LoopFlatten] Move it to a LoopPassManager In D109958 it was noticed that we could optimise the pipeline and avoid rerunning LoopSimplify/LCSSA for LoopFlatten by moving it to a LoopPassManager. Differential Revision: https://reviews.llvm.org/D110057	2021-12-30 12:32:14 +00:00
Florian Hahn	acea6e9cfa	[Passes] Only run extra vector passes if loops have been vectorized. This patch uses a similar trick as in D113947 to only run the extra passes after vectorization on functions where loops have been vectorized. The reason for running the 'extra vector passes' is simplification/unswitching of the runtime checks created by LV, there should be no need to run them if nothing got vectorized To do that, a new dummy analysis ShouldRunExtraVectorPasses has been added. If loops have been vectorized for a function, LV will cache the analysis. At the moment it uses MadeCFGChanges as proxy for loop vectorized, which isn't perfect (it could be too aggressive, e.g. because no runtime checks have been added), but should be good enough for now. The extra passes are now managed by a new FunctionPassManager that runs its passes only if ShouldRunExtraVectorPasses has been cached. Without this patch, `-extra-vectorizer-passes` has the following compile-time impact: NewPM-O3: +4.86% NewPM-ReleaseThinLTO: +3.56% NewPM-ReleaseLTO-g: +7.17% http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=c292da649e2c6e88a31e702fdc474727d09c72bc&stat=instructions With this patch, that gets reduced to NewPM-O3: +1.43% NewPM-ReleaseThinLTO: +1.00% NewPM-ReleaseLTO-g: +1.58% http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=e67d86b57810011cf285eb9aa1944781be6096f0&stat=instructions It is probably still too high to enable by default, but much better. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D115052	2021-12-10 11:42:45 +00:00
Arthur Eubanks	5c7e783ebe	[NFC] Clarify comment about LoopDeletionPass in the optimization pipeline Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D115179	2021-12-07 09:58:12 -08:00
Nikita Popov	ae7f468073	[NewPM] Fix MergeFunctions scheduling MergeFunctions (as well as HotColdSplitting an IROutliner) are incorrectly scheduled under the new pass manager. The code makes it look like they run towards the end of the module optimization pipeline (as they should), while in reality the run at the start. This is because the OptimizePM populated around them is only scheduled later. I'm fixing this by moving these three passes until after OptimizePM to avoid splitting the function pass pipeline. It doesn't seem important to me that some of the function passes run after these late module passes. Differential Revision: https://reviews.llvm.org/D115098	2021-12-04 17:30:30 +01:00
Anton Afanasyev	c34d157fc7	[Passes] Move AggressiveInstCombine after InstCombine Swap AIC and IC neighbouring in pipeline. This looks more natural and even almost has no effect for now (three slightly touched tests of test-suite). Also this could be the first step towards merging AIC (or its part) to -O2 pipeline. After several changes in AIC (like D108091, D108201, D107766, D109515, D109236) there've been observed several regressions (like PR52078, PR52253, PR52289) that were fixed in different passes (see D111330, D112721) by extending their functionality, but these regressions were exposed since changed AIC prevents IC from making some of early optimizations. This is common problem and it should be fixed by just moving AIC after IC which looks more logically by itself: make aggressive instruction combining only after failed ordinary one. Fixes PR52289 Reviewed By: spatel, RKSimon Differential Revision: https://reviews.llvm.org/D113179	2021-12-04 14:22:43 +03:00
Nikita Popov	5b94037a30	[PhaseOrdering] Add test for incorrect merge function scheduling Add an -enable-merge-functions option to allow testing of function merging as it will actually happen in the optimization pipeline. Based on that add a test where we currently produce two identical functions without merging them due to incorrect pass scheduling under the new pass manager.	2021-12-04 10:12:04 +01:00
Liqiang Tao	7e8f9d6b38	[llvm][Inline] Add FunctionSimplificationPipeline to module inliner pipeline The FunctionSimplificationPipeline could effectively reduce the size of .text section when module inliner is enabled. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D114704	2021-12-03 12:13:31 +09:00
Florian Hahn	770a50b28c	[AnnotationRemarks] Support generating annotation remarks with -O0. This matches the legacy pass manager behavior. If remarks are not enabled the pass is effectively a no-op.	2021-12-02 15:01:02 +00:00
Arthur Eubanks	e3e25b5112	[NewPM] Add option to prevent rerunning function pipeline on functions in CGSCC adaptor In a CGSCC pass manager, we may visit the same function multiple times due to SCC mutations. In the inliner pipeline, this results in running the function simplification pipeline on a function multiple times even if it hasn't been changed since the last function simplification pipeline run. We use a newly introduced analysis to keep track of whether or not a function has changed since the last time the function simplification pipeline has run on it. If we see this analysis available for a function in a CGSCCToFunctionPassAdaptor, we skip running the function passes on the function. The analysis is queried at the end of the function passes so that it's available after the first time the function simplification pipeline runs on a function. This is a per-adaptor option so it doesn't apply to every adaptor. The goal of this is to improve compile times. However, currently we can't turn this on by default at least for the higher optimization levels since the function simplification pipeline is not robust enough to be idempotent in many cases, resulting in performance regressions if we stop running the function simplification pipeline on a function multiple times. We may be able to turn this on for -O1 in the near future, but turning this on for higher optimization levels would require more investment in the function simplification pipeline. Heavily inspired by D98103. Example compile time improvements with flag turned on: https://llvm-compile-time-tracker.com/compare.php?from=998dc4a5d3491d2ae8cbe742d2e13bc1b0cacc5f&to=5c27c913687d3d5559ef3ab42b5a3d513531d61c&stat=instructions Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D113947	2021-11-17 09:06:46 -08:00
Arthur Eubanks	19867de9e7	[NewPM] Only invalidate modified functions' analyses in CGSCC passes + turn on eagerly invalidate analyses Previously, any change in any function in an SCC would cause all analyses for all functions in the SCC to be invalidated. With this change, we now manually invalidate analyses for functions we modify, then let the pass manager know that all function analyses should be preserved since we've already handled function analysis invalidation. So far this only touches the inliner, argpromotion, function-attrs, and updateCGAndAnalysisManager(), since they are the most used. This is part of an effort to investigate running the function simplification pipeline less on functions we visit multiple times in the inliner pipeline. However, this causes major memory regressions especially on larger IR. To counteract this, turn on the option to eagerly invalidate function analyses. This invalidates analyses on functions immediately after they're processed in a module or scc to function adaptor for specific parts of the pipeline. Within an SCC, if a pass only modifies one function, other functions in the SCC do not have their analyses invalidated, so in later function passes in the SCC pass manager the analyses may still be cached. It is only after the function passes that the eager invalidation takes effect. For the default pipelines this makes sense because the inliner pipeline runs the function simplification pipeline after all other SCC passes (except CoroSplit which doesn't request any analyses). Overall this has mostly positive effects on compile time and positive effects on memory usage. https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructions https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss D113196 shows that we slightly regressed compile times in exchange for some memory improvements when turning on eager invalidation. D100917 shows that we slightly improved compile times in exchange for major memory regressions in some cases when invalidating less in SCC passes. Turning these on at the same time keeps the memory improvements while keeping compile times neutral/slightly positive. Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D113304	2021-11-15 14:44:53 -08:00
Arthur Eubanks	1d8750c3da	[NFC] Rename GVN -> GVNPass and SROA -> SROAPass To be more consistent with other pass struct names. There are still more passes that don't end with "Pass", but these are the important ones. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D112935	2021-11-09 10:35:58 -08:00
Liqiang Tao	6cad45d5c6	[llvm][Inline] Add a module level inliner Add module level inliner, which is a minimum viable product at this point. Also add some tests for it. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-August/152297.html Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D106448	2021-11-09 11:03:29 +08:00
Arthur Eubanks	7175886a0f	[NewPM] Make eager analysis invalidation per-adaptor Follow-up change to D111575. We don't need eager invalidation on every adaptor. Most notably, adaptors running passes that use very few analyses, or passes that purely invalidate specific analyses. Also allow testing of this via a pipeline string "function<eager-inv>()". The compile time/memory impact of this is very comparable to D111575. https://llvm-compile-time-tracker.com/compare.php?from=9a2eec512a29df45c90c2fcb741e9d5c693b1383&to=b9f20bcdea138060967d95a98eab87ce725b22bb&stat=instructions Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D113196	2021-11-04 17:16:11 -07:00
Sjoerd Meijer	3fd1902ad8	[FuncSpec] Enable it only with -O3 Function specialisation was running at all optimisation levels (if enabled on the command line, it is not on by default). That was an oversight and not something we want to do. Function specialisation duplicates functions when it triggers, so the backend is processing more functions/instructions resulting in compile-time increases, which seems more appropriate with -O3 and inline with GCC. Please note that since function specialisation is not enabled by default, this didn't require updating any pass manager tests. Differential Revision: https://reviews.llvm.org/D112129	2021-11-04 13:59:00 +00:00
Roman Lebedev	9c2469c1dd	[PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passes Test thanks to Michael Kuklinski from `#llvm`: https://godbolt.org/z/bdrah5Goo originally inspired by Daniel Lemire's https://lemire.me/blog/2021/10/26/in-c-is-empty-faster-than-comparing-the-size-with-zero/ We manage to deduce that the answer does not require looping, but we do that after the last `LoopDeletion` pass run, so we end up being stuck with a dead loop. Now, as with all things SCEV, this has a very expected ~`+0.12%` compile time performance regression: https://llvm-compile-time-tracker.com/compare.php?from=0ae7bf124a9bca76dd9a91b2f7379168ff13f562&to=c2ae57c9b961aeb4a28c747266949340613a6d84&stat=instructions (for comparison, doing that in function simplification pipeline would have been ~`+0.5` compile time performance regression, D112840) Looking at the transformation stats over vanilla test-suite, i think it's rather expected: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|--------------------------------------------------\|----------:\|----------:\|------:\|-------:\|-------:\| \| scalar-evolution.NumBruteForceTripCountsComputed \| 789 \| 888 \| 99 \| 12.55% \| 12.55% \| \| scalar-evolution.NumTripCountsNotComputed \| 105592 \| 117900 \| 12308 \| 11.66% \| 11.66% \| \| loop-delete.NumBackedgesBroken \| 542 \| 559 \| 17 \| 3.14% \| 3.14% \| \| regalloc.numExtends \| 81 \| 79 \| -2 \| -2.47% \| 2.47% \| \| indvars.NumFoldedUser \| 408 \| 400 \| -8 \| -1.96% \| 1.96% \| \| indvars.NumElimCmp \| 3831 \| 3758 \| -73 \| -1.91% \| 1.91% \| \| scalar-evolution.NumTripCountsComputed \| 299759 \| 304278 \| 4519 \| 1.51% \| 1.51% \| \| loop-delete.NumDeleted \| 8055 \| 8128 \| 73 \| 0.91% \| 0.91% \| \| machine-cse.NumCommutes \| 111 \| 110 \| -1 \| -0.90% \| 0.90% \| \| globaldce.NumFunctions \| 1187 \| 1192 \| 5 \| 0.42% \| 0.42% \| \| codegenprepare.NumSelectsExpanded \| 277 \| 278 \| 1 \| 0.36% \| 0.36% \| \| loop-unroll.NumRuntimeUnrolled \| 13841 \| 13791 \| -50 \| -0.36% \| 0.36% \| \| machinelicm.NumPostRAHoisted \| 1168 \| 1172 \| 4 \| 0.34% \| 0.34% \| \| phi-node-elimination.NumCriticalEdgesSplit \| 83054 \| 82879 \| -175 \| -0.21% \| 0.21% \| \| machine-cse.NumPREs \| 3085 \| 3079 \| -6 \| -0.19% \| 0.19% \| \| branch-folder.NumBranchOpts \| 108122 \| 107942 \| -180 \| -0.17% \| 0.17% \| \| loop-unroll.NumUnrolled \| 40136 \| 40067 \| -69 \| -0.17% \| 0.17% \| \| branch-folder.NumDeadBlocks \| 130818 \| 130607 \| -211 \| -0.16% \| 0.16% \| \| codegenprepare.NumBlocksElim \| 92856 \| 92714 \| -142 \| -0.15% \| 0.15% \| \| instsimplify.NumSimplified \| 103263 \| 103129 \| -134 \| -0.13% \| 0.13% \| \| instcombine.NumConstProp \| 26070 \| 26102 \| 32 \| 0.12% \| 0.12% \| \| instsimplify.NumExpand \| 1716 \| 1718 \| 2 \| 0.12% \| 0.12% \| \| loop-unroll.NumCompletelyUnrolled \| 9236 \| 9225 \| -11 \| -0.12% \| 0.12% \| \| branch-folder.NumHoist \| 2773 \| 2770 \| -3 \| -0.11% \| 0.11% \| \| regalloc.NumReloadsRemoved \| 10822 \| 10834 \| 12 \| 0.11% \| 0.11% \| \| regalloc.NumSnippets \| 11394 \| 11406 \| 12 \| 0.11% \| 0.11% \| \| machine-cse.NumCrossBBCSEs \| 1052 \| 1053 \| 1 \| 0.10% \| 0.10% \| \| machinelicm.NumCSEed \| 99887 \| 99784 \| -103 \| -0.10% \| 0.10% \| \| branch-folder.NumTailMerge \| 72501 \| 72435 \| -66 \| -0.09% \| 0.09% \| \| codegenprepare.NumExtUses \| 22007 \| 21987 \| -20 \| -0.09% \| 0.09% \| \| local.NumRemoved \| 68232 \| 68294 \| 62 \| 0.09% \| 0.09% \| \| loop-vectorize.LoopsAnalyzed \| 75483 \| 75413 \| -70 \| -0.09% \| 0.09% \| ``` Note that i'm only changing current PM, and not touching obsolete PM. This is an alternative to the function simplification pipeline variant of the same change, D112840. It has both less compile time impact (since the additional number of SCEV trip count calculations is way lass less than with the D112840), and it is much more powerful/impactful (almost 2x more loops deleted). I have checked, and doing this after loop rotation is favorable (more loops deleted). Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D112851	2021-11-03 19:24:49 +03:00

1 2

54 Commits