llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	1f1fb208da	[SimpleLoopUnswitch] Forget block and loop dispositions. Also invalidate block and loop dispositions during non-trivial unswitching. Fixes #58564.	2022-10-30 20:44:22 +00:00
zhongyunde	f58311796c	[InstCombine] refactor the SimplifyUsingDistributiveLaws NFC Precommit for D136015 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D137019	2022-10-30 21:04:06 +08:00
Patrick Walton	cb2cb2d201	[InstCombine] Avoid deleting volatile memcpys. InstCombine can replace memcpy to an alloca with a pointer directly to the source in certain cases. Unfortunately, it also did so for volatile memcpys. This patch makes it stop doing that. This was discovered in D136822. Differential Revision: https://reviews.llvm.org/D137031	2022-10-30 02:40:16 -07:00
Florian Hahn	43f0f1a66f	[VPlan] Use onlyFirstLaneUsed in sinkScalarOperands. Replace custom code to check if only the first lane is used by generic helper `onlyFirstLaneUsed`. This enables VPlan-based sinking in a few additional cases and was suggested in D133760. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D136368	2022-10-29 19:45:19 +01:00
Sanjay Patel	6064e92b0a	[InstCombine] fold mul with incremented "shl 1" factor X * ((1 << Z) + 1) --> (X << Z) + X https://alive2.llvm.org/ce/z/P-7WK9 It's possible that we could do better with propagating no-wrap, but this carries over the existing logic and appears to be correct. The naming differences on the existing folds are a result of using getName() to set the final value via Builder. That makes it easier to transfer no-wrap rather than the gymnastics required from the raw create instruction APIs.	2022-10-29 12:50:19 -04:00
Sanjay Patel	50000ec2cb	[InstCombine] create helper function for mul patterns with 1<<X; NFC There are at least 2 other potential patterns that could go here.	2022-10-29 12:50:19 -04:00
Sanjay Patel	d344146857	[InstCombine] reduce code duplication in visitMul(); NFC	2022-10-29 09:26:02 -04:00
Arthur Eubanks	5404fe3456	Revert "[LegacyPM] Remove pipeline extension mechanism" This reverts commit `4ea6ffb7e8`. Breaks various backends.	2022-10-28 10:26:58 -07:00
Arthur Eubanks	4ea6ffb7e8	[LegacyPM] Remove pipeline extension mechanism Part of gradually removing the legacy PM optimization pipeline. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D136622	2022-10-28 10:16:52 -07:00
Momchil Velikov	c9b4dc3a81	[FuncSpec][NFC] Avoid redundant computations of DominatorTree/LoopInfo The `FunctionSpecialization` pass needs loop analysis results for its cost function. For this purpose, it computes the `DominatorTree` and `LoopInfo` for a function in `getSpecializationBonus`. This function, however, is called O(number of call sites x number of arguments), but the DominatorTree/LoopInfo can be computed just once. This patch plugs into the PassManager infrastructure to obtain LoopInfo for a function and removes ad-hoc computation from `getSpecializatioBonus`. Reviewed By: ChuanqiXu, labrinea Differential Revision: https://reviews.llvm.org/D136332	2022-10-28 16:08:41 +01:00
Momchil Velikov	14384c96df	Recommit: [FuncSpec][NFC] Refactor finding specialisation opportunities [recommitting after recommitting a dependency] This patch reorders the traversal of function call sites and function formal parameters to: * do various argument feasibility checks (`isArgumentInteresting` ) only once per argument, i.e. doing N-args checks instead of N-calls x N-args checks. * do hash table lookups only once per call site, i.e. N-calls lookups/inserts instead of N-call x N-args lookups/inserts. Reviewed By: ChuanqiXu, labrinea Differential Revision: https://reviews.llvm.org/D135968 Change-Id: I7d21081a2479cbdb62deac15f903d6da4f3b8529	2022-10-28 11:26:25 +01:00
Simon Pilgrim	3c71253eb7	ConstraintElimination - pass const DataLayout by reference in (recursive) MergeResults lambda capture. NFC. There's no need to copy this and fixes a coverity remark about large copy by value	2022-10-28 11:20:09 +01:00
David Green	5dd7d2ce67	[InstCombine] Don't change switch table from desirable to illegal types In InstCombine we treat i8/i16 as desirable, even if they are not legal. The current logic in shouldChangeType will decide to convert from an illegal but desirable type (such as an i8) to an illegal and undesirable type (such as i3). This patch prevents changing the switch conditions to an irregular type on like Arm/AArch64 where i8/i16 are not legal. This is the same issue as https://reviews.llvm.org/D54115. In the case I was looking it is was converting an i32 switch to an i8 switch, which then became a i3 switch. Differential Revision: https://reviews.llvm.org/D136763	2022-10-28 10:15:41 +01:00
Juan Manuel MARTINEZ CAAMAÑO	256f8b06c6	[StructurizeCFG][DebugInfo] Maintain DILocations in the branches created by StructurizeCFG Make StructurizeCFG preserve the debug locations of the branch instructions it introduces. Differential Revision: https://reviews.llvm.org/D135967	2022-10-28 02:51:02 -05:00
Alexey Bataev	2ec51f1c75	[SLP]Improve analysis of same/alternate code ops and scheduling. Should improve compile time for analysis and vectorization. Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 2023.00 2022.00 -0.0% test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test 148.00 146.00 -1.4% Generated more vector instructions. Differential Revision: https://reviews.llvm.org/D127531	2022-10-27 16:29:16 -07:00
Alexey Bataev	8ce0c7b1c9	Revert "[SLP]Improve analysis of same/alternate code ops and scheduling." This reverts commit `dad64448c6` to fix a crash in https://lab.llvm.org/buildbot/#/builders/74/builds/14584	2022-10-27 15:21:35 -07:00
wlei	63c27c5c75	Use getCanonicalFnName for callee name	2022-10-27 13:14:47 -07:00
Sanjay Patel	fd90f542cf	[InstCombine] improve efficiency of sub demanded bits; NFC There's no reason to shrink a constant or simplify an operand in 2 steps. This matches what we currently do for 'add' (although that seems like it should be altered to handle the commutative case).	2022-10-27 15:28:05 -04:00
Alexey Bataev	dad64448c6	[SLP]Improve analysis of same/alternate code ops and scheduling. Should improve compile time for analysis and vectorization. Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 2023.00 2022.00 -0.0% test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test 148.00 146.00 -1.4% Generated more vector instructions. Differential Revision: https://reviews.llvm.org/D127531	2022-10-27 11:31:18 -07:00
eopXD	10da9844d0	[LSR] Drop LSR solution if it is less profitable than baseline The LSR may suggest less profitable transformation to the loop. This patch adds check to prevent LSR from generating worse code than what we already have. Since LSR affects nearly all targets, the patch is guarded by the option 'lsr-drop-solution' and default as disable for now. The next step should be extending an TTI interface to allow target(s) to enable this enhancememnt. Debug log is added to remind user of such choice to skip the LSR solution. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D126043	2022-10-27 10:13:57 -07:00
Alexandros Lamprineas	dbeaf6baa2	[FuncSpec] Do not overestimate the specialization bonus for users inside loops. When calculating the specialization bonus for a given function argument, we recursively traverse the chain of (certain) users, accumulating the instruction costs. Then we exponentially increase the bonus to account for loop nests. This is problematic for two reasons: (a) the users might not themselves be inside the loop nest, (b) if they are we are accounting for it multiple times. Instead we should be adjusting the bonus before traversing the user chain. This reduces the instruction count for CTMark (newPM-O3) when Function Specialization is enabled without actually reducing the amount of specializations performed (geomean: -0.001% non-LTO, -0.406% LTO). Differential Revision: https://reviews.llvm.org/D136692	2022-10-27 15:26:11 +01:00
Sanjay Patel	d2d23795ca	[InstCombine] improve demanded bits for Sub operand 0 This is copying the code that was added for 'add' with D130075. (That patch removed a fallthrough in the cases, but we can probably still share at least some code again as a follow-up cleanup, but I didn't want to risk it here.) The reasoning is similar to the carry propagation for 'add': if we don't demand low bits of the subtraction and the subtrahend (aka RHS or operand 1) is known zero in those low bits, then there can't be any borrowing required from the higher bits of operand 0, so the low bits don't matter. Also, the no-wrap flags can be propagated (and I think that should be true for add too). Here's an attempt to prove that in Alive2: https://alive2.llvm.org/ce/z/xqh7Pa (can add nsw or nuw to src and tgt, and it should still pass) Differential Revision: https://reviews.llvm.org/D136788	2022-10-27 09:41:57 -04:00
Momchil Velikov	38f44ccfba	Recommit: [FuncSpec] Fix specialisation based on literals [fixed test to work with reverse iteration] The `FunctionSpecialization` pass has support for specialising functions, which are called with literal arguments. This functionality is disabled by default and is enabled with the option `-function-specialization-for-literal-constant` . There are a few issues with the implementation, though: * even with the default, the pass will still specialise based on floating-point literals * even when it's enabled, the pass will specialise only for the `i1` type (or `i2` if all of the possible 4 values occur, or `i3` if all of the possible 8 values occur, etc) The reason for this is incorrect check of the lattice value of the function formal parameter. The lattice value is `overdefined` when the constant range of the possible arguments is the full set, and this is the reason for the specialisation to trigger. However, if the set of the possible arguments is not the full set, that must not prevent the specialisation. This patch changes the pass to NOT consider a formal parameter when specialising a function if the lattice value for that parameter is: * unknown or undef * a constant * a constant range with a single element on the basis that specialisation is pointless for those cases. Is also changes the criteria for picking up an actual argument to specialise if the argument is: * a LLVM IR constant * has `constant` lattice value has `constantrange` lattice value with a single element. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D135893 Change-Id: Iea273423176082ec51339aa66a5fe9fea83557ee	2022-10-27 12:48:20 +01:00
Sameer Sahasrabuddhe	0fd018f9a9	[NFC] [AAPointerInfo] OffsetAndSize is no longer an std::pair The struct OffsetAndSize is a simple tuple of two int64_t. Treating it as a derived class of std::pair has no special benefit, but it makes the code verbose since we need get/set functions that avoid using "first" and "second" in client code. Eliminating the std::pair makes this more readable. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D136745	2022-10-27 11:00:28 +05:30
wlei	d6a0585dd1	[SampleFDO] Compute and report profile staleness metrics When a profile is stale and profile mismatch could happen, the mismatched samples are discarded, so we'd like to compute the mismatch metrics to quantify how stale the profile is, which will suggest user to refresh the profile if the number is high. Two sets of metrics are introduced here: - (Num_of_mismatched_funchash/Total_profiled_funchash), (Samples_of_mismached_func_hash / Samples_of_profiled_function) : Here it leverages the FunctionSamples's checksums attribute which is a feature of pseudo probe. When the source code CFG changes, the function checksums will be different, later sample loader will discard the whole functions' samples, this metrics can show the percentage of samples are discarded due to this. - (Num_of_mismatched_callsite/Total_profiled_callsite), (Samples_of_mismached_callsite / Samples_of_profiled_callsite) : This shows how many mismatching for the callsite location as callsite location mismatch will affect the inlining which is highly correlated with the performance. It goes through all the callsite location in the IR and profile, use the call target name to match, report the num of samples in the profile that doesn't match a IR callsite. This is implemented in a new class(SampleProfileMatcher) and under a switch("--report-profile-staleness"), we plan to extend it with a fuzzy profile matching feature in the future. Reviewed By: hoy, wenlei, davidxl Differential Revision: https://reviews.llvm.org/D136627	2022-10-26 21:06:52 -07:00
Johannes Rudolf Doerfert	41a278f56a	[OpenMP][FIX] Do not add custom state machine eagerly in LTO runs If we run LTO optimization we migth end up introducing a custom state machine and later transforming the region into SPMD. This is a problem. While a follow up will introduce a check for the SPMD conversion, this already prevents the eager custom state machine generation. Only if the kernel init function is defined, rather then declared, we will emit a custom state machine. SPMD-zation can happen eagerly though. Tests are adjusted via a weak definition. The LTO test was added to verify this works as expected. Differential Revision: https://reviews.llvm.org/D136740	2022-10-26 10:40:11 -07:00
Alex Brachet	443e2a10f6	Reland "[PGO] Make emitted symbols hidden" This was reverted because it was breaking when targeting Darwin which tried to export these symbols which are now hidden. It should be safe to just stop attempting to export these symbols in the clang driver, though Apple folks will need to change their TAPI allow list described in the commit where these symbols were originally exported `f538018562` Then reverted again because it broke tests on MacOS, they should be fixed now. Bug: https://github.com/llvm/llvm-project/issues/58265 Differential Revision: https://reviews.llvm.org/D135340	2022-10-26 17:13:05 +00:00
Momchil Velikov	9901583968	Revert "[FuncSpec] Fix specialisation based on literals" This reverts commit `a8b0f58017` because of "reverse-iteration" buildbot failure.	2022-10-26 13:54:12 +01:00
Momchil Velikov	2c8a4c6e62	Revert "[FuncSpec][NFC] Refactor finding specialisation opportunities" This reverts commit `a8853924bd` due to dependency on `a8b0f58017`	2022-10-26 13:54:12 +01:00
Guillaume Chatelet	1a726cfa83	Take memset_inline into account in analyzeLoadFromClobberingMemInst This appeared in https://reviews.llvm.org/D126903#3884061 Differential Revision: https://reviews.llvm.org/D136752	2022-10-26 09:50:13 +00:00
Momchil Velikov	a8853924bd	[FuncSpec][NFC] Refactor finding specialisation opportunities This patch reorders the traversal of function call sites and function formal parameters to: * do various argument feasibility checks (`isArgumentInteresting` ) only once per argument, i.e. doing N-args checks instead of N-calls x N-args checks. * do hash table lookups only once per call site, i.e. N-calls lookups/inserts instead of N-call x N-args lookups/inserts. Reviewed By: ChuanqiXu, labrinea Differential Revision: https://reviews.llvm.org/D135968	2022-10-26 10:18:35 +01:00
Momchil Velikov	606d25e545	[FuncSpec] Compute specialisation gain even when forcing specialisation When rewriting the call sites to call the new specialised functions, a single call site can be matched by two different specialisations - a "less specialised" version of the function and a "more specialised" version of the function, e.g. for a function void f(int x, int y) the call like `f(1, 2)` could be matched by either void f.1(int x /* int y == 2 /); or void f.2(/ int x == 1, int y == 2 */); The `FunctionSpecialisation` pass tries to match specialisation in the order of decreasing gain, so "more specialised" functions are preferred to "less specialised" functions. This breaks, however, when using the flag `-force-function-specialization`, in which case the cost/benefit analysis is not performed and all the specialisations are equally preferable. This patch makes the pass calculate specialisation gain and order the specialisations accordingly even when `-force-function-specialization` is used, under the assumption that this flag has purely debugging purpose and it is reasonable to ignore the extra computing effort it incurs. Reviewed By: ChuanqiXu, labrinea Differential Revision: https://reviews.llvm.org/D136180	2022-10-26 10:08:03 +01:00
Momchil Velikov	a8b0f58017	[FuncSpec] Fix specialisation based on literals The `FunctionSpecialization` pass has support for specialising functions, which are called with literal arguments. This functionality is disabled by default and is enabled with the option `-function-specialization-for-literal-constant` . There are a few issues with the implementation, though: * even with the default, the pass will still specialise based on floating-point literals * even when it's enabled, the pass will specialise only for the `i1` type (or `i2` if all of the possible 4 values occur, or `i3` if all of the possible 8 values occur, etc) The reason for this is incorrect check of the lattice value of the function formal parameter. The lattice value is `overdefined` when the constant range of the possible arguments is the full set, and this is the reason for the specialisation to trigger. However, if the set of the possible arguments is not the full set, that must not prevent the specialisation. This patch changes the pass to NOT consider a formal parameter when specialising a function if the lattice value for that parameter is: * unknown or undef * a constant * a constant range with a single element on the basis that specialisation is pointless for those cases. Is also changes the criteria for picking up an actual argument to specialise if the argument is: * a LLVM IR constant * has `constant` lattice value has `constantrange` lattice value with a single element. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D135893	2022-10-26 09:55:33 +01:00
Momchil Velikov	1a525dec7f	[FuncSpec] Fix missed opportunities for function specialisation When collecting the possible constant arguments to specialise a function the compiler will abandon the search on the first argument that is for some reason unsuitable as a specialisation constant. Thus, depending on the traversal order of the functions and call sites, the compiler can end up with a different set of possible constants, hence with different set of specialisations. With this patch, the compiler will skip unsuitable constants, but nevertheless will continue searching for more. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D135867	2022-10-25 23:19:48 +01:00
Philip Reames	269bc684e7	[LV][RISCV] Disable vectorization of epilogue loops Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder. In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV. In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV. As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination. As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV. Differential Revision: https://reviews.llvm.org/D136695	2022-10-25 14:28:02 -07:00
Arthur Eubanks	ef37504879	[Instrumentation] Remove legacy passes Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D136615	2022-10-25 13:11:07 -07:00
Alina Sbirlea	d1b19da854	[LoopPeeling] Add flag to disable support for peeling loops with non-latch exits Add a flag to allow disabling the changes in https://reviews.llvm.org/D134803. Differential Revision: https://reviews.llvm.org/D136643	2022-10-25 12:19:14 -07:00
Momchil Velikov	c47739b45c	[FuncSpec] Consider small noinline functions for specialisation Small functions with size under a given threshold are not considered for specialisaion on the presumption that they are easy to inline. This does not apply to `noinline` functions, though. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D135862	2022-10-25 19:49:04 +01:00
Fangrui Song	a527bda520	[LegacyPM] Remove DataFlowSanitizerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove DataFlowSanitizerLegacyPass. Differential Revision: https://reviews.llvm.org/D124594	2022-10-25 10:55:29 -07:00
Simon Pilgrim	50fe87a5c8	[Transforms] classifyArgUse - don't deference pointer before null test Reported here: https://pvs-studio.com/en/blog/posts/cpp/1003/ (N11)	2022-10-25 17:24:00 +01:00
Yaxun (Sam) Liu	9d5adc7e49	Revert "reland `e5581df60a` [SimplifyCFG] accumulate bonus insts cost" This reverts commit `bd7949bcd8`. Revert this patch since reviwers have different opinions regarding the approach in post-commit review. Will open RFC for further discussion. Differential Revision: https://reviews.llvm.org/D132408	2022-10-25 12:15:39 -04:00
zhongyunde	620cff096a	[InstCombine] Fold series of instructions into mull for more types Relax the constraint of wide/vectors types. Address the comment https://reviews.llvm.org/D136015?id=469189#inline-1314520 Reviewed By: spatel, chfast Differential Revision: https://reviews.llvm.org/D136661	2022-10-25 23:04:46 +08:00
Nico Weber	76745d2b58	Revert "[PGO] Make emitted symbols hidden" This reverts commit `04877284b4`. Looks like this is still breaking the test Profile-x86_64 :: instrprof-darwin-dead-strip.c (see comment on https://reviews.llvm.org/D135340).	2022-10-25 08:54:47 -04:00
LiaoChunyu	e6c8418aab	[ObjCARC][NFC] Fix defined but not used warning from D135041 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D136665	2022-10-25 15:16:42 +08:00
Kevin Athey	31bfa4a69b	[MSAN] Add handleCountZeroes for ctlz and cttz. This addresses a bug where vector versions of ctlz are creating false positive reports. Depends on D136369 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D136523	2022-10-24 17:31:34 -07:00
Roy Sundahl	0c35b6165c	[ASAN] Don't inline when -asan-max-inline-poisoning-size=0 When -asan-max-inline-poisoning-size=0, all shadow memory access should be outlined (through asan calls). This was not occuring when partial poisoning was required on the right side of a variable's redzone. This diff contains the changes necessary to implement and utilize __asan_set_shadow_01() through __asan_set_shadow_07(). The change is necessary for the full abstraction of the asan implementation and will enable experimentation with alternate strategies. Differential Revision: https://reviews.llvm.org/D136197	2022-10-24 14:17:59 -07:00
Sanjay Patel	5dcfc32822	[InstCombine] allow more commutative matches for logical-and to select fold This is a sibling transform to the fold just above it. That was changed to allow the corresponding commuted patterns with: `3073074562` `e1bd759ea5` `8628e6df70`	2022-10-24 16:40:43 -04:00
Yaxun (Sam) Liu	bd7949bcd8	reland `e5581df60a` [SimplifyCFG] accumulate bonus insts cost Fixed compile time increase due to always constructing LocalCostTracker. Now only construct LocalCostTracker when needed.	2022-10-24 15:43:53 -04:00
Craig Topper	1edc51b56a	[InstCombine] Explicitly check for scalable TypeSize. Instead of assuming it is a fixed size. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D136517	2022-10-24 12:29:06 -07:00
Alex Brachet	04877284b4	[PGO] Make emitted symbols hidden This was reverted because it was breaking when targeting Darwin which tried to export these symbols which are now hidden. It should be safe to just stop attempting to export these symbols in the clang driver, though Apple folks will need to change their TAPI allow list described in the commit where these symbols were originally exported `f538018562` Bug: https://github.com/llvm/llvm-project/issues/58265 Differential Revision: https://reviews.llvm.org/D135340	2022-10-24 19:05:10 +00:00
Alexey Bataev	da4e0f7ac5	[SLP][NFC]Fix PR58476: Fix compile time for reductions, NFC. Improve O(N^2) to O(N) in some cases, reduce number of allocations by reserving memory. Also, improve analysis of loads reduction values to avoid analysis of not vectorizable cases.	2022-10-24 10:13:24 -07:00
zhongyunde	81713e893a	[InstCombine] Fold series of instructions into mull The following sequence should be folded into in0 * in1 In0Lo = in0 & 0xffffffff; In0Hi = in0 >> 32; In1Lo = in1 & 0xffffffff; In1Hi = in1 >> 32; m01 = In1Hi * In0Lo; m10 = In1Lo * In0Hi; m00 = In1Lo * In0Lo; addc = m01 + m10; ResLo = m00 + (addc >> 32); Reviewed By: spatel, RKSimon Differential Revision: https://reviews.llvm.org/D136015	2022-10-25 01:09:37 +08:00
Kevin P. Neal	cfb88ee3ba	[StrictFP][IPSCCP] Constant fold intrinsics with metadata arguments This teaches the SCCP Solver how to constant fold more intrinsics. Constant folding appears to be just as good as D115737 but much, much lower in code change impact as suggested by nikic. The constrained floating-point intrinsics all take at least one metadata argument and were the motivation for the change. Differential Revision: https://reviews.llvm.org/D136466	2022-10-24 11:43:20 -04:00
Ahmed Bougacha	bddd9b6b91	[InstCombine] Combine ptrauth sign/resign + auth/resign intrinsics. (sign\|resign) + (auth\|resign) can be folded by omitting the middle sign+auth component if the key and discriminator match. Differential Revision: https://reviews.llvm.org/D132383	2022-10-24 08:03:14 -07:00
Matt Arsenault	597b9b7e8e	CodeExtractor: Fix assertion with non-0 alloca address spaces emitCallAndSwitchStatement creates placeholder allocas to pass to these, so the types need to match.	2022-10-23 15:16:55 -07:00
Mike Hommey	86e57e66da	[InstCombine] Bail out of casting calls when a conversion from/to byval is involved. Fixes #58307 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D135738	2022-10-23 09:49:48 +02:00
Kazu Hirata	3f8d2c917c	Ensure newlines at the end of files (NFC)	2022-10-22 09:29:40 -07:00
Sanjay Patel	8628e6df70	[InstCombine] use freeze to enable poison-safe logic->select fold Without a freeze, this transform can leak poison to the output: https://alive2.llvm.org/ce/z/GJuF9i This makes the transform as uniform as possible, and it can help reduce patterns like issue #58313 (although that particular example probably still needs another transform). Differential Revision: https://reviews.llvm.org/D136527	2022-10-22 10:42:14 -04:00
Thomas Symalla	fc26a75280	[NFC] Fixed several misspellings of "Splitter" in Scalarizer Spliiter => Splitter	2022-10-22 15:13:56 +02:00
Sanjay Patel	e1bd759ea5	[InstCombine] allow more matches for logical-ands --> select This allows patterns with real 'and' instructions because those are safe to transform: https://alive2.llvm.org/ce/z/7-U_Ak	2022-10-22 08:15:50 -04:00
Arthur Eubanks	4153f989ba	[ObjCARC] Remove legacy PM versions of optimization passes This doesn't touch objc-arc-contract because that's in the codegen pipeline. However, this does move its corresponding initialize function into initializeCodegen(). Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D135041	2022-10-21 13:40:54 -07:00
Sanjay Patel	3073074562	[InstCombine] allow more commutative matches for logical-and to select fold When the common value is part of either select condition, this is safe to reduce. Otherwise, it is not poison-safe (with the select form of the pattern): https://alive2.llvm.org/ce/z/FxQTzB This is another patch motivated by issue #58313.	2022-10-21 13:29:13 -04:00
Wael Yehia	461a1836d3	[PGO][AIX] Improve dummy var retention and allow -bcdtors:csect linking. 1) Use a static array of pointer to retain the dummy vars. 2) Associate liveness of the array with that of the runtime hook variable __llvm_profile_runtime. 3) Perform the runtime initialization through the runtime hook variable. 4) Preserve the runtime hook variable using the -u linker flag. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D136192	2022-10-21 16:32:42 +00:00
Sanjay Patel	d7fecf26f4	[InstCombine] allow some commutative matches for logical-and to select fold This is obviously correct for real logic instructions, and it also works for the poison-safe variants that use selects: https://alive2.llvm.org/ce/z/wyHiwX This is motivated by the lack of 'xor' folding seen in issue #58313. This more general fold should help reduce some of those patterns, but I'm not sure if this specific case does anything for that particular example.	2022-10-21 11:28:38 -04:00
Sanjay Patel	f6fc3e23b9	[InstCombine] refactor matching code for logical ands; NFCI Separating the matches makes it easier to enhance for commutative patterns.	2022-10-21 11:28:38 -04:00
Sanjay Patel	bf75e937bb	[InstCombine] match logical and/or more generally in fold to select This allows the regular bitwise logic opcodes in addition to the poison-safe select variants: https://alive2.llvm.org/ce/z/8xB9gy Handling commuted variants safely is likely trickier, so that's left to another patch.	2022-10-21 09:03:36 -04:00
Florian Hahn	fd236772f5	[IndVars] Forget SCEV for value after simplifying condition. Additional SCEV verification highlighted a case where the cached loop dispositions where incorrect after simplifying a condition in IndVars and moving the user in LoopDeletion. Fix it by invalidating ICmp and all its users. Fixes #58515.	2022-10-21 11:18:01 +01:00
Nikita Popov	e9754f0211	[IR] Add support for memory attribute This implements IR and bitcode support for the memory attribute, as specified in https://reviews.llvm.org/D135597. The new attribute is not used for anything yet (and as such, the old memory attributes are unaffected). Differential Revision: https://reviews.llvm.org/D135592	2022-10-21 12:11:25 +02:00
Florian Hahn	7eb4ec1c75	[VPlan] Print predicates for widened cmp instructions (NFC).	2022-10-21 08:54:11 +01:00
Michael Francis	922f42d531	[clang][AIX] Fix mcount name and call arguments Currently, compiling a program with the `-pg` flag will result in an undefined symbol error for `.mcount`. This revision fixes the call to use `__mcount`, which requires a pointer argument to a pointer-sized object (unique per inserted call) on AIX. This is only a partial fix. This patch should fix the `-pg` flag's behaviour on AIX to work with code you are compiling, but it will not link against standard libraries with `mcount` instrumentation calls. The next step is to add profiled libraries to the linker search paths in the Clang driver for the AIX toolchain when linking with `-pg`. Differential Review: https://reviews.llvm.org/D135384	2022-10-20 16:20:00 -04:00
William Huang	6c767cef5a	[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back Canonicalize GEP of GEP by swapping GEP with some suffix constant indices to the back (and GEP with all constant indices to the back of that), this allows more constant index GEP merging to happen. Exceptions are: If swapping violates use-def relations, or anti-optimizes LICM For constant indexed GEP of GEP, if they cannot be merged directly, they will be casted to i8* and merged. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D125845	2022-10-20 17:41:26 +00:00
Paul Walker	ab8257ca0e	[NFC] Fix a few whitespace inconsistencies.	2022-10-20 14:52:25 +00:00
OCHyams	f825214411	[DebugInfo][NFC] Refactor debug intrinsic copy and delete to instead just move Reviewed By: jryans Differential Revision: https://reviews.llvm.org/D133304	2022-10-20 15:12:49 +01:00
Florian Hahn	e25ed058bc	[LV] Use buildScalarSteps to also handle VF = 1. (NFCI) The code in buildScalarSteps already properly handles creating the scalar induction values with VF = 1. Use it directly instead of using extra code to handle that case. Suggested by @Ayal in D133760.	2022-10-20 14:30:01 +01:00
Nikita Popov	7c32c7e777	Reapply [FunctionAttrs] Make location classification more precise Reapplying after the fix for volatile modelling in D135863. ----- Don't add argmem if the pointer is clearly not an argument (e.g. a global). I don't think this makes a difference right now, but gives more obvious results with D135780.	2022-10-20 15:13:20 +02:00
Nabeel Omer	e1fd6d49a3	[InstCombine] Fix assert condition in `foldSelectShuffleOfSelectShuffle` Bug introduced in `e239198cdb`. The assert() is making an assumption that the resulting shuffle mask will always select elements from both vectors, this is untrue in the case of two shuffles being folded if the former shuffle has a mask with undef elements in it. In such a case folding the shuffles might result in a mask which only selects from one of the vectors because the other elements (in the mask) are undef. Differential Revision: https://reviews.llvm.org/D136256	2022-10-20 12:10:54 +00:00
Florian Hahn	3a4aa24fd1	[LoopSimplifyCFG] Forget loop and block dispos after merging blocks. This fixes another case where block and loop dispositions weren't properly invalidate after changing the CFG. Fixes #58489.	2022-10-20 11:23:29 +01:00
Nikita Popov	bed31153b7	[FuncAttrs] Extract code for adding a location access (NFC) This code is the same for accesses from call arguments and for accesses from other (single-location) instructions. Extract i into a common function.	2022-10-20 12:01:27 +02:00
Nikita Popov	f2fe289374	[FunctionAttrs] Volatile operations can access inaccessible memory Per LangRef, volatile operations are allowed to access the location of their pointer argument, plus inaccessible memory: > Any volatile operation can have side effects, and any volatile > operation can read and/or modify state which is not accessible > via a regular load or store in this module. > [...] > The allowed side-effects for volatile accesses are limited. If > a non-volatile store to a given address would be legal, a volatile > operation may modify the memory at that address. A volatile > operation may not modify any other memory accessible by the > module being compiled. A volatile operation may not call any > code in the current module. FunctionAttrs currently does not model this and ends up marking functions with volatile accesses on arguments as argmemonly, even though they should be inaccessiblemem_or_argmemonly. Differential Revision: https://reviews.llvm.org/D135863	2022-10-20 11:57:10 +02:00
Alexey Bataev	b8b740c834	[SLP][NFC]Remove unused variable, NFC.	2022-10-19 12:35:27 -07:00
Fangrui Song	c80b12d352	Revert D135427 "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally" This reverts commit `8ef3fd8d59`. I mentioned that GlobalAlias was not handled. It turns out GlobalAlias has to be handled in the same patch (as opposed to in a follow-up), as otherwise clang codegen of C5/D5 constructor/destructor would regress (https://reviews.llvm.org/D135427#3869003).	2022-10-19 11:24:12 -07:00
Florian Hahn	d72fcee8f4	[VPlan] Add VPValue::isDefinedOutsideVectorRegions helper (NFC). @Ayal suggested a better named helper than using `!getDef()` to check if a value is invariant across all parts. The property we are using here is that the VPValue is defined outside any vector loop region. There's a TODO left to handle recipes defined in pre-header blocks. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133666	2022-10-19 13:20:30 +01:00
bipmis	38f3e44997	[AggressiveInstCombine] Load merge the reverse load pattern of consecutive loads. This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns. Differential Revision: https://reviews.llvm.org/D135137	2022-10-19 11:22:58 +01:00
Nikita Popov	747f27d97d	[AA] Rename getModRefBehavior() to getMemoryEffects() (NFC) Follow up on D135962, renaming the method name to match the new type name.	2022-10-19 11:03:54 +02:00
Nikita Popov	1a9d9823c5	[AA] Rename uses of FunctionModRefBehavior (NFC) Followup to D135962 to rename remaining uses of FunctionModRefBehavior to MemoryEffects. Does not touch API names yet, but also updates variables names FMRB/MRB to ME, to match the new type name.	2022-10-19 10:54:47 +02:00
Alexey Bataev	087dadfd37	[SLP]Generalize cost model. Generalized the cost model estimation. Improved cost model estimation for repeated scalars (no need to count their cost anymore), improved cost model for extractelement instructions. cpu2017 511.povray_r 0.57 520.omnetpp_r -0.98 521.wrf_r -0.01 525.x264_r 3.59 <+ 526.blender_r -0.12 531.deepsjeng_r -0.07 538.imagick_r -1.42 Geometric mean: 0.21 Differential Revision: https://reviews.llvm.org/D115757	2022-10-18 11:55:59 -07:00
Alexey Bataev	62267e8de0	Revert "[SLP]Generalize cost model." This reverts commit `f12fb91188` and `f5c747bfbe` to fix detected non-initialized var use.	2022-10-18 11:25:59 -07:00
Sjoerd Meijer	f7c42a278b	Revert "Recommit "[LoopFlatten] Enable it by default"" This reverts commit `5b9597f59a`. A miscompilation was reported: https://github.com/llvm/llvm-project/issues/58441 Reverting this while I look at that.	2022-10-18 23:36:36 +05:30
Alexey Bataev	f5c747bfbe	[SLP][NFC]Fix a warning for ?: with enum/unsigned, NFC.	2022-10-18 10:08:05 -07:00
Florian Hahn	c65513444b	[IndVars] Forget SCEV for instruction and users before replacing it. Extra invalidation is needed here to clear stale values to fix a verification failure. Fixes #58440.	2022-10-18 17:38:14 +01:00
Alexey Bataev	f12fb91188	[SLP]Generalize cost model. Generalized the cost model estimation. Improved cost model estimation for repeated scalars (no need to count their cost anymore), improved cost model for extractelement instructions. cpu2017 511.povray_r 0.57 520.omnetpp_r -0.98 521.wrf_r -0.01 525.x264_r 3.59 <+ 526.blender_r -0.12 531.deepsjeng_r -0.07 538.imagick_r -1.42 Geometric mean: 0.21 Differential Revision: https://reviews.llvm.org/D115757	2022-10-18 08:49:32 -07:00
Arthur Eubanks	6219ec07c6	[SROA] Don't speculate phis with different load user types Fixes an SROA crash. Fallout from opaque pointers since with typed pointers we'd bail out at the bitcast. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D136119	2022-10-18 08:44:13 -07:00
Sanjay Patel	44b7da89d7	[InstCombine] fmul nnan X, 0.0 --> copysign(0.0, X) https://alive2.llvm.org/ce/z/ybgM5F Differential Revision: https://reviews.llvm.org/D136166	2022-10-18 11:34:02 -04:00
Sanjay Patel	d16989607b	[InstCombine] reduce code duplication in visitBranchInst(); NFCI	2022-10-18 11:34:02 -04:00
Florian Hahn	a8e9742bd4	[IndVarSimplify] Clear block and loop dispositions after moving instr. Moving an instruction can invalidate the cached block dispositions of the corresponding SCEV. Invalidate the cached dispositions. Also fixes a copy-paste error in forgetBlockAndLoopDispositions where the start expression S was removed from BlockDispositions in the loop but not the current values. This was also exposed by the new test case. Fixes #58439.	2022-10-18 16:18:14 +01:00
Alexey Bataev	e79532d28c	[SLP][NFC]Try to fix MSVC buildbots with a workaround, NFC.	2022-10-18 07:50:10 -07:00
uabkaka	da137d041b	[SimplifyLibCalls] Add NoUndef/NonNull/Dereferenceable attributes to iprintf/siprintf When SimplifyLibCalls fail to optimize printf and sprintf it add NoUndef/NonNull/Dereferenceable attributes. This patch add the same attributes if SimplifyLibCalls optimize printf/sprintf into the integer only iprintf/siprintf. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D136140	2022-10-18 16:36:35 +02:00
Alexey Bataev	6a6fc4890d	[SLP][NFC]Formatting of the getEntryCost function, NFC.	2022-10-18 07:18:26 -07:00
Florian Hahn	e302fa89aa	[LoopUnroll] Forget exit values when making changes. When unrolling, the exit values in LCSSA phis will get updated. Invalidate cached SCEV values for those phis in case SCEV looked through a exit phi. Fixes #58340.	2022-10-18 15:12:24 +01:00
Max Kazantsev	f884a4c957	[NFC] Reuse NonTrivialUnswitchCandidate instead of std::pair	2022-10-18 14:00:53 +07:00
Arthur Eubanks	308b4bca14	[NFC][SROA] Update comment to use opaque pointers for clarity	2022-10-17 16:37:29 -07:00
Daniel Sanders	021e6e05d3	[instsimplify] Move (extelt (inselt Vec, Value, Index), Index) -> Value from InstCombine As requested in https://reviews.llvm.org/D135625#3858141 Differential Revision: https://reviews.llvm.org/D136099	2022-10-17 15:22:06 -07:00
Matthias Braun	6d972ad2d8	ControlHeightReduction: Remove assert check in shouldApply Remove assertion checking for non-empty `ProfileSummaryInfo`. Differential Revision: https://reviews.llvm.org/D133706	2022-10-17 13:10:13 -07:00
Florian Hahn	6db71b8f14	[ConstraintElim] Use helper to allow overflow for coefficients of GEPs If the arithmetic for indices of inbounds GEPs overflows, the result is poison. This means it is also OK for the coefficients to overflow. GEP decomposition is limited to cases where the index size is <= 64 bit, which can be represented by int64_t used for the coefficients in the constraint system.	2022-10-17 20:30:43 +01:00
Sjoerd Meijer	5b9597f59a	Recommit "[LoopFlatten] Enable it by default" The sanitizer bots turned green again after another change went in, i.e. revert `26dd64ba9c`, so I don't think this patch was causing the problems.	2022-10-17 23:27:19 +05:30
Sjoerd Meijer	a71c4e4fbb	Revert "[LoopFlatten] Enable it by default" This reverts commit `233659c7ae`. I see some sanitizer build bot failures. Not sure if it is change causing it, but let's see if a revert returns the bots to green...	2022-10-17 22:14:20 +05:30
Sanjay Patel	8d76fbb5f0	[VectorCombine] fix crashing on match of non-canonical fneg We can't assume that operand 0 is the negated operand because the matcher handles "fsub -0.0, X" (and also +0.0 with FMF). By capturing the extract within the match, we avoid the bug and make the transform more robust (can't assume that this pass will only see canonical IR).	2022-10-17 10:47:48 -04:00
Nikita Popov	779fd39684	Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify Relative to the previous attempt, this is rebased over the InstSimplify fix in `ac74e7a780`, which addresses the miscompile reported in PR58401. ----- foldOpIntoPhi() currently only folds operations into the phi if all but one operands constant-fold. The two exceptions to this are freeze and select, where we allow more general simplification. This patch makes foldOpIntoPhi() generally simplification based and removes all the instruction-specific logic. We just try to simplify the instruction for each operand, and for the (potentially) one non-simplified operand, we move it into the new block with adjusted operands. This fixes https://github.com/llvm/llvm-project/issues/57448, which was my original motivation for the change. Differential Revision: https://reviews.llvm.org/D134954	2022-10-17 16:11:05 +02:00
Florian Hahn	699396131f	Revert "Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify" This reverts commit `333246b48e`. It looks like this patch causes a mis-compile: https://github.com/llvm/llvm-project/issues/58401 Fixes #58401.	2022-10-17 12:56:28 +01:00
Sjoerd Meijer	233659c7ae	[LoopFlatten] Enable it by default LoopFlatten has been in the code base off by default for years, but this enables it to run by default. Downstream this has been running for years, so it has been exposed to quite some code. Then around the time we switched to the NPM, several fixes went in related to updating the MemorySSA state and we moved it to a loop pass manager, which both helped preventing rerunning certain analysis passes, and thus helped a bit with compile-times. About compile-times, adding a pass isn't free, but this should see only very minor increases. The pass is relatively simple and there shouldn't be anything algorithmically expensive because all it does is looking at inner/outer loops and it checks assumptions on loop increments and indices. If we see increases, I expect this to mainly come from invalidation of analysis info, and perhaps subsequent passes to trigger and do more. Despite its simplicity/restrictions, it triggers in most code-bases, which makes it worth to enable this by default. Differential Revision: https://reviews.llvm.org/D109958	2022-10-17 17:11:39 +05:30
Chuanqi Xu	1cedc51ff5	[Coroutines] Don't merge readnone calls in presplit coroutines Another alternative to fix the thread identification problem in coroutines. We plan to fix this problem by unifying memory effecting attributes. See https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579. But it may be a long-term project. And it is a pity that the coroutines can't resume in different threads for years. So this one is temporary fix. It may cause unnecessary performance regression for coroutines. But correctness are more important. And this one is planned to be reverted after we are able to unify the memory effecting attributes actually. Reviewed By: jdoerfert, rjmccall Differential Revision: https://reviews.llvm.org/D135550	2022-10-17 10:22:43 +08:00
Kazu Hirata	5ea3155565	[llvm] Use llvm::find (NFC)	2022-10-16 16:21:00 -07:00
Florian Hahn	462ab9810d	[ConstraintElim] Fix signed integer overflow for inbounds GEP. For inbounds GEPs, signed overflow yields poison, so it is fine for the coefficients to wrap as well. This fixes an UBSan failure.	2022-10-16 23:25:28 +01:00
Florian Hahn	aec0c1009f	[ConstraintElim] Replace custom GEP index handling by using existing code Instead of duplicating the existing decomposition code for GEP indices just use the existing code by calling the existing decompose function on the index expression and multiply the result's coefficients by the scale of the index. This both reduces code duplication and generalizes the pattern we can handle.	2022-10-16 21:53:11 +01:00
Florian Hahn	a4635ec710	[ConstraintElim] Support `add nsw` for unsigned preds with positive ops. If both operands of an `add nsw` are known positive, it can be treated the same as `add nuw` and added to the unsigned system. https://alive2.llvm.org/ce/z/6gprff	2022-10-16 20:25:14 +01:00
Sanjay Patel	e5ee0b06d6	[InstCombine] try to determine "exact" for sdiv If the divisor is a power-of-2 or negative-power-of-2 and the dividend is known to have >= trailing zeros than the divisor, the division is exact: https://alive2.llvm.org/ce/z/UGBksM (general proof) https://alive2.llvm.org/ce/z/D4yPS- (examples based on regression tests) This isn't the most direct optimization (we could create ashr in these examples instead of relying on existing folds for exact divides), but it's possible that there's a more general constraint than just a pow2 divisor, so this might be extended in the future. This should solve issue #58348. Differential Revision: https://reviews.llvm.org/D135970	2022-10-16 10:59:56 -04:00
Sanjay Patel	340ae45be0	[InstCombine] use isKnownNonNegative() for readability; NFCI This should be functionally equivalent - both calls are thin wrappers around computeKnownBits(). We'll probably want to use known-bits directly in follow-up patches because that could determine "exact" for example (see issue #58348).	2022-10-16 10:59:56 -04:00
Kazu Hirata	b2f41e9ac1	[Vectorize] Use std::conditional_t (NFC)	2022-10-15 14:52:25 -07:00
Florian Hahn	7c1b80e35c	[ConstraintElim] Support unsigned decomposition of mul/shl nuw..const Support decomposition for `mul/shl nuw` with constant operand for unsigned queries. Those expressions should not wrap in the unsigned sense and can be added directly to the unsigned system.	2022-10-15 21:28:08 +01:00
Florian Hahn	f12684d36e	[ConstraintElim] Support signed decomposition of `add nsw`. Add support decomposition for `add nsw` for signed queries. `add nsw` won't wrap and can be directly added to the signed system.	2022-10-15 18:34:03 +01:00
Zequan Wu	82035ec777	Revert "[PGO] Make emitted symbols hidden" This reverts commit `ecac223b0e`. The commit causes instrprof-darwin-dead-strip.c to fail on mac.	2022-10-14 15:23:26 -07:00
Florian Hahn	16cf666bb7	[Loop] Move block and loop dispo invalidation to makeLoopInvariant. makeLoopInvariant may recursively move its operands to make them invariant, before moving the passed in instruction. Those recursively moved instructions are currently missed when invalidating block and loop dispositions. To address this, move the invalidation code to Loop::makeLoopInvariant. Fixes #58314. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D135909	2022-10-14 21:58:14 +01:00
Zain Jaffal	0c8dde551c	[ConstraintElimination] Move logic for replacing ssub overflow users (NFC) Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D134044	2022-10-14 21:14:21 +01:00
Argyrios Kyrtzidis	d877e3fe71	[Transforms/ObjCARC] Fix non-deterministic output of `ObjCARCOptPass` `ProvenanceAnalysis::related()` was assuming that the order of parameters for `relatedCheck()` was not affecting the result but this was not the case when both parameters were `PHINode`s. Due to this assumption `ProvenanceAnalysis::related()` was ordering the parameters based on pointer value which resulted in non-deterministic behavior. To address this change `relatedPHI()` so that it gives the same result independent of the parameter order. rdar://100325456 Differential Revision: https://reviews.llvm.org/D135376	2022-10-14 12:26:58 -07:00
Craig Topper	d3366efd43	[LV] Simplify register usage code and avoid double map lookups. NFC Instead of checking whether a map entry exists to decide if we should initialize it or add to it, we can rely on the map entry being constructed and initialized to 0 before the addition happens. For the std::max case, I've made a reference to the map entry to avoid looking it up twice. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D135977	2022-10-14 11:55:48 -07:00
Florian Hahn	5a68e578ca	[ConstraintElim] Add debug message when decomposition fails.	2022-10-14 11:02:05 +01:00
Wolfgang Pieb	b43a1d1bd9	[PGO] Do not create block count annotations when all weights are 0, avoiding an assertion. A BB with a nonzero count, whose successor blocks all have 0 counts, could cause an assertion. Don't create any branch weights in this case. Reviewed By: xur Differential Revision: https://reviews.llvm.org/D134203	2022-10-13 14:57:42 -07:00
Sanjay Patel	d85505a932	[InstCombine] fold logical and/or to xor (A \| B) & ~(A & B) --> A ^ B https://alive2.llvm.org/ce/z/qpFMns We already have the equivalent fold for real logic instructions, but this pattern may occur with selects too. This is part of solving issue #58313.	2022-10-13 16:12:20 -04:00
Florian Hahn	572d5d374c	[ConstraintElim] Add support for GEPs with multiple indices. Lift restriction on GEPs with a single index by iterating over all indices and joining the {Coefficient, Variable} entries for all indices together.	2022-10-13 21:08:33 +01:00
Alex Brachet	ecac223b0e	[PGO] Make emitted symbols hidden This was reverted because it was breaking when targeting Darwin which tried to export these symbols which are now hidden. It should be safe to just stop attempting to export these symbols in the clang driver, though Apple folks will need to change their TAPI allow list described in the commit where these symbols were originally exported `f538018562` Bug: https://github.com/llvm/llvm-project/issues/58265 Differential Revision: https://reviews.llvm.org/D135340	2022-10-13 19:47:15 +00:00
Florian Hahn	71c49d189a	[ConstraintElim] Move check-and-replace logic to helper function (NFC). Move logic to check and replace conditions to a helper function. This isolates the code, allows using early returns, reduces the indentation and simplifies eliminateConstraints.	2022-10-13 18:58:37 +01:00
Nikita Popov	b54b84fde6	[MemCpyOpt] Add additional debug output (NFC)	2022-10-13 17:03:44 +02:00
Alexey Bataev	c787986cdd	[SLP]Improve costs of vectorized loads/stores by analyzing GEPs. When generating masked gathers nodes, SLP vectorizer accounts the cost of the GEPs for loads as part of the scalar-vector transformation cost estimation. But it does not do it for vectorized loads/stores, while it may completely remove some of the GEPs completely. Because of this in some cases masked gather operation can be much more profitable rather than regular vectorization (masked-gather cost + vector GEP - scalar loads + GEPs comparing to vectorized loads - scalar loads). Added the analysis of the removed scalarGEPs for vectorized load/store nodes for better cost estimation. Differential Revision: https://reviews.llvm.org/D135282	2022-10-13 07:20:41 -07:00
Philip Reames	fe755af3a9	Revert "Remove PlaceSafepoints pass" This reverts commit `cb66e123c6`. It was reported via https://reviews.llvm.org/rGcb66e123c6bc82a793300b6fb3ecbed79c58f557#1132969 that the Microsoft.NET compiler is still using this pass.	2022-10-13 07:17:25 -07:00
Florian Hahn	019049a1ca	[ConstraintElim] Use MulOverflow to avoid UB on signed overflow. This fixes an UBSan failure after `359bc5c541`. For inbounds GEP with index sizes <= 64, having the coefficients overflowing is fine.	2022-10-13 13:57:43 +01:00
Nikita Popov	d44cd1bbeb	Revert "[FunctionAttrs] Make location classification more precise" This reverts commit `b05f5b90a1`. There are thread sanitizer buildbot failures in simple_stack.c. I think that's because this ended up affecting the handling of volatile accesses to allocas. Reverting for now.	2022-10-13 12:11:04 +02:00
Nikita Popov	b05f5b90a1	[FunctionAttrs] Make location classification more precise Don't add argmem if the pointer is clearly not an argument (e.g. a global). I don't think this makes a difference right now, but gives more obvious results with D135780.	2022-10-13 11:24:23 +02:00
Florian Hahn	359bc5c541	[ConstraintElim] Bail out for GEPs when index size > 64 bits. Limit pointer decomposition to pointers with index sizes of at most 64 bits. int64_t is used for coefficients, so as long as the index size <= 64 bits we should be able to represent all pointer offsets. Pointer decomposition is limited to inbounds GEPs, so if a index computation would overflow the result is poison, so it doesn't matter that the coefficient overflows. This allows replacing MulOverflow with regular multiplications.	2022-10-13 10:19:30 +01:00
Nikita Popov	440ce05fbf	[FunctionAttrs] Handle potential access of captured argument We have to account for accesses to argument memory via captures. I don't think there's any way to make this produce incorrect results right now (because as soon as "other" is set, we lose the ability to infer argmemonly), but this avoids incorrect results once we have more precise representation.	2022-10-13 11:15:36 +02:00
Nikita Popov	5b3776842f	[FunctionAttrs] Account for memory effects of inalloca/preallocated The code for inferring memory attributes on arguments claims that inalloca/preallocated arguments are always clobbered: `d71ad41080/llvm/lib/Transforms/IPO/FunctionAttrs.cpp (L640-L642)` However, we would still infer memory attributes for the whole function without taking this into account, so we could still end up inferring readnone for the function. This adds an argument clobber if there are any inalloca/preallocated arguments. Differential Revision: https://reviews.llvm.org/D135783	2022-10-13 10:20:17 +02:00
Florian Hahn	0ebd288338	[ConstraintElim] Move GEP decomposition code to separate fn (NFC). Breaks up a large function and allows for the use to early exits.	2022-10-12 20:39:05 +01:00
Arthur Eubanks	f59e1bcc22	[PrintPipeline] Handle CoroConditionalWrapper and add more verification Add a check (can be disabled via a flag) that the pipeline we generate is actually parsable. Can be disabled because we don't expect to handle every pass in -print-pipeline-passes. Fixes #58280. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D135703	2022-10-12 09:36:45 -07:00
Sanjay Patel	7b9482df3d	[InstCombine] fold sdiv with common shl amount in operands (X << Z) / (Y << Z) --> X / Y https://alive2.llvm.org/ce/z/CLKzqT This requires a surprising "nuw" constraint because we have to guard against immediate UB via signed-div overflow with -1 divisor. This extends `008a89037a` and is another transform derived from issue #58137.	2022-10-12 11:32:15 -04:00
Alexey Bataev	d71ad41080	[SLP]Fix insertpoint of the extractellements instructions to avoid reshuffle crash. Need to set the insertpoint for extractelement to point to the first instruction in the node to avoid possible crash during external uses combine process. Without it we may endup with the incorrect transformation. Differential Revision: https://reviews.llvm.org/D135591	2022-10-12 08:18:30 -07:00
Sanjay Patel	008a89037a	[InstCombine] fold udiv with common shl amount in operands (X << Z) / (Y << Z) --> X / Y https://alive2.llvm.org/ce/z/E5eaxU This fixes the motivating example from issue #58137, but it is not the most general transform. We should probably also convert left-shift in the divisor to right-shift in the dividend for that, but that exposes another missed canonicalization for shifts and adds.	2022-10-12 11:12:26 -04:00
Jordan Rupprecht	cbae57c0e1	[NFC] Ignore unused var in no-asserts builds	2022-10-12 08:11:10 -07:00
Alexey Bataev	1be3428ea0	[SLP]Fix PR58177: Improve isUndefVector function to avoid extra freeze. Freeze instruction in some cases makes codegen worse, so need to be very careful when emitting it. Instead improve analysis in isUndefVector function to generate mask of unused elements and use it in the analysis. Differential Revision: https://reviews.llvm.org/D135382	2022-10-12 07:32:54 -07:00
Sanjay Patel	fe97f95036	[InstCombine] propagate "exact" through folds of div These folds were added recently with: `6b869be810` `8da2fa856f` ...but they didn't account for the "exact" attribute, and that can be safely propagated: https://alive2.llvm.org/ce/z/F_WhnR https://alive2.llvm.org/ce/z/ft9Cgr	2022-10-12 09:25:05 -04:00
Sanjay Patel	d117ee25b8	[InstCombine] add helper function for div+shl folds; NFC There are at least 2 similar patterns that could be added here, and the existing fold can be improved because it fails to propagate "exact".	2022-10-12 09:25:04 -04:00
Florian Hahn	c1fe52bfa6	[VPlan] Remove dead recipes before sinking. optimizeInductions may leave dead recipes which can prevent sinking. Sinking on the other hand should not introduce new dead recipes, so clean up dead recipes before sinking. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D133762	2022-10-12 12:49:42 +01:00
Max Kazantsev	fbad5fdc03	[NFC] Perform all legality checks for non-trivial unswitch in one function They have been scattered over the code. For better structuring, perform them in one place. Potential CT drop is possible because we collect exit blocks twice, but it's small price to pay for much better code structure.	2022-10-12 18:35:12 +07:00
Max Kazantsev	6bfcac612f	[SimpleLoopUnswitch][NFC] Separate legality checks from cost computation These are semantically two different stages, but were entwined in the old implementation. Now cost computation does not do legality checks, and they all are done beforehead.	2022-10-12 13:31:36 +07:00
Max Kazantsev	421728b40c	[NFC] Factor out computation of best unswitch cost candidate Split out a major peice of this method to make code more readable.	2022-10-12 12:36:46 +07:00
Fangrui Song	8ef3fd8d59	[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its leader has been made available_externally. This violates the COMDAT rule that its members must be retained or discarded as a unit. To fix this, update the regular LTO change D34803 to track local linkage GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.) This fixes two problems. (a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to linger around (unreferenced, hence benign), and is now correctly discarded. ``` int foo(); inline int v = foo(); ``` (b) Fix https://github.com/llvm/llvm-project/issues/58215: as a size optimization, we place private `__profd_` in a COMDAT with a `__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change makes the `__profd_` available_externally. ``` cat > c.h <<'eof' extern void bar(); inline __attribute__((noinline)) void foo() {} eof cat > m1.cc <<'eof' int main() { bar(); foo(); } eof cat > m2.cc <<'eof' __attribute__((noinline)) void bar() { foo(); } eof clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw ``` Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D135427	2022-10-11 15:30:07 -07:00
Florian Hahn	be611ef7fa	[LoopRotation] Also drop block dispositions. LoopRotation may also fold basic blocks, so cached block dispositions also need to be dropped. Fixes #58291.	2022-10-11 15:25:27 +01:00
Sanjay Patel	7ec604a317	[InstCombine] try harder to cancel out mul/div ((Op1 * X) / Y) / Op1 --> X / Y https://alive2.llvm.org/ce/z/JYxWjA InstSimplify handles the more basic mul+div pattern with shared operand, but we don't seem to have any reassociation folds to handle cases where the common op is further away. This is a generalization of `9cff4711ac` and another transform derived from issue #58137.	2022-10-11 09:51:51 -04:00
Max Kazantsev	91aa9097ae	[NFC] Factor out collection of unswitch candidate to a separate function Just to make the code more structured and easier to understand.	2022-10-11 19:35:16 +07:00
Max Kazantsev	f18979912d	[NFC] Refine API in SimpleLoopUnswitch: add missing const notions	2022-10-11 19:35:16 +07:00
Max Kazantsev	a8a07890aa	[NFC] Refine API: add missing const notion in hasPartialIVCondition	2022-10-11 19:35:16 +07:00
Nikita Popov	df8264c46a	[SimplifyLibCalls] Use helper methods to query attributes (NFC)	2022-10-11 11:41:28 +02:00
Daniel Sanders	4a95a64e4a	[instcombine] (extelt (inselt Vec, Value, Index), Index) -> Value When Index is variable but still trivially known to be equal we can use Value from before the insertion, possibly eliminating the vector. Reverts a functional change from: Author: Philip Reames <listmail@philipreames.com> Date: Wed Dec 8 12:21:10 2021 -0800 [instcombine] A couple style tweaks to visitExtractElementInst [nfc] Thanks to Michele Scandale for identifying the bug Differential Revision: https://reviews.llvm.org/D135625	2022-10-10 15:41:53 -07:00
Sanjay Patel	baab4aa1ba	[VectorCombine] convert scalar fneg with insert/extract to vector fneg insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index --> shuffle DestVec, (fneg SrcVec), Mask This is a specialized form of what could be a more general fold for a binop. It's also possible that fneg is overlooked by SLP in this kind of insert/extract pattern since it's a unary op. This shows up in the motivating example from #issue 58139, but it won't solve it (that probably requires some x86-specific backend changes). There are also some small enhancements (see TODO comments) that can be done as follow-up patches. Differential Revision: https://reviews.llvm.org/D135278	2022-10-10 14:59:56 -04:00
Jordan Rupprecht	fb27fd5f88	Revert "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally" This reverts commit `4fbe33593c`. It causes linking errors, with details provided internally. (Hopefully the author/reviewers will be able to upstream the internal repro).	2022-10-10 11:40:45 -07:00
Florian Hahn	4b6bd1c9d5	[LoopSimplifyCFG] Clear SCEV dispositions when removing dead blocks. When removing loops & blocks we also need to clear the SCEV dispositions as they may now contain incorrect values. Fixes #58262.	2022-10-10 18:08:35 +01:00
Florian Hahn	80e49f49e4	[ConstraintElimination] Bail out for GEPs with scalable vectors. This fixes a crash with scalable vectors, thanks @nikic for spotting this!	2022-10-10 16:01:20 +01:00
Shubham Narlawar	b920407cf5	[LICM] Disable thread-safety checks in single-thread model If the single-thread model is used, or the -licm-force-thread-model-single flag is specified, skip checks related to thread-safety. This means that store promotion for conditionally executed stores only requires proof of dereferenceability and writability, but not of thread-safety. For example, this enables promotion of stores to (non-constant) globals, as well as captured allocas. Fixes https://github.com/llvm/llvm-project/issues/50537. Differential Revision: https://reviews.llvm.org/D130466	2022-10-10 16:51:16 +02:00
Alex Brachet	deb82d4a20	Revert "[PGO] Make emitted symbols hidden" This reverts commit `4ea1a647ff`. This breaks on Darwin which tries to export these symbols `ebb258d3b0/clang/lib/Driver/ToolChains/Darwin.cpp (L1363)` I'll try to reland which that removed and approval from Apple folks.	2022-10-10 14:37:59 +00:00
Sanjay Patel	9cff4711ac	[InstCombine] fold udiv with common factor ((X *nuw Y) >> Z) / X --> Y >> Z https://alive2.llvm.org/ce/z/x3kKnq This is similar to `6b869be810` / `8da2fa856f`, but I have not found a signed equivalent, so it's just an unsigned match for now.	2022-10-10 08:12:06 -04:00
Nikita Popov	874c0327e7	[Attributor] Use ConstantFoldLoadFromConst() When determining the initial value of the object, use the constant folding API to load a given type at a given offset in the global initializer. This makes it work for cases where the load doesn't directly correspond to an aggregate member. Differential Revision: https://reviews.llvm.org/D135435	2022-10-10 10:17:37 +02:00
Florian Hahn	fee8f561bd	[ConstraintElimination] Include index type scale. The current decomposition for GEPs did not correctly handle cases where GEPs access different source types. Adjust the constraints by including the indexed type-size as coefficients. Further generalization to allow GEPs with more than one index is a needed general follow-up improvement.	2022-10-09 21:53:30 +01:00
luxufan	eaf6e2fc33	[DSE] Relax constraint on isGuaranteedLoopInvariant If the location ptr to be killed is in no loop and the Function does not have irreducible loops, then we can regard it as loop invariant. Differential Revision: https://reviews.llvm.org/D135369	2022-10-06 03:01:21 +00:00
Florian Hahn	11a6e64ba7	[ConstraintElim] Move logic to get constraint for solving to helper. Move common logic shared by callers of getConstraint that use the result to query the constraint system to a new helper getConstraintForSolving. This includes common legality checks (i.e. not an equality constraint, no new variables) and the logic to query the unsigned system if possible for signed predicates.	2022-10-09 10:44:36 +01:00
Fangrui Song	4fbe33593c	[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally See the updated linkonce_resolution_comdat.ll. For a local linkage GV in a non-prevailing COMDAT, it remains defined while its leader has been made available_externally. This violates the COMDAT rule that its members must be retained or discarded as a unit. To fix this, update the regular LTO change D34803 to track local linkage GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.) Fix https://github.com/llvm/llvm-project/issues/58215: as a size optimization, we place private `__profd_` in a COMDAT with a `__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change makes the `__profd_` available_externally. ``` cat > c.h <<'eof' extern void bar(); inline __attribute__((noinline)) void foo() {} eof cat > m1.cc <<'eof' #include "c.h" int main() { bar(); foo(); } eof cat > m2.cc <<'eof' #include "c.h" __attribute__((noinline)) void bar() { foo(); } eof clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw # one _Z3foov clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw # one _Z3foov ``` Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D135427	2022-10-08 11:09:43 -07:00
Florian Hahn	e0136a62cc	[ConstraintElimination] Support chained GEPs with constant offsets. Handle the (gep (gep ....), C) case by incrementing the constant coefficient of the inner GEP, if C is a constant.	2022-10-08 16:59:27 +01:00
Florian Hahn	73950f26f5	[LV] Replace check with assert for reduction resume values (NFC). At this point, we need to have resume values for all inductions. If not, this would result in silent mis-compiles.	2022-10-08 16:26:10 +01:00
Florian Hahn	be858bda69	[ConstraintElimination] Remove unused function (NFC).	2022-10-08 16:05:56 +01:00
Sanjay Patel	eccb9a77c6	[InstCombine] fold exact sdiv to ashr (2nd try) The 1st attempt failed to updated the test checks as expected. Original commit message: sdiv exact X, (1<<ShAmt) --> ashr exact X, ShAmt (if shl is non-negative) https://alive2.llvm.org/ce/z/kB6VF7 It would probably be better to use ValueTracking to replace this and the existing transform above it, but the analysis does not account for the no-wrap properly, and it's not immediately clear to me how to fix it.	2022-10-08 10:09:44 -04:00
Sanjay Patel	68d4dbc2c1	Revert "[InstCombine] fold exact sdiv to ashr" This reverts commit `fe15290e0c`. The test checks were not updated as expected.	2022-10-08 10:02:03 -04:00
Sanjay Patel	fe15290e0c	[InstCombine] fold exact sdiv to ashr sdiv exact X, (1<<ShAmt) --> ashr exact X, ShAmt (if shl is non-negative) https://alive2.llvm.org/ce/z/kB6VF7 It would probably be better to use ValueTracking to replace this and the existing transform above it, but the analysis does not account for the no-wrap properly, and it's not immediately clear to me how to fix it.	2022-10-08 09:23:46 -04:00
Florian Hahn	9d31d1c214	[ConstraintElimination] Use logic from `3771310eed` for queries only. The logic added in `3771310eed` was placed sub-optimally. Applying the transform in ::getConstraint meant that it would also impact conditions that are added to the system by the signed <-> unsigned transfer logic. This meant we failed to add some signed facts to the signed system. To make sure we still add as many useful facts to the signed/unsigned systems, move the logic to the point where we query the system.	2022-10-08 11:03:45 +01:00
Florian Hahn	13ac102726	[LoopSimplifyCFG] Invalidate SCEV dispositions. Clear all dispositions if there are any dead blocks (which will get removed later) and also clear dispositions for removed instructions. Clearing all dispositions in case there are dead blocks happens first, which should avoid traversing SCEV use-lists for invalidating dispositions for individual values. Fixes #58179.	2022-10-07 21:35:42 +01:00
Florian Hahn	19ad1cd5ce	Recommit "[SCEV] Support clearing Block/LoopDispositions for a single value." This reverts commit `92f698f01f`. The updated version of the patch includes handling for non-SCEVable types. A test case has been added in `ec86e9a99b`.	2022-10-07 20:15:44 +01:00
Philip Reames	cb66e123c6	Remove PlaceSafepoints pass This patch was added way back in the beginning of the work which became the statepoint infrastructure. The idea was that safepoints could be inserted late in the optimization pipeline. This is true if the only concern is garbage collection, but this approach turned out to be incompatible with the requirement to also support deoptimization at safepoints. In theory, this pass would still be quite useful for an AOT compiled language which wants to support garbage collection, but we have no known users, and haven't for over 5 years. Time to remove unused code. If someone wants to use this, restoring it would not be hard. The immediate motivation for removal is that this is one of the last passes remaining which hasn't been ported to the new pass manager and the (straight forward) work to do so is not justified for unused code. Differential Revision: https://reviews.llvm.org/D135371	2022-10-07 11:51:00 -07:00
Sanjay Patel	3e6767ed5f	[InstCombine] propagate 'exact' when converting ashr to lshr The shift amount is not changing, so if we guaranteed shifting out zeros before, those bits are still zeros. https://alive2.llvm.org/ce/z/sokQca	2022-10-07 13:17:19 -04:00
Florian Hahn	92f698f01f	Revert "[SCEV] Support clearing Block/LoopDispositions for a single value." This reverts commit `9e931439dd`. This commit causes a crash when TSan, e.g. with https://lab.llvm.org/buildbot/#/builders/70/builds/28309/steps/10/logs/stdio Reverting while I extract a reproducer and submit a fix.	2022-10-07 17:58:54 +01:00
Sanjay Patel	bdfefac9a4	[InstCombine] refactor sdiv by (negative) power-of-2 folds; NFCI It's probably better to try harder on this kind of pattern by using ValueTracking.	2022-10-07 11:35:17 -04:00
Florian Hahn	9e931439dd	[SCEV] Support clearing Block/LoopDispositions for a single value. Extend forgetBlockAndLoopDisposition to allow clearing information for a single value. This can be useful when only a single value is changed, e.g. because the instruction is moved. We also need to clear the cached values for all SCEV users, because they may depend on the starting value's disposition. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D134614	2022-10-07 16:07:17 +01:00
Florian Hahn	3771310eed	[ConstraintElimination] Convert to unsigned Pred if possible. Convert SLE/SLT predicates to unsigned equivalents if both operands are known to be signed-positive. https://alive2.llvm.org/ce/z/tBeiZr	2022-10-07 12:27:36 +01:00
Nikita Popov	b43a4d0850	[LoopPeeling] Support peeling loops with non-latch exits Loop peeling currently requires that a) the latch is exiting b) a branch and c) other exits are unreachable/deopt. This patch removes all of these limitations, and adds the necessary branch weight updating support. It essentially works the same way as before with latch -> exiting terminator and loop trip count -> per exit trip count. It's worth noting that there are still other limitations in profitability heuristics: This patch enables peeling of loops to make conditions invariant (which is pretty much always highly profitable if possible), while peeling to make loads dereferenceable still checks that non-latch exits are unreachable and PGO-based peeling has even more conditions. Those checks could be relaxed later if we consider those cases profitable. The motivation for this change is that loops using iterator adaptors in Rust often optimize very badly, and end up with a loop phi of the form phi(true, false) in the final result. Peeling eliminates that phi and conditions based on it, which enables a lot of follow-on simplification. Differential Revision: https://reviews.llvm.org/D134803	2022-10-07 12:35:52 +02:00
Nikita Popov	ccf53cae32	[ValueTracking] Remove unused Offset argument in getConstantStringInfo() (NFC)	2022-10-07 11:35:55 +02:00
Dmitry Makogon	8307f6c854	[LoopPredication] Insert assumes of conditions of predicated guards As LoopPredication performs non-equivalent transforms removing some checks from loops, other passes may not be able to perform transforms they'd be able to do if the checks were left in loops. This patch makes LoopPredication insert assumes of the replaced conditions either after a guard call or in the true block of widenable condition branch. Differential Revision: https://reviews.llvm.org/D135354	2022-10-07 16:10:24 +07:00
Nikita Popov	333246b48e	Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify Relative to the previous attempt, this adjusts simplification to use the correct context instruction: We need to use the terminator of the incoming block, not the original instruction. ----- foldOpIntoPhi() currently only folds operations into the phi if all but one operands constant-fold. The two exceptions to this are freeze and select, where we allow more general simplification. This patch makes foldOpIntoPhi() generally simplification based and removes all the instruction-specific logic. We just try to simplify the instruction for each operand, and for the (potentially) one non-simplified operand, we move it into the new block with adjusted operands. This fixes https://github.com/llvm/llvm-project/issues/57448, which was my original motivation for the change. Differential Revision: https://reviews.llvm.org/D134954	2022-10-07 11:04:19 +02:00
Alina Sbirlea	b9898e7ed1	Revert "Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify" This reverts commit `e94619b955`.	2022-10-06 13:12:24 -07:00
Alexey Bataev	323ed2308a	[SLP]Improve/fix CSE analysis of the blocks/instructions. Added analysis for invariant extractelement instructions and improved detection of the CSE blocks for generated extractelement instructions. Differential Revision: https://reviews.llvm.org/D135279	2022-10-06 12:08:48 -07:00
Alex Brachet	4ea1a647ff	[PGO] Make emitted symbols hidden Differential Revision: https://reviews.llvm.org/D135340	2022-10-06 18:28:16 +00:00
Bjorn Pettersson	0db4b1d1a8	[SimplifyLibCalls] Adjust code comment in optimizeStringLength. NFC The limitation in LibCallSimplifier::optimizeStringLength to only optimize when the string is an i8 array was changed already in commit `50ec0b5dce` back in 2017. We still only simplify when 's' points at an array of 'CharSize', so the comment is still valid in the sense that we do not support arbitrary array types. Differential Revision: https://reviews.llvm.org/D135261	2022-10-06 20:00:27 +02:00
Arthur Eubanks	ae5733346f	Revert "[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI" This reverts commit `cd8f3e7581`. Causes miscompiles, see D132657	2022-10-06 10:36:02 -07:00
Sanjay Patel	8da2fa856f	[InstCombine] fold sdiv with hidden common factor (X * Y) s/ (X << Z) --> Y s/ (1 << Z) https://alive2.llvm.org/ce/z/yRSddG issue #58137	2022-10-06 13:11:50 -04:00
Florian Hahn	a7ac0dd0cf	[ConstraintElimination] Generalize AND matching. Extend more general matching used for chains of ORs to also support chains of ANDs.	2022-10-06 17:17:38 +01:00
Sanjay Patel	6b869be810	[InstCombine] fold udiv with hidden common factor (X * Y) u/ (X << Z) --> Y u>> Z https://alive2.llvm.org/ce/z/4G9D_W	2022-10-06 11:35:27 -04:00
Florian Hahn	8e3e96298f	[ConstraintElimination] Order cmps for signed <-> unsigned transfer first. Make sure conditions with constant operands come before conditions without constant operands. This increases the effectiveness of the current signed <-> unsigned fact transfer logic.	2022-10-06 15:56:25 +01:00
Florian Hahn	349375d093	[ConstraintElimination] Generalize OR matching. Extend OR handling to traverse chains of ORs.	2022-10-06 11:56:22 +01:00
Nikita Popov	028874dd61	[Local] Fix unused variable warnings (NFC)	2022-10-06 10:30:59 +02:00
Florian Hahn	7449570ff7	[ConstraintElimination] Use ConstraintTy::IsSigned instead of Predicate. This should be NFC and ensure the sign of the constraint is used consistently in the future.	2022-10-06 07:51:49 +01:00
Carl Ritson	c316332e17	[Sink] Allow sinking of invariant loads across critical edges Invariant loads can always be sunk. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D135133	2022-10-06 09:21:12 +09:00
Florian Hahn	9aa004a04c	[ConstraintElimination] Convert NewIndices to vector and rename (NFCI). The callers of getConstraint only require a list of new variables. Update the naming and types to make this clearer.	2022-10-05 16:25:00 +01:00
Johannes Doerfert	e18736149c	[Attributor] Teach AAPointerInfo about atomic cmxchg and rmw The atomic operations behave similar to a store except that we don't know the new value and we read the result first.	2022-10-05 06:48:00 -07:00
Johannes Doerfert	93e51fa444	[Attributor] AAPointerInfo can model non-escaping call uses If a call base use will not capture a pointer we can approximate the effects. This is important especially for readnone/only uses. Even may-write uses are not too bad with reachability in place. Capturing is the problem as we loose track of update sides.	2022-10-05 06:29:14 -07:00
Johannes Doerfert	477e8e10f0	[Attributor] Teach AAPointerInfo to look into aggregates If we have a constant aggregate, e.g., as an initializer, we usually failed to extract the proper value/type from it. This patch provides the size and offset information necessary to extract the right part of the constant.	2022-10-05 06:19:47 -07:00
Nikita Popov	5fa14ee835	[MemCpyOpt] Don't hoist above producer of pointer operand This was already handled correctly below, but not checked for the original store pointer operand. Encountered when converting tests to opaque pointers, where the intermediate bitcast goes away.	2022-10-05 14:52:33 +02:00
David Stuttard	d1d7d2235c	[AggressiveInstCombine] Fix cases where non-opaque pointers are used In the case of non-opaque pointers, when combining consecutive loads, need to bitcast the pointer source to the combined type size, otherwise asserts are triggered. Differential Revision: https://reviews.llvm.org/D135249	2022-10-05 13:42:46 +01:00
Nikita Popov	e94619b955	Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify The infinite loop seen on buildbots should be fixed by `11897708c0` (assuming there are not multiple infinite combine loops...) ----- foldOpIntoPhi() currently only folds operations into the phi if all but one operands constant-fold. The two exceptions to this are freeze and select, where we allow more general simplification. This patch makes foldOpIntoPhi() generally simplification based and removes all the instruction-specific logic. We just try to simplify the instruction for each operand, and for the (potentially) one non-simplified operand, we move it into the new block with adjusted operands. This fixes https://github.com/llvm/llvm-project/issues/57448, which was my original motivation for the change. Differential Revision: https://reviews.llvm.org/D134954	2022-10-05 14:00:19 +02:00
Nikita Popov	11897708c0	[InstCombine] Directly replace instr in foldIntegerTypedPHI() (NFCI) Rather than inserting a ptrtoint + inttoptr pair, directly replace the inttoptr with the new phi node. This ensures that no other transform can undo it before the pair gets folded away. This avoids the infinite loop when combined with D134954. This is NFCI in the sense that it shouldn't make a difference, but could due to different worklist order.	2022-10-05 13:28:23 +02:00
Florian Hahn	469f0fc6a6	[SimpleLoopUnswitch] Clear dispos in deleteDeadBlocksFromLoop. SimpleLoopUnswitch may remove blocks from loops. Clear block and loop dispositions in that case, to clean up invalid entries in the cache. Fixes #58158. Fixes #58159.	2022-10-05 10:28:15 +01:00
Johannes Doerfert	a9557115b4	[Attributor] Qualify variables to avoid clashes in the future	2022-10-04 19:43:04 -07:00
Gulfem Savrun Yeniceri	d7592bbb03	Revert "Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify" This reverts commit `e1dd2cd063` because the original commit `b20e34b39f` had a dramatic increase in the build time of RTfuzzer, which caused Fuchsia Clang toolchain builders to timeout: https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8801248587754572961/overview	2022-10-04 20:57:34 +00:00
Florian Hahn	4f827318e3	[LoopVersioning,LLE] Clear LoopAccessInfoManager after making changes. Loop versioning changes the control-flow, which may impact SCEVs cached by for other loops in LoopAccessInfoManager. Clear the manager after making changes. Fixes #57825. Depends on D134609. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D134611	2022-10-04 21:35:42 +01:00
Ram-NK	a58b6acf1f	[NFC][LoopInterchange] Clean up of irrelevent dependency checking with isOuterMostDepPositive() The function isOuterMostDepPositive() is checked after negative dependence vectors are normalized to be non-negative, so there will not be any negative dependency ('>' as the outermost non-equal sign) after normalization. And therefore the check in isOuterMostDepPositive() is irrelevent and redundant. Reviewed By: congzhe Differential Revision: https://reviews.llvm.org/D132982	2022-10-04 14:54:08 -04:00
Alexey Bataev	ab9a81f736	[SLP]Try to emit canonical shuffle with undef operand. In the canonical form of the shuffle the poison/undef operand is the second operand, the patch tries to emit canonical form for partial vectorization of the buildvector sequence. Also, this patch starts emitting freeze instruction for shuffles with undef indices if the second shuffle operan is undef, not poison. It is an initial step to D93818, where undef mask element are treated as returning poison value. Differential Revision: https://reviews.llvm.org/D134377	2022-10-04 08:16:07 -07:00
Nikita Popov	e1dd2cd063	Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify Reapply with a fix for the case where an operand simplified back to the original phi: We need to map this case to the new phi node. ----- foldOpIntoPhi() currently only folds operations into the phi if all but one operands constant-fold. The two exceptions to this are freeze and select, where we allow more general simplification. This patch makes foldOpIntoPhi() generally simplification based and removes all the instruction-specific logic. We just try to simplify the instruction for each operand, and for the (potentially) one non-simplified operand, we move it into the new block with adjusted operands. This fixes https://github.com/llvm/llvm-project/issues/57448, which was my original motivation for the change.	2022-10-04 15:18:34 +02:00
Alex Richardson	16f9c5577d	[SimplifyLibCalls] Retain attributes added by Builder.CreateMem* This currently does not make much of a difference (only one tests is affected), but it is helpful e.g. for the out-of-tree CHERI target where Builder.CreateMemCpy() can add attributes other than parameter alignment. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D135075	2022-10-04 13:11:34 +00:00
Bjorn Pettersson	491ac8f3e8	[LibCalls] Cast Char argument to 'int' before calling emitFPutC The helpers in BuildLibCalls normally expect that the Value arguments already have the correct type (matching the lib call signature). And exception has been emitFPutC which casted the Char argument to 'int' using CreateIntCast. This patch moves the cast to the caller instead of doing it inside emitFPutC. I think it makes sense to make the BuildLibCall API:s a bit more consistent this way, despite the need to handle the int cast in two different places now. Differential Revision: https://reviews.llvm.org/D135066	2022-10-04 12:52:05 +02:00
Bjorn Pettersson	aa1b64cc42	[BuildLibCalls] Use TLI to get 'int' and 'size_t' type sizes Stop assuming that an 'int' is 32 bits in helpers that emit libcalls to lib functions that had 'int' in the signature. For most targets this is NFC. For a target with 16 bit 'int' type this could help out detecting if trying to emit a libcall with incorrect signature. Similarly we now derive the type mapping to 'size_t' by asking TLI about the size of 'size_t'. This should be NFC (at least for in-tree targets) since getSizeTSize(), in TLI, is deriving the size in the same way as DataLayout::getIntPtrType(). Differential Revision: https://reviews.llvm.org/D135065	2022-10-04 12:52:05 +02:00
Bjorn Pettersson	73e8d95d28	[BuildLibCalls] Name types to identify when 'int' and 'size_t' is assumed. NFC Lots of BuildLibCalls helpers are using Builder::getInt32Ty to get a type matching an 'int', and DataLayout::getIntPtrType to get a type matching 'size_t'. The former is not true for all targets, since and 'int' isn't always 32 bits. And the latter is a bit weird as well as the definition of DataLayout::getIntPtrType isn't clearly mapping it to 'size_t'. This patch is not aiming at solving any such problems. It is merely highlighting when a libcall is expecting to use 'int' and 'size_t' by naming the types as IntTy and SizeTTy when preparing the type signatures for the emitted libcalls. Differential Revision: https://reviews.llvm.org/D135064	2022-10-04 12:52:05 +02:00
Florian Hahn	825e16969e	[LAA] Pass LoopAccessInfoManager instead of GetLAA function. Use LoopAccessInfoManager directly instead of various GetLAA lambdas. Depends on D134608. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D134609	2022-10-04 11:51:25 +01:00
Florian Hahn	e399dd601f	[SimpleLoopUnswitch] Clear block and loop dispos after destroying loop. SimpleLoopUnswitch may remove loops. Clear block and loop dispositions, to clean up invalid entries in the cache. Fixes #58136.	2022-10-04 10:27:52 +01:00
Nikita Popov	635f93dff7	[SimplifyLibCalls] Place deref attr even if nonnull already set If nonnull is already set, we currently skip setting both nonnull and dereferenceable. Make these independent, to avoid regressions when additional nonnull attributes are inferred earlier.	2022-10-04 11:26:15 +02:00
Nikita Popov	0f32f0e147	Revert "[InstCombine] Switch foldOpIntoPhi() to use InstSimplify" This reverts commit `b20e34b39f`. This causes RAUW type mismatch assertions on some buildbots, reverting for now.	2022-10-04 11:17:09 +02:00
Nikita Popov	b20e34b39f	[InstCombine] Switch foldOpIntoPhi() to use InstSimplify foldOpIntoPhi() currently only folds operations into the phi if all but one operands constant-fold. The two exceptions to this are freeze and select, where we allow more general simplification. This patch makes foldOpIntoPhi() generally simplification based and removes all the instruction-specific logic. We just try to simplify the instruction for each operand, and for the (potentially) one non-simplified operand, we move it into the new block with adjusted operands. This fixes https://github.com/llvm/llvm-project/issues/57448, which was my original motivation for the change.	2022-10-04 10:12:14 +02:00
Florian Hahn	b2daeb9706	[ConstraintElimination] Re-enable debug print when adding facts. (NFC) Also add test coverage for important debug output.	2022-10-03 20:51:58 +01:00
Florian Hahn	bdfe986d66	[ConstraintElimination] Simplify logic for using inverse predicate (NFC) Recent improvements to the code structure mean we don't need to reset the condition's predicate in the IR and later restore it. Remove the restorer logic.	2022-10-03 18:35:59 +01:00
Florian Hahn	017552e8b8	[ConstraintElimination] Remove stray comment (NFC). The comment doens't apply in the current context, remove it.	2022-10-03 18:20:48 +01:00
Florian Hahn	a2efc29e99	[ConstraintElimination] Remove unused StackEntry::IsNot field. (NFC) The field is no unused and can be removed.	2022-10-03 18:05:25 +01:00
Alex Richardson	3890a456d8	[SimplifyLibCalls] Reduce code duplication. NFC Reviewed By: nikic, nickdesaulniers, xbolva00 Differential Revision: https://reviews.llvm.org/D135073	2022-10-03 15:44:00 +00:00
Yevgeny Rouban	e351821088	Fix compilation of CodeLayout.cpp for MacOS llvm/lib/Transforms/Utils/CodeLayout.cpp uses std::abs() with double argument, which is provided by cmath header, which is not explicitly included into CodeLayout.cpp. The implicit include in llvm/include/llvm/Support/MathExtras.h was removed in commit `16544cbe64` Inserting explicit include of cmath into CodeLayout.cpp in order to fix build on MacOS. Committed on behalf of alsemenov (Aleksei Semenov) Reviewed By: thieta Differential Revision: https://reviews.llvm.org/D135072	2022-10-03 21:47:43 +07:00
Bjorn Pettersson	66fcdfca4d	[Analysis][SimplifyLibCalls] Refactor code related to size_t in lib func signatures. NFC Added a helper in TargetLibraryInfo to get size of "size_t" in bits, given a Module reference. The new getSizeTSize helper is using the same strategy as for example isValidProtoForLibFunc has been using in the past, assuming that the size can be derived by asking DataLayout about the size/type of a pointer to int. FortifiedLibCallSimplifier::optimizeStrpCpyChk was changed to use the new getSizeTSize helper instead of assuming that sizeof(size_t) is equal to sizeof(int*) by itself (that is the assumption used in TargetLibraryInfoImpl::getSizeTSize so the result will be the same). Having a common helper for this ensure that we use the same strategy when deriving the size of "size_t" in different parts of the code. One bonus with this refactoring (basing it on Module instead of just DataLayout) is that it makes it easier to override this for a specific target triple, in case the assumption of using getPointerSizeInBits wouldn't hold. Differential Revision: https://reviews.llvm.org/D110585	2022-10-03 12:02:50 +02:00
Sanjay Patel	2e87333bfe	[InstCombine] convert mul by negative-pow2 to negate and shift This is an unusual canonicalization because we create an extra instruction, but it's likely better for analysis and codegen (similar reasoning as D133399). InstCombine::Negator may create this kind of multiply from negate and shift, but this should not conflict because of the narrow negation. I don't know how to create a fully general proof for this kind of transform in Alive2, but here's an example with bitwidths similar to one of the regression tests: https://alive2.llvm.org/ce/z/J3jTjR Differential Revision: https://reviews.llvm.org/D133667	2022-10-02 12:22:25 -04:00
Florian Hahn	3fe6ddd999	[ConstraintElimination] Update Changed status in ssub simplification. Update tryToSimplifyOverflowMath to indicate whether the function made any changes to the IR.	2022-10-02 14:25:51 +01:00
Arthur Eubanks	5df4ab55f9	[llvm] Migrate PAEval to new pass manager	2022-10-01 16:41:58 -07:00
Florian Hahn	7c0ff64b0f	[LAA] Change to function analysis for new PM. At the moment, LoopAccessAnalysis is a loop analysis for the new pass manager. The issue with that is that LAI caches SCEV expressions and modifications in a loop may impact SCEV expressions in other loops, but we do not have a convenient way to invalidate LAI for other loops withing a loop pipeline. To avoid this issue, turn it into a function analysis which returns a manager object that keeps track of the individual LAI objects per loop. Fixes #50940. Fixes #51669. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D134606	2022-10-01 15:44:27 +01:00
Teresa Johnson	43417d8159	[MemProf] Update metadata during inlining Update both memprof and callsite metadata to reflect inlined functions. For callsite metadata this is simply a concatenation of each cloned call's call stack with that of the inlined callsite's. For memprof metadata, each profiled memory info block (MIB) is either moved to the cloned allocation call or left on the original allocation call depending on whether its context matches the newly refined call stack context on the cloned call. We also reapply context trimming optimizations based on the refined set of contexts on each of the calls (cloned and original). Depends on D128142. Reviewed By: snehasish Differential Revision: https://reviews.llvm.org/D128143	2022-09-30 19:21:15 -07:00
Teresa Johnson	4d243348fb	Revert "[MemProf] Update metadata during inlining" and preceeding commit This reverts commit `0d7f3464ce` and commit `f9403ca41e`. The latter was "Profile matching and IR annotation for memprof profiles." and was left from a bad rebase from a commit already pushed upstream.	2022-09-30 17:01:30 -07:00
Teresa Johnson	0d7f3464ce	[MemProf] Update metadata during inlining Update both memprof and callsite metadata to reflect inlined functions. For callsite metadata this is simply a concatenation of each cloned call's call stack with that of the inlined callsite's. For memprof metadata, each profiled memory info block (MIB) is either moved to the cloned allocation call or left on the original allocation call depending on whether its context matches the newly refined call stack context on the cloned call. We also reapply context trimming optimizations based on the refined set of contexts on each of the calls (cloned and original), via utilities in MemoryProfileInfo. Depends on D128142. Differential Revision: https://reviews.llvm.org/D128143	2022-09-30 16:46:17 -07:00
Teresa Johnson	f9403ca41e	Profile matching and IR annotation for memprof profiles. See also related RFCs: RFC: Sanitizer-based Heap Profiler [1] RFC: A binary serialization format for MemProf [2] RFC: IR metadata format for MemProf [3]* * Note that the IR metadata format has changed from the RFC during implementation, as described in the preceeding patch adding the basic metadata and verification support. The matching is performed during the normal PGO annotation phase, to ensure that the inlines applied in the IR at that point are a subset of the inlines in the profiled binary and thus reflected in the profile's call stacks. This is important because the call frames are associated with functions in the profile based on the inlining in the symbolized call stacks, and this simplifies locating the subset of profile data relevant for matching onto each function's IR. The PGOInstrumentationUse pass is enhanced to perform matching for whatever combination of memprof and regular PGO profile data exists in the profile. Using the utilities introduced in D128854: The memprof profile data for each context is converted to "cold" or "notcold" based on parameterized thresholds for size, access count, and lifetime. The memprof allocation contexts are trimmed to the minimal amount of context required to uniquely identify whether the context is cold or not cold. For allocations where all profiled contexts have the same allocation type, no memprof metadata is attached and instead the allocation call is directly annotated with an attribute specifying the alloction type. This is the same attributed that will be applied to allocation calls once cloned for different contexts, and later used during LibCall simplification to emit allocation hints [4]. Depends on D128141 and D128854. [1] https://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html [2] https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html [3] https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165 [4] `ab87cf382d` Differential Revision: https://reviews.llvm.org/D128142	2022-09-30 16:46:17 -07:00
Sanjay Patel	2053070443	[SCCP] remove unnecessary check for constant when folding sext->zext I'm not sure how to test this because we seem to constant-fold all examples already. We changed this code to use the common isNonNegative() helper, so it should not be necessary to avoid a constant. This makes the code uniform for all transforms.	2022-09-30 17:26:10 -04:00
Sanjay Patel	89a5d804c1	[SCCP] add a code comment about sitofp -> uitofp; NFC D134975 would have added this fold, but we decided it's not worth doing without some evidence of benefit.	2022-09-30 17:26:10 -04:00
Florian Hahn	04c711c78d	[ConstraintElimination] Make sure the variable is available before use. This fixes a crash when trying to access an index for a value where we don't have a known index. Fixes #58009.	2022-09-30 18:09:01 +01:00
Nikita Popov	d40dcb0b8d	[LICM] Collect more scalar promotion stats (NFC) Collect more statistics for scalar promotion. In particular, keep track of how many promotion candidates there were, and whether it is a load or a load/store promotion.	2022-09-30 16:07:52 +02:00
Simon Pilgrim	5849fcb635	Revert rG1b7089fe67b924bdd5ecef786a34bdba7a88778f "[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions" Revert rGef89409a59f3b79ae143b33b7d8e6ee6285aa42f "Fix 'unused-lambda-capture' gcc warning. NFCI." Revert rG926ccfef032d206dcbcdf74ca1e3a9ebf4d1be45 "[SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds" Revert ScalarizationOverheadBuilder sequence from D134605 - when accumulating extraction costs by Type (instead of specific Value), we are not distinguishing enough when they are coming from the same source or not, and we always just count the cost once. This needs addressing before we can use getScalarizationOverhead properly.	2022-09-30 11:22:48 +01:00
Florian Hahn	8ae0d9aa07	[LoopDeletion] Clear block & loop dispo cache after breaking backedge. breakLoopBackedge may remove blocks and loops. Also clear block & loop disposition to avoid the cache containing invalid blocks and loops. The coverage for the change is provided when using an ASAN build of opt to run the LoopDeletion unit tests; without the fix, pointers to invalid objects would be used. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D134663	2022-09-30 11:21:58 +01:00
Florian Hahn	9933a2e9fd	[SCEVExpander] Move LCSSA fixup to ::expand. Move LCSSA fixup from ::expandCodeForImpl to ::expand(). This has the advantage that we directly preserve LCSSA nodes here instead of relying on doing so in rememberInstruction. It also ensures that we don't add the non-LCSSA-safe value to InsertedExpressions. Alternative to D132704. Fixes #57000. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134739	2022-09-29 20:49:56 +01:00
luxufan	f079ba76cf	[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI For noop store of the form of LoadI and StoreI, An invariant should be kept is that the memory state of the related MemoryLoc before LoadI is the same as before StoreI. For this example: ``` define void @pr49927(i32* %q, i32* %p) { %v = load i32, i32* %p, align 4 store i32 %v, i32* %q, align 4 store i32 %v, i32* %p, align 4 ret void } ``` Here the definition of the store's destination is different with the definition of the load's destination, which it seems that the invariant mentioned above is broken. But the definition of the store's destination would write a value that is LoadI, actually, the invariant is still kept. So we can safely ignore it. Fixes https://github.com/llvm/llvm-project/issues/49271 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D132657	2022-09-29 00:51:56 +00:00
Nikola Tesic	b5d28f3ea5	[Debugify][OriginalDIMode] Make HTML reporting infrastructure more resilient Debugify in OriginalDebugInfo mode (verify-each-debuginfo-preserve), when used in parallel builds of large projects, can produce incorrect report. More precisely, simultaneous writes to JSON report file, could form incorrect JSON objects, which describe found Debug Info bugs. This patch uses the lock/unlock mechanism to protect JSON report file and also makes script llvm/utils/llvm-original-di-preservation.py resilient to corrupted lines in the report file. So, it ensures the creation of HTML report. Differential Revision: https://reviews.llvm.org/D115616	2022-09-29 16:48:06 +02:00
eopXD	8cbdb1e081	[LSR][NFC] Add missing constness	2022-09-29 06:30:50 -07:00
Nikita Popov	412141663c	Reapply [FunctionAttrs] Infer precise FMRB The previous version of the patch would incorrect convert an existing argmemonly attribute into an inaccessiblemem_or_argmemonly attribute. ----- This updates checkFunctionMemoryAccess() to infer a precise FunctionModRefBehavior, rather than an approximation split into read/write and argmemonly. Afterwards, we still map this back to imprecise function attributes. This still allows us to infer some cases that we previously did not handle, namely inaccessiblememonly and inaccessiblemem_or_argmemonly. In practice, this means we get better memory attributes in the presence of intrinsics like @llvm.assume. Differential Revision: https://reviews.llvm.org/D134527	2022-09-29 14:02:15 +02:00
Florian Hahn	080a1e2bbb	[LV] Create createInductionResumeValue helper (NFC). Factor out the logic to create induction resume values for a specific induction. This will be used in D92132 to support widened IVs during epilogue vectorization. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D134211	2022-09-29 11:13:01 +01:00
Juan Manuel MARTINEZ CAAMAÑO	52545e603b	[DebugInfo][InferAddressSpaces] Propagate DebugLoc when cloning an instruction in InferAddressSpaces Differential Revision: https://reviews.llvm.org/D134428	2022-09-29 08:43:37 +00:00
Juan Manuel MARTINEZ CAAMAÑO	e9716c64ec	[StructurizeCFG] Remove imposible case and replace by assert In addition, replace outdated XFAIL test by a new one. Differential Revision: https://reviews.llvm.org/D134439	2022-09-29 08:27:49 +00:00
Florian Hahn	9247b012d6	[SCEVExpander] Use CreateBitOrPointerCast instead of builder (NFC). Simplify the code by using CastInst::CreateBitOrPointerCast directly. By not going through the builder, the temporary instruction also won't get registered in InsertedValues & co, which means less work overall and simplifies the clean-up.	2022-09-29 09:24:39 +01:00
Gulfem Savrun Yeniceri	5bdf22e743	[InstrProfiling] Fix emitting runtime hook once https://reviews.llvm.org/D134254 introduced an issue on Fuchsia target, which does not unconditionally emit runtime hook. It used containsProfilingIntrinsics(M) after intrinsics are lowered. So, this patch fixes the issue by capturing the result of that function invocation before intrinsics are lowered. Differential Revision: https://reviews.llvm.org/D134841	2022-09-29 01:21:49 +00:00
Florian Mayer	e06c9b63bc	[NFC] [HWASan] remove unnecessary cast	2022-09-28 17:48:19 -07:00
Florian Mayer	0401dc2913	[MTE] [HWASan] unify isInterestingAlloca Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D134779	2022-09-28 15:52:34 -07:00
serge-sans-paille	16544cbe64	[iwyu] Move <cmath> out of llvm/Support/MathExtras.h Interestingly, MathExtras.h doesn't use <cmath> declaration, so move it out of that header and include it when needed. No functional change intended, but there's no longer a transitive include fromMathExtras.h to cmath.	2022-09-28 20:49:01 +02:00
Mingming Liu	ac28efa6c1	[SimplifyCFG][TranformUtils]Do not simplify away a trivial basic block if both this block and at least one of its predecessors are loop latches. - Before this patch, loop metadata (if exists) will override the metadata of each predecessor; if the predecessor block already has loop metadata, the orignal loop metadata won't be preserved and could cause missed loop transformations (see 'test2' in llvm/test/Transforms/SimplifyCFG/preserve-llvm-loop-metadata.ll). To illustrate how inner-loop metadata might be dropped before this patch: CFG Before entry \| v ---> while.cond -------------> while.end \| \| \| v \| while.body \| \| \| v \| for.body <---- (md1) \| \| \|______\| \| v \| while.cond.exit (md2) \| \| \|_______\| CFG After entry \| v ---> while.cond.rewrite -------------> while.end \| \| \| v \| while.body \| \| \| v \| for.body <---- (md2) \|_______\| \|______\| Basically, when 'while.cond.exit' is folded into 'while.cond', 'md2' overrides 'md1' and 'md1' is dropped from the CFG. Differential Revision: https://reviews.llvm.org/D134152	2022-09-28 10:48:14 -07:00
bipmis	3b49a9fcf6	[AggressiveInstCombine] Combine consecutive loads which are being merged to form a wider load. The patch simplifies some of the patterns as below 1. (ZExt(L1) << shift1) \| (ZExt(L2) << shift2) -> ZExt(L3) << shift1 2. (ZExt(L1) << shift1) \| ZExt(L2) -> ZExt(L3) The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc. Fix the error reported on reverse load merge. Differential Revision: https://reviews.llvm.org/D127392	2022-09-28 17:32:47 +01:00
Sanjay Patel	e239198cdb	[InstCombine] fold select shuffles with shared operand together We don't combine generic shuffles together in IR, but select shuffles are a special-case because a select shuffle of a select shuffle is just another select shuffle; codegen is expected to efficiently lower those (select shuffles are also the canonical form of a vector select with constant condition).	2022-09-28 11:56:27 -04:00
Sameer Sahasrabuddhe	3f078b308b	[AAPointerInfo] OffsetInfo: Unassigned is distinct from Unknown A User like the PHINode may be visited multiple times for the same pointer along different def-use edges. The uninitialized state of OffsetInfo at the first visit needs to be distinct from the Unknown value that may be assigned after processing the PHINode. Without that, a PHINode with all inputs Unknown is never followed to its uses. This results in incorrect optimization because some interfering accessess are missed. Differential Revision: https://reviews.llvm.org/D134704	2022-09-28 20:31:36 +05:30
Benjamin Kramer	0fb2676c24	Revert "[FunctionAttrs] Infer precise FMRB" This reverts commit `97dfa53626`. It can make DSE crash. Reduced test case at https://reviews.llvm.org/P8291	2022-09-28 16:57:43 +02:00
Florian Hahn	ed47bc8b58	[SCEVExpander] Remove dead Root argument from expandCodeForImpl (NFC). The argument is unused and can be removed.	2022-09-28 12:08:36 +01:00
Florian Hahn	2b23a58924	[LoopDeletion] Forget block and loop dispositions after deleting loop. After deleting a loop, the block and loop dispositions need to be cleared. As we don't know which SCEVs in the loop/blocks may be impacted, completely clear the cache. This should also fix some cases where deleted loops remained in the LoopDispositions cache. This fixes a verification failure surfaced by D134531. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D134613	2022-09-28 11:33:43 +01:00
Simon Pilgrim	926ccfef03	[SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds Workaround for a chromium bug reported on D134605 - test case will be added later	2022-09-28 11:03:37 +01:00
Igor Kirillov	2d60d7ba1a	[LoopVectorize][Fix] Crash when invariant store address is calculated inside loop Fixes #57572 Generally LICM pass is responsible for sinking out code that calculates invariant address inside loop as it only needed to be calculated once. But in rare case it does not happen we will not be vectorizing the loop. Differential Revision: https://reviews.llvm.org/D133687	2022-09-28 10:33:50 +01:00
Philip Reames	f6d110e26f	[LAA] Make getPtrStride return Option instead of overloading zero as error value [nfc] This is purely NFC restructure in advance of a change which actually exposes zero strides. This is mostly because I find this interface confusing each time I look at it.	2022-09-27 15:55:44 -07:00
Philip Reames	899ebd7e99	[LV] Remove two unused default arguments [nfc]	2022-09-27 14:33:53 -07:00
Martin Sebor	e80e134c77	[InstCombine] Add support for stpncpy folding Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D130922	2022-09-27 14:44:33 -06:00
Doru Bercea	c9adeca501	Move allocas converted from __kmpc_alloc_shared to entry block.	2022-09-27 17:16:58 +00:00
Philip Reames	dc7387b587	[LV] Adjust cost model to use uniform store lowering for unpredicated uniform stores Follow up to D133580; adjust the cost model to prefer uniform store lowering for scalable stores which are unpredicated. The impact here isn't in the uniform store lowering quality itself. InstCombine happily converts the scatter form into the single store form. The main impact is in letting the rest of the cost model make choices based on the knowledge that the vector will be scalarized on use. Differential Revision: https://reviews.llvm.org/D134460	2022-09-27 07:28:40 -07:00
Simon Pilgrim	ef89409a59	Fix 'unused-lambda-capture' gcc warning. NFCI.	2022-09-27 15:15:43 +01:00
Simon Pilgrim	1b7089fe67	[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions Instead of accumulating all extraction costs separately and then adjusting for repeated subvector extractions, this patch collects all the extractions and then converts to calls to getScalarizationOverhead to improve the accuracy of the costs. I'm not entirely satisfied with the getExtractWithExtendCost handling yet - this still just adds all the getExtractWithExtendCost costs together - it really needs to be replaced with a "getScalarizationOverheadWithExtend", but that will require further refactoring first. This replaces my initial attempt in D124769. Differential Revision: https://reviews.llvm.org/D134605	2022-09-27 14:49:07 +01:00
Florian Hahn	3abaa3760d	[LSR] Preserve LCSSA in expander when rewriting loop exit values. The expanded values when rewriting exit values need to preserve LCSSA. Ask SCEVExpander to preserve LCSSA to ensure that. Fixes #58007.	2022-09-27 09:58:48 +01:00
Nikita Popov	97dfa53626	[FunctionAttrs] Infer precise FMRB This updates checkFunctionMemoryAccess() to infer a precise FunctionModRefBehavior, rather than an approximation split into read/write and argmemonly. Afterwards, we still map this back to imprecise function attributes. This still allows us to infer some cases that we previously did not handle, namely inaccessiblememonly and inaccessiblemem_or_argmemonly. In practice, this means we get better memory attributes in the presence of intrinsics like @llvm.assume. Differential Revision: https://reviews.llvm.org/D134527	2022-09-27 10:14:35 +02:00
Florian Hahn	275bee32ad	[LoopUnroll] Forget block and loop dispositions during unrolling. After unrolling a loop, the block and loop dispositions need to be cleared. As we don't know which SCEVs in the loop/blocks may be impacted, completely clear the cache. This should also fix some cases where deleted loops remained in the LoopDispositions cache. This fixes a verification failure surfaced by D134531. I am planning on reviewing/updating the existing uses of forgetLoopDispositions to check if they should be replaced by forgetBlockAndLoopDispositions. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D134612	2022-09-27 08:49:04 +01:00
Sebastian Peryt	46fc75ab28	[NFC][2/n] Remove PrunePH pass Second patch in the series to remove legacy PM and associated -enable-new-pm=0 flag targets pass that has not been ported to new PM - PruneEH. Discussion about this can be found in D44415. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D134686	2022-09-26 18:38:04 -07:00
Sanjay Patel	def6cbd2bd	[InstCombine] add assert/test for zext to i1 This is a test to verify that we do not crash with the problem noted in issue #57986. The root problem should be fixed with a prior change to InstSimplify.	2022-09-26 16:01:25 -04:00
Matt Arsenault	473e83b95a	GuardWidening: Pass through AssumptionCache (NFC)	2022-09-26 14:53:00 -04:00
Matt Arsenault	9bf1aea224	LoopPeel: Pass through AssumptionCache (NFC)	2022-09-26 14:52:59 -04:00
Matt Arsenault	53fa00b3ae	LoopUnroll: Pass through AssumptionCache (NFC) Using these queries with a context instruction and without a cache seems to be about 2x slower than with it so this theoretically improves compile time.	2022-09-26 14:52:59 -04:00
Ruiling Song	a5676a3a7e	StructurizeCFG: Set Undef for non-predecessors in setPhiValues() During structurization process, we may place non-predecessor blocks between the predecessors of a block in the structurized CFG. Take the typical while-break case as an example: ``` /---A(v=...) \| / \ ^ B C \| \ /\| \---L \| \ / E (r = phi (v:C)...) ``` After structurization, the CFG would be look like: ``` /---A \| \|\ \| \| C \| \|/ \| F1 ^ \|\ \| \| B \| \|/ \| F2 \| \|\ \| \| L \ \|/ \--F3 \| E ``` We can see that block B is placed between the predecessors(C/L) of E. During phi reconstruction, to achieve the same sematics as before, we are reconstructing the PHIs as: F1: v1 = phi (v:C), (undef:A) F3: r = phi (v1:F2), ... But this is also saying that `v1` would be live through B, which is not quite necessary. The idea in the change is to say the incoming value from B is Undef for the PHI in E. With this change, the reconstructed PHI would be: F1: v1 = phi (v:C), (undef:A) F2: v2 = phi (v1:F1), (undef:B) F3: r = phi (v2:F2), ... Reviewed by: sameerds Differential Revision: https://reviews.llvm.org/D132450	2022-09-26 09:54:47 +08:00
Ruiling Song	40e9284f3c	StructurizeCFG: prefer reduced number of live values The instruction simplification will try to simplify the affected phis. In some cases, this might extend the liveness of values. For example: BB0: \| \ \| BB1 \| / BB2:phi (BB0, v), (BB1, undef) The phi in BB2 will be simplified to v as v dominates BB2, but this is increasing the number of active values in BB1. By setting CanUseUndef to false, we will not simplify the phi in this way, this would help register pressure. This is mandatory for the later change to help reducing VGPR pressure for AMDGPU. Reviewed by: foad, sameerds Differential Revision: https://reviews.llvm.org/D132449	2022-09-26 09:54:47 +08:00
Douglas Yung	91e0423595	Revert "[SROA] Create additional vector type candidates based on store and load slices" This reverts commit `de3445e0ef`. This is causing GHI #57796 and #57821.	2022-09-23 12:24:07 -07:00
Douglas Yung	0a7f4e03a9	Revert "[SROA] Check typeSizeEqualsStoreSize in isVectorPromotionViable" This reverts commit `3f08d248c4`. The commit this change is fixing is being reverted due to GHI #57796 and #37821, so revert this commit as well.	2022-09-23 12:24:07 -07:00
Teresa Johnson	b1926f308f	Restore "[MemProf] Memprof profile matching and annotation" This reverts commit `794b7ea960`, and thus restores commit `a212d8da94`, and follow on fixes `0cd6763fa9`, `e9ff53d42f`, and `37c6a25e9a`. Use a hash function (BLAKE3) instead of hash_combine/hash_code which are not guaranteed to be stable across executions. Additionally, it adds a "REQUIRES: x86_64-linux" to the tests that have raw profile inputs to avoid failures on big endian bots. Reviewers: snehasish, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D128142	2022-09-23 11:38:47 -07:00
Florian Hahn	2c692d891e	[LV] Update handling of scalable pointer inductions after b73d2c8. The dependent code has been changed quite a lot since `151c144` which b73d2c8 effectively reverts. Now we run into a case where lowering didn't expect/support the behavior pre `151c144` any longer. Update the code dealing with scalable pointer inductions to also check for uniformity in combination with isScalarAfterVectorization. This should ensure scalable pointer inductions are handled properly during epilogue vectorization. Fixes #57912.	2022-09-23 18:23:02 +01:00
Dmitri Gribenko	954d3cd2c6	Revert "[AggressiveInstCombine] Combine consecutive loads which are being merged to form a wider load." This reverts commit `3c70c8c1df`. After this commit, during the 3-stage bootstrap the second-stage Clang crashes.	2022-09-23 19:21:09 +02:00
Philip Reames	954c1ed009	[SLP] Adjust debug output for store vectorization failure When store vectorization is infeasible, it's helpful to have a debug logging indication of why. A case I've hit a couple times now is accidentally using -march instead of -mtriple and getting the default TTI results. This causes max-vf to become 1, and thus hits the added logging line.	2022-09-23 09:58:15 -07:00
Philip Reames	42ef572049	[SLP] Fix cost model w.r.t. operand properties We allow the target to report different costs depending on properties of the operands; given this, we have to make sure we pass the right set of operands and account for the fact that different scalar instructions can have operands with different properties. As a motivating example, consider a set of multiplies which each multiply by a constant (but not all the same constant). Most of the constants are power of two (but not all). If the target doesn't have support for non-uniform constant immediates, this will likely require constant materialization and a non-uniform multiply. However, depending on the balance of target costs for constant scalar multiplies vs a single vector multiply, this might or might not be a profitable vectorization. This ends up basically being a rewrite of the existing code. Normally, I'd scope the change more narrowly, but I kept noticing things which seemed highly suspicious, and none of the existing code appears to have any test coverage at all. I think this is a case where simply throwing out the existing code and starting from scratch is reasonable. This is a follow on to Alexey's D126885, but also handles the arithmetic instruction case since the existing code appears to have the same problem. Differential Revision: https://reviews.llvm.org/D132566	2022-09-23 08:40:23 -07:00
Florian Hahn	d72eb9c985	[LoopDeletion] Invalidate SCEV after moving instruction. LoopDeletion may hoist instructions out of a loop using makeLoopInvariant without invalidating the SCEV for the moved instruction. Moving the instruction to a different block may change its cached block disposition, so invalidate the cached info. Fixes #57837.	2022-09-23 15:14:11 +01:00
Simon Pilgrim	a6e9141505	[TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436) The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both. This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling. Fixes #50778 Differential Revision: https://reviews.llvm.org/D111968	2022-09-23 14:03:18 +01:00
Nikita Popov	fe196380cc	[FunctionAttrs] Use MemoryLocation::getOrNone() when infering memory attrs MemoryLocation::getOrNone() already has the necessary logic to handle different instruction types. Use it, rather than repeating a subset of the logic. This adds support for previously unhandled instructions like atomicrmw.	2022-09-23 13:56:55 +02:00
Florian Hahn	623c4a7a55	[LoopVersioning] Invalidate SCEV for phi if new values are added. After `20d798bd47`, SCEV looks through PHIs with a single incoming value. This means adding a new incoming value may change the SCEV for a phi. Add missing invalidation when an existing PHI is reused during LoopVersioning. New incoming values will be added later from the versioned loop. Similar issues have been fixed by also adding missing invalidation. Fixes #57825. Note that the test case unfortunately requires running loop-vectorize followed by loop-load-elimination, which does the actual versioning. I don't think it is possible to reproduce the failure without that combination.	2022-09-23 11:53:29 +01:00
bipmis	3c70c8c1df	[AggressiveInstCombine] Combine consecutive loads which are being merged to form a wider load. The patch simplifies some of the patterns as below 1. (ZExt(L1) << shift1) \| (ZExt(L2) << shift2) -> ZExt(L3) << shift1 2. (ZExt(L1) << shift1) \| ZExt(L2) -> ZExt(L3) The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc. Differential Revision: https://reviews.llvm.org/D127392	2022-09-23 10:19:50 +01:00
Teresa Johnson	794b7ea960	Revert "[MemProf] Memprof profile matching and annotation" This reverts commit `a212d8da94`, and follow on fixes `0cd6763fa9`, `e9ff53d42f`, and `37c6a25e9a`. After re-reading the documentation for hash_combine, I don't think this is the appropriate hash function to use for computing the hash to use as a stack id in the metadata, since it is not guaranteed to produce stable values across executions. I have not hit this problem, but plan to switch to using an MD5 hash. I am hitting an issue with one of the bots (https://lab.llvm.org/buildbot/#/builders/171/builds/20732) where the values produced are only the lower 32 bits of the expected hash values, however, which I assume is related to the implementation of hash_combine and hash_code. I believe I fixed all of the other bot failures with the follow on fixes, which I'll merge into the new version before reapplying.	2022-09-22 16:08:03 -07:00
Teresa Johnson	e9ff53d42f	[MemProf] Fix buildbot error due to unused variable from bad merge Fix an unused variable warning introduced by `a212d8da94` due to a bad merge with a recent change. E.g. in https://lab.llvm.org/buildbot/#/builders/77/builds/22095	2022-09-22 13:24:33 -07:00
Teresa Johnson	a212d8da94	[MemProf] Memprof profile matching and annotation Profile matching and IR annotation for memprof profiles. See also related RFCs: RFC: Sanitizer-based Heap Profiler [1] RFC: A binary serialization format for MemProf [2] RFC: IR metadata format for MemProf [3]* * Note that the IR metadata format has changed from the RFC during implementation, as described in the preceeding patch adding the basic metadata and verification support. The matching is performed during the normal PGO annotation phase, to ensure that the inlines applied in the IR at that point are a subset of the inlines in the profiled binary and thus reflected in the profile's call stacks. This is important because the call frames are associated with functions in the profile based on the inlining in the symbolized call stacks, and this simplifies locating the subset of profile data relevant for matching onto each function's IR. The PGOInstrumentationUse pass is enhanced to perform matching for whatever combination of memprof and regular PGO profile data exists in the profile. Using the utilities introduced in D128854: The memprof profile data for each context is converted to "cold" or "notcold" based on parameterized thresholds for size, access count, and lifetime. The memprof allocation contexts are trimmed to the minimal amount of context required to uniquely identify whether the context is cold or not cold. For allocations where all profiled contexts have the same allocation type, no memprof metadata is attached and instead the allocation call is directly annotated with an attribute specifying the alloction type. This is the same attributed that will be applied to allocation calls once cloned for different contexts, and later used during LibCall simplification to emit allocation hints [4]. Depends on D128141 and D128854. [1] https://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html [2] https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html [3] https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165 [4] `ab87cf382d` Differential Revision: https://reviews.llvm.org/D128142	2022-09-22 12:48:31 -07:00
Leonard Chan	21b03bf970	[llvm] Handle dso_local_equivalent in FunctionComparator This addresses https://github.com/llvm/llvm-project/issues/51066. Prior to this, dso_local_equivalent would lead to an llvm_unreachable in a switch in the FunctionComparator. This adds a conservative case in that switch that just compares the underlying functions. Differential Revision: https://reviews.llvm.org/D134300	2022-09-22 18:42:31 +00:00
Philip Reames	32dc1151e2	[VPlan] Only generate single instr for unpredicated stores of varying value to invariant address This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.) This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.) Differential Revision: https://reviews.llvm.org/D133580	2022-09-22 08:53:46 -07:00
Nikita Popov	8df376db72	[InstCombine] Remove buggy zext of icmp eq with pow2 fold (PR57899) For the case where the constant is a power of two rather than zero, the fold is incorrect, because it fails to check that the bit set in the LHS matches the bit in the RHS. Rather than fixing this, remove the power of two handling entirely, as a different fold will already canonicalize such comparisons to use a zero constant. Fixes https://github.com/llvm/llvm-project/issues/57899.	2022-09-22 16:37:10 +02:00
Nikita Popov	c2e76f914c	[InstCombine] Use simplifyWithOpReplaced() for non-bool selects Perform the simplifyWithOpReplaced() fold even for non-bool selects. This subsumes a number of recently added folds for zext/sext of the condition. We still need to manually handle variations with both sext/zext and not, because simplifyWithOpReplaced() only performs one level of replacements.	2022-09-22 15:46:00 +02:00
Nikita Popov	41dde5d858	[InstSimplify] Support vectors in simplifyWithOpReplaced() We can handle vectors inside simplifyWithOpReplaced(), as long as cross-lane operations are excluded. The equality can hold (or not hold) for each vector lane independently, so we shouldn't use the replacement value from other lanes. I believe the only operations relevant here are shufflevector (where all previous bugs were seen) and calls (which might use shuffle-like intrinsics and would require more careful classification). Differential Revision: https://reviews.llvm.org/D134348	2022-09-22 10:45:42 +02:00
Congzhe Cao	22c91df52c	[LoopInterchange][PR57148] Ensure the correct form of IR after transformation This is a bugfix patch that resolves the following two bugs in loop interchange: 1. PR57148 which is an assertion error due to of loss of LCSSA form after interchange, as referred to test1() in pr57148.ll. 2. Use before def for the outermost loop induction variables after interchange, as referred to test2() in pr57148.ll. The fix in this patch is that: 1. In cases where the LCSSA form is not maintained after interchange, we update the IR to the LCSSA form again. 2. We split the phi nodes in the inner loop header into a separate basic block to avoid the situation where use of the outer indvar appears before its def after interchange. Previously we already did this for innermost loops, now we do it for non-innermost loops (e.g., middle loops) as well. Reviewed By: bmahjour, Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D132055	2022-09-22 00:20:53 -04:00
Congzhe Cao	6782d71680	[LoopPassManager] Ensure to construct loop nests with the outermost loop This patch is to resolve the bug reported and discussed in https://reviews.llvm.org/D124926#3718761 and https://reviews.llvm.org/D124926#3719876. The problem is that loop interchange is a loopnest pass under the new pass manager, but the loop nest may not be constructed correctly by the loop pass manager after running loop interchange and before running the next pass, which might cause problems when it continues running the next pass. The reason that the loop nest is constructed incorrectly is that the outermost loop might have changed after interchange, and what was the original outermost loop is not the current outermost loop anymore. Constructing the loop nest based on the original outermost loop would generate an invalid loop nest. The fix in this patch is that, in the loop pass manager before running each loopnest pass, we re-cosntruct the loop nest based on the current outermost loop, if LPMUpdater notifies the loop pass manager that the previous loop nest has been invalidated by passes like loop interchange. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D132199	2022-09-21 23:59:26 -04:00
Vitaly Buka	ba39a6e14a	[msan] Instrument vtest instrinsics Instrumentation just ORs shadow of inputs. I assume some result shadow bits can be reset if we go into specifics of particular checks, but as-is it is still an improvement against existing default strict instruction handler, when every set bit of input shadow is reported as an error. Reviewed By: kda Differential Revision: https://reviews.llvm.org/D134123	2022-09-21 16:57:44 -07:00
Vitaly Buka	6fd959d625	[msan] Handle x86_avx_cmp_pd_256 and x86_avx_cmp_ps_256 Removed FIXME which looks irrelevant. The error message happens only without -mattr=+avx. E.g. GOOD: opt llvm/test/Instrumentation/MemorySanitizer/avx-intrinsics-x86.ll -passes=msan -o - \| llc -O3 -o /dev/null -mattr=+avx BAD: opt llvm/test/Instrumentation/MemorySanitizer/avx-intrinsics-x86.ll -passes=msan -o - \| llc -O3 -o /dev/null So nothing to fix here. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D134119	2022-09-21 15:17:02 -07:00
Sanjay Patel	ee0bf64722	[InstCombine] try to fold mul by neg-power-of-2 to shl `(A * -2**C) + B --> B - (A << C)` https://alive2.llvm.org/ce/z/A6BWkf This inverts what Negator was doing before: D134310 / `0f32a5dea0` Analysis and codegen are generally better without multiply, so we should favor this form even if we trade add for sub (because those are generally equivalent cost operations).	2022-09-21 15:09:39 -04:00
Sanjay Patel	64d309131a	[InstCombine] try multi-use demanded bits fold for 'sub' This is similar to D133788 / `73919a87e9`, but for sub the transform is valid only for low zeros in operand 1. https://alive2.llvm.org/ce/z/EmRsXC	2022-09-21 14:13:05 -04:00
Konstantina	80d3ed6fb1	[NFC][NewGVN] Remove OpIsSafeForPHIOfOpsHelper() Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D130949	2022-09-21 09:25:59 -07:00
Alexey Bataev	e664dea182	[SLP]Fix write-after-bounds. Mask might be larger than the NumElts-OffsetBeg, need to use actual indices to avoid acces out of bounds.	2022-09-21 08:00:15 -07:00
Matt Arsenault	0d1f040749	LICM: Pass through AssumptionCache	2022-09-21 09:16:17 -04:00
Sanjay Patel	0f32a5dea0	[InstCombine] don't canonicalize shl+sub to mul+add This stops Negator from transforming: `C1 - shl X, C2 --> mul X, (1<<C2) + C1` ...in the general case. There does not seem to be any analysis benefit to using mul in IR, and there's definitely downside in codegen (particularly when the multiply has to be expanded). If `C1` is 0, then there's a stronger argument that the single mul is a better canonicalization than negate-of-shl, but we may want to remove that too. This was noted as a potential conflict for D133667. Differential Revision: https://reviews.llvm.org/D134310	2022-09-21 08:39:07 -04:00
Bjorn Pettersson	3f08d248c4	[SROA] Check typeSizeEqualsStoreSize in isVectorPromotionViable Commit `de3445e0ef` (https://reviews.llvm.org/D132096) made changes to isVectorPromotionViable basically doing // Create Vector with size of V, and each element of type Ty ... uint64_t ElementSize = DL.getTypeStoreSizeInBits(Ty).getFixedSize(); uint64_t VectorSize = DL.getTypeSizeInBits(V).getFixedSize(); ... VectorType VTy = VectorType::get(Ty, VectorSize / ElementSize, false); Not quite sure why it uses the TypeStoreSize for the ElementSize, but the new vector would only match in size with the old vector in situations when the TypeStoreSize equals the TypeSize for Ty. Therefore this patch adds a typeSizeEqualsStoreSize check as yet another condition for allowing the the new type as a promotion candidate. Without this fix the new @test15 test would fail with an assert like this: opt: ../lib/Transforms/Scalar/SROA.cpp:1966: auto isVectorPromotionViable(llvm::sroa::Partition &, const llvm::DataLayout &) ::(anonymous class)::operator()(llvm::VectorType , llvm::VectorType *) const: Assertion `DL.getTypeSizeInBits(RHSTy).getFixedSize() == DL.getTypeSizeInBits(LHSTy).getFixedSize() && "Cannot have vector types of different sizes!"' failed. ... #8 isVectorPromotionViable(...)::$_10::operator()... #9 llvm::SROAPass::rewritePartition(...) #10 llvm::SROAPass::splitAlloca(...) #11 llvm::SROAPass::runOnAlloca(...) #12 llvm::SROAPass::runImpl(...) #13 llvm::SROAPass::run(...) Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D134032	2022-09-21 09:45:05 +02:00
Michael Berg	897a79f970	[DSE] Add value type info checks for masked store candidates in Dead Store Elimination. The type information of the store values can diverge when checking for valid mask store candidates to eliminate via DSE. This patch checks for equivalence wrt to size and element count. Reviewed By: fhahn, rui.zhang Differential Revision: https://reviews.llvm.org/D132700	2022-09-20 15:54:25 -07:00
Markus Böck	b751da43b2	[InstCombine] Handle integer extension in `select` patterns using the condition as value These patterns were previously only implemented for i1 type but can be extended for any integer type by also handling zext and sext operands. Differential Revision: https://reviews.llvm.org/D134142	2022-09-20 22:25:13 +02:00
Zain Jaffal	68cc35d52c	[InstCombine] Matrix multiplication negation optimisation If one of the operands in a matrix multiplication is negated we can optimise the equation by moving the negation to the smallest element of the operands or the result. Reviewed By: spatel, fhahn Differential Revision: https://reviews.llvm.org/D133300	2022-09-20 19:50:39 +01:00
Gulfem Savrun Yeniceri	f039a9fa32	[InstrProfiling] Emit runtime hook only once This patch fixes the issue about calling emitRuntimeHook() twice when we need to unconditionally emit runtime hook as discussed in https://reviews.llvm.org/rGd6aed77f0d19. Differential Revision: https://reviews.llvm.org/D134254	2022-09-20 17:00:46 +00:00
Kazu Hirata	00874c48ea	[IPO] Reorder parameters of InlineFunction (NFC) With the recent addition of new parameter MergeAttributes (D134117), callers need to specify several default parameters before getting to specify the new parameter. This patch reorders the parameters so that callers do not have to specify as many default parameters. Differential Revision: https://reviews.llvm.org/D134125	2022-09-20 09:09:38 -07:00
Simon Pilgrim	09cb9fdef9	[InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635) Alive2: https://alive2.llvm.org/ce/z/sZ6wwS As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero. Differential Revision: https://reviews.llvm.org/D134172	2022-09-20 16:44:41 +01:00
Florian Hahn	dcbc8a0daa	[LV] Remove unused widenCallInstruction declaration (NFC). The definition and uses have been removed a while ago. Clean up the unused declaration.	2022-09-20 15:20:28 +01:00
Djordje Todorovic	f0f8b46863	Recommit "[AggressiveInstCombine] Lower Table Based CTTZ The bug reported on the [0] has been fixed. The issue was we have not checked if the global variables that represent cttz tables was constant. There is a new negative test added in negative-lower-table-based-cttz.ll that represents this. [0] https://reviews.llvm.org/rGdf868edee561eb973edd85ec9df41c67aa0bff6b	2022-09-20 13:12:47 +02:00
Dmitri Gribenko	5d7ff0d87c	Fix an unused warning in release build	2022-09-20 11:29:39 +02:00
eopXD	3b2011fd4f	[LSR] Fold terminating condition to other IV when possible When the IV is only used by the terminating condition (say IV-A) and the loop has a predictable back-edge count and we have another IV (say IV-B) that is an affine add recursion, we will be able to calculate the terminating value of IV-B in the loop pre-header. This patch adds attempts to replace IV-B as the new terminating condition and remove IV-A. It is safe to do so since IV-A is only used as the terminating condition. This transformation is suitable to be appended after LSR as it may optimize the loop into the situation mentioned above. The transformation can reduce number of IV-s in the loop by one. A cli option `lsr-term-fold` is added and default disabled. Reviewed By: mcberg2021, craig.topper Differential Revision: https://reviews.llvm.org/D132443	2022-09-20 01:38:47 -07:00
Vitaly Buka	4fa8df20ff	[msan] Handle shadow of masked instruction Origin handling is not implemented yet. Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133682	2022-09-19 17:57:43 -07:00
Matt Arsenault	2adae8e1b7	VectorCombine: Pass through AssumptionCache	2022-09-19 19:25:22 -04:00
Matt Arsenault	c867401407	MemCpyOpt: Pass through AssumptionCache	2022-09-19 19:25:22 -04:00
Matt Arsenault	555af0274c	SLPVectorizer: Pass through AssumptionCache	2022-09-19 19:25:22 -04:00
Matt Arsenault	b609741958	LoopVectorize: Pass through AssumptionCache	2022-09-19 19:25:22 -04:00
Matt Arsenault	84a2e48ce6	GVN: Pass through AssumptionCache to queries	2022-09-19 19:25:22 -04:00
Matt Arsenault	ce44357216	Analysis: Add AssumptionCache to isSafeToSpeculativelyExecute Does not update any of the uses.	2022-09-19 19:25:22 -04:00
Matt Arsenault	fd37ab6cf6	InstCombine: Pass AssumptionCache through isDereferenceablePointer	2022-09-19 19:10:51 -04:00
Matt Arsenault	0d8ffcc532	Analysis: Add AssumptionCache argument to isDereferenceableAndAlignedPointer This does not try to pass it through from the end users.	2022-09-19 18:57:33 -04:00
Alexey Bataev	ce39bdbd65	[SLP][NFC]Reorder gather nodes with reused scalars, NFC. The compiler does not reorder the gather nodes with reused scalars, just does it for opernads of the user nodes. This currently does not affect the compiler but breaks internal logic of the SLP graph. In future, it is supposed to actually use all nodes instead of just list of operands and this will affect the vectorization result. Also, did some early check to avoid complex logic in cost estimation analysis, should improve compiler time a bit.	2022-09-19 14:00:17 -07:00
Vitaly Buka	6f3276d57e	[msan] Check mask and pointers shadow Msan has default handler for unknown instructions which previously applied to these as well. However depending on mask, not all pointers or passthru part will be used. This allows other passes to insert undef into sum arguments. As result, default strict instruction handler can produce false reports. Reviewed By: kda, kstoimenov Differential Revision: https://reviews.llvm.org/D133678	2022-09-19 13:09:56 -07:00
Florian Hahn	582f8ef19f	[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd. Epilogue vectorization uses isScalarAfterVectorization to check if widened versions for inductions need to be generated and bails out in those cases. At the moment, there are scenarios where isScalarAfterVectorization returns true but VPWidenPointerInduction::onlyScalarsGenerated would return false, causing widening. This can lead to widened phis with incorrect start values being created in the epilogue vector body. This patch addresses the issue by storing the cost-model decision in VPWidenPointerInductionRecipe and restoring the behavior before `151c144`. This effectively reverts `151c144`, but the long-term fix is to properly support widened inductions during epilogue vectorization Fixes #57712.	2022-09-19 18:14:35 +01:00
Craig Topper	90a004b4a1	[LV] Remove FIXME about NoImplicitFloat. NFC My understanding is that NoImplicitFloat, despite it's name, is supposed to disable all vectors not just float vectors. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134084	2022-09-19 10:01:02 -07:00
Nikita Popov	dd61726d5b	Revert "[SimplifyCFG] accumulate bonus insts cost" This reverts commit `e5581df60a`. This causes major compile-time regressions, about 2-3% end-to-end on CTMark.	2022-09-19 14:46:43 +02:00
Max Kazantsev	92e9bddc49	[LoopRotate] Drop loop dispositions when rotating loops. PR56260 This is required because if there is a pure loop-invariant instruction, Loop Rotation may decide to not clone it and just hoist it instead. If SCEV has previously cached that it was loop-variant (not being smart enough to prove invariance), we may end up with inconsistent cache state (which may later trigger false-negative assertion failures checking that something was invariant). This is a conservative fix that unconditionally drops the dispositions. We could only drop it if the hoisting has actually happened, but it should take some time understanding whether it's safe with all other things this function does. Differential Revision: https://reviews.llvm.org/D134167 Reviewed By: fhahn	2022-09-19 18:01:02 +07:00
Max Kazantsev	21a9abc1ce	[LoopFuse] Drop loop dispositions before reassigning blocks to other loop This bug was found by recent improvement in SCEV verifier. The code in LoopFuse directly reassigns blocks to be a part of a different loop, which should automatically invalidate all related cached loop dispositions. Differential Revision: https://reviews.llvm.org/D134173 Reviewed By: nikic	2022-09-19 17:43:06 +07:00
Max Kazantsev	818b1ab84e	[SCEV][NFC] Remove unused parameter from forgetLoopDispositions Let's be honest about it, we don't drop loop dispositions for particular loops. Remove the parameter that misleadingly makes it apparent that we do.	2022-09-19 14:06:42 +07:00
Yaxun (Sam) Liu	e5581df60a	[SimplifyCFG] accumulate bonus insts cost SimplifyCFG folds bool foo() { if (cond1) return false; if (cond2) return false; return true; } as bool foo() { if (cond1 \| cond2) return false return true; } 'cond2' is called 'bonus insts' in branch folding since they introduce overhead since the original CFG could do early exit but the folded CFG always executes them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into its predecessor BB which shares the destination. If it is below bonus-inst-threshold, SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed. When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts in the current BB to be considered for folding. This causes issue for unrolled loops which share destinations, e.g. bool foo(int a) { for (int i = 0; i < 32; i++) if (a[i] > 0) return false; return true; } After unrolling, it becomes bool foo(int a) { if(a[0]>0) return false if(a[1]>0) return false; //... if(a[31]>0) return false; return true; } SimplifyCFG will merge each BB with its predecessor BB, and ends up with 32 'bonus insts' which are always executed, which is much slower than the original CFG. The root cause is that SimplifyCFG does not consider the accumulated cost of 'bonus insts' which are folded from different BB's. This patch fixes that by introducing a ValueMap to track costs of 'bonus insts' coming from different BB's into the same BB, and cuts off if the accumulated cost exceeds a threshold. Reviewed by: Artem Belevich, Florian Hahn, Nikita Popov, Matt Arsenault Differential Revision: https://reviews.llvm.org/D132408	2022-09-18 20:21:14 -04:00
Sanjay Patel	d6498abc24	[InstCombine] remove multi-use add demanded constant fold This was originally part of D133788. There are no visible regressions. All of the diffs show a large unsigned constant becoming a small negative constant. This should be better for analysis (and slightly less compile-time) and codegen.	2022-09-18 14:23:43 -04:00
Kazu Hirata	5e5a6c5b07	Use std::conditional_t (NFC)	2022-09-18 10:25:06 -07:00

... 5 6 7 8 9 ...

32159 Commits