llvm-project

Commit Graph

Author	SHA1	Message	Date
Hongtao Yu	9641b9be9d	[Inliner] Preserve !prof metadata when converting call to invoke. When a callee function is inlined via an invoke instruction, every function call inside the callee, if not an invoke, will be converted to an invoke after cloned to the caller body. I found that during the conversion the !prof metadata was dropped. This in turned caused a cloned indirect call not properly promoted in subsequent passes. The particular scenario I was investigating was with AutoFDO and thinLTO. In prelink, no ICP was triggered (neither by the sample loader nor PGO ICP), no indirect call was promoted. This is because 1) the particular indirect call did not have inlined samples; and 2) PGO ICP was intentionally disabled. After inlining, the prof metadata was dropped. Then in postlink, PGO ICP jumped in but didn't do anything. Thus the opportunity was missed. I'm making a simple fix to preserve !prof metadata when converting call to invoke. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D125249	2022-05-09 15:08:09 -07:00
Augie Fackler	1deea714b3	BuildLibCalls: simplify switch statement slightly Per feedback on D123086 after submit. Also added a test for vec_malloc et al attribute inference to show it's doing the right thing. The new tests exposed a defect, corrected by adding vec_free to the list of free functions in MemoryBuiltins.cpp, which had been overlooked all the way back in D94710, over a year ago. Differential Revision: https://reviews.llvm.org/D124859	2022-05-03 13:17:33 -04:00
Jonas Paulsson	304378fd09	Reapply "[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building libcalls." (was `0f8c626`). This reverts commit `14d9390`. The patch previously failed to recognize cases where user had defined a function alias with an identical name as that of the library function. Module::getFunction() would then return nullptr which is what the sanitizer discovered. In this updated version a new function isLibFuncEmittable() has as well been introduced which is now used instead of TLI->has() anytime a library function is to be emitted . It additionally also makes sure there is e.g. no function alias with the same name in the module. Reviewed By: Eli Friedman Differential Revision: https://reviews.llvm.org/D123198	2022-05-02 19:37:00 +02:00
Augie Fackler	c7ae423e39	BuildLibCalls: add alloc-family attribute to many allocator functions Differential Revision: https://reviews.llvm.org/D123086	2022-05-02 11:12:55 -04:00
Augie Fackler	e940456531	BuildLibCalls: infer allocptr attribute for free and realloc() family functions Differential Revision: https://reviews.llvm.org/D123084	2022-05-02 09:43:21 -04:00
Nikita Popov	aae5f8115a	[Local] Consider atomic loads from constant global as dead Per the guidance in https://llvm.org/docs/Atomics.html#atomics-and-ir-optimization, an atomic load from a constant global can be dropped, as there can be no stores to synchronize with. Any write to the constant global would be UB. IPSCCP will already drop such loads, but the main helper in Local doesn't recognize this currently. This is motivated by D118387. Differential Revision: https://reviews.llvm.org/D124241	2022-05-02 10:52:58 +02:00
Florian Hahn	a80081763c	[SimplifyCFG] Avoid shifting by a too large exponent. TI->getBitWidth can be > 64 and in those cases the shift will be UB due to the exponent being too large. To fix this, cap the shift at 63. I think this should work out fine, because TableSize is itself a 64 bit type and the maximum table size must fit in the type. Also, if we would underestimate the size here, at most we get an extra ZExt. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124608	2022-04-29 15:19:06 +01:00
Nikita Popov	884e9a877b	[SimplifyCFG] Replace condition value when threading Replace the condition value with the known constant value on the threaded edge. This happens implicitly with phi threading because we replace with the incoming value, but not for non-phi threading.	2022-04-29 09:50:27 +02:00
Nikita Popov	4e545bdb35	[SimplifyCFG] Thread branches on same condition in more cases (PR54980) SimplifyCFG implements basic jump threading, if a branch is performed on a phi node with constant operands. However, InstCombine canonicalizes such phis to the condition value of a previous branch, if possible. SimplifyCFG does support this as well, but only in the very limited case where the same condition is used in a direct predecessor -- notably, this does not include the common diamond pattern (i.e. two consecutive if/elses on the same condition). This patch extends the code to look back a limited number of blocks to find a branch on the same value, rather than only looking at the direct predecessor. Fixes https://github.com/llvm/llvm-project/issues/54980. Differential Revision: https://reviews.llvm.org/D124159	2022-04-29 09:44:05 +02:00
Arthur Eubanks	4e65291837	[OpaquePtr][GlobalOpt] Don't attempt to evaluate global constructors with arguments Previously all entries in global_ctors had to have the void()* type and we'd skip evaluating bitcasted functions. With opaque pointers we may see the function directly. Fixes #55147. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D124553	2022-04-27 19:00:44 -07:00
Martin Sebor	efa0f12c0b	[InstCombine] Fold strnlen calls in equality to zero. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123818	2022-04-27 12:03:24 -06:00
Alexandros Lamprineas	a910337b5d	[FuncSpec] Conditional jump or move depends on uninitialised value(s). I found this bug when performing a two-stage build of clang with Function Specialization enabled and tuned aggressively. The crash appears only on release builds. Fixes https://github.com/llvm/llvm-project/issues/55000. Before accessing the contents of the ArgInfo iterator inside SCCPInstVisitor::markArgInFuncSpecialization, we should be checking that the iterator is valid. Differential Revision: https://reviews.llvm.org/D124114	2022-04-27 07:28:25 +01:00
Martin Sebor	ffed0cfcdb	[SimplifyLibCalls] avoid slicing 64-bit integers in an ILP32 build (PR #54739 ) Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123472	2022-04-26 17:20:56 -06:00
Martin Sebor	449adafabe	[InstCombine] Fold strnlen of constant strings. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123817	2022-04-26 16:15:28 -06:00
Martin Sebor	ce8f42d4af	[InstCombine] Fold memrchr calls with a constant character. Reviewed By: nikic Differential Revision: //reviews.llvm.org/D123629	2022-04-26 14:02:50 -06:00
Martin Sebor	10c99ce67d	[InstCombine] Fold memrchr calls with constant size, bail on excessive. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123626 Differential Revision: https://reviews.llvm.org/D123628	2022-04-26 14:02:50 -06:00
Martin Sebor	25febbd155	[InstCombine] Fold strnlen with a bound of zero and one. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123816	2022-04-26 14:02:50 -06:00
Martin Sebor	2807c420cd	[InstCombine] add a strnlen handler stub. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123815	2022-04-26 14:02:49 -06:00
Augie Fackler	a907d36cfe	Attributes: add a new `allocptr` attribute This continues the push away from hard-coded knowledge about functions towards attributes. We'll use this to annotate free(), realloc() and cousins and obviate the hard-coded list of free functions. Differential Revision: https://reviews.llvm.org/D123083	2022-04-26 13:57:11 -04:00
Igor Kudrin	39ce68886b	[LoopPeel][NFCI] Simplify the code to calculate peel count for PGO This reorganizes the code as a preparation for D123865: * Use more descriptive names for variables * Simplify a condition by use an already calculated value for `MaxPeelCount` * Remove a duplicate log entry * Report basic values for loop costs Differential Revision: https://reviews.llvm.org/D124388	2022-04-26 18:44:24 +04:00
Igor Kudrin	c71890e158	[LoopPeel][NFC] Exit early if there is no room for peeling Differential Revision: https://reviews.llvm.org/D123864	2022-04-26 18:43:56 +04:00
David Green	9727c77d58	[NFC] Rename Instrinsic to Intrinsic	2022-04-25 18:13:23 +01:00
Paul Kirth	4683a2effa	[llvm][misexpect] Avoid division by 0 when using sample profiling MisExpect diagnostics should not prevent compilation from succeeding, and the assertion is insufficient to prevent division by zero in release builds. This patch addresses that by replacing the assert with an early return. Additionally, it disables MisExpect diagnostics when using sample profiling, since this is the only known case where this error has manifested. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D124302	2022-04-22 22:48:00 +00:00
Nikita Popov	993b166deb	Reapply [SimplifyCFG] Handle branch on same condition in pred more directly Reapplying without changes, after a fix to a dependent patch. ----- Rather than creating a PHI node and then using the PHI threading code, directly handle this case in FoldCondBranchOnValueKnownInPredecessor(). This change is supposed to be NFC-ish, but may cause changes due to different transform order.	2022-04-22 10:27:38 +02:00
Nikita Popov	df18e37541	Reapply [SimplifyCFG] Make FoldCondBranchOnPHI more amenable to extension (NFCI) Reapply with SmallMapVector instead of SmallDenseMap, which should address the non-determinism issue. ----- This general threading transform can be performed whenever we know a constant value for the condition in a predecessor, which would currently just be the case of a phi node with constant arguments.	2022-04-22 09:42:11 +02:00
Fangrui Song	35e350d5ba	Revert "[SimplifyCFG] Handle branch on same condition in pred more directly" and "[SimplifyCFG] Make FoldCondBranchOnPHI more amenable to extension" This reverts commit `3df86e799e`. This reverts commit `8988254667`. `[SimplifyCFG] Handle branch on same condition in pred more directly` caused non-determinism when compiling opt with a bootstrapped clang. I have to revert the dependent commit as well.	2022-04-21 12:58:58 -07:00
Nikola Tesic	c5600aef88	[Debugify] Limit number of processed functions for original mode Debugify in OriginalDebugInfo mode, does (DebugInfo) collect-before-pass & check-after-pass for each instruction, which is pretty expensive. When used to analyze DebugInfo losses in large projects (like LLVM), this raises the build time unacceptably. This patch introduces a limit for the number of processed functions per compile unit. By default, the limit is set to UINT_MAX (practically unlimited), and by using the introduced option -debugify-func-limit the limit could be set to any positive integer number. Differential revision: https://reviews.llvm.org/D115714	2022-04-21 13:58:17 +02:00
Nikita Popov	3df86e799e	[SimplifyCFG] Handle branch on same condition in pred more directly Rather than creating a PHI node and then using the PHI threading code, directly handle this case in FoldCondBranchOnValueKnownInPredecessor(). This change is supposed to be NFC-ish, but may cause changes due to different transform order.	2022-04-21 11:22:02 +02:00
Nikita Popov	8988254667	[SimplifyCFG] Make FoldCondBranchOnPHI more amenable to extension This general threading transform can be performed whenever we know a constant value for the condition in a predecessor, which would currently just be the case of a phi node with constant arguments.	2022-04-21 10:49:49 +02:00
Nikita Popov	d727505e40	[SimplifyCFG] Remove one-use limitation in FoldCondBranchOnPHI() BlockIsSimpleEnoughToThreadThrough() already checks that the phi (and all other instructions) are not used outside the block, so this one-use check is not necessary for legality. I also don't see any reason why it would be necessary for profitability (in fact, those extra uses will be replaced with constants, which should be generally profitable).	2022-04-20 15:56:20 +02:00
Fangrui Song	14d9390721	Revert D123198 "[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building libcalls." test/Transforms/InstCombine/pr39177.ll failed in a -DLLVM_USE_SANITIZER=Undefined build. ``` lib/Transforms/Utils/BuildLibCalls.cpp:1217:17: runtime error: reference binding to null pointer of type 'llvm::Function' ``` `Function &F = *M->getFunction(Name);` This reverts commit `0f8c626723`.	2022-04-19 22:26:10 -07:00
Paul Kirth	bac6cd5bf8	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D115907	2022-04-19 21:23:48 +00:00
Jonas Paulsson	0f8c626723	[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building libcalls. A new set of overloaded functions named getOrInsertLibFunc() are now supposed to be used instead of getOrInsertFunction() when building a libcall from within an LLVM optimizer(). The idea is that this new function also makes sure that any mandatory argument attributes are added to the function prototype (after calling getOrInsertFunction()). inferLibFuncAttributes() is renamed to inferNonMandatoryLibFuncAttrs() as it only adds attributes that are not necessary for correctness but merely helping with later optimizations. Generally, the front end is responsible for building a correct function prototype with the needed argument attributes. If the middle end however is the one creating the call, e.g. when replacing one libcall with another, it then must take this responsibility. This continues the work of properly handling argument extension if required by the target ABI when building a lib call. getOrInsertLibFunc() now does this for all libcalls currently built by any LLVM optimizer. It is expected that when in the future a new optimization builds a new libcall with an integer argument it is to be added to getOrInsertLibFunc() with the proper handling. Note that not all targets have it in their ABI to sign/zero extend integer arguments to the full register width, but this will be done selectively as determined by getExtAttrForI32Param(). Review: Eli Friedman, Nikita Popov, Dávid Bolvanský Differential Revision: https://reviews.llvm.org/D123198	2022-04-19 21:22:07 +02:00
Joseph Huber	984a0dc386	[OpenMP] Use new offloading binary when embedding offloading images The previous patch introduced the offloading binary format so we can store some metada along with the binary image. This patch introduces using this inside the linker wrapper and Clang instead of the previous method that embedded the metadata in the section name. Differential Revision: https://reviews.llvm.org/D122683	2022-04-15 20:35:26 -04:00
chenglin.bi	00871e2f4f	[SimplifyCFG] Try to fold switch with single result value and power-of-2 cases to mask+select When switch with 2^n cases go to one result, check if the 2^n cases can be covered by n bit masks. If yes we can use "and condition, ~mask" to simplify the switch case 0 2 4 6 -> and condition, -7 https://alive2.llvm.org/ce/z/jjH_0N case 0 2 8 10 -> and condition, -11 https://alive2.llvm.org/ce/z/K7E-2V case 2 4 8 12 -> and (sub condition, 2), -11 https://alive2.llvm.org/ce/z/CrxbYg Fix one case of https://github.com/llvm/llvm-project/issues/39957 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D122485	2022-04-15 00:10:00 +08:00
Ruiling Song	1e01f95057	LowerSwitch: Avoid inserting NewDefault block The NewDefault was used to simplify the updating of PHI nodes, but it causes some inefficiency for target that will run structurizer later. For example, for a simple two-case switch, the extra NewDefault is causing unstructured CFG like: O / \ O O / \ / \ C1 ND C2 \ \| / \ \| / D The change is to avoid the ND(NewDefault) block, that is we will get a structured CFG for above example like: O / \ / \ O O / \ / \ C1 \ / C2 \-> D <-/ The IR change introduced by this patch should be trivial to other targets, so I am doing this unconditionally. Fall-through among the cases will also cause unstructured CFG, but it need more work and will be addressed in a separate change. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D123607	2022-04-14 13:30:56 +08:00
Sanjay Patel	0ef46dc0f9	[SimplifyCFG] improve readability in switch-to-select; NFC	2022-04-13 17:14:45 -04:00
serge-sans-paille	262eba01b3	Revert "[ValueTracking] Make getStringLenth aware of strdup" This reverts commit `e810d55809`. The commit was not taken into account the fact that strduped string could be modified. Checking if such modification happens would make the function very costly, without a test case in mind it's not worth the effort.	2022-04-13 19:17:28 +02:00
Nikita Popov	8c74169990	[SimplifyLibCalls] Don't mark memchr() memory as fully dereferenceable C11 specifies memchr() as follows: > The memchr function locates the first occurrence of c (converted > to an unsigned char) in the initial n characters (each interpreted > as unsigned char) of the object pointed to by s. The implementation > shall behave as if it reads the characters sequentially and stops > as soon as a matching character is found. In particular, it is well-defined to specify a memchr size larger than the underlying object, as long as the character is found before the end of the object. Differential Revision: https://reviews.llvm.org/D123665	2022-04-13 16:46:18 +02:00
Sanjay Patel	cd0d0d633b	[SimplifyCFG] make a debug option for case max when converting switch to select This should be "NFC" as written, but it will make D122485 smaller and give us more flexibility to experiment with optimization level vs. compile-time. Differential Revision: https://reviews.llvm.org/D123625	2022-04-13 06:55:13 -04:00
Sanjay Patel	d9211be13d	[SimplifyCFG] cleanup code for converting switch to select (NFC) This renames functions for more general usage (and current capitalization style) before a proposed logic change in D122485. Differential Revision: https://reviews.llvm.org/D123614	2022-04-12 12:17:54 -04:00
serge-sans-paille	e810d55809	[ValueTracking] Make getStringLenth aware of strdup During strlen compile-time evaluation, make it possible to track size of strduped strings. Differential Revision: https://reviews.llvm.org/D123497	2022-04-12 14:47:29 +02:00
Nikita Popov	9af8cc8d17	[SimplifyLibCalls] Remove unnecessary inbounds check Even if the GEP is not inbounds, the GEP will have provenance of the global, and accessing past the extent of the global would be undefined behavior.	2022-04-11 16:51:09 +02:00
Matt Arsenault	9fdd25848a	Transforms: Fix code duplication between LowerAtomic and AtomicExpand	2022-04-08 19:06:36 -04:00
Evgeniy Brevnov	da41214d65	Add support for atomic memory copy lowering Currently, the utility supports lowering of non atomic memory transfer routines only. This patch adds support for atomic version of memcopy. This may be useful for targets not supporting atomic memcopy. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D118443	2022-04-08 10:41:31 +07:00
Augie Fackler	b916414096	BuildLibCalls: also set allocsize() attributes This is part of being able to get rid of two more columns in MemoryBuiltins.cpp's large table. We'll have two more changes before we can finish the job. Differential Revision: https://reviews.llvm.org/D119582	2022-04-07 12:38:44 -04:00
Benjamin Kramer	ff485d727f	Transforms: Remove unused include Utils can't depend on Scalar transforms.	2022-04-07 10:40:28 +02:00
Matt Arsenault	39f1568633	Transforms: Split LowerAtomics into separate Utils and pass This will allow code sharing from AtomicExpandPass. Not entirely sure why these exist as separate passes though.	2022-04-06 20:54:45 -04:00
Nikita Popov	1dc1d5a0d2	[SimplifyLibCalls] Use KnownBits helper APIs (NFC) Use helper APIs for isNonNegative() and getMaxValue() instead of flipping the zero value and having a long comment explaining why that is necessary.	2022-04-06 16:01:24 +02:00
Martin Storsjö	46776f7556	Fix warnings about variables that are set but only used in debug mode Add void casts to mark the variables used, next to the places where they are used in assert or `LLVM_DEBUG()` expressions. Differential Revision: https://reviews.llvm.org/D123117	2022-04-06 10:01:46 +03:00
Evgeniy Brevnov	acfc785c0e	Preserve aliasing info during memory intrinsics lowering By specification, source and destination of llvm.memcpy.* must either be equal or non-overlapping. This semantics is hard or impossible to figure out once lowered. This patch explicitly marks loads from source and stores to destination as not aliasing if source and destination is known to be not equal. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D118441	2022-04-06 11:33:54 +07:00
Jonas Paulsson	dbb6a75fbb	[LibCalls] Respect TLI.getExtAttrForI32Param() in inferLibFuncAttributes(). getExtAttrForI32Param() is the method to be used for determining the type of extension attribute (if any) that is to be added for a signed/unsigned argument. Previously, the SExt attribute was always added to the i32 ldexp* argument as it was expected to be ignored by targets not needing it. This patch now changes this so that it is only added for the targets that need it in the first place. Putchar() argument is now also extended as required by the target (SystemZ in the test), to fix the issue below. Many more libcalls will be handled similarly in a following patch. Fixes https://github.com/llvm/llvm-project/issues/54532. Differential Revision: https://reviews.llvm.org/D123030 Review: Eli Friedman	2022-04-05 10:29:42 +02:00
Martin Sebor	5ccfd5f6d4	[SimplifyLibCalls] Optimize memchr() with known char+str and unknown length If both the character and string are known, but the length potentially isn't, we can optimize the memchr() call to a select of either the known position of the character or null. Split off from https://reviews.llvm.org/D122836.	2022-04-04 11:01:33 +02:00
Martin Sebor	5197d2791f	[SimplifyLibCalls] Move handling of constant char earlier (NFC) Handle the simple constant char case before the bitmask optimization. This will allow extending the code to handle a non-constant size argument in a followup change. Split out from https://reviews.llvm.org/D122836.	2022-04-04 11:01:33 +02:00
Martin Sebor	d18991debf	[SimplifyLibCalls] Fold memchr() with size 1 If the memchr() size is 1, then we can convert the call into a single-byte comparison. This works even if both the string and the character are unknown. Split off from https://reviews.llvm.org/D122836.	2022-04-04 10:41:20 +02:00
Serge Pavlov	c625b6051c	Remove duplicate code from wouldInstructionBeTriviallyDead There is a similar check few lines above in this function.	2022-04-02 16:04:39 +07:00
Jorge Gorbe Moya	fc7573f29c	Revert "[misexpect] Re-implement MisExpect Diagnostics" This reverts commit `46774df307`.	2022-03-31 14:54:41 -07:00
Paul Kirth	46774df307	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D115907	2022-03-31 17:38:21 +00:00
Serge Pavlov	47b3b76825	Implement inlining of strictfp functions According to the current design, if a floating point operation is represented by a constrained intrinsic somewhere in a function, all floating point operations in the function must be represented by constrained intrinsics. It imposes additional requirements to inlining mechanism. If non-strictfp function is inlined into strictfp function, all ordinary FP operations must be replaced with their constrained counterparts. Inlining strictfp function into non-strictfp is not implemented as it would require replacement of all FP operations in the host function, which now is undesirable due to expected performance loss. Differential Revision: https://reviews.llvm.org/D69798	2022-03-31 19:15:52 +07:00
serge-sans-paille	01be9be2f2	Cleanup includes: final pass Cleanup a few extra files, this closes the work on libLLVM dependencies on my side. Impact on libLLVM preprocessed output: -35876 lines Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122576	2022-03-29 09:00:21 +02:00
Paul Kirth	90cb325abd	Revert "[misexpect] Re-implement MisExpect Diagnostics" This reverts commit `2add3fbd97`.	2022-03-29 06:20:30 +00:00
Paul Kirth	2add3fbd97	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D115907	2022-03-28 23:30:04 +00:00
Alexandros Lamprineas	8045bf9d0d	[FuncSpec] Support function specialization across multiple arguments. The current implementation of Function Specialization does not allow specializing more than one arguments per function call, which is a limitation I am lifting with this patch. My main challenge was to choose the most suitable ADT for storing the specializations. We need an associative container for binding all the actual arguments of a specialization to the function call. We also need a consistent iteration order across executions. Lastly we want to be able to sort the entries by Gain and reject the least profitable ones. MapVector fits the bill but not quite; erasing elements is expensive and using stable_sort messes up the indices to the underlying vector. I am therefore using the underlying vector directly after calculating the Gain. Differential Revision: https://reviews.llvm.org/D119880	2022-03-28 12:01:53 +01:00
Roman Lebedev	f6b60b3b79	[SimplifyCFG] `FoldBranchToCommonDest()`: allow branch-on-select This whole check is bogus, it's some kind of a profitability check. For now, simply extend it to not only allow branch-on-binary-ops, but also on poison-safe logic ops. Refs. https://github.com/llvm/llvm-project/issues/53861 Refs. https://github.com/llvm/llvm-project/issues/54553	2022-03-25 16:12:17 +03:00
Simon Pilgrim	1a943923b8	[Utils] stripDebugifyMetadata - use cast<> instead of dyn_cast_or_null<> to avoid dereference of nullptr The pointer is dereferenced immediately, so assert the cast is correct instead of returning nullptr	2022-03-25 10:25:04 +00:00
Julian Lettner	64902d335c	Reland "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121736	2022-03-23 18:36:55 -07:00
Zequan Wu	581dc3c729	Revert "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" This reverts commit `22570bac69`.	2022-03-23 16:11:54 -07:00
Djordje Todorovic	91ea247039	[Debugify] Use DebugifyLevel in Debugify original mode Before this patch the DebugifyLevel option was used for the synthetic mode, so after this, it will be used in the original mode as well. Differential Revision: https://reviews.llvm.org/D115623	2022-03-22 14:04:56 +01:00
Djordje Todorovic	73777b4c35	[Debugify] Optimize debugify original mode Before we start addressing the issue with having a lot of false positives when using debugify in the original mode, we have made a few patches that should speed up the execution of the testing utility Passes. For example, when testing a large project (let's say LLVM project itself), we can face a lot of potential DI issues. Usually, we use -verify-each-debuginfo-preserve (that is very similar to -debugify-each) -- it collects DI metadata before each Pass, and after the Pass it checks if the Pass preserved the DI metadata. However, we can speed up this process, since we don't need to collect DI metadata before each Pass -- we could use the DI metadata that are collected after the previous Pass from the pipeline as an input for the next Pass. This patch speeds up the utility for ~2x. Differential Revision: https://reviews.llvm.org/D115622	2022-03-22 12:14:00 +01:00
Paul Kirth	964398ccb1	Revert "Revert "Revert "[misexpect] Re-implement MisExpect Diagnostics""" This reverts commit `6cf560d69a`.	2022-03-18 00:21:33 +00:00
Paul Kirth	6cf560d69a	Revert "Revert "[misexpect] Re-implement MisExpect Diagnostics"" I mistakenly reverted my commit, so I'm relanding it. This reverts commit `10866a1df4`.	2022-03-18 00:04:22 +00:00
Paul Kirth	10866a1df4	Revert "[misexpect] Re-implement MisExpect Diagnostics" This reverts commit `e7749d4713`.	2022-03-17 23:54:26 +00:00
Paul Kirth	e7749d4713	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Differential Revision: https://reviews.llvm.org/D115907	2022-03-17 23:46:23 +00:00
Julian Lettner	22570bac69	Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121736	2022-03-17 10:47:13 -07:00
Nikita Popov	20531b3a6b	[RelLookupTableConverter] Avoid querying TTI for declarations This code queries TTI on a single function, which is considered to be representative. This is a bit odd, but probably fine in practice. However, I think we should at least avoid querying declarations, which e.g. will generally lack target attributes, and for which we don't seem to ever query TTI in other places.	2022-03-16 10:39:28 +01:00
Simon Pilgrim	7262eacd41	Revert rG9c542a5a4e1ba36c24e48185712779df52b7f7a6 "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" Mane of the build bots are complaining: Unknown command line argument '-lower-global-dtors'	2022-03-15 13:01:35 +00:00
Julian Lettner	9c542a5a4e	Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121327	2022-03-14 17:51:18 -07:00
Teresa Johnson	fee0bde4c6	[WPD] Extend checking mode to support fallback to indirect call Extend -wholeprogramdevirt-check to support both the existing trapping mode on an incorrect devirtualization, as well as a new mode to fallback to an indirect call on a mismatch. The new mode is The new mode is useful in cases where we want to enable devirtualization but cannot fully guarantee whole program visibility (e.g in the case where LTO has been disabled for a small set of objects that could potentially override virtual methods without having a symbol reference to anything in the base class including the vtable). Remove !prof and !callees metadata (which are used by indirect call promotion) from both the new direct call and the fallback indirect call (so that we don't perform another round of promotion on the latter). Also remove it from the direct call in the non-fallback cases, which was an oversight, although it didn't seem to cause any issues. Add tests for the metadata removal covering the various cases. Differential Revision: https://reviews.llvm.org/D121419	2022-03-14 10:16:28 -07:00
Nikita Popov	067c035012	[GlobalOpt] Handle undef global_ctors gracefully If there are no ctors, then this can have an arbirary zero-sized value. The current code checks for null, but it could also be undef or poison. Replacing the specific null check with a check for non-ConstantArray.	2022-03-10 16:02:12 +01:00
Benoit Jacob	851332a1f2	Fix linking error, undefined class static constants. Reviewed By: spupyrev Differential Revision: https://reviews.llvm.org/D121293	2022-03-09 10:01:38 -08:00
Vitaly Buka	ce29a0429b	Revert "Attempt to fix linking issue on the bot" The issue was fixed with `48c74bb2e2` This reverts commit `ac423a8c8a`.	2022-03-08 16:16:01 -08:00
Florian Mayer	e86bd32b71	[NFC] [HWASan] [MTE] Use function_ref over template.	2022-03-08 15:49:55 -08:00
Vitaly Buka	ac423a8c8a	Attempt to fix linking issue on the bot	2022-03-08 15:33:10 -08:00
Fangrui Song	48c74bb2e2	[SampleProfileInference] Work around odr-use of const non-inline static data member to fix -O0 builds after D120508 MinBaseDistance may be odr-used by std::max, leading to an undefined symbol linker error: ``` ld.lld: error: undefined symbol: (anonymous namespace)::MinCostMaxFlow::MinBaseDistance >>> referenced by SampleProfileInference.cpp:744 (/home/ray/llvm-project/llvm/lib/Transforms/Utils/SampleProfileInference.cpp:744) >>> lib/Transforms/Utils/CMakeFiles/LLVMTransformUtils.dir/SampleProfileInference.cpp.o:((anonymous namespace)::FlowAdjuster::jumpDistance(llvm::FlowJump*) const) ``` Since llvm-project is still using C++ 14, workaround it with a cast.	2022-03-08 14:34:53 -08:00
spupyrev	81aedab7dd	introducing some profi flags Differential Revision: https://reviews.llvm.org/D120508	2022-03-08 12:35:15 -08:00
William S. Moses	87ec6f41bb	[OpenMPIRBuilder] Allocate temporary at the correct block in a nested parallel The OpenMPIRBuilder has a bug. Specifically, suppose you have two nested openmp parallel regions (writing with MLIR for ease) ``` omp.parallel { %a = ... omp.parallel { use(%a) } } ``` As OpenMP only permits pointer-like inputs, the builder will wrap all of the inputs into a stack allocation, and then pass this allocation to the inner parallel. For example, we would want to get something like the following: ``` omp.parallel { %a = ... %tmp = alloc store %tmp[] = %a kmpc_fork(outlined, %tmp) } ``` However, in practice, this is not what currently occurs in the context of nested parallel regions. Specifically to the OpenMPIRBuilder, the entirety of the function (at the LLVM level) is currently inlined with blocks marking the corresponding start and end of each region. ``` entry: ... parallel1: %a = ... ... parallel2: use(%a) ... endparallel2: ... endparallel1: ... ``` When the allocation is inserted, it presently inserted into the parent of the entire function (e.g. entry) rather than the parent allocation scope to the function being outlined. If we were outlining parallel2, the corresponding alloca location would be parallel1. This causes a variety of bugs, including https://github.com/llvm/llvm-project/issues/54165 as one example. This PR allows the stack allocation to be created at the correct allocation block, and thus remedies such issues. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121061	2022-03-06 18:34:25 -05:00
Augie Fackler	b32735d599	BuildLibCalls: add allocalign attributes for memalign and aligned_alloc This gets us close to being able to remove a column from the table in MemoryBuiltins.cpp. Differential Revision: https://reviews.llvm.org/D117923	2022-03-04 15:57:53 -05:00
Augie Fackler	d664c4b73c	Attributes: add a new allocalign attribute This will let us start moving away from hard-coded attributes in MemoryBuiltins.cpp and put the knowledge about various attribute functions in the compilers that emit those calls where it probably belongs. Differential Revision: https://reviews.llvm.org/D117921	2022-03-04 15:57:53 -05:00
Alexandros Lamprineas	910eb988eb	[FuncSpec][NFC] Refactor internal structures. `ArgInfo` is reduced to only contain a pair of {formal,actual} values. The specialized function `Fn` and the `Partial` flag are redundant in this structure. The `Gain` is moved to a new struct `SpecializationInfo`. The value mappings created by cloneCandidateFunction() are being used by rewriteCallSites() for matching the formal arguments of recursive functions. The list of specializations is passed by reference to calculateGains() instead of being returned by value. The `IsPartial` flag is removed from isArgumentInteresting() and getPossibleConstants() as it's no longer used anywhere in the code. Differential Revision: https://reviews.llvm.org/D120753	2022-03-03 13:08:13 +00:00
spupyrev	f2ade65fb2	[CSSPGO] Even flow distribution Differential Revision: https://reviews.llvm.org/D118640	2022-03-02 13:12:05 -08:00
Stephen Long	2f6c14816a	[LoopPeel] Add EXPENSIVE_CHECKS ifdef guard around domtree verify call The verify call was taking 50% of the compile time in our internal LLVM fork when trying to unroll many loops. Differential Revision: https://reviews.llvm.org/D113028	2022-03-02 09:56:20 -08:00
spupyrev	bcdc047731	speeding up ext-tsp for huge instances Differential Revision: https://reviews.llvm.org/D120780	2022-03-02 07:17:48 -08:00
serge-sans-paille	a494ae43be	Cleanup includes: TransformsUtils Estimation on the impact on preprocessor output: before: 1065307662 after: 1064800684 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120741	2022-03-01 21:00:07 +01:00
Tong Zhang	17ce89fa80	[SanitizerBounds] Add support for NoSanitizeBounds function Currently adding attribute no_sanitize("bounds") isn't disabling -fsanitize=local-bounds (also enabled in -fsanitize=bounds). The Clang frontend handles fsanitize=array-bounds which can already be disabled by no_sanitize("bounds"). However, instrumentation added by the BoundsChecking pass in the middle-end cannot be disabled by the attribute. The fix is very similar to D102772 that added the ability to selectively disable sanitizer pass on certain functions. In this patch, if no_sanitize("bounds") is provided, an additional function attribute (NoSanitizeBounds) is attached to IR to let the BoundsChecking pass know we want to disable local-bounds checking. In order to support this feature, the IR is extended (similar to D102772) to make Clang able to preserve the information and let BoundsChecking pass know bounds checking is disabled for certain function. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D119816	2022-03-01 18:47:02 +01:00
Alexandros Lamprineas	b803aee67b	[FuncSpec][NFC] Improve debug messages. Adds diagnostic messages when debugging the pass. Differential Revision: https://reviews.llvm.org/D119875	2022-03-01 11:55:08 +00:00
Alexandros Lamprineas	7b74123a3d	[FuncSpec][NFC] Variable renaming. Just preparing the ground for follow up patches to make the reviews easier. Differential Revision: https://reviews.llvm.org/D119874	2022-03-01 11:38:57 +00:00
Nikita Popov	16a2d5f885	[SCEVExpander] Use early returns in FindValueInExprValueMap() (NFC)	2022-02-25 10:09:16 +01:00
Nikita Popov	2d0fc3e46f	[SCEV] Return ArrayRef from getSCEVValues() (NFC) Return a read-only view on this set. For the one internal use, directly access ExprValueMap.	2022-02-25 09:32:22 +01:00
Nikita Popov	d9715a7266	[SCEV] Don't try to reuse expressions with offset SCEVs ExprValueMap currently tracks not only which IR Values correspond to a given SCEV expression, but additionally stores that it may be expanded in the form X+Offset. In theory, this allows reusing existing IR Values in more cases. In practice, this doesn't seem to be particularly useful (the test changes are rather underwhelming) and adds a good bit of complexity. Per https://github.com/llvm/llvm-project/issues/53905, we have an invalidation issue with these offseted expressions. Differential Revision: https://reviews.llvm.org/D120311	2022-02-25 09:16:48 +01:00
Joseph Huber	7aef8b3754	[OpenMP] Make section variable external to prevent collisions Summary: We use a section to embed offloading code into the host for later linking. This is normally unique to the translation unit as it is thrown away during linking. However, if the user performs a relocatable link the sections will be merged and we won't be able to access the files stored inside. This patch changes the section variables to have external linkage and a name defined by the section name, so if two sections are combined during linking we get an error.	2022-02-24 10:57:09 -05:00
Matthias Braun	6a383369f9	PGOInstrumentation, GCOVProfiling: Split indirectbr critical edges regardless of PHIs The `SplitIndirectBrCriticalEdges` function was originally designed for `CodeGenPrepare` and skipped splitting of edges when the destination block didn't contain any `PHI` instructions. This only makes sense when reducing COPYs like `CodeGenPrepare`. In the case of `PGOInstrumentation` or `GCOVProfiling` it would result in missed counters and wrong result in functions with computed goto. Differential Revision: https://reviews.llvm.org/D120096	2022-02-23 16:27:37 -08:00
Bill Wendling	a5bbc6ef99	[NFC] Remove unnecessary "#include"s from header files	2022-02-23 01:20:48 -08:00
Nikita Popov	f8d7210032	[GlobalStatus] Keep Visited set in isSafeToDestroyConstant() Constants cannot be cyclic, but they can be tree-like. Keep a visited set to ensure we do not degenerate to exponential run-time. This fixes the problem reported in https://reviews.llvm.org/D117223#3335482, though I haven't been able to construct a concise test case for the issue. This requires a combination of dead constants and the kind of constant expression tree that textual IR cannot represent (because the textual representation, unlike the in-memory representation, is also exponential in size).	2022-02-22 10:02:37 +01:00
Arthur Eubanks	053c2a0020	[SimplifyCFG][OpaquePtr] Check store type when merging conditional store	2022-02-20 11:29:54 -08:00
Arthur Eubanks	129af4daa7	[SCEVExpander][OpaquePtr] Check GEP source type when finding identical GEP Fixes an opaque pointers miscompile. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D120004	2022-02-17 08:48:11 -08:00
Nikita Popov	36fdfaba19	[RelLookupTableConverter] Ensure that GV, GEP and load types match This code could be generalized to be type-independent, but for now just ensure that the same type constraints are enforced with opaque pointers as with typed pointers.	2022-02-17 12:05:05 +01:00
Roman Lebedev	371fcb720e	[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in https://github.com/llvm/llvm-project/issues/53853 and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574 This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes https://github.com/llvm/llvm-project/issues/53853 Refs. https://github.com/rust-lang/rust/issues/85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854	2022-02-17 12:13:55 +03:00
Florian Mayer	c195addb60	[NFC] [MTE] [HWASan] Remove unnecessary member of AllocaInfo Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119981	2022-02-16 15:19:30 -08:00
Nikita Popov	c9032f1a69	[LowerMemIntrinsics] Explicitly use i8 type in memmove lowering By convention, memcpy/memmove intrinsics are always used with i8 pointers (though this is not enforced), so in practice this code was always using an i8 type. Make that explicit. Of course, i8 is not a very profitable choice, and this code could be more performant by picking an appropriate larger type. But that would require additional test coverage and correctness review, and certainly shouldn't be a decision based on the pointer element type.	2022-02-16 16:31:55 +01:00
Max Kazantsev	bfc1217119	[NFC] Introduce option to switch off compatible invokes merge Does not affect default behavior (transform is on).	2022-02-15 21:51:03 +07:00
Florian Mayer	8de457eafc	[HWASAN] use common alignAndPadAlloca Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119614	2022-02-14 15:28:32 -08:00
Florian Mayer	205308de6b	[NFC] [MTE] Move alignAndPadAlloca to MemoryTaggingSupport. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119610	2022-02-14 14:54:04 -08:00
Florian Mayer	6759cdd829	[NFC] [MTE] Use helpers for stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119503	2022-02-11 16:01:46 -08:00
Florian Mayer	bf2f72fa10	[hwasan] keep debug intrinsicts in AllocaInfo. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119498	2022-02-11 16:01:02 -08:00
Florian Mayer	26dbc47468	Revert "[hwasan] keep debug intrinsicts in AllocaInfo." This reverts commit `19fdf85f58`.	2022-02-11 14:41:24 -08:00
Florian Mayer	b1bd64aeee	Revert "[NFC] [MTE] Use helpers for stack tagging." This reverts commit `8f0e5b4e26`.	2022-02-11 14:41:24 -08:00
Florian Mayer	8f0e5b4e26	[NFC] [MTE] Use helpers for stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119503	2022-02-11 10:59:09 -08:00
Florian Mayer	19fdf85f58	[hwasan] keep debug intrinsicts in AllocaInfo. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119498	2022-02-11 10:56:53 -08:00
Florian Mayer	e7356fb3e2	[nfc] [hwasan] factor out logic to collect info about stack this is the first step in unifying some of the logic between hwasan and mte stack tagging. this only moves around code, changes to converge different implementations of the same logic follow later. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118947	2022-02-11 10:54:12 -08:00
Sameer Sahasrabuddhe	d8f99bb6e0	[AMDGPU] replace hostcall module flag with function attribute The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph. If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument. The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument. Reviewed By: jdoerfert, arsenm, kpyzhov Differential Revision: https://reviews.llvm.org/D119216	2022-02-11 22:51:56 +05:30
Philip Reames	5ba115031d	[PSE] Remove assumption that top level predicate is union from public interface [NFC] Note that this doesn't actually cause the top level predicate to become a non-union just yet. The above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.	2022-02-10 16:14:52 -08:00
Philip Reames	d39f4ac494	[SCEV] Unwind SCEVUnionPredicate from getPredicatedBackedgeTakenCount [NFC] For those curious, the whole reason for tracking the predicate set seperately as opposed to just immediately registering the dependencies appears to be allowing the printing code to print a result without changing the PSE state. It's slightly questionable if this justifies the complexity, but since we can preserve it with local ugliness, I did so.	2022-02-09 12:55:40 -08:00
Roman Lebedev	c8ba2b67a0	[SimplifyCFG] 'merge compatible invokes': fully support indirect invokes As long as all the invokes in the set are indirect, we can merge them, but don't merge direct invokes into the set, even though it would be legal to do.	2022-02-08 21:29:38 +03:00
Roman Lebedev	414b47645d	[SimplifyCFG] 'merge compatible invokes': don't create trivial PHI's with all-identical incoming values	2022-02-08 21:29:38 +03:00
Philip Reames	c302f1e677	[SCEV] Generalize SCEVEqualsPredicate to any compare [NFC] PredicatedScalarEvolution has a predicate type for representing A == B. This change generalizes it into something which can represent a A <pred> B. This generality is currently unused, but is motivated by a couple of recent cases which have come up. In particular, I'm currently playing around with using this to simplify the runtime checking code in LoopVectorizer. Regardless of the outcome of that prototyping, generalizing the compare node seemed useful.	2022-02-08 08:18:09 -08:00
Nikita Popov	074561a4a2	[Mem2Reg] Check that load type matches alloca type Alloca promotion can only deal with cases where the load/store types match the alloca type (it explicitly does not support bitcasted load/stores). With opaque pointers this is no longer enforced through the pointer type, so add an explicit check.	2022-02-08 17:16:15 +01:00
Roman Lebedev	42ca7cc889	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ uses If the original invokes had uses, the uses must have been in PHI's, but that immediately results in the incoming values being incompatible. But we'll replace uses of the original invokes with the use of the merged invoke, so as long as the incoming values become compatible after that, we can merge.	2022-02-08 17:49:38 +03:00
Roman Lebedev	9986d60224	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ PHIs but no uses As long as the incoming values for all the invokes in the set are identical, we can merge the invokes.	2022-02-08 17:49:38 +03:00
Roman Lebedev	8411560fd0	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ no uses, no PHI's Even if the invokes have normal destination, iff it's the same block, we can merge them. For now, require that there are no PHI nodes, and the returned values of invokes aren't used.	2022-02-08 17:49:38 +03:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Kazu Hirata	a1a8d10a17	[Transforms] Use default member initialization in LibCallSimplifier (NFC)	2022-02-06 16:36:27 -08:00
Kazu Hirata	3fce5bb7b0	[Transforms] Use default member initialization in LoopVersioning (NFC)	2022-02-06 16:36:25 -08:00
Kazu Hirata	2d650ee03e	[Transforms] Use default member initialization in SCEVFindUnsafe (NFC)	2022-02-05 21:39:27 -08:00
Kazu Hirata	e24384b506	[Transforms] Use default member initialization in SimplifyIndvar (NFC)	2022-02-05 16:29:22 -08:00
Bill Wendling	c6f0940d99	[NFC] Remove unnecessary #includes An attempt to reduce the number of files that are recompiled due to a change. Differential Revision: https://reviews.llvm.org/D119055	2022-02-04 21:22:41 -08:00
Hongtao Yu	dee058c670	[CSSPGO] Turn on ext-tsp by default for CSSPGO. I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119048	2022-02-04 19:46:44 -08:00
Roman Lebedev	18ff1ec3c3	Reland [SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable` As per LangRef's definition of `noreturn` attribute: ``` noreturn This function attribute indicates that the function never returns normally, hence through a return instruction. This produces undefined behavior at runtime if the function ever does dynamically return. nnotated functions may still raise an exception, i.a., nounwind is not implied. ``` So if we `invoke` a `noreturn` function, and the normal destination of an invoke is not an `unreachable`, point it at the new `unreachable` block. The change/fix from the original commit is that we now actually create the new block, and don't just repurpose the original block, because said normal destination block could have other users. This reverts commit `db1176ce66`, relanding commit `598833c987`.	2022-02-05 02:58:19 +03:00
Roman Lebedev	db1176ce66	Revert "[SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable`" The normal destination may have other uses. This reverts commit `598833c987`.	2022-02-05 02:30:20 +03:00
Roman Lebedev	598833c987	[SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable` As per LangRef's definition of `noreturn` attribute: ``` noreturn This function attribute indicates that the function never returns normally, hence through a return instruction. This produces undefined behavior at runtime if the function ever does dynamically return. nnotated functions may still raise an exception, i.a., nounwind is not implied. ```	2022-02-05 02:15:07 +03:00
Roman Lebedev	55cd727c9a	[SimplifyCFG] 'merge compatible invokes': allow PHI nodes in landing pads ... iff the incoming values for the invokes-to-be-merged are compatible (identical).	2022-02-04 20:26:44 +03:00
Roman Lebedev	0d384e9228	[NFC][SimplifyCFG] Extract `IncomingValuesAreCompatible()` out of `SafeToMergeTerminators()`	2022-02-04 20:26:44 +03:00
Roman Lebedev	36df803dfd	[SimplifyCFG] Merge compatible `invoke`s of a `landingpad` While nowadays SimplifyCFG knows how to hoist code from then-else blocks, sink code from unconditional predecessors, and even promote the latter by tail-merging `ret`/`resume` function terminators, that isn't everything. While i (& others) have been trying to deal with merging/sinking `unreachable`, apparently perhaps the more impactful remaining problem is merging the `throw` calls. If we start at the `landingpad`, all the predecessors are unwind edges of `invoke`s, and in some cases some of the `invoke`s are mergeable. ``` /// This is a weird mix of hoisting and sinking. Visually, it goes from: /// [...] [...] /// \| \| /// [invoke0] [invoke1] /// / \ / \ /// [cont0] [landingpad] [cont1] /// to: /// [...] [...] /// \ / /// [invoke] /// / \ /// [cont] [landingpad] ``` This simplifies the IR/CFG, at the cost of debug info and extra PHI nodes. Note that we don't require for all the `invokes` of the `landingpad` to be mergeable, they can form more than a single set, we gracefully handle that. For now, i completely disallowed normal destination, PHI nodes and indirect invokes but that can be supported. Out of all the CTMark projects, only 7zip is C++, so there isn't much impact: https://llvm-compile-time-tracker.com/compare.php?from=ba8eb31bd9542828f6424e15a3014f80f14522c8&to=722fc871c84f14157d45c2159bc9c8c7e2825785&stat=size-total ... but there it currently causes size-total decrease. Differential Revision: https://reviews.llvm.org/D117805	2022-02-04 17:04:21 +03:00
Simon Pilgrim	6b4ebdd46f	ModuleUtils - VFABI::setVectorVariantNames - use ArrayRef<> instead of const SmallVector to pass argument	2022-02-03 12:11:48 +00:00
Roman Lebedev	ee4ba9f3a1	Revert "[SimplifyCFG] Start redesigning `FoldTwoEntryPHINode()`." Unfortunately, it seems we really do need to take the long route; start from the "merge" block, find (all the) "dispatch" blocks, and deal with each "dispatch" block separately, instead of simply starting from each "dispatch" block like it would logically make sense, otherwise we run into a number of other missing folds around `switch` formation, missing sinking/hoisting and phase ordering. This reverts commit `85628ce75b`. This reverts commit `c5fff90953`. This reverts commit `34a98e1046`. This reverts commit `1e353f0922`.	2022-02-03 12:32:50 +03:00
Florian Mayer	fa75a62cb5	[NFC] pull retvec logic to MemoryTaggingSupport. we will also need this for aarch64 stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118852	2022-02-02 16:05:52 -08:00
Fangrui Song	85628ce75b	[SimplifyCFG] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds	2022-02-02 15:11:22 -08:00
Florian Mayer	f7a6c341cb	[mte] support more complicated lifetimes (e.g. for exceptions). Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118848	2022-02-02 14:39:22 -08:00
Florian Mayer	712b31e2d4	[NFC] factor isStandardLifetime out of HWASan this is so we can use it for aarch64 stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118836	2022-02-02 13:23:55 -08:00
Roman Lebedev	c5fff90953	[NFC][SimplifyCFG] Merge `FoldTwoEntryPHINode()` into it's only callee	2022-02-02 17:53:56 +03:00
Roman Lebedev	34a98e1046	[NFC][SimplifyCFG] `FoldTwoEntryPHINode()`: s/BB/MergeBB/	2022-02-02 17:53:56 +03:00
Roman Lebedev	1e353f0922	[SimplifyCFG] Start redesigning `FoldTwoEntryPHINode()`. The current `FoldTwoEntryPHINode()` is not quite designed correctly. It starts from the merge point, and then tries to detect the 'divergence' point. Because of that, it is limited to the simple two-predecessor case, where the PHI completely goes away. but that is rather pessimistic, and it doesn't make much sense from the costmodel side of things. For example if there is some other unrelated predecessor of the merge point, we could split the merge point so that the then/else blocks first branch to an empty block and then to the merge point, and then we'd be able to speculate the then/else code. But if we'd instead simply start at the divergence point, and look for the merge point, then we'll just natively support this case. There's also the fact that `SpeculativelyExecuteBB()` already does just that, but only if there is a single block to speculate, and with a much more restrictive cost model. But that also means we have code duplication. Now, sadly, while this is as much NFCI as possible, there is just no way to cleanly migrate to the proper implementation. The results are going to be different somewhat because of various phase ordering effects and SimplifyCFG block iteration strategy.	2022-02-02 17:53:56 +03:00
serge-sans-paille	e188aae406	Cleanup header dependencies in LLVMCore Based on the output of include-what-you-use. This is a big chunk of changes. It is very likely to break downstream code unless they took a lot of care in avoiding hidden ehader dependencies, something the LLVM codebase doesn't do that well :-/ I've tried to summarize the biggest change below: - llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h - llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h - llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h - llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h - llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h - llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h - llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h And the usual count of preprocessed lines: $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l before: 6400831 after: 6189948 200k lines less to process is no that bad ;-) Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D118652	2022-02-02 06:54:20 +01:00
Anna Thomas	bc48a26655	[LoopPeel] Use reference instead of pointer for DT argument Cleanup code in peelLoop API. We already have usage of DT without guarding against a null DT, so this change constant folds the remaining null DT checks. Also make the argument a reference so that it is clear the argument is a nonnull DT. Extracted from D118472.	2022-02-01 17:00:08 -05:00
Nikita Popov	236fbf571d	[GlobalStatus] Skip non-pointer dead constant users Constant expressions with a non-pointer result type used an early exit that bypassed the later dead constant user check, and resulted in different optimization outcomes depending on whether dead users were present or not. This fixes the issue reported in https://reviews.llvm.org/D117223#3287039.	2022-02-01 15:51:32 +01:00
Fangrui Song	85dfe19b36	[ModuleUtils] Move EmbedBufferInModule to LLVMTransformsUtils D116542 adds EmbedBufferInModule which introduces a layer violation (https://llvm.org/docs/CodingStandards.html#library-layering). See `2d5f857a1e` for detail. EmbedBufferInModule does not use BitcodeWriter functionality and should be moved LLVMTransformsUtils. While here, change the function case to the prevailing convention. It seems that EmbedBufferInModule just follows the steps of EmbedBitcodeInModule. EmbedBitcodeInModule calls WriteBitcodeToFile but has IR update operations which ideally should be refactored to another library. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D118666	2022-01-31 16:33:57 -08:00
Jay Foad	8faad29634	Revert "[Local] invertCondition: try modifying an existing ICmpInst" This reverts commit `a6b54ddaba`. Apparently it is not safe to modify the condition even if it passes the hasOneUse test, because StructurizeCFG might have other references to the condition that are not manifest in the IR use-def chains.	2022-01-31 14:55:36 +00:00
Jay Foad	a6b54ddaba	[Local] invertCondition: try modifying an existing ICmpInst This avoids various cases where StructurizeCFG would otherwise insert an xor i1 instruction, and it since it generally runs late in the pipeline, instcombine does not clean up the xor-of-cmp pattern. Differential Revision: https://reviews.llvm.org/D118478	2022-01-31 10:44:17 +00:00
Nikita Popov	4810051a82	[Inline][Cloning] Reliably remove unreachable blocks during cloning (PR53206) The pruning cloner already tries to remove unreachable blocks. The original cloning process will simplify instructions and constant terminators, and only clone blocks that are reachable at that point. However, phi nodes can only be simplified after everything has been cloned. For that reason, additional blocks may become unreachable after phi simplification. The code does try to handle this as well, but only removes blocks that don't have predecessors. It misses unreachable cycles. This can cause issues if SEH exception handling code is part of an unreachable cycle, as the inliner is not prepared to deal with that. This patch instead performs an explicit scan for reachable blocks, and drops everything else. Fixes https://github.com/llvm/llvm-project/issues/53206. Differential Revision: https://reviews.llvm.org/D118449	2022-01-31 09:31:34 +01:00
Nikita Popov	7d176844d0	[CodeExtractor] Fix warning in assert (NFC)	2022-01-28 16:33:34 +01:00
Nikita Popov	cf0357a545	[BasicBlockUtils] Fix typo in API name (NFC) detatch -> detach. As this requires touching all uses, also lower-case it in accordance with the style guide.	2022-01-28 16:32:13 +01:00
Nikita Popov	91e5096d82	[InlineFunction] Use phis() iterator (NFC)	2022-01-28 10:36:28 +01:00
Florian Hahn	bb5c1b0691	[LoopVersioning] Use IRBuilder for OR simplification.	2022-01-27 09:55:51 +00:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
Nikita Popov	de8867a0b6	[AMDGPUEmitPrintf] Don't require specific pointer element type Rather than checking for i8, simply add a bitcast to i8, so the appendString() code sees the expected type.	2022-01-26 16:16:32 +01:00
Nikita Popov	903c3d2863	[SCEVExpander] Always use i8 GEP for reused value offset We could keep the non-i8 GEP code for non-opaque pointers, but there's two reasons I'm dropping it: First, this actually appears to be dead code, at least it isn't hit in any of our tests. I expect that this is because we usually expand trip counts, and those are never pointers (anymore). Second, the non-i8 GEP was actually incorrect in multiple ways, because it used SCEV type sizes, which don't match DL type sizes (for pointers) and certainly don't match type alloc sizes (which is what GEPs actually use). As such, I'm simplifying the code to always use the i8 GEP code path if it does get hit.	2022-01-26 15:38:58 +01:00
Nikita Popov	bec4e865de	[SCEVExpander] Remove pointer element type access in assertion Assert directly on i8 rather than the element type of i8*.	2022-01-26 10:35:57 +01:00
Giorgis Georgakoudis	95b981ca2a	[CodeExtractor] Enable partial aggregate arguments Summary: Enable CodeExtractor to construct output functions that partially aggregate inputs/outputs in their argument list. A use case is the OMPIRBuilder to create outlined functions for parallel regions that aggregate in a struct the payload variables for the region while passing as scalars thread and bound identifiers. Differential Revision: https://reviews.llvm.org/D96854	2022-01-25 20:50:34 -05:00
Nikita Popov	98db33349b	[SLC] Fix pointer diff type in sprintf() optimization We should always be calculating a byte-wise difference here. Previously this calculated the pointer difference while taking the pointer element type into account, which is incorrect.	2022-01-25 15:22:56 +01:00
Nikita Popov	6a008de82a	[Evaluator] Simplify handling of bitcasted calls When fetching the function, strip casts. When casting the result, use the call result type. Don't actually inspect the bitcast.	2022-01-25 14:19:04 +01:00
Nikita Popov	30d4a7e295	[IRBuilder] Require explicit element type in CreatePtrDiff() For opaque pointer compatibility, we cannot derive the element type from the pointer type.	2022-01-25 12:43:57 +01:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Ahmed Bougacha	e7298464c5	[ObjCARC] Use "UnsafeClaimRV" to refer to unsafeClaim in enums. NFC. This matches the actual runtime function more closely. I considered also renaming both RetainRV/UnsafeClaimRV to end with "ARV", for AutoreleasedReturnValue, but there's less potential for confusion there.	2022-01-24 19:37:01 -08:00
Heejin Ahn	eb675e972d	[WebAssembly] Support Wasm EH + Wasm SjLj D108960 added support for SjLj using Wasm EH instructions, which we call Wasm SjLj going forward. (We call the old SjLj Emscripten SjLj) But it did not support using Wasm EH and Wasm SjLj together. So far users of Wasm EH had to use Wasm EH with Emscripten SjLj, which had a certain limitation and it suffered from bigger code size increases as well. This enables using Wasm EH and Wasm SjLj together. 1. This redirects `catchswitch` and `cleanupret` that unwind to caller to `catch.dispatch.longjmp` BB, which is a `catchswitch` BB that handles longjmps. 2. D108960 converted all longjmpable `call`s to `invokes` that unwind to `catch.dispatch.longjmp`. This CL checks if the `call` is embedded within another `catchpad`, and if so, makes it unwind to its nearest parent's unwind destination, rather than `catch.dispatch.longjmp`. This is necessary to preserve the scoping structure. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D117610	2022-01-19 20:13:54 -08:00
Craig Topper	02d9a4d56d	[LoopPeel] Pass TripCount to computePeelCount by value instead of by reference. NFC The TripCount is not modified by the function so it doesn't need to be passed by reference. Verified by passing it as const reference before changing to value. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D117735	2022-01-19 17:54:45 -08:00
Craig Topper	1507786c22	[LoopPeeling] Fix stale comments. NFC These comments were not updated when PeelingPreferences split from UnrollingPreferences.	2022-01-19 17:00:12 -08:00
Nikita Popov	5ba73c924d	[BuildLibCalls] Mark calloc as inaccessiblememonly Now that DSE handles inaccessiblememonly calloc, mark it as such, as we do with other memory allocation functions.	2022-01-19 12:55:09 +01:00
spupyrev	13d1364a34	A better profi rebalancer This is an extension of profi post-processing step that rebalances counts in CFGs that have basic blocks w/o probes (aka "unknown" blocks). Specifically, the new version finds many more "unknown" subgraphs and marks more "unknown" basic blocks as hot (which prevents unwanted optimization passes). I see up to 0.5% perf on some (large) binaries, e.g., clang-10 and gcc-8. The algorithm is still linear and yields no build time overhead.	2022-01-18 12:14:24 -08:00
Jan Svoboda	5f4ae56457	[llvm] Remove uses of `std::vector<bool>` LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement. This patch does just that for llvm. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D117121	2022-01-18 18:20:45 +01:00
pvellien	4e1c207726	[SimplifyCFG] Fix assertion failure when reusing table switch comparison After D116332, some icmps no longer fold with the target-independent constant folder. The SimplifyCFG code assumed that the comparison would always fold, which is not guaranteed. Explicitly check that the result is either true or false. Differential Revision: https://reviews.llvm.org/D117184	2022-01-18 09:30:54 +01:00
Stephen Tozer	32417b3203	[DebugInfo] ValueMapper impl for DIArgList respects IgnoreMissingLocals This patch fixes an issue in which SSA value reference within a DIArgList would be unnecessarily dropped by llvm-link, even when invoking on a single file (which should be a no-op). The reason for the difference is that the ValueMapper does not refer to the RF_IgnoreMissingLocals flag for LocalAsMetadata contained within a DIArgList; this flag is used for direct LocalAsMetadata uses to preserve SSA references even when the ValueMapper does not have an explicit mapping for the referenced SSA value, which appears to always be the case when using llvm-link in this manner. Differential Revision: https://reviews.llvm.org/D114355	2022-01-17 17:17:32 +00:00
Nikita Popov	c63a3175c2	[AttrBuilder] Remove ctor accepting AttributeList and Index Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeList. Moving this out of the AttrBuilder generally results in cleaner code.	2022-01-15 22:39:31 +01:00
Nikita Popov	d1675e4944	[AttrBuilder] Remove empty() / td_empty() methods The empty() method is a footgun: It only checks whether there are non-string attributes, which is not at all obvious from its name, and of dubious usefulness. td_empty() is entirely unused. Drop these methods in favor of hasAttributes(), which checks whether there are any attributes, regardless of whether these are string or enum attributes.	2022-01-15 17:57:18 +01:00
Florian Hahn	e00158ed5c	[LoopUtils] Use InstSimplifyFolder in addRuntimeChecks. Use the InstSimplifyFolder introduced earlier to perform initial simplification during runtime check construction.	2022-01-15 15:21:16 +00:00
Roman Lebedev	82c8aca934	[SimplifyCFG] Be more aggressive when sinking into block followed by unreachable I strongly believe we need some variant of this. The main problem is e.g. that the glibc's assert has 4 parameters, but the profitability check is only okay with one extra phi node, so D116692 doesn't even trigger on most of the expected cases. While that restriction probably makes sense in normal code, if we are about to run off of a cliff (into an `unreachable`), this successor block is unlikely so the cost to setup these PHI nodes should not be on the hotpath, and shouldn't matter performance-wise. Likewise, we don't sink if there are unconditional predecessors UNLESS we'd sink at least one non-speculatable instruction, which is a performance workaround, but if we are about to run into `unreachable`, it shouldn't matter. Note that we only allow the case where there are at most unconditiona branches on the way to the unreachable block. Differential Revision: https://reviews.llvm.org/D117045	2022-01-13 23:30:31 +03:00
Florian Hahn	e3275cfa94	[BuildLibCalls] Add nounwind,willreturn to memset_pattern{4,8,16}. Similar to memset, memset_pattern{4,8,16} all will return and do not unwind. Use fallthrough to include all attributes also set for memset. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114904	2022-01-12 10:32:53 +00:00
Nikita Popov	47a47733f0	[GlobalStatus] Remove unused HasNonInstructionUser member (NFC) This hasn't been used in a long time.	2022-01-12 09:40:54 +01:00
Nikita Popov	94d6263391	[GlobalStatus] Look through non-constexpr casts analyzeGlobal() looks through non-constexpr cast instructions when looking for users. However, this particular place only strips the casts again if they are constexprs. We should be looking through all casts here.	2022-01-11 16:02:35 +01:00
Florian Hahn	2d67a86b7c	[SCEVExpander] Use IntToPtr for temporary instruction. Use PtrToInt instead Add when creating temporary instructions. The add might get folded away with more sophisticated folding.	2022-01-11 09:40:21 +00:00
Philip Reames	5265ac72c6	[MemoryBuiltin] Add an API for checking if an unused allocation can be removed [NFC] Not all allocation functions are removable if unused. An example of a non-removable allocation would be a direct call to the replaceable global allocation function in C++. An example of a removable one - at least according to historical practice - would be malloc.	2022-01-10 15:43:39 -08:00
Roman Lebedev	82fb4f4b22	[SCEV] Sequential/in-order `UMin` expression As discussed in https://github.com/llvm/llvm-project/issues/53020 / https://reviews.llvm.org/D116692, SCEV is forbidden from reasoning about 'backedge taken count' if the branch condition is a poison-safe logical operation, which is conservatively correct, but is severely limiting. Instead, we should have a way to express those poison blocking properties in SCEV expressions. The proposed semantics is: ``` Sequential/in-order min/max SCEV expressions are non-commutative variants of commutative min/max SCEV expressions. If none of their operands are poison, then they are functionally equivalent, otherwise, if the operand that represents the saturation point* of given expression, comes before the first poison operand, then the whole expression is not poison, but is said saturation point. ``` * saturation point - the maximal/minimal possible integer value for the given type The lowering is straight-forward: ``` compare each operand to the saturation point, perform sequential in-order logical-or (poison-safe!) ordered reduction over those checks, and if reduction returned true then return saturation point else return the naive min/max reduction over the operands ``` https://alive2.llvm.org/ce/z/Q7jxvH (2 ops) https://alive2.llvm.org/ce/z/QCRrhk (3 ops) Note that we don't need to check the last operand: https://alive2.llvm.org/ce/z/abvHQS Note that this is not commutative: https://alive2.llvm.org/ce/z/FK9e97 That allows us to handle the patterns in question. Reviewed By: nikic, reames Differential Revision: https://reviews.llvm.org/D116766	2022-01-10 20:51:26 +03:00
Serge Guelton	d2cc6c2d0c	Use a sorted array instead of a map to store AttrBuilder string attributes Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step. Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions Differential Revision: https://reviews.llvm.org/D116599	2022-01-10 14:49:53 +01:00
Florian Hahn	aecad5828e	[SCEVExpander] Only create trunc when needed. `9345ab3a45` updated generateOverflowCheck to skip creating checks that always evaluate to false. This in turn means that we only need to create TruncTripCount if it is actually used. Sink the TruncTripCount creating into ComputeEndCheck, so it is only created when there's an actual check.	2022-01-10 11:31:27 +00:00
Florian Hahn	ad1b8772cf	[SCEVExpander] Only create multiplication if needed. `9345ab3a45` updated generateOverflowCheck to skip creating checks that always evaluate to false. This in turn means that we only need to compute \|Step\| * Trip count if the result of the multiplication is actually used. Sink the multiplication into ComputeEndCheck, so it is only created when there's an actual check.	2022-01-10 08:49:25 +00:00
Florian Hahn	1ce01b7dfe	[SCEVExpander] Simplify cleanup, skip sorting by dominance. There is no need to sort inserted instructions by dominance, as the deletion loop still requires RAUW with undef before deleting. Removing instructions in reverse insertion order should still insure that the number of uselist updates is kept to a minimum.	2022-01-09 18:38:41 +00:00
Florian Hahn	7f1bf68d7d	[SCEVExpander] Only check overflow if it is needed. `9345ab3a45` updated generateOverflowCheck to skip creating checks that always evaluate to false. This in turn means that we only need to check for overflows if the result of the multiplication is actually used. Sink the Or for the overflow check into ComputeEndCheck, so it is only created when there's an actual check.	2022-01-09 12:55:41 +00:00
Florian Hahn	9345ab3a45	[SCEVExpander] Skip creating <u 0 check, which is always false. Unsigned compares of the form <u 0 are always false. Do not create such a redundant check in generateOverflowCheck. The patch introduces a new lambda to create the check, so we can exit early conveniently and skip creating some instructions feeding the check. I am planning to sink a few additional instructions as follow-ups, but I would prefer to do this separately, to keep the changes and diff smaller. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D116811	2022-01-08 10:31:04 +00:00
Florian Hahn	f395a4f8d5	[SCEVExpand] Only create required predicate checks. Currently generateOverflowCheck always creates code for Step being negative and positive, followed by a select at the end depending on Step's sign. This patch updates the code to only create either the checks for step being positive or negative, if the sign is known. Follow-up to D116696. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D116747	2022-01-07 14:49:02 +00:00
Nikita Popov	348bc76e35	[LibCalls] Infer same attrs for reallocf() as realloc() reallocf() is the same as realloc() but frees the input pointer on failure as well. We can infer the same attributes. Also combine some cases that infer the same attributes and are logically related.	2022-01-07 09:51:15 +01:00
Kazu Hirata	2aed08131d	[llvm] Use true/false instead of 1/0 (NFC) Identified with modernize-use-bool-literals.	2022-01-07 00:39:14 -08:00
Nikita Popov	c8189da201	[ModuleUtils] Remove dead arg from filterDeadComdatFunctions() (NFC) The module argument is no longer used.	2022-01-07 09:12:16 +01:00
Philip Reames	916b35e783	[unroll] Strengthen verification of analysis updates under expensive asserts I am suspecting a bug around updates of loop info for unreachable exits, but don't have a test case. Running this locally on make check didn't reveal anything, we'll see if the expensive checks bots find it.	2022-01-06 08:51:50 -08:00
Florian Hahn	86d113a8b8	[SCEVExpand] Do not create redundant 'or false' for pred expansion. This patch updates SCEVExpander::expandUnionPredicate to not create redundant 'or false, x' instructions. While those are trivially foldable, they can be easily avoided and hinder code that checks the size/cost of the generated checks before further folds. I am planning on look into a few other similar improvements to code generated by SCEVExpander. I remember a while ago @lebedev.ri working on doing some trivial folds like that in IRBuilder itself, but there where concerns that such changes may subtly break existing code. Reviewed By: reames, lebedev.ri Differential Revision: https://reviews.llvm.org/D116696	2022-01-06 11:52:19 +00:00
Nikita Popov	32808cfb24	[IR] Track users of comdats Track all GlobalObjects that reference a given comdat, which allows determining whether a function in a comdat is dead without scanning the whole module. In particular, this makes filterDeadComdatFunctions() have complexity O(#DeadFunctions) rather than O(#SymbolsInModule), which addresses half of the compile-time issue exposed by D115545. Differential Revision: https://reviews.llvm.org/D115864	2022-01-06 09:13:58 +01:00
Sanjay Patel	e2165e0968	[InstCombine] remove trunc user restriction for match of bswap This does not appear to cause any problems, and it fixes #50910 Extra tests with a trunc user were added with: `3a239379` ...but they don't match either way, so there's an opportunity to improve the matching further.	2022-01-05 13:04:11 -05:00
Philip Reames	c16fd6a376	Rename doesNotReadMemory to onlyWritesMemory globally [NFC] The naming has come up as a source of confusion in several recent reviews. onlyWritesMemory is consist with onlyReadsMemory which we use for the corresponding readonly case as well.	2022-01-05 08:52:55 -08:00
Nikita Popov	6e474d3308	[GlobalOpt][Evaluator] Fix off by one error in bounds check (PR53002) We should bail out if the index is >= the size, not > the size. Fixes https://github.com/llvm/llvm-project/issues/53002.	2022-01-05 14:06:02 +01:00
Benjamin Kramer	5f0a349738	Revert "Revert "[InferAttrs] Add writeonly to all the math functions"" This reverts commit `29b6e967f3`. The bug it found in PartiallyInlineLibCalls was fixed in `c8ffc73350`.	2022-01-05 12:16:35 +01:00
Sjoerd Meijer	e550dfa4a6	Silence a few unused variable warnings. NFC.	2022-01-05 09:15:07 +00:00
Martin Storsjö	29b6e967f3	Revert "[InferAttrs] Add writeonly to all the math functions" This reverts commit `ea75be3d9d` and `1eb5b6e850`. That commit caused crashes with compilation e.g. like this (not fixed by the follow-up commit): $ cat sqrt.c float a; b() { sqrt(a); } $ clang -target x86_64-linux-gnu -c -O2 sqrt.c Attributes 'readnone and writeonly' are incompatible! %sqrtf = tail call float @sqrtf(float %0) #1 in function b fatal error: error in backend: Broken function found, compilation aborted!	2022-01-05 11:12:19 +02:00
Nikita Popov	787f86e68c	[GlobalOpt][Evaluator] Don't create bitcast for same type (PR52994) isBitOrNoopPointerCastable() returns true if the types are the same, but it's not actually possible to create a bitcast for all such types. The assumption seems to be that the user will omit creating the cast in that case, as it is unnecessary. Fixes https://github.com/llvm/llvm-project/issues/52994.	2022-01-05 09:17:07 +01:00
Fangrui Song	1eb5b6e850	[InferAttrs] If readonly is already set, set readnone instead of writeonly D116426 may lead to an assertion failure `Attributes 'readonly and writeonly' are incompatible!` if the builtin function already has `readonly`.	2022-01-04 18:59:35 -08:00
Benjamin Kramer	ea75be3d9d	[InferAttrs] Add writeonly to all the math functions All of these functions would be `readnone`, but can't be on platforms where they can set `errno`. A `writeonly` function with no pointer arguments can only write (but never read) global state. Writeonly theoretically allows these calls to be CSE'd (a writeonly call with the same arguments will always result in the same global stores) or hoisted out of loops, but that's not implemented currently. There are a few functions in this list that could be `readnone` instead of `writeonly`, if someone is interested. Differential Revision: https://reviews.llvm.org/D116426	2022-01-04 16:58:05 +01:00
Nikita Popov	bbeaf2aac6	[GlobalOpt][Evaluator] Rewrite global ctor evaluation (fixes PR51879) Global ctor evaluation currently models memory as a map from Constant* to Constant. For this to be correct, it is required that there is only a single Constant referencing a given memory location. The Evaluator tries to ensure this by imposing certain limitations that could result in ambiguities (by limiting types, casts and GEP formats), but ultimately still fails, as can be seen in PR51879. The approach is fundamentally fragile and will get more so with opaque pointers. My original thought was to instead store memory for each global as an offset => value representation. However, we also need to make sure that we can actually rematerialize the modified global initializer into a Constant in the end, which may not be possible if we allow arbitrary writes. What this patch does instead is to represent globals as a MutableValue, which is either a Constant* or a MutableAggregate. The mutable aggregate exists to allow efficient mutation of individual aggregate elements, as mutating an element on a Constant would require interning a new constant. When a write to the Constant is made, it is converted into a MutableAggregate* as needed. I believe this should make the evaluator more robust, compatible with opaque pointers, and a bit simpler as well. Fixes https://github.com/llvm/llvm-project/issues/51221. Differential Revision: https://reviews.llvm.org/D115530	2022-01-04 09:30:54 +01:00
Philip Reames	7203140748	Revert "[unroll] Prune all but first copy of invariant exit" This reverts commit `9bd22595ba`. Seeing some bot failures which look plausibly connected. Revert while investigating/waiting for bots to stablize. e.g. https://lab.llvm.org/buildbot#builders/36/builds/15933	2022-01-03 11:57:35 -08:00
Craig Topper	cbcbbd6ac8	[ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC This function returns an upper bound on the number of bits needed to represent the signed value. Use "Max" to match similar functions in KnownBits like countMaxActiveBits. Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old name around to keep this patch size down. Will do a bulk rename as follow up. Rename KnownBits::countMaxSignedBits->countMaxSignificantBits. Reviewed By: lebedev.ri, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D116522	2022-01-03 11:33:30 -08:00
Craig Topper	14849fe554	[SimplifyCFG] Make use of ComputeMinSignedBits and KnownBits::getBitWidth. NFC	2022-01-03 10:08:14 -08:00
Philip Reames	9bd22595ba	[unroll] Prune all but first copy of invariant exit If we have an exit which is controlled by a loop invariant condition and which dominates the latch, we know only the copy in the first unrolled iteration can be taken. All other copies are dead. The change itself is pretty straight forward, but let me add two points of context: * I'd have expected other transform passes to catch this after unrolling, but I'm seeing multiple examples where we get to the end of O2/O3 without simplifying. * I'd like to do a stronger change which did CSE during unroll and accounted for invariant expressions (as defined by SCEV instead of trivial ones from LoopInfo), but that doesn't fit cleanly into the current code structure. Differential Revision: https://reviews.llvm.org/D116496	2022-01-03 09:55:19 -08:00
Nikita Popov	730414b341	[CodeExtractor] Remove unnecessary explicit attribute handling (NFC) The nounwind and uwtable attributes will get handled as part of the loop below as well, there is no need to special-case them here.	2022-01-03 14:23:25 +01:00
Nikita Popov	587495ffa1	[CodeExtractor] Separate function from param/ret attributes (NFC) This list is confusing because it conflates functions attributes (which are either extractable or not) and other attribute kinds, which are simply irrelevant for this code.	2022-01-03 14:07:12 +01:00
Benjamin Kramer	4683ce2cd8	[InferAttrs] Give strnlen the same attributes as strlen This moves the only string function out of the big list of math funcs. And let's us CSE strnlen calls.	2021-12-30 20:43:43 +01:00
Kazu Hirata	e7774f499b	Use static_assert instead of assert (NFC) Identified with misc-static-assert.	2021-12-26 14:26:44 -08:00
Djordje Todorovic	93615b88f5	[Debugify] Use WeakWH map collected before Pass when checking loc drop This fixes a typo/bug when checking for pointer reuse when testing DI location preservation in the Debugify original mode (when checking -g generated Debug Info). Differential Revision: https://reviews.llvm.org/D115621	2021-12-21 15:54:09 +01:00
Sami Tolvanen	5dc8aaac39	[llvm][IR] Add no_cfi constant With Control-Flow Integrity (CFI), the LowerTypeTests pass replaces function references with CFI jump table references, which is a problem for low-level code that needs the address of the actual function body. For example, in the Linux kernel, the code that sets up interrupt handlers needs to take the address of the interrupt handler function instead of the CFI jump table, as the jump table may not even be mapped into memory when an interrupt is triggered. This change adds the no_cfi constant type, which wraps function references in a value that LowerTypeTestsModule::replaceCfiUses does not replace. Link: https://github.com/ClangBuiltLinux/linux/issues/1353 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D108478	2021-12-20 12:55:32 -08:00
Kazu Hirata	26bd534a79	[llvm] Use none_of instead of \!any_of (NFC)	2021-12-17 13:48:57 -08:00
Kazu Hirata	2b7be47b22	[llvm] Strip redundant lambda (NFC)	2021-12-17 10:51:40 -08:00
Philip Reames	f632c49478	Extract a helper function for computing estimate trip count of an exiting branch Plan to use this in following change to support estimated trip counts derived from multiple loop exits.	2021-12-16 17:29:32 -08:00
eopXD	6734be290b	Revert "[LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified" This reverts commit `fbf6c8ac15`.	2021-12-16 02:06:11 -08:00
Yueh-Ting Chen	fbf6c8ac15	[LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified After this function call, the LLVM IR would look like the following: ``` if (true) /* NonVersionedLoop / else / VersionedLoop */ ``` Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D104631	2021-12-16 00:41:29 -08:00
Arthur Eubanks	d5583366ba	[FunctionComparator] Use getAlign() instead of getAlignment() getAlignment() is deprecated.	2021-12-15 14:40:56 -08:00
Kazu Hirata	7787a8f1b7	[llvm] Use llvm::reverse (NFC)	2021-12-13 21:54:51 -08:00
Nick Desaulniers	95ba0e4563	[SimplifyLibCalls] propagate tail flags on CallInsts I noticed we weren't propagating tail flags on calls when FortifiedLibCallSimplifier.optimizeCall() was replacing calls to runtime checked calls to the non-checked routines (when safe to do so). Make sure to check this before replacing the original calls! Also, avoid any libcall transforms when notail/musttail is present. PR46734 Fixes: https://github.com/llvm/llvm-project/issues/46079 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D107872	2021-12-13 11:18:30 -08:00
Gulfem Savrun Yeniceri	e5a8af7a90	[Passes] Fix relative lookup table converter pass This patch fixes the relative table converter pass for the lookup table accesses that are resulted in an instruction sequence, where gep is not immediately followed by a load, such as gep being hoisted outside the loop or another instruction is inserted in between them. The fix inserts the call to load.relative.instrinsic in the original place of load instead of gep. Issue is reported by FreeBSD via https://bugs.freebsd.org/259921. Differential Revision: https://reviews.llvm.org/D115571	2021-12-12 04:40:17 +00:00
Philip Reames	0d13f94c1d	[reductions] Delete another piece of dead flag handling [NFC] The code claimed to handle nsw/nuw, but those aren't passed via builder state and the explicit IR construction just above never sets them. The only case this bit of code is actually relevant for is FMF flags. However, dropPoisonGeneratingFlags currently doesn't know about FMF at all, so this was a noop. It's also unneeded, as the caller explicitly configures the flags on the builder before this call, and the flags on the individual ops should be controled by the intrinsic flags anyways. If any of the flags aren't safe to propagate, the caller needs to make that change.	2021-12-09 10:56:55 -08:00
Philip Reames	b24db85c0b	[recurrence] Delete dead flag/fmf handling [NFC] The recurrence lowering code has handling which claims to be about flag intersection, but all the callers pass empty arrays to the arguments. The sole exception is a caller of a method which has the argument, but no implementation. I don't know what the intent was here, but it certaintly doesn't actually do anything today.	2021-12-09 10:43:53 -08:00
Philip Reames	2d31b02517	Compute estimated trip counts for multiple exit loops This change allows us to estimate trip count from profile metadata for all multiple exit loops. We still do the estimate only from the latch, but that's fine as it causes us to over estimate the trip count at worst. Reviewing the uses of the API, all but one are cases where we restrict a loop transformation (unroll, and vectorize respectively) when we know the trip count is short enough. So, as a result, the change makes these passes strictly less aggressive. The test change illustrates a case where we'd previously have runtime unrolled a loop which ran fewer iterations than the unroll factor. This is definitely unprofitable. The one case where an upper bound on estimate trip count could drive a more aggressive transform is peeling, and I duplicated the logic being removed from the generic estimation there to keep it the same. The resulting heuristic makes no sense and should probably be immediately removed, but we can do that in a separate change. This was noticed when analyzing regressions on D113939. I plan to come back and incorporate estimated trip counts from other exits, but that's a minor improvement which can follow separately. Differential Revision: https://reviews.llvm.org/D115362	2021-12-09 09:53:49 -08:00
Dmitry Makogon	0b533c1833	[MetaRenamer] Add command line options to disable renaming name with specified prefixes This patch adds 4 options for specifying functions, aliases, globals and structs name prefixes hat don't need to be renamed by MetaRenamer pass. This is useful if one has some downstream logic that depends directly on an entity name. MetaRenamer can break this logic, but with the patch you can tell it not to rename certain names. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D115323	2021-12-09 18:45:06 +07:00
spupyrev	f573f6866e	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-07 07:31:10 -08:00
Nico Weber	3678326d28	Revert "ext-tsp basic block layout" This reverts commit `c68f71eb37`. Breaks tests on arm hosts, see comments on https://reviews.llvm.org/D113424	2021-12-06 19:08:20 -05:00
spupyrev	c68f71eb37	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-06 08:56:39 -08:00
Anna Thomas	72750f0012	[TrivialDeadness] Introduce API separating two different usages The earlier usage of wouldInstructionBeTriviallyDead is based on the assumption that the use_count of that instruction being checked will be zero. This patch separates the API into two different ones: 1. The strictly conservative one where the instruction is trivially dead iff the uses are dead. 2. The slightly relaxed form, where an instruction is dead along paths where it is not used. The second form can be used in identifying instructions that are valid to sink down to uses (D109917). Reviewed-By: reames Differential Revision: https://reviews.llvm.org/D114647	2021-12-03 10:09:52 -05:00
spupyrev	93a2c2919f	profi - a flow-based profile inference algorithm: Part III (out of 3) This is a continuation of D109860 and D109903. An important challenge for profile inference is caused by the fact that the sample profile is collected on a fully optimized binary, while the block and edge frequencies are consumed on an early stage of the compilation that operates with a non-optimized IR. As a result, some of the basic blocks may not have associated sample counts, and it is up to the algorithm to deduce missing frequencies. The problem is illustrated in the figure where three basic blocks are not present in the optimized binary and hence, receive no samples during profiling. We found that it is beneficial to treat all such blocks equally. Otherwise the compiler may decide that some blocks are “cold” and apply undesirable optimizations (e.g., hot-cold splitting) regressing the performance. Therefore, we want to distribute the counts evenly along the blocks with missing samples. This is achieved by a post-processing step that identifies "dangling" subgraphs consisting of basic blocks with no sampled counts; once the subgraphs are found, we rebalance the flow so as every branch probability is 50:50 within the subgraphs. Our experiments indicate up to 1% performance win using the optimization on some binaries and a significant improvement in the quality of profile counts (when compared to ground-truth instrumentation-based counts) {F19093045} Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D109980	2021-12-02 12:01:30 -08:00
spupyrev	98dd2f9ed3	profi - a flow-based profile inference algorithm: Part II (out of 3) This is a continuation of D109860. Traditional flow-based algorithms cannot guarantee that the resulting edge frequencies correspond to a connected flow in the control-flow graph. For example, for an instance in the attached figure, a flow-based (or any other) inference algorithm may produce an output in which the hot loop is disconnected from the entry block (refer to the rightmost graph in the figure). Furthermore, creating a connected minimum-cost maximum flow is a computationally NP-hard problem. Hence, we apply a post-processing adjustments to the computed flow by connecting all isolated flow components ("islands"). This feature helps to keep all blocks with sample counts connected and results in significant performance wins for some binaries. {F19077343} Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D109903	2021-12-02 11:04:21 -08:00
Kazu Hirata	22d82949b0	[llvm] Fix "unused variable" warnings	2021-12-02 09:20:17 -08:00
Djordje Todorovic	2cdc6f2ca6	Reland "[LICM] Hoist LOAD without sinking the STORE" When doing load/store promotion within LICM, if we cannot prove that it is safe to sink the store we won't hoist the load, even though we can prove the load could be dereferenced and moved outside the loop. This patch implements the load promotion by moving it in the loop preheader by inserting proper PHI in the loop. The store is kept as is in the loop. By doing this, we avoid doing the load from a memory location in each iteration. Please consider this small example: loop { var = ptr; if (var) break; ptr= var + 1; } After this patch, it will be: var0 = ptr; loop { var1 = phi (var0, var2); if (var1) break; var2 = var1 + 1; ptr = var2; } This addresses some problems from [0]. [0] https://bugs.llvm.org/show_bug.cgi?id=51193 Differential revision: https://reviews.llvm.org/D113289	2021-12-02 03:53:50 -08:00
Florian Hahn	2de5f39e54	[BuildLibCalls] Add support for memset_pattern{4,8}. Add support for memset_pattern{4,8} similar to the existing memset_pattern16 handling. Reviewed By: ab Differential Revision: https://reviews.llvm.org/D114883	2021-12-02 11:04:25 +00:00
Nikita Popov	a0ff26e08c	[GlobalOpt] Fix assertion failure during instruction deletion This fixes the assertion failure reported in https://reviews.llvm.org/D114889#3166417, by making RecursivelyDeleteTriviallyDeadInstructionsPermissive() more permissive. As the function accepts a WeakTrackingVH, even if originally only Instructions were inserted, we may end up with different Value types after a RAUW operation. As such, we should not assume that the vector only contains instructions. Notably this matches the behavior of the RecursivelyDeleteTriviallyDeadInstructions() function variant which accepts a single value rather than vector.	2021-12-02 11:58:39 +01:00
Florian Hahn	0496edad49	[BuildLibCalls] Add additional attrs to memcpy_chk. `memcpy_chk` can be treated like `memcpy`, with the exception that it may not return (if it aborts the program). See D114793 for a similar patch for `memset_chk`. Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D114863	2021-12-02 09:50:14 +00:00
Arthur Eubanks	512534bc16	[Cloning] Clone metadata on function declarations Previously we missed cloning metadata on function declarations because we don't call CloneFunctionInto() on declarations in CloneModule(). Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D113812	2021-12-01 15:40:05 -08:00
spupyrev	7cc2493daa	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. UPD Dec 1st 2021: - synced the declaration and definition of the option `SampleProfileUseProfi ` to use type `cl::opt<bool`; - added `inline` for `SampleProfileInference<BT>::findUnlikelyJumps` and `SampleProfileInference<BT>::isExit` to avoid linking problems on windows. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-12-01 15:30:38 -08:00

... 3 4 5 6 7 ...

6465 Commits