llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazu Hirata	41ae78ea3a	Use has_value instead of hasValue (NFC)	2022-07-19 20:15:44 -07:00
Johannes Doerfert	f84712f0b8	[Attributor] Teach checkForAllUses to follow returns into callers If we can determine all call sites we can follow a use in a return instruction into the caller. AAPointerInfo utilizes this feature.	2022-07-19 18:17:40 -05:00
Johannes Doerfert	4f2ccdd0b1	[Attributor][NFC] Improve debug messages	2022-07-19 18:17:40 -05:00
Nick Desaulniers	1cf6b93df1	Revert "[Local] Allow creating callbr with duplicate successors" This reverts commit `08860f525a`. Crashes during PPC64LE linux kernel builds as reported by @nathanchance. https://reviews.llvm.org/D129997#3663632	2022-07-19 15:03:27 -07:00
Johannes Doerfert	bf789b1957	[Attributor] Replace AAValueSimplify with AAPotentialValues For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many reasons: - We recomputed the result a lot as there was no caching for the 9 locations calling `genericValueTraversal`. - We added the idea of "intra" vs. "inter" procedural simplification only as an afterthought. `genericValueTraversal` did offer an option but `AAValueSimplify` did not. Thus, we might end up with "too much" simplification in certain situations and then gave up on it. - Because `genericValueTraversal` was not a real `AA` we ended up with problems like the infinite recursion bug (#54981) as well as code duplication. This patch introduces `AAPotentialValues` and replaces the `AAValueSimplify` uses with it. `genericValueTraversal` is folded into `AAPotentialValues` as are the instruction simplifications performed in `AAValueSimplify` before. We further distinguish "intra" and "inter" procedural simplification now. `AAValueSimplify` was not deleted as we haven't ported the re-materialization of instructions yet. There are other differences over the former handling, e.g., we may not fold trivially foldable instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2` but if an operand would be simplified to `i32 1` we would fold it still. We are also even more aware of function/SCC boundaries in CGSCC passes, which is good even if some tests look like they regress. Fixes: https://github.com/llvm/llvm-project/issues/54981 Note: A previous version was flawed and consequently reverted in `6555558a80`.	2022-07-19 16:24:42 -05:00
Arthur Eubanks	13aa2c1c3b	[DSE] Revisit pointers that may no longer escape after removing another store In dependent-capture, previously we'd see that %tmp4 is captured due to the first store. We'd cache this info in CapturedBeforeReturn and InvisibleToCallerAfterRet. Then the first store is then removed, causing the cached values to be wrong. We also need to revisit everything because normally we work backwards when removing stores at the end of the function, but in this case removing an earlier store causes a later store to be removable. No compile time impact: https://llvm-compile-time-tracker.com/compare.php?from=56796ae1a8db4c85dada28676f8303a5a3609c63&to=21b7e5248ffc423cd36c9d4a020085e363451465&stat=instructions Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D123686	2022-07-19 09:30:34 -07:00
Sanjay Patel	3d6c10dcf3	[SimplifyLibCalls] avoid converting pow() to powi() with no FMF powi() is not a standard math library function; it is specified with non-strict semantics in the LangRef. We currently require 'afn' to do this transform when it needs a sqrt(), so I just extended that requirement to the whole-number exponent too. This bug was introduced with: `b17754bcaa` ...where we deferred expansion of pow() to later passes.	2022-07-19 12:26:53 -04:00
Arnold Schwaighofer	bc4870f09e	[coro async] Add missing llvm.coro.id.async intrinsic to declaresCoroCleanupIntrinsics rdar://97214593 Differential Revision: https://reviews.llvm.org/D130038	2022-07-19 07:25:04 -07:00
Andrew Turner	b850762b62	Add the FreeBSD AArch64 memory layout Use the FreeBSD AArch64 memory layout values when building for it. These are based on the x86_64 values, scaled to take into account the larger address space on AArch64. Reviewed by: vitalybuka Differential Revision: https://reviews.llvm.org/D125883	2022-07-19 09:58:07 -04:00
Andrew Turner	e13bd2644e	Add the FreeBSD AArch64 shadow offset to llvm AArch64 has a larger address space than 64 but x86. Use the larger shadow offset on FreeBSD AArch64. Reviewed by: vitalybuka Differential Revision: https://reviews.llvm.org/D125873	2022-07-19 09:58:07 -04:00
William Schmidt	bccc9aa81c	Don't vectorize PHIs in catchswitch blocks We currently assert in vectorizeTree(TreeEntry*) when processing a PHI bundle in a block containing a catchswitch. We attempt to set the IRBuilder insertion point following the catchswitch, which is invalid. This is done so that ShuffleBuilder.finalize() knows where to insert a shuffle if one is needed. To avoid this occurring, watch out for catchswitch blocks during buildTree_rec() processing, and avoid adding PHIs in such blocks to the vectorizable tree. It is unlikely that constraining vectorization over an exception path will cause a noticeable performance loss, so this seems preferable to trying to anticipate when a shuffle will and will not be required.	2022-07-19 06:10:17 -07:00
Nikita Popov	08860f525a	[Local] Allow creating callbr with duplicate successors Since D129288, callbr is allowed to have duplicate successors. This patch removes a limitation which prevents optimizations from actually producing such callbrs. Differential Revision: https://reviews.llvm.org/D129997	2022-07-19 14:28:22 +02:00
Florian Hahn	a75760a269	[LV] Remove unnecessary cast in widenCallInstruction. (NFC)	2022-07-19 11:23:24 +01:00
Max Kazantsev	82309831c3	[LoopSimplifyCFG] Prevent use-def dominance breach by handling dead exits. PR56243 One of the transforms in LoopSimplifyCFG demands that the LCSSA form is truly maintained for all values, tokens included, otherwise it may end up creating a use that is not dominated by def (and Phi creation for tokens is impossible). Detect this situation and prevent transform for it early. Differential Revision: https://reviews.llvm.org/D129984 Reviewed By: efriedma	2022-07-19 15:54:12 +07:00
Ellis Hoag	3580daacf3	[InstrProf] Allow CSIRPGO function entry coverage The flag `-fcs-profile-generate` for enabling CSIRPGO moves the pass `pgo-instrumentation` after inlining. Function entry coverage works fine with this change, so remove the assert. I had originally left this assert in because I had not tested this at the time. Reviewed By: davidxl, MaskRay Differential Revision: https://reviews.llvm.org/D129407	2022-07-18 15:10:11 -07:00
Florian Hahn	30e53b8c03	[LV] Sink module variable and use State to set it in widenCall. (NFC) Limits the lifetime of the variable and makes it independent of CallInst.	2022-07-18 19:41:48 +01:00
Arnold Schwaighofer	28ebd13d63	[coro async] Fix code to run coro.async.end cleanup like the legacy pass did The code executed for the Switch ABI does not change. rdar://97074714 Differential Revision: https://reviews.llvm.org/D129865	2022-07-18 10:41:29 -07:00
Nicolai Hähnle	3443788087	Revert "Inliner: don't mark call sites as 'nounwind' if that would be redundant" This reverts commit `9905c37981`. Looks like there are Clang changes that are affected in trivial ways. Will look into it.	2022-07-18 17:43:35 +02:00
Nicolai Hähnle	9905c37981	Inliner: don't mark call sites as 'nounwind' if that would be redundant When F calls G calls H, G is nounwind, and G is inlined into F, then the inlined call-site to H should be effectively nounwind so as not to lose information during inlining. If H itself is nounwind (which often happens when H is an intrinsic), we no longer mark the callsite explicitly as nounwind. Previously, there were cases where the inlined call-site of H differs from a pre-existing call-site of H in F only in the explicitly added nounwind attribute, thus preventing common subexpression elimination. v2: - just check CI->doesNotThrow Differential Revision: https://reviews.llvm.org/D129860	2022-07-18 17:28:52 +02:00
Sanjay Patel	26fbb79c33	[InstCombine] reduce code for signbit folds; NFC	2022-07-18 11:04:58 -04:00
Nikita Popov	21e2f133a8	[LoopSimplifyCFG] Revert accidental change This change was included in an unrelated change `b57d61384c` and was of course not intended for commit...	2022-07-18 15:30:13 +02:00
Nikita Popov	b57d61384c	[ConstantRangeTest] Move nowrap binop tests to generic infrastructure (NFC) Move testing for add/sub with nowrap flags to TestBinaryOpExhaustive, rather than separate homegrown exhaustive testing functions.	2022-07-18 15:14:17 +02:00
Kristina Bessonova	44736c1d49	[CloneFunction][DebugInfo] Avoid cloning DILexicalBlocks of inlined subprograms If DISubpogram was not cloned (e.g. we are cloning a function that has other functions inlined into it, and subprograms of the inlined functions are not supposed to be cloned), it doesn't make sense to clone its DILexicalBlocks as well. Otherwise we'll get duplicated DILexicalBlocks that may confuse debug info emission in AsmPrinter. I believe it also makes no sense cloning any DILocalVariables or maybe other local entities, if their parent subprogram was not cloned, cause they will be dangling and will not participate in futher emission. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D127102	2022-07-18 13:14:52 +02:00
Nikita Popov	8201e3ef5c	[BasicBlockUtils] Don't drop callbr with unique successor As callbr is now allowed to have duplicate destinations, we can have a callbr with a unique successor. Make sure it doesn't get dropped, as we still need to preserve the side-effect.	2022-07-18 12:26:29 +02:00
Nikita Popov	4fba35f973	[InstCombine] Clarify invoke/callbr handling in constexpr call fold (NFCI) We only need to check the block for the normal/default destination, not for other destinations. Using the value in those would be illegal anyway. The callbr case cannot actually happen here, because callbr is currently limited to inline asm. Retaining it to match the spirit of the original code.	2022-07-18 12:02:46 +02:00
Florian Hahn	105032f549	[LV] Use PHI recipe instead of PredRecipe for subsequent uses. At the moment, the VPPRedInstPHIRecipe is not used in subsequent uses of the predicate recipe. This incorrectly models the def-use chains, as all later uses should use the phi recipe. Fix that by delaying recording of the recipe. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D129436	2022-07-18 09:35:34 +01:00
Nikita Popov	11079e8820	[IR] Don't treat callbr as indirect terminator Callbr is no longer an indirect terminator in the sense that is relevant here (that it's successors cannot be updated). The primary effect of this change is that callbr no longer prevents formation of loop simplify form. I decided to drop the isIndirectTerminator() method entirely and replace it with isa<IndirectBrInst>() checks. I assume this method was added to abstract over indirectbr and callbr, but it never really caught on, and there is nothing left to abstract anymore at this point. Differential Revision: https://reviews.llvm.org/D129849	2022-07-18 09:32:08 +02:00
Fangrui Song	0e3447bf8a	[LegacyPM] Remove WholeProgramDevirt Unused after LTO removal from legacy optimization passline.	2022-07-17 23:14:53 -07:00
Fangrui Song	1f90cc589e	[LegacyPM] Remove FunctionImportLegacyPass Unused after ThinLTO was removed from legacy optimization pipeline.	2022-07-17 23:06:46 -07:00
Kazu Hirata	7094ab4ee7	[llvm] Modernize bool literals (NFC) Identified with modernize-use-bool-literals.	2022-07-17 18:08:51 -07:00
Kazu Hirata	3112987d5c	Remove unused forward declarations (NFC)	2022-07-17 15:37:48 -07:00
Kazu Hirata	8b3ed1fa98	Remove redundant return statements (NFC) Identified with readability-redundant-control-flow.	2022-07-17 15:37:46 -07:00
Fangrui Song	bbaa015e82	[LegacyPM] Remove LowerTypeTestsPass Unused after LTO removal from optimization passline.	2022-07-17 15:06:38 -07:00
Fangrui Song	a6942256ca	[LegacyPM] Remove NameAnonGlobalLegacyPass Unused after LTO removal from optimization passline.	2022-07-17 14:38:29 -07:00
Fangrui Song	d74b88c69d	[LegacyPM] Remove CanonicalizeAliasesLegacyPass Unused after LTO removal from optimization passline.	2022-07-17 14:30:22 -07:00
Fangrui Song	70519a1fba	[LegacyPM] Remove LTO passes from optimization pipeline Following recent changes removing non-core features of the legacy PM/optimization pipeline.	2022-07-17 14:24:36 -07:00
Fangrui Song	f502115561	[LegacyPM] Remove PGO options from PassManagerBuilder They have been dead since legacy PGO/SamplePGO passes were removed.	2022-07-17 14:03:23 -07:00
Fangrui Song	dd5e3f0e27	[LegacyPM] Remove SampleProfileLoaderLegacyPass Following recent changes removing non-core features of the legacy PM/optimization pipeline (e.g. PGO), remove SamplePGO.	2022-07-17 12:09:46 -07:00
Florian Hahn	cc0ee17951	[LV] Move VPPredInstPHIRecipe::execute to VPlanRecipes.cpp (NFC)	2022-07-17 11:34:23 +01:00
zhongyunde	3a6b766b1b	[IndVars] Directly use unsigned integer induction for FPToUI/FPToSI of float induction Depend on D129358 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129756	2022-07-17 10:48:35 +08:00
Florian Hahn	6813b41d57	[LV] Avoid creating new run-time VF expression for each runtime checks. At the moment, the cost of runtime checks for scalable vectors is overestimated due to creating separate vscale * VF expressions for each check. Instead re-use the first expression.	2022-07-16 17:24:07 +01:00
David Green	4b7913c357	[VectorCombine] Only consider shuffle uses with the same type. The backend getShuffleCosts do not currently handle shuffles that change size very well. Limit the shuffles we collect to the same type to make sure they do not cause issues as reported in D128732.	2022-07-16 13:23:39 +01:00
Fangrui Song	f9d6f37201	[LegacyPM] Remove ControlHeightReductionLegacyPass This pass tries to reduce the number of conditional branches in the hot path based on profile. It's mostly a no-op after legacy PGO passes are moved.	2022-07-16 01:35:56 -07:00
Fangrui Song	3a42c499c2	[LegacyPM] Remove createInstrProfilingLegacyPass Follow the steps of removing non-core instrumentation passes like PGO.	2022-07-16 01:26:40 -07:00
Fangrui Song	685775bbab	[LegacyPM] Remove CGProfileLegacyPass It's mostly a no-op after I removed legacy PGO passes in D123834.	2022-07-16 00:39:56 -07:00
Fangrui Song	df8f5be596	[LegacyPM] Remove ModuleSanitizerCoverageLegacyPass Follow the steps of various other legacy instrumentation passes removed for 15.0.0.	2022-07-15 19:01:20 -07:00
Rong Xu	5e0443292b	[PGO] Report number of counts being dropped when a hash-mismatch happens This patch reports number of counts being dropped when a hash-mismatch happens. This information will be helpful to the users -- if the dropped counts are large, the user should redo the instrumentation build and recollect the profile. Differential Revision: https://reviews.llvm.org/D129001	2022-07-15 14:53:59 -07:00
Rong Xu	19ac75364f	[PGO] Improve hash-mismatch warning message This patch improves FDO hash-mismatch handling: (1) filter out warnings to weak functions. Weak functions definition will be overridden by a strong definition by linker. The hash mismatch in profile use compilation is expected. Make the profile hash mismatch warning under the existing option (default true). (2) add an option to trace the hash of functions with the specific string. Note that an empty string parameter will trace all functions. Differential Revision: https://reviews.llvm.org/D129002	2022-07-15 13:44:55 -07:00
Philip Reames	6ab686eb86	[LSR] Allow already invariant operand for ICmpZero matching [try 2] Changes since initial commit: * Wrapping a pointer in an SCEV unknown hides the base, and SCEV is only able to compute a subtraction when the bases are known to be equal. This results in a SCEVCouldNotCompute flowing forward and triggering asserts. Test case added in `d767b392`. * isLoopInvariant returns true for instructions outside the loop, but not necessarily above the loop. Since this code is allowed to visit uses of an IV outside of a loop, we have to make sure the operands of the compare are both invariant and dominating the header. Test case added in `2aed3cdb`. Original commit message follows... The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already are loop invariant. As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former? Differential Revision: https://reviews.llvm.org/D129793	2022-07-15 13:29:43 -07:00
Warren Ristow	c650793049	[Reassociate] Enable FP reassociation via 'reassoc' and 'nsz' Compiling with '-ffast-math' tuns on all the FastMathFlags (FMF), as expected, and that enables FP reassociation. Only the two FMF flags 'reassoc' and 'nsz' are technically required to perform reassociation, but disabling other unrelated FMF bits is needlessly suppressing the optimization. This patch fixes that needless suppression, and makes appropriate adjustments to test-cases, fixing some outstanding TODOs in the process. Fixes: #56483 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D129523	2022-07-15 11:44:35 -07:00
Philip Reames	6fe766beba	Revert "[LSR] Allow already invariant operand for ICmpZero matching" This reverts commit `9153515a7b`. Builtbot crash was reported in the commit thread, reverting while investigating.	2022-07-15 10:47:57 -07:00
Florian Hahn	aa00fb02c9	[LV] Use umax(VF * UF, MinProfTC) for scalable vectors. For scalable vectors, it is not sufficient to only check MinProfitableTripCount if it is >= VF.getKnownMinValue() * UF, because this property may not holder for larger values of vscale. In those cases, compute umax(VF * UF, MinProfTC) instead. This should fix https://lab.llvm.org/buildbot/#/builders/197/builds/2262	2022-07-15 10:23:14 -07:00
Philip Reames	9153515a7b	[LSR] Allow already invariant operand for ICmpZero matching The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already are loop invariant. As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former? Differential Revision: https://reviews.llvm.org/D129793	2022-07-15 09:51:00 -07:00
Nikita Popov	8a519b3c21	[InstCombine] Ensure constant folding in binop of select fold When folding a binop into a select, we need to ensure that one of the select arms actually does constant fold, otherwise we'll create two binop instructions and perform the reverse transform. Ensure this by performing an explicit constant folding attempt, and failing the transform if neither side simplifies. A simple alternative here would have been to limit the fold to ImmConstants, but given the current representation of scalable vector splats, this wouldn't be ideal.	2022-07-15 11:03:10 +02:00
Mel Chen	bd404fbcc8	[LV][NFC] Fix the condition for printing debug messages Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D128523	2022-07-15 01:47:33 -07:00
Nikita Popov	f75ccadcdd	[LSR] Create SCEVExpander earlier, use member isSafeToExpand() (NFC) This is a followup to D129630, which switches LSR to the member isSafeToExpand() variant, and removes the freestanding function. This is done by creating the SCEVExpander early (already during the analysis phase). Because the SCEVExpander is now available for the whole lifetime of LSRInstance, I've also made it into a member variable, rather than passing it around in even more places. Differential Revision: https://reviews.llvm.org/D129769	2022-07-15 09:41:23 +02:00
Craig Topper	0e718443c7	[SimplifyIndVar] Use enum class for ExtendKind. NFC I happened to notice a two places where the enum was being pass directly to the bool IsSigned argument of createExtendInst. This was functionally ok since SignExtended in the enum has value of 1, but the code shouldn't rely on that. Using an enum class prevents the enum from being convertible to bool, but does make writing the enum values more verbose. Since we now have to write ExtendKind:: in front of them, I've shortened the names of ZeroExtended and SignExtended. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129733	2022-07-14 10:03:58 -07:00
Philip Reames	3bc09c7da5	[SCEVExpander] Allow udiv with isKnownNonZero(RHS) + add vscale case Motivation here is to unblock LSRs ability to use ICmpZero uses - the major effect of which is to enable count down IVs. The test changes reflect this goal, but the potential impact is much broader since this isn't a change in LSR at all. SCEVExpander needs() to prove that expanding the expression is safe anywhere the SCEV expression is valid. In general, we can't expand any node which might fault (or exhibit UB) unless we can either a) prove it won't fault, or b) guard the faulting case. We'd been allowing non-zero constants here; this change extends it to non-zero values. vscale is never zero. This is already implemented in ValueTracking, and this change just adds the same logic in SCEV's range computation (which in turn drives isKnownNonZero). We should common up some logic here, but let's do that in separate changes. () As an aside, "needs" is such an interesting word here. First, we don't actually need to guard this at all; we could choose to emit a select for the RHS of ever udiv and remove this code entirely. Secondly, the property being checked here is way too strong. What the client actually needs is to expand the SCEV at some particular point in some particular loop. In the examples, the original urem dominates that loop and yet we completely ignore that information when analyzing legality. I don't plan to actively pursue either direction, just noting it for future reference. Differential Revision: https://reviews.llvm.org/D129710	2022-07-14 08:56:58 -07:00
Brendon Cahoon	58fec78231	Revert "[UnifyLoopExits] Reduce number of guard blocks" This reverts commit `e13248ab0e`. Need to revert because the transformation cannot occur for basic blocks that contain convergent instructions.	2022-07-14 10:33:52 -05:00
Warren Ristow	230c8c56f2	[Reassociate] Cleanup minor missed optimizations In analyzing issue #56483, it was noticed that running `opt` with `-reassociate` was missing some minor optimizations. For example, there were cases where the running `opt` on IR with floating-point instructions that have the `fast` flags applied, sometimes resulted in less efficient code than the input IR (things like dead instructions left behind, and missed reassociations). These were sometimes noted in the test-files with TODOs, to investigate further. This commit fixes some of these problems, removing some TODOs in the process. FTR, I refer to these as "minor" missed optimizations, because when running a full clang/llvm compilation, these inefficiencies are not happening, as other passes clean that residue up. Regardless, having cleaner IR produced by `opt`, makes assessing the quality of fixes done in `opt` easier.	2022-07-14 08:21:04 -07:00
Brendon Cahoon	c945d88d2b	Revert "[StructurizeCFG] Improve basic block ordering" This reverts commit `f1b05a0a2b`. Need to revert to due to issues identified with testing. The transformation is incorrect for blocks that contain convergent instructions.	2022-07-14 09:40:51 -05:00
Nikita Popov	9e6e631b38	[LoopPredication] Use isSafeToExpandAt() member function (NFC) As a followup to D129630, this switches a usage of the freestanding function in LoopPredication to use the member variant instead. This was the last use of the freestanding function, so drop it entirely.	2022-07-14 14:49:07 +02:00
Nikita Popov	dcf4b733ef	[SCEVExpander] Make CanonicalMode handing in isSafeToExpand() more robust (PR50506) isSafeToExpand() for addrecs depends on whether the SCEVExpander will be used in CanonicalMode. At least one caller currently gets this wrong, resulting in PR50506. Fix this by a) making the CanonicalMode argument on the freestanding functions required and b) adding member functions on SCEVExpander that automatically take the SCEVExpander mode into account. We can use the latter variant nearly everywhere, and thus make sure that there is no chance of CanonicalMode mismatch. Fixes https://github.com/llvm/llvm-project/issues/50506. Differential Revision: https://reviews.llvm.org/D129630	2022-07-14 14:41:51 +02:00
zhongyunde	fc6092fd4d	[IndVars] Eliminate redundant type cast between unsigned integer and float Extend for unsigned integer according the comment of D129191. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129358	2022-07-14 19:41:07 +08:00
Nikita Popov	7a43b382ce	[IndVars] Make sure header phi simplification preserves LCSSA form When simplifying instructions, make sure that the replacement preserves LCSSA form. This fixes the issue reported at: https://reviews.llvm.org/D129293#3650851	2022-07-14 11:46:48 +02:00
Nikita Popov	ebc54e0cd4	[SCCP] Make check for unknown/undef in unary op handling more explicit (NFCI) Make the implementation more similar to other functions, by explicitly skipping an unknown/undef first, and always falling back to overdefined at the end. I don't think it makes a difference now, but could make one once the constant evaluation can fail. In that case we would directly mark the result as overdefined now, rather than keeping it unknown (and later making it overdefined because we think it's undef-based).	2022-07-14 10:56:11 +02:00
Nikita Popov	6db3edc858	[SCCP] Don't check for UndefValue before calling markConstant() The value lattice explicitly represents undef, and markConstant() internally checks for UndefValue and will create an undef rather than constant lattice element in that case. This is mostly a code simplification, it has little practical impact because we usually get undef results from undef operands, and those don't get processed. Only leave the check behind for the CmpInst case, because it currently goes through this incorrect code in the getCompare() implementation: `f98697642c/llvm/include/llvm/Analysis/ValueLattice.h (L456-L457)` Differential Revision: https://reviews.llvm.org/D128330	2022-07-14 10:05:56 +02:00
Kazu Hirata	611ffcf4e4	[llvm] Use value instead of getValue (NFC)	2022-07-13 23:11:56 -07:00
Florian Hahn	ee37ae91b6	[VPlan] Move VPBB verification to separate function (NFC).	2022-07-13 18:53:40 -07:00
Florian Hahn	6f7347b888	[LV] Use PredRecipe directly instead of getOrAddVPValue (NFC). There is no need to look up the VPValue for Instr, PredRecipe can be used directly.	2022-07-13 17:01:42 -07:00
Alexander Shaposhnikov	c916840539	[SimplifyCFG] Improve SwitchToLookupTable optimization Try to use the original value as an index (in the lookup table) in more cases (to avoid one subtraction and shorten the dependency chain) (https://github.com/llvm/llvm-project/issues/56189). Test plan: 1/ ninja check-all 2/ bootstrapped LLVM + Clang pass tests Differential revision: https://reviews.llvm.org/D128897	2022-07-13 23:21:45 +00:00
Leonard Chan	21f72c05c4	[hwasan] Add __hwasan_add_frame_record to the hwasan interface Hwasan includes instructions in the prologue that mix the PC and SP and store it into the stack ring buffer stored at __hwasan_tls. This is a thread_local global exposed from the hwasan runtime. However, if TLS-mechanisms or the hwasan runtime haven't been setup yet, it will be invalid to access __hwasan_tls. This is the case for Fuchsia where we instrument libc, so some functions that are instrumented but can run before hwasan initialization will incorrectly access this global. Additionally, libc cannot have any TLS variables, so we cannot weakly define __hwasan_tls until the runtime is loaded. A way we can work around this is by moving the instructions into a hwasan function that does the store into the ring buffer and creating a weak definition of that function locally in libc. This way __hwasan_tls will not actually be referenced. This is not our long-term solution, but this will allow us to roll out hwasan in the meantime. This patch includes: - A new llvm flag for choosing to emit a libcall rather than instructions in the prologue (off by default) - The libcall for storing into the ringbuffer (__hwasan_add_frame_record) Differential Revision: https://reviews.llvm.org/D128387	2022-07-13 15:15:15 -07:00
Leonard Chan	d843d5c8e6	Revert "[hwasan] Add __hwasan_record_frame_record to the hwasan interface" This reverts commit `4956620387`. This broke a sanitizer builder: https://lab.llvm.org/buildbot/#/builders/77/builds/19597	2022-07-13 15:06:07 -07:00
Florian Hahn	225e3ec622	[LV] Move VPBranchOnMaskRecipe::execute to VPlanRecipes.cpp (NFC).	2022-07-13 14:39:59 -07:00
leonardchan	4956620387	[hwasan] Add __hwasan_record_frame_record to the hwasan interface Hwasan includes instructions in the prologue that mix the PC and SP and store it into the stack ring buffer stored at __hwasan_tls. This is a thread_local global exposed from the hwasan runtime. However, if TLS-mechanisms or the hwasan runtime haven't been setup yet, it will be invalid to access __hwasan_tls. This is the case for Fuchsia where we instrument libc, so some functions that are instrumented but can run before hwasan initialization will incorrectly access this global. Additionally, libc cannot have any TLS variables, so we cannot weakly define __hwasan_tls until the runtime is loaded. A way we can work around this is by moving the instructions into a hwasan function that does the store into the ring buffer and creating a weak definition of that function locally in libc. This way __hwasan_tls will not actually be referenced. This is not our long-term solution, but this will allow us to roll out hwasan in the meantime. This patch includes: - A new llvm flag for choosing to emit a libcall rather than instructions in the prologue (off by default) - The libcall for storing into the ringbuffer (__hwasan_record_frame_record) Differential Revision: https://reviews.llvm.org/D128387	2022-07-14 05:07:11 +08:00
Martin Sebor	ab7ee3c991	[InstCombine] Enable strtol folding with nonnull endptr Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129593	2022-07-13 09:26:34 -06:00
Nikita Popov	07146a9e64	[SCCP] Fix typo in previous commit Ooops, I tested a build from the wrong checkout.	2022-07-13 16:22:40 +02:00
Nikita Popov	e298dfbc1b	[SCCP] Avoid ConstantExpr::get() call Use ConstantFoldUnaryOpOperand() API instead. This is in preparation for removing fneg constant expressions.	2022-07-13 16:20:34 +02:00
Max Kazantsev	62f4572e45	[IndVars][NFC] Make IVOperand parameter an instruction	2022-07-13 19:07:16 +07:00
Max Kazantsev	30e33b4b81	[SCEV][NFC] Make getStrengthenedNoWrapFlagsFromBinOp return optional	2022-07-13 18:54:25 +07:00
David Sherwood	307ace7f20	[LoopVectorize] Ensure the VPReductionRecipe is placed after all it's inputs When vectorising ordered reductions we call a function LoopVectorizationPlanner::adjustRecipesForReductions to replace the existing VPWidenRecipe for the fadd instruction with a new VPReductionRecipe. We attempt to insert the new recipe in the same place, but this is wrong because createBlockInMask may have generated new recipes that VPReductionRecipe now depends upon. I have changed the insertion code to append the recipe to the VPBasicBlock instead. Added a new RUN with tail-folding enabled to the existing test: Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll Differential Revision: https://reviews.llvm.org/D129550	2022-07-13 09:29:25 +01:00
Nikita Popov	af49bed933	[IndVars] Simplify instructions after replacing header phi with preheader value After replacing a loop phi with the preheader value, it's usually possible to simplify some of the using instructions, so do that as part of replaceLoopPHINodesWithPreheaderValues(). Doing this as part of IndVars is valuable, because it may make GEPs in the loop have constant offsets and allow the following SROA run to succeed (as demonstrated in the PhaseOrdering test). Differential Revision: https://reviews.llvm.org/D129293	2022-07-13 10:27:04 +02:00
Nikita Popov	a5ee62a141	[IndVars] Call replaceLoopPHINodesWithPreheaderValues() for already constant exits Currently we only call replaceLoopPHINodesWithPreheaderValues() if optimizeLoopExits() replaces the exit with an unconditional exit. However, it is very common that this already happens as part of eliminateIVComparison(), in which case we're leaving behind the dead header phi. Tweak the early bailout for already-constant exits to also call replaceLoopPHINodesWithPreheaderValues(). Differential Revision: https://reviews.llvm.org/D129214	2022-07-13 09:43:21 +02:00
Augie Fackler	9029bda041	[Attributor] Don't crash if getAnalysisResultForFunction() returns null LoopInfo I have no idea what's going on here. This code was moved around/introduced in change `cb26b01d57` and starts crashing with a NULL dereference once I apply https://reviews.llvm.org/D123090. I assume that I've unwittingly taught the attributor enough that it's able to do more clever things than in the past, and it's able to trip on this case. I make no claims about the correctness of this patch, but it passes tests and seems to fix all the crashes I've been seeing. Differential Revision: https://reviews.llvm.org/D129589	2022-07-12 16:44:06 -04:00
Yuanfang Chen	fcb7d76d65	[coroutine] add nomerge function attribute to `llvm.coro.save` It is illegal to merge two `llvm.coro.save` calls unless their `llvm.coro.suspend` users are also merged. Marks it "nomerge" for the moment. This reverts D129025. Alternative to D129025, which affects other token type users like WinEH. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D129530	2022-07-12 10:39:38 -07:00
Nick Desaulniers	2240d72f15	[X86] initial -mfunction-return=thunk-extern support Adds support for: * `-mfunction-return=<value>` command line flag, and * `__attribute__((function_return("<value>")))` function attribute Where the supported <value>s are: * keep (disable) * thunk-extern (enable) thunk-extern enables clang to change ret instructions into jmps to an external symbol named __x86_return_thunk, implemented as a new MachineFunctionPass named "x86-return-thunks", keyed off the new IR attribute fn_ret_thunk_extern. The symbol __x86_return_thunk is expected to be provided by the runtime the compiled code is linked against and is not defined by the compiler. Enabling this option alone doesn't provide mitigations without corresponding definitions of __x86_return_thunk! This new MachineFunctionPass is very similar to "x86-lvi-ret". The <value>s "thunk" and "thunk-inline" are currently unsupported. It's not clear yet that they are necessary: whether the thunk pattern they would emit is beneficial or used anywhere. Should the <value>s "thunk" and "thunk-inline" become necessary, x86-return-thunks could probably be merged into x86-retpoline-thunks which has pre-existing machinery for emitting thunks (which could be used to implement the <value> "thunk"). Has been found to build+boot with corresponding Linux kernel patches. This helps the Linux kernel mitigate RETBLEED. * CVE-2022-23816 * CVE-2022-28693 * CVE-2022-29901 See also: * "RETBLEED: Arbitrary Speculative Code Execution with Return Instructions." * AMD SECURITY NOTICE AMD-SN-1037: AMD CPU Branch Type Confusion * TECHNICAL GUIDANCE FOR MITIGATING BRANCH TYPE CONFUSION REVISION 1.0 2022-07-12 * Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 SystemZ may eventually want to support "thunk-extern" and "thunk"; both options are used by the Linux kernel's CONFIG_EXPOLINE. This functionality has been available in GCC since the 8.1 release, and was backported to the 7.3 release. Many thanks for folks that provided discrete review off list due to the embargoed nature of this hardware vulnerability. Many Bothans died to bring us this information. Link: https://www.youtube.com/watch?v=IF6HbCKQHK8 Link: https://github.com/llvm/llvm-project/issues/54404 Link: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01197.html Link: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html Link: https://arstechnica.com/information-technology/2022/07/intel-and-amd-cpus-vulnerable-to-a-new-speculative-execution-attack/?comments=1 Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce114c866860aa9eae3f50974efc68241186ba60 Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00702.html Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00707.html Reviewed By: aaron.ballman, craig.topper Differential Revision: https://reviews.llvm.org/D129572	2022-07-12 09:17:54 -07:00
David Sherwood	6b694d600a	[LoopVectorize] Change PredicatedBBsAfterVectorization to be per VF When calculating the cost of Instruction::Br in getInstructionCost we query PredicatedBBsAfterVectorization to see if there is a scalar predicated block. However, this meant that the decisions being made for a given fixed-width VF were affecting the cost for a scalable VF. As a result we were returning InstructionCost::Invalid pointlessly for a scalable VF that should have a low cost. I encountered this for some loops when enabling tail-folding for scalable VFs. Test added here: Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll Differential Revision: https://reviews.llvm.org/D128272	2022-07-12 14:53:20 +01:00
Nikita Popov	3d475dfeb9	[Mem2Reg] Consistently preserve nonnull assume for uninit load When performing a !nonnull load from uninitialized memory, we should preserve the nonnull assume just like in all other cases. We already do this correctly in the generic mem2reg code, but don't handle this case when using the optimized single-block implementation. Make sure that the optimized implementation exhibits the same behavior as the generic implementation.	2022-07-12 12:53:08 +02:00
Kazu Hirata	ec9a0e36d9	[IPO] Remove addLTOOptimizationPasses and addLateLTOOptimizationPasses (NFC) The last uses were removed on Apr 15, 2022 in commit `2e6ac54cf4`. Differential Revision: https://reviews.llvm.org/D129460	2022-07-11 20:15:24 -07:00
Florian Hahn	5d135041c5	[LV] Move VPBlendRecipe::execute to VPlanRecipes.cpp (NFC).	2022-07-11 16:01:07 -07:00
Justin Cady	3d438ceed1	[InstrProf] Mark __llvm_profile_runtime hidden to match libclang_rt.profile definition Mark the symbol hidden to match INSTR_PROF_PROFILE_RUNTIME_VAR in compiler-rt. Fixes second issue discussed at https://discourse.llvm.org/t/63090 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D128842	2022-07-11 11:29:20 -07:00
David Sherwood	03fee6712a	[LoopVectorize] Add option to use active lane mask for loop control flow Currently, for vectorised loops that use the get.active.lane.mask intrinsic we only use the mask for predicated vector operations, such as masked loads and stores, etc. The loop itself is still controlled by comparing the canonical induction variable with the trip count. However, for some targets this is inefficient when it's cheap to use the mask itself to control the loop. This patch adds support for using the active lane mask for control flow by: 1. Generating the active lane mask for the next iteration of the vector loop, rather than the current one. If there are still any remaining iterations then at least the first bit of the mask will be set. 2. Extract the first bit of this mask and use this bit for the conditional branch. I did this by creating a new VPActiveLaneMaskPHIRecipe that sets up the initial PHI values in the vector loop pre-header. I've also made use of the new BranchOnCond VPInstruction for the final instruction in the loop region. Differential Revision: https://reviews.llvm.org/D125301	2022-07-11 13:46:55 +01:00
David Sherwood	02d6950d84	[LoopVectorize][NFC] Add optional Name parameter to VPInstruction This patch is a simple piece of refactoring that now permits users to create VPInstructions and specify the name of the value being generated. This is useful for creating more readable/meaningful names in IR. Differential Revision: https://reviews.llvm.org/D128982	2022-07-11 09:23:24 +01:00
Florian Hahn	6a4bc452f8	[LV] Move VPWidenGEPRecipe::execute to VPlanRecipes.cpp (NFC).	2022-07-10 17:10:17 -07:00
Florian Hahn	13ae213469	[LV] Move VPWidenRecipe::execute to VPlanRecipes.cpp (NFC).	2022-07-09 18:46:57 -07:00
Paul Osmialowski	b17754bcaa	[SimplifyLibCalls] refactor pow(x, n) expansion where n is a constant integer value Since the backend's codegen is capable to expand powi into fmul's, it is not needed anymore to do so in the ::optimizePow() function of SimplifyLibCalls.cpp. What is sufficient is to always turn pow(x, n) into powi(x, n) for the cases where n is a constant integer value. Dropping the current expansion code allowed relaxation of the folding conditions and now this can also happen at optimization levels below Ofast. The added CodeGen/AArch64/powi.ll test case ensures that powi is actually expanded into fmul's, confirming that this refactor did not cause any performance degradation. Following an idea proposed by David Sherwood <david.sherwood@arm.com>. Differential Revision: https://reviews.llvm.org/D128591	2022-07-09 12:00:22 -04:00
Florian Hahn	0c27b38849	[VPlan] Move VPWidenSelectRecipe::execute to VPlanRecipes.cpp (NFC). Depends on D127968. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D127970	2022-07-08 09:35:23 -07:00
Nikita Popov	d287051404	[InstCombine] Avoid ConstantExpr::get() in vector binop fold (NFCI) Use the ConstantFoldBinaryOpOperands() API instead. This case would bail out on a non-folded result anyway.	2022-07-08 17:20:14 +02:00
Nikita Popov	29c6bf45c3	[InstCombine] Avoid ConstantExpr::get() call Avoid calling ConstantExpr::get() for associative/commutative binops, call ConstantFoldBinaryOpOperands() instead. We only want to perform the reassociation of the constants actually fold.	2022-07-08 17:13:06 +02:00
Nikita Popov	fc18a88231	[InstCombine] Avoid creating float binop ConstantExprs Replace ConstantExpr:getFAdd etc with call to ConstantFoldBinaryOpOperands(). I'm using the constant folding API rather than IRBuilder here to ensure that this does actually constant fold. These transforms don't use m_ImmConstant(), so this would not otherwise be guaranteed (and apparently, they can't use m_ImmConstant because they want to handle scalable vector splats). There is an opportunity here to further migrate these to the ConstantFoldFPInstOperands() API, which would respect the denormal mode. I've held off on doing so here, because some of this code explicitly checks for denormal results, and I don't want to touch it in a mostly NFC change.	2022-07-08 16:36:04 +02:00

1 2 3 4 5 ...

31124 Commits