llvm-project

Commit Graph

Author	SHA1	Message	Date
Francis Visoiu Mistrih	448a094d3e	[Matrix] Add assert to catch extracted vectors with poison elements Assert when the extracted vector is wider than the row/column. Differential Revision: https://reviews.llvm.org/D130173	2022-07-26 11:07:02 -07:00
Francis Visoiu Mistrih	2c6e8b4636	[Matrix] Refactor tiled loops in a struct. NFC The three loops have the same structure: index, header, latch.	2022-07-26 11:02:22 -07:00
Stefan Gränitz	1e30820483	[WinEH] Apply funclet operand bundles to nounwind intrinsics that lower to function calls in the course of IR transforms WinEHPrepare marks any function call from EH funclets as unreachable, if it's not a nounwind intrinsic or has no proper funclet bundle operand. This affects ARC intrinsics on Windows, because they are lowered to regular function calls in the PreISelIntrinsicLowering pass. It caused silent binary truncations and crashes during unwinding with the GNUstep ObjC runtime: https://github.com/gnustep/libobjc2/issues/222 This patch adds a new function `llvm::IntrinsicInst::mayLowerToFunctionCall()` that aims to collect all affected intrinsic IDs. * Clang CodeGen uses it to determine whether or not it must emit a funclet bundle operand. * PreISelIntrinsicLowering asserts that the function returns true for all ObjC runtime calls it lowers. * LLVM uses it to determine whether or not a funclet bundle operand must be propagated to inlined call sites. Reviewed By: theraven Differential Revision: https://reviews.llvm.org/D128190	2022-07-26 17:52:43 +02:00
Arthur Eubanks	2eade1dba4	[WPD] Use new llvm.public.type.test intrinsic for potentially publicly visible classes Turning on opaque pointers has uncovered an issue with WPD where we currently pattern match away `assume(type.test)` in WPD so that a later LTT doesn't resolve the type test to undef and introduce an `assume(false)`. The pattern matching can fail in cases where we transform two `assume(type.test)`s into `assume(phi(type.test.1, type.test.2))`. Currently we create `assume(type.test)` for all virtual calls that might be devirtualized. This is to support `-Wl,--lto-whole-program-visibility`. To prevent this, all virtual calls that may not be in the same LTO module instead use a new `llvm.public.type.test` intrinsic in place of the `llvm.type.test`. Then when we know if `-Wl,--lto-whole-program-visibility` is passed or not, we can either replace all `llvm.public.type.test` with `llvm.type.test`, or replace all `llvm.public.type.test` with `true`. This prevents WPD from trying to pattern match away `assume(type.test)` for public virtual calls when failing the pattern matching will result in miscompiles. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D128955	2022-07-26 08:01:08 -07:00
Phoebe Wang	19c5638e4f	[ArgPromotion] Transfer metadata nontemporal to promoted loads Fixes #56703 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D130536	2022-07-26 16:30:08 +08:00
Kazu Hirata	3f3930a451	Remove redundaunt virtual specifiers (NFC) Identified with tidy-modernize-use-override.	2022-07-25 23:00:59 -07:00
zhongyunde	d485c1b73e	[LoopDataPrefetch] Fix crash when TTI doesn't set CacheLineSize Fix https://github.com/llvm/llvm-project/issues/56681 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130418	2022-07-26 13:08:42 +08:00
Joseph Huber	d61d72dae6	[OpenMP] Remove noinline attributes in the device runtime We previously used the `noinline` attributes to specify some defintions which should be kept alive in the runtime. These were then stripped immediately in the OpenMPOpt module pass. However, Since the changes in D130298, we not explicitly state which functions will have external visiblity in the bitcode library. Additionally the OpenMPOpt module pass should run before the inliner pass, so this shouldn't make a difference in whether or not the functions will be alive for the initial pass of OpenMPOpt. This should simplify the interface, and additionally save time spend on scanning funciton names for noinline. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D130368	2022-07-25 15:44:50 -04:00
Warren Ristow	3bbd380a5b	[Reassociate][NFC] Use an appropriate dyn_cast for BinaryOperator In D129523, it was noted that there is are some questionable naked casts from Instruction to BinaryOperator, which could be addressed by doing a dyn_cast directly to BinaryOperator, avoiding the need for the later cast. This cleans up that casting. Reviewed By: nikic, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D130448	2022-07-25 10:24:43 -07:00
Kazu Hirata	95a932fb15	Remove redundaunt override specifiers (NFC) Identified with modernize-use-override.	2022-07-24 22:28:11 -07:00
Kazu Hirata	b5188591a0	[llvm] Remove redundaunt virtual specifiers (NFC) Identified with modernize-use-override.	2022-07-24 21:50:35 -07:00
Warren Ristow	3089b411a4	[Reassociate][NFC] Consistent checking for FastMathFlags suitability In D129523, it was noted that the approach to check whether a value can have FastMathFlags was done in different ways, and they should be made consistent. This patch makes minor changes to fix that. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D130408	2022-07-24 17:44:30 -07:00
Kazu Hirata	acf648b5e9	Use llvm::less_first and llvm::less_second (NFC)	2022-07-24 16:21:29 -07:00
Kazu Hirata	8ac2d06195	[IPO] Use range-based for loops (NFC)	2022-07-24 14:48:06 -07:00
Kazu Hirata	3736a498d4	[IPO] Use std::array for AccessKind2Accesses (NFC) Switching to std:array allow us to use fill. While I am at it, this patch also converts one for loop to a range-based one.	2022-07-23 15:47:53 -07:00
Fangrui Song	7225213c0a	[LegacyPM] Remove {,PostInline}EntryExitInstrumenterPass Following recent changes removing non-core features of the legacy PM/optimization pipeline.	2022-07-23 15:30:15 -07:00
Nuno Lopes	9df0b254d2	[NFC] Switch a few uses of undef to poison as placeholders for unreachable code	2022-07-23 21:50:11 +01:00
Kazu Hirata	2d2e2e7ea9	[Vectorize] Remove isConsecutiveLoadOrStore (NFC) The last use was removed on Jan 4, 2022 in commit `95a93722db`.	2022-07-23 13:01:14 -07:00
Johannes Doerfert	6b7eae11f1	[Attributor][FIX] HasBeenWrittenTo logic should only be used for reads If we look at a write, we should not enact the "has been written to" logic introduced to avoid spurious write -> read dependences. Doing so lead to elimination of stores we needed, which is obviously bad.	2022-07-22 23:57:57 -05:00
Alexander Shaposhnikov	2ebfda2417	[InstCombine] Improve folding of mul + icmp This diff adds folds for patterns like X * A < B where A, B are constants and "mul" has either "nsw" or "nuw". (to address https://github.com/llvm/llvm-project/issues/56563). Test plan: 1/ ninja check-llvm check-clang 2/ Bootstrapped LLVM/Clang pass tests Differential revision: https://reviews.llvm.org/D130039	2022-07-22 22:08:53 +00:00
Sanjay Patel	08091a99ae	Revert "[InstCombine] enhance fold for subtract-from-constant -> xor" This reverts commit `79bb915fb6`. This caused regressions because SCEV works better with sub.	2022-07-22 15:56:24 -04:00
Philip Reames	b5c7213647	[LV] Use early return to simplify code structure	2022-07-22 12:15:14 -07:00
Mircea Trofin	7b81a81d5f	[NFC] FunctionSamples::getEntrySamples -> getHeadSamplesEstimate The name `getEntrySamples` was misleading for 2 reasons. One, it's close in name to `Function::getEntryCount`, but the equivalent here is `getHeadSamples`; second, as opposed to the other get* APIs in `FunctionSamples`, it performs an estimate/heuristic rather than just retrieving raw data (or a non-heuristic derivate off that data, like `getMaxCountInside`) The new name should more clearly communicate its intent; and, being close (in name) to `getHeadSamples`, it should allow the reader discover the relation between them. Also updated the doc comments for both `getHeadSamples[Estimate]` so a reader may better understand the relation between them. Differential Revision: https://reviews.llvm.org/D130281	2022-07-22 09:17:59 -07:00
Benjamin Kramer	5a445395e4	[LV] Remove unused variable. NFC.	2022-07-22 17:43:58 +02:00
Philip Reames	d7bf81fd51	[LV] Rework widening cost of uniform memory ops for clarity [nfc] Reorganize the code to make it clear what is and isn't handle, and why. Restructure bailout to remove (false and confusing) dependence on CM_Scalarize; just return invalid cost and propagate, that's what it is for.	2022-07-22 08:35:45 -07:00
Joseph Huber	3d0ab8638b	[Internalize] Support glob patterns for API lists The internalize pass supports an option to provide a list of symbols that should not be internalized. THis is useful retaining certain defintions that should be kept alive. However, this interface is somewhat difficult to use as it requires knowing every single symbol's name and specifying it. Many APIs provide common prefixes for the symbols exported by the library, so it would make sense to be able to match these using a simple glob pattern. This patch changes the handling from a simple string comparison to a glob pattern match. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D130319	2022-07-22 08:24:32 -04:00
Johannes Doerfert	a50b9f9f1f	[Attributor][FIX] Handle non-recursive but re-entrant functions properly If a function is non-recursive we only performed intra-procedural reasoning for reachability (via AA::isPotentiallyReachable). However, if it is re-entrant that doesn't mean we can't reach. Instead of this problematic logic in the reachability reasoning we utilize logic in AAPointerInfo. If a location is for sure written by a function it can be re-entrant or recursive we know only intra-procedural reasoning is sufficient.	2022-07-22 00:00:56 -05:00
Max Kazantsev	a40af8589e	[RS4GC] Handle special cases in unreachable code for memcpy/memmov The existing code doesn't expect dummy values (undef, poison, null-derived constants etc) as arguments of these intrinsics. However, they can be there in unreached code. Currently we fail trying to find base for them. Handle these cases separately. Return null as base for them to be consistent with the handling in the main algorithm in findBaseDefiningValue. Differential Revision: https://reviews.llvm.org/D129561 Reviewed By: apilipenko	2022-07-22 11:30:43 +07:00
Johannes Doerfert	62f7888d6d	[Attributor] Dominating must-write accesses allow unknown initial values If we have a dominating must-write access we do not need to know the initial value of some object to perform reasoning about the potential values. The dominating must-write has overwritten the initial value.	2022-07-21 23:08:43 -05:00
Johannes Doerfert	c72d93a08a	[Attributor][NFC] Remove unnecessary overwritten methods	2022-07-21 21:57:02 -05:00
Chenbing Zheng	1a0187c9e7	[InstCombine] remove useless ‘InstCombiner::’. nfc Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130220	2022-07-22 09:24:24 +08:00
Philip Reames	bd75350180	[LV] Fix a conceptual mistake around meaning of uniform in isPredicatedInst This code confuses LV's "Uniform" and LVL/LAI's "Uniform". Despite the common name, these are different. * LVs notion means that only the first lane of each unrolled part is required. That is, lanes within a single unroll factor are considered uniform. This allows e.g. widenable memory ops to be considered uses of uniform computations. * LVL and LAI's notion refers to all lanes across all unrollings. IsUniformMem is in turn defined in terms of LAI's notion. Thus a UniformMemOpmeans is a memory operation with a loop invariant address. This means the same address is accessed in every iteration. The tweaked piece of code was trying to match a uniform mem op (i.e. fully loop invariant address), but instead checked for LV's notion of uniformity. In theory, this meant with UF > 1, we could speculate a load which wasn't safe to execute. This ends up being mostly silent in current code as it is nearly impossible to create the case where this difference is visible. The closest I've come in the test case from 54cb87, but even then, the incorrect result is only visible in the vplan debug output; before this change we sink the unsafely speculated load back into the user's predicate blocks before emitting IR. Both before and after IR are correct so the differences aren't "interesting". The other test changes are uninteresting. They're cases where LV's uniform analysis is slightly weaker than SCEV isLoopInvariant.	2022-07-21 15:44:34 -07:00
Alexander Shaposhnikov	e9afdf838e	[GlobalOpt] Enable evaluation of atomic loads Relax the check to allow evaluation of atomic loads (but still skip volatile loads). Test plan: 1/ ninja check-llvm check-clang 2/ Bootstrapped LLVM/Clang pass tests Differential revision: https://reviews.llvm.org/D130211	2022-07-21 21:36:11 +00:00
Augie Fackler	bd6aa67e02	BuildLibCalls: move inference of freeing memory later This probably should have been part of D123089, but the effects of it don't show up until we start removing functions from the table in D130107. Oops. Differential Revision: https://reviews.llvm.org/D130184	2022-07-21 15:31:16 -04:00
Sanjay Patel	78c09f0f24	[PatternMatch][InstCombine] match a vector with constant expression element(s) as a constant expression The InstCombine test is reduced from issue #56601. Without the more liberal match for ConstantExpr, we try to rearrange constants in Negator forever. Alternatively, we could adjust the definition of m_ImmConstant to be more conservative, but that's probably a larger patch, and I don't see any downside to changing m_ConstantExpr. We never capture and modify a ConstantExpr; transforms just want to avoid it. Differential Revision: https://reviews.llvm.org/D130286	2022-07-21 15:23:57 -04:00
David Sherwood	f15b6b2907	[AArch64] Add target hook for preferPredicateOverEpilogue This patch adds the AArch64 hook for preferPredicateOverEpilogue, which currently returns true if SVE is enabled and one of the following conditions (non-exhaustive) is met: 1. The "sve-tail-folding" option is set to "all", or 2. The "sve-tail-folding" option is set to "all+noreductions" and the loop does not contain reductions, 3. The "sve-tail-folding" option is set to "all+norecurrences" and the loop has no first-order recurrences. Currently the default option is "disabled", but this will be changed in a later patch. I've added new tests to show the options behave as expected here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D129560	2022-07-21 17:20:06 +01:00
Nikita Popov	1f69503107	[MemoryBuiltins] Add getReallocatedOperand() function (NFC) Replace the value-accepting isReallocLikeFn() overload with a getReallocatedOperand() function, which returns which operand is the one being reallocated. Currently, this is always the first one, but once allockind(realloc) is respected, the reallocated operand will be determined by the allocptr parameter attribute.	2022-07-21 14:54:16 +02:00
Nikita Popov	46e6dd84b7	[MemoryBuiltins] Remove isFreeCall() function (NFC) Remove isFreeCall() in favor of getFreedOperand(). Replace the two remaining uses with a getFreedOperand() != nullptr check, as they only care that something is getting freed. (The usage in DSE is correct as such. The allocator-related checks in CFLGraph look rather questionable in general.)	2022-07-21 14:44:23 +02:00
Nikita Popov	5e856a8578	[InstCombine] Use getFreedOperand() (NFC) Use getFreedOperand() instead of isFreeCall() to remove the implicit assumption that any pointer operand to a free function is the operand being freed. This won't actually matter until we handle allockind(free).	2022-07-21 14:33:55 +02:00
Nikita Popov	3ac8587a2b	[Attributor] Use getFreedOperand() (NFC) Track which operand is actually freed, to avoid the implicit assumption that it is the first call argument.	2022-07-21 14:26:47 +02:00
Nikita Popov	c81dff3c30	[MemoryBuiltins] Add getFreedOperand() function (NFCI) We currently assume in a number of places that free-like functions free their first argument. This is true for all hardcoded free-like functions, but with the new attribute-based design, the freed argument is supposed to be indicated by the allocptr attribute. To make sure we handle this correctly once allockind(free) is respected, add a getFreedOperand() helper which returns the freed argument, rather than just indicating whether the call frees some argument. This migrates most but not all users of isFreeCall() to the new API. The remaining users are a bit more tricky.	2022-07-21 12:39:35 +02:00
Nikita Popov	8d58c8e57b	Reapply [InstCombine] Don't check for alloc fn before fetching alloc size Reapply the patch with getObjectSize() replaced by getAllocSize(). The former will also look through calls that return their argument, and we'll end up placing dereferenceable attributes on intrinsics like llvm.launder.invariant.group. While this isn't wrong, it also doesn't seem to be particularly useful. For now, use getAllocSize() instead, which sticks closer to the original behavior of this code. ----- This code is just interested in the allocsize, not any other allocator properties.	2022-07-21 11:48:24 +02:00
Nikita Popov	70056d04e2	Revert "[InstCombine] Don't check for alloc fn before fetching object size" This reverts commit `c72c22c04d`. This affected an Analysis test that I missed. Reverting for now.	2022-07-21 10:59:12 +02:00
Nikita Popov	c72c22c04d	[InstCombine] Don't check for alloc fn before fetching object size This code is just interested in the allocsize, not any other allocator properties.	2022-07-21 10:45:03 +02:00
Nikita Popov	f45ab43332	[MemoryBuiltins] Avoid isAllocationFn() call before checking removable alloc Alloc directly checking whether a given call is a removable allocation, instead of first checking whether it is an allocation first.	2022-07-21 09:39:19 +02:00
Chenbing Zheng	8c124c9088	[InstCombine] (ShiftValC >> Y) >s -1/<s 0 --> Y != 0/==0 We can do folds (ShiftValC >> Y) >s -1 --> Y != 0 and (ShiftValC >> Y) <s 0 --> Y == 0, with ShiftValC < 0. Alive2: https://alive2.llvm.org/ce/z/-PRHfD Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D129726	2022-07-21 10:12:29 +08:00
Chenbing Zheng	8075f680c8	[InstCombine] add fold (X > C - 1) ^ (X < C + 1) --> X != C Considering the correctness of this pattern, we should avoid that C - 1 is non-negative and C + 1 is negative. Alive2: https://alive2.llvm.org/ce/z/c_rBaq Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D129622	2022-07-21 10:08:21 +08:00
Johannes Doerfert	ad98ef8be4	[Attributor] Deal with complex PHI nodes better during AAPointerInfo We were quite conservative when it came to PHI node handling to avoid recursive reasoning. Now we check more direct if we have seen a PHI already or not. This allows non-recursive PHI chains to be handled. This also exposed a bug as we did only model the effect of one loop traversal. `phi_no_store_3` has been adapted to show how we would have used `undef` instead of `1` before. With this patch we don't replace it at all, which is expected as we do not argue about loop iterations (or alignments).	2022-07-20 17:34:50 -05:00
Johannes Doerfert	142897dd7d	[Attributor] Only non-exact accesses require a uniform bit-pattern (=0) If we only have exact accesses we should never require the bit-pattern to be uniform (in this case 0). Only a non-exact access should force us to require only 0 values.	2022-07-20 17:34:50 -05:00
Alexander Shaposhnikov	67f1fe8597	[GlobalOpt] Enable evaluation of atomic stores Relax the check to allow evaluation of atomic stores (but still skip volatile stores). Test plan: 1/ ninja check-llvm check-clang 2/ Bootstrapped LLVM/Clang pass tests Differential revision: https://reviews.llvm.org/D129841	2022-07-20 22:33:58 +00:00
Schrodinger ZHU Yifan	304027206c	[ThinLTO] Support aliased GlobalIFunc Fixes https://github.com/llvm/llvm-project/issues/56290: when an ifunc is aliased in LTO, clang will attempt to create an alias summary; however, as ifunc is not included in the module summary, doing so will lead to crash. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D129009	2022-07-20 15:30:38 -07:00
Craig Topper	d76c8f5127	[InstCombine] Add mul with negated power of 2 constant to canEvaluateShifted. If we are right shifting a multiply by a negated power of 2 where the power of 2 is the same as the shift amount, we can replace with a negate followed by an And. New tests have not been committed yet but the patch shows the diffs. Let me know if you want any changes or additional tests. Differential Revision: https://reviews.llvm.org/D130103	2022-07-20 11:00:22 -07:00
Ruobing Han	2b98b8e8fb	fix bug for useless malloc elimination in CodeGenPrepare Put AllocationFn check before I->willReturn can allow CodeGenPrepare to remove useless malloc instruction Differential Revision: https://reviews.llvm.org/D130126	2022-07-20 16:29:51 +00:00
Philip Reames	523a526a02	[LV] Fix miscompile due to srem/sdiv speculation safety condition An srem or sdiv has two cases which can cause undefined behavior, not just one. The existing code did not account for this, and as a result, we miscompiled when we encountered e.g. a srem i64 %v, -1 in a conditional block. Instead of hand rolling the logic, just use the utility function which exists exactly for this purpose. Differential Revision: https://reviews.llvm.org/D130106	2022-07-20 05:35:23 -07:00
Nicolai Hähnle	1ddc51d89d	Inliner: don't mark call sites as 'nounwind' if that would be redundant When F calls G calls H, G is nounwind, and G is inlined into F, then the inlined call-site to H should be effectively nounwind so as not to lose information during inlining. If H itself is nounwind (which often happens when H is an intrinsic), we no longer mark the callsite explicitly as nounwind. Previously, there were cases where the inlined call-site of H differs from a pre-existing call-site of H in F only in the explicitly added nounwind attribute, thus preventing common subexpression elimination. v2: - just check CI->doesNotThrow v3 (resubmit after revert at `3443788087`): - update Clang tests Differential Revision: https://reviews.llvm.org/D129860	2022-07-20 14:17:23 +02:00
Florian Hahn	5124b21648	[VPlan] Initial def-use verification. This patch introduces some initial def-use verification. This catches cases like the one fixed by D129436. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D129717	2022-07-20 11:06:32 +01:00
Fangrui Song	e931c2e870	[LegacyPM] Remove InstrOrderFileLegacyPass Following recent changes removing non-core features of the legacy PM/optimization pipeline.	2022-07-19 23:58:51 -07:00
Kazu Hirata	0387da6f4f	Use value instead of getValue (NFC)	2022-07-19 21:18:26 -07:00
Kazu Hirata	41ae78ea3a	Use has_value instead of hasValue (NFC)	2022-07-19 20:15:44 -07:00
Johannes Doerfert	f84712f0b8	[Attributor] Teach checkForAllUses to follow returns into callers If we can determine all call sites we can follow a use in a return instruction into the caller. AAPointerInfo utilizes this feature.	2022-07-19 18:17:40 -05:00
Johannes Doerfert	4f2ccdd0b1	[Attributor][NFC] Improve debug messages	2022-07-19 18:17:40 -05:00
Nick Desaulniers	1cf6b93df1	Revert "[Local] Allow creating callbr with duplicate successors" This reverts commit `08860f525a`. Crashes during PPC64LE linux kernel builds as reported by @nathanchance. https://reviews.llvm.org/D129997#3663632	2022-07-19 15:03:27 -07:00
Johannes Doerfert	bf789b1957	[Attributor] Replace AAValueSimplify with AAPotentialValues For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many reasons: - We recomputed the result a lot as there was no caching for the 9 locations calling `genericValueTraversal`. - We added the idea of "intra" vs. "inter" procedural simplification only as an afterthought. `genericValueTraversal` did offer an option but `AAValueSimplify` did not. Thus, we might end up with "too much" simplification in certain situations and then gave up on it. - Because `genericValueTraversal` was not a real `AA` we ended up with problems like the infinite recursion bug (#54981) as well as code duplication. This patch introduces `AAPotentialValues` and replaces the `AAValueSimplify` uses with it. `genericValueTraversal` is folded into `AAPotentialValues` as are the instruction simplifications performed in `AAValueSimplify` before. We further distinguish "intra" and "inter" procedural simplification now. `AAValueSimplify` was not deleted as we haven't ported the re-materialization of instructions yet. There are other differences over the former handling, e.g., we may not fold trivially foldable instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2` but if an operand would be simplified to `i32 1` we would fold it still. We are also even more aware of function/SCC boundaries in CGSCC passes, which is good even if some tests look like they regress. Fixes: https://github.com/llvm/llvm-project/issues/54981 Note: A previous version was flawed and consequently reverted in `6555558a80`.	2022-07-19 16:24:42 -05:00
Arthur Eubanks	13aa2c1c3b	[DSE] Revisit pointers that may no longer escape after removing another store In dependent-capture, previously we'd see that %tmp4 is captured due to the first store. We'd cache this info in CapturedBeforeReturn and InvisibleToCallerAfterRet. Then the first store is then removed, causing the cached values to be wrong. We also need to revisit everything because normally we work backwards when removing stores at the end of the function, but in this case removing an earlier store causes a later store to be removable. No compile time impact: https://llvm-compile-time-tracker.com/compare.php?from=56796ae1a8db4c85dada28676f8303a5a3609c63&to=21b7e5248ffc423cd36c9d4a020085e363451465&stat=instructions Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D123686	2022-07-19 09:30:34 -07:00
Sanjay Patel	3d6c10dcf3	[SimplifyLibCalls] avoid converting pow() to powi() with no FMF powi() is not a standard math library function; it is specified with non-strict semantics in the LangRef. We currently require 'afn' to do this transform when it needs a sqrt(), so I just extended that requirement to the whole-number exponent too. This bug was introduced with: `b17754bcaa` ...where we deferred expansion of pow() to later passes.	2022-07-19 12:26:53 -04:00
Arnold Schwaighofer	bc4870f09e	[coro async] Add missing llvm.coro.id.async intrinsic to declaresCoroCleanupIntrinsics rdar://97214593 Differential Revision: https://reviews.llvm.org/D130038	2022-07-19 07:25:04 -07:00
Andrew Turner	b850762b62	Add the FreeBSD AArch64 memory layout Use the FreeBSD AArch64 memory layout values when building for it. These are based on the x86_64 values, scaled to take into account the larger address space on AArch64. Reviewed by: vitalybuka Differential Revision: https://reviews.llvm.org/D125883	2022-07-19 09:58:07 -04:00
Andrew Turner	e13bd2644e	Add the FreeBSD AArch64 shadow offset to llvm AArch64 has a larger address space than 64 but x86. Use the larger shadow offset on FreeBSD AArch64. Reviewed by: vitalybuka Differential Revision: https://reviews.llvm.org/D125873	2022-07-19 09:58:07 -04:00
William Schmidt	bccc9aa81c	Don't vectorize PHIs in catchswitch blocks We currently assert in vectorizeTree(TreeEntry*) when processing a PHI bundle in a block containing a catchswitch. We attempt to set the IRBuilder insertion point following the catchswitch, which is invalid. This is done so that ShuffleBuilder.finalize() knows where to insert a shuffle if one is needed. To avoid this occurring, watch out for catchswitch blocks during buildTree_rec() processing, and avoid adding PHIs in such blocks to the vectorizable tree. It is unlikely that constraining vectorization over an exception path will cause a noticeable performance loss, so this seems preferable to trying to anticipate when a shuffle will and will not be required.	2022-07-19 06:10:17 -07:00
Nikita Popov	08860f525a	[Local] Allow creating callbr with duplicate successors Since D129288, callbr is allowed to have duplicate successors. This patch removes a limitation which prevents optimizations from actually producing such callbrs. Differential Revision: https://reviews.llvm.org/D129997	2022-07-19 14:28:22 +02:00
Florian Hahn	a75760a269	[LV] Remove unnecessary cast in widenCallInstruction. (NFC)	2022-07-19 11:23:24 +01:00
Max Kazantsev	82309831c3	[LoopSimplifyCFG] Prevent use-def dominance breach by handling dead exits. PR56243 One of the transforms in LoopSimplifyCFG demands that the LCSSA form is truly maintained for all values, tokens included, otherwise it may end up creating a use that is not dominated by def (and Phi creation for tokens is impossible). Detect this situation and prevent transform for it early. Differential Revision: https://reviews.llvm.org/D129984 Reviewed By: efriedma	2022-07-19 15:54:12 +07:00
Ellis Hoag	3580daacf3	[InstrProf] Allow CSIRPGO function entry coverage The flag `-fcs-profile-generate` for enabling CSIRPGO moves the pass `pgo-instrumentation` after inlining. Function entry coverage works fine with this change, so remove the assert. I had originally left this assert in because I had not tested this at the time. Reviewed By: davidxl, MaskRay Differential Revision: https://reviews.llvm.org/D129407	2022-07-18 15:10:11 -07:00
Florian Hahn	30e53b8c03	[LV] Sink module variable and use State to set it in widenCall. (NFC) Limits the lifetime of the variable and makes it independent of CallInst.	2022-07-18 19:41:48 +01:00
Arnold Schwaighofer	28ebd13d63	[coro async] Fix code to run coro.async.end cleanup like the legacy pass did The code executed for the Switch ABI does not change. rdar://97074714 Differential Revision: https://reviews.llvm.org/D129865	2022-07-18 10:41:29 -07:00
Nicolai Hähnle	3443788087	Revert "Inliner: don't mark call sites as 'nounwind' if that would be redundant" This reverts commit `9905c37981`. Looks like there are Clang changes that are affected in trivial ways. Will look into it.	2022-07-18 17:43:35 +02:00
Nicolai Hähnle	9905c37981	Inliner: don't mark call sites as 'nounwind' if that would be redundant When F calls G calls H, G is nounwind, and G is inlined into F, then the inlined call-site to H should be effectively nounwind so as not to lose information during inlining. If H itself is nounwind (which often happens when H is an intrinsic), we no longer mark the callsite explicitly as nounwind. Previously, there were cases where the inlined call-site of H differs from a pre-existing call-site of H in F only in the explicitly added nounwind attribute, thus preventing common subexpression elimination. v2: - just check CI->doesNotThrow Differential Revision: https://reviews.llvm.org/D129860	2022-07-18 17:28:52 +02:00
Sanjay Patel	26fbb79c33	[InstCombine] reduce code for signbit folds; NFC	2022-07-18 11:04:58 -04:00
Nikita Popov	21e2f133a8	[LoopSimplifyCFG] Revert accidental change This change was included in an unrelated change `b57d61384c` and was of course not intended for commit...	2022-07-18 15:30:13 +02:00
Nikita Popov	b57d61384c	[ConstantRangeTest] Move nowrap binop tests to generic infrastructure (NFC) Move testing for add/sub with nowrap flags to TestBinaryOpExhaustive, rather than separate homegrown exhaustive testing functions.	2022-07-18 15:14:17 +02:00
Kristina Bessonova	44736c1d49	[CloneFunction][DebugInfo] Avoid cloning DILexicalBlocks of inlined subprograms If DISubpogram was not cloned (e.g. we are cloning a function that has other functions inlined into it, and subprograms of the inlined functions are not supposed to be cloned), it doesn't make sense to clone its DILexicalBlocks as well. Otherwise we'll get duplicated DILexicalBlocks that may confuse debug info emission in AsmPrinter. I believe it also makes no sense cloning any DILocalVariables or maybe other local entities, if their parent subprogram was not cloned, cause they will be dangling and will not participate in futher emission. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D127102	2022-07-18 13:14:52 +02:00
Nikita Popov	8201e3ef5c	[BasicBlockUtils] Don't drop callbr with unique successor As callbr is now allowed to have duplicate destinations, we can have a callbr with a unique successor. Make sure it doesn't get dropped, as we still need to preserve the side-effect.	2022-07-18 12:26:29 +02:00
Nikita Popov	4fba35f973	[InstCombine] Clarify invoke/callbr handling in constexpr call fold (NFCI) We only need to check the block for the normal/default destination, not for other destinations. Using the value in those would be illegal anyway. The callbr case cannot actually happen here, because callbr is currently limited to inline asm. Retaining it to match the spirit of the original code.	2022-07-18 12:02:46 +02:00
Florian Hahn	105032f549	[LV] Use PHI recipe instead of PredRecipe for subsequent uses. At the moment, the VPPRedInstPHIRecipe is not used in subsequent uses of the predicate recipe. This incorrectly models the def-use chains, as all later uses should use the phi recipe. Fix that by delaying recording of the recipe. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D129436	2022-07-18 09:35:34 +01:00
Nikita Popov	11079e8820	[IR] Don't treat callbr as indirect terminator Callbr is no longer an indirect terminator in the sense that is relevant here (that it's successors cannot be updated). The primary effect of this change is that callbr no longer prevents formation of loop simplify form. I decided to drop the isIndirectTerminator() method entirely and replace it with isa<IndirectBrInst>() checks. I assume this method was added to abstract over indirectbr and callbr, but it never really caught on, and there is nothing left to abstract anymore at this point. Differential Revision: https://reviews.llvm.org/D129849	2022-07-18 09:32:08 +02:00
Fangrui Song	0e3447bf8a	[LegacyPM] Remove WholeProgramDevirt Unused after LTO removal from legacy optimization passline.	2022-07-17 23:14:53 -07:00
Fangrui Song	1f90cc589e	[LegacyPM] Remove FunctionImportLegacyPass Unused after ThinLTO was removed from legacy optimization pipeline.	2022-07-17 23:06:46 -07:00
Kazu Hirata	7094ab4ee7	[llvm] Modernize bool literals (NFC) Identified with modernize-use-bool-literals.	2022-07-17 18:08:51 -07:00
Kazu Hirata	3112987d5c	Remove unused forward declarations (NFC)	2022-07-17 15:37:48 -07:00
Kazu Hirata	8b3ed1fa98	Remove redundant return statements (NFC) Identified with readability-redundant-control-flow.	2022-07-17 15:37:46 -07:00
Fangrui Song	bbaa015e82	[LegacyPM] Remove LowerTypeTestsPass Unused after LTO removal from optimization passline.	2022-07-17 15:06:38 -07:00
Fangrui Song	a6942256ca	[LegacyPM] Remove NameAnonGlobalLegacyPass Unused after LTO removal from optimization passline.	2022-07-17 14:38:29 -07:00
Fangrui Song	d74b88c69d	[LegacyPM] Remove CanonicalizeAliasesLegacyPass Unused after LTO removal from optimization passline.	2022-07-17 14:30:22 -07:00
Fangrui Song	70519a1fba	[LegacyPM] Remove LTO passes from optimization pipeline Following recent changes removing non-core features of the legacy PM/optimization pipeline.	2022-07-17 14:24:36 -07:00
Fangrui Song	f502115561	[LegacyPM] Remove PGO options from PassManagerBuilder They have been dead since legacy PGO/SamplePGO passes were removed.	2022-07-17 14:03:23 -07:00
Fangrui Song	dd5e3f0e27	[LegacyPM] Remove SampleProfileLoaderLegacyPass Following recent changes removing non-core features of the legacy PM/optimization pipeline (e.g. PGO), remove SamplePGO.	2022-07-17 12:09:46 -07:00
Florian Hahn	cc0ee17951	[LV] Move VPPredInstPHIRecipe::execute to VPlanRecipes.cpp (NFC)	2022-07-17 11:34:23 +01:00
zhongyunde	3a6b766b1b	[IndVars] Directly use unsigned integer induction for FPToUI/FPToSI of float induction Depend on D129358 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129756	2022-07-17 10:48:35 +08:00
Florian Hahn	6813b41d57	[LV] Avoid creating new run-time VF expression for each runtime checks. At the moment, the cost of runtime checks for scalable vectors is overestimated due to creating separate vscale * VF expressions for each check. Instead re-use the first expression.	2022-07-16 17:24:07 +01:00
David Green	4b7913c357	[VectorCombine] Only consider shuffle uses with the same type. The backend getShuffleCosts do not currently handle shuffles that change size very well. Limit the shuffles we collect to the same type to make sure they do not cause issues as reported in D128732.	2022-07-16 13:23:39 +01:00
Fangrui Song	f9d6f37201	[LegacyPM] Remove ControlHeightReductionLegacyPass This pass tries to reduce the number of conditional branches in the hot path based on profile. It's mostly a no-op after legacy PGO passes are moved.	2022-07-16 01:35:56 -07:00
Fangrui Song	3a42c499c2	[LegacyPM] Remove createInstrProfilingLegacyPass Follow the steps of removing non-core instrumentation passes like PGO.	2022-07-16 01:26:40 -07:00
Fangrui Song	685775bbab	[LegacyPM] Remove CGProfileLegacyPass It's mostly a no-op after I removed legacy PGO passes in D123834.	2022-07-16 00:39:56 -07:00
Fangrui Song	df8f5be596	[LegacyPM] Remove ModuleSanitizerCoverageLegacyPass Follow the steps of various other legacy instrumentation passes removed for 15.0.0.	2022-07-15 19:01:20 -07:00
Rong Xu	5e0443292b	[PGO] Report number of counts being dropped when a hash-mismatch happens This patch reports number of counts being dropped when a hash-mismatch happens. This information will be helpful to the users -- if the dropped counts are large, the user should redo the instrumentation build and recollect the profile. Differential Revision: https://reviews.llvm.org/D129001	2022-07-15 14:53:59 -07:00
Rong Xu	19ac75364f	[PGO] Improve hash-mismatch warning message This patch improves FDO hash-mismatch handling: (1) filter out warnings to weak functions. Weak functions definition will be overridden by a strong definition by linker. The hash mismatch in profile use compilation is expected. Make the profile hash mismatch warning under the existing option (default true). (2) add an option to trace the hash of functions with the specific string. Note that an empty string parameter will trace all functions. Differential Revision: https://reviews.llvm.org/D129002	2022-07-15 13:44:55 -07:00
Philip Reames	6ab686eb86	[LSR] Allow already invariant operand for ICmpZero matching [try 2] Changes since initial commit: * Wrapping a pointer in an SCEV unknown hides the base, and SCEV is only able to compute a subtraction when the bases are known to be equal. This results in a SCEVCouldNotCompute flowing forward and triggering asserts. Test case added in `d767b392`. * isLoopInvariant returns true for instructions outside the loop, but not necessarily above the loop. Since this code is allowed to visit uses of an IV outside of a loop, we have to make sure the operands of the compare are both invariant and dominating the header. Test case added in `2aed3cdb`. Original commit message follows... The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already are loop invariant. As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former? Differential Revision: https://reviews.llvm.org/D129793	2022-07-15 13:29:43 -07:00
Warren Ristow	c650793049	[Reassociate] Enable FP reassociation via 'reassoc' and 'nsz' Compiling with '-ffast-math' tuns on all the FastMathFlags (FMF), as expected, and that enables FP reassociation. Only the two FMF flags 'reassoc' and 'nsz' are technically required to perform reassociation, but disabling other unrelated FMF bits is needlessly suppressing the optimization. This patch fixes that needless suppression, and makes appropriate adjustments to test-cases, fixing some outstanding TODOs in the process. Fixes: #56483 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D129523	2022-07-15 11:44:35 -07:00
Philip Reames	6fe766beba	Revert "[LSR] Allow already invariant operand for ICmpZero matching" This reverts commit `9153515a7b`. Builtbot crash was reported in the commit thread, reverting while investigating.	2022-07-15 10:47:57 -07:00
Florian Hahn	aa00fb02c9	[LV] Use umax(VF * UF, MinProfTC) for scalable vectors. For scalable vectors, it is not sufficient to only check MinProfitableTripCount if it is >= VF.getKnownMinValue() * UF, because this property may not holder for larger values of vscale. In those cases, compute umax(VF * UF, MinProfTC) instead. This should fix https://lab.llvm.org/buildbot/#/builders/197/builds/2262	2022-07-15 10:23:14 -07:00
Philip Reames	9153515a7b	[LSR] Allow already invariant operand for ICmpZero matching The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already are loop invariant. As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former? Differential Revision: https://reviews.llvm.org/D129793	2022-07-15 09:51:00 -07:00
Nikita Popov	8a519b3c21	[InstCombine] Ensure constant folding in binop of select fold When folding a binop into a select, we need to ensure that one of the select arms actually does constant fold, otherwise we'll create two binop instructions and perform the reverse transform. Ensure this by performing an explicit constant folding attempt, and failing the transform if neither side simplifies. A simple alternative here would have been to limit the fold to ImmConstants, but given the current representation of scalable vector splats, this wouldn't be ideal.	2022-07-15 11:03:10 +02:00
Mel Chen	bd404fbcc8	[LV][NFC] Fix the condition for printing debug messages Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D128523	2022-07-15 01:47:33 -07:00
Nikita Popov	f75ccadcdd	[LSR] Create SCEVExpander earlier, use member isSafeToExpand() (NFC) This is a followup to D129630, which switches LSR to the member isSafeToExpand() variant, and removes the freestanding function. This is done by creating the SCEVExpander early (already during the analysis phase). Because the SCEVExpander is now available for the whole lifetime of LSRInstance, I've also made it into a member variable, rather than passing it around in even more places. Differential Revision: https://reviews.llvm.org/D129769	2022-07-15 09:41:23 +02:00
Craig Topper	0e718443c7	[SimplifyIndVar] Use enum class for ExtendKind. NFC I happened to notice a two places where the enum was being pass directly to the bool IsSigned argument of createExtendInst. This was functionally ok since SignExtended in the enum has value of 1, but the code shouldn't rely on that. Using an enum class prevents the enum from being convertible to bool, but does make writing the enum values more verbose. Since we now have to write ExtendKind:: in front of them, I've shortened the names of ZeroExtended and SignExtended. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129733	2022-07-14 10:03:58 -07:00
Philip Reames	3bc09c7da5	[SCEVExpander] Allow udiv with isKnownNonZero(RHS) + add vscale case Motivation here is to unblock LSRs ability to use ICmpZero uses - the major effect of which is to enable count down IVs. The test changes reflect this goal, but the potential impact is much broader since this isn't a change in LSR at all. SCEVExpander needs() to prove that expanding the expression is safe anywhere the SCEV expression is valid. In general, we can't expand any node which might fault (or exhibit UB) unless we can either a) prove it won't fault, or b) guard the faulting case. We'd been allowing non-zero constants here; this change extends it to non-zero values. vscale is never zero. This is already implemented in ValueTracking, and this change just adds the same logic in SCEV's range computation (which in turn drives isKnownNonZero). We should common up some logic here, but let's do that in separate changes. () As an aside, "needs" is such an interesting word here. First, we don't actually need to guard this at all; we could choose to emit a select for the RHS of ever udiv and remove this code entirely. Secondly, the property being checked here is way too strong. What the client actually needs is to expand the SCEV at some particular point in some particular loop. In the examples, the original urem dominates that loop and yet we completely ignore that information when analyzing legality. I don't plan to actively pursue either direction, just noting it for future reference. Differential Revision: https://reviews.llvm.org/D129710	2022-07-14 08:56:58 -07:00
Brendon Cahoon	58fec78231	Revert "[UnifyLoopExits] Reduce number of guard blocks" This reverts commit `e13248ab0e`. Need to revert because the transformation cannot occur for basic blocks that contain convergent instructions.	2022-07-14 10:33:52 -05:00
Warren Ristow	230c8c56f2	[Reassociate] Cleanup minor missed optimizations In analyzing issue #56483, it was noticed that running `opt` with `-reassociate` was missing some minor optimizations. For example, there were cases where the running `opt` on IR with floating-point instructions that have the `fast` flags applied, sometimes resulted in less efficient code than the input IR (things like dead instructions left behind, and missed reassociations). These were sometimes noted in the test-files with TODOs, to investigate further. This commit fixes some of these problems, removing some TODOs in the process. FTR, I refer to these as "minor" missed optimizations, because when running a full clang/llvm compilation, these inefficiencies are not happening, as other passes clean that residue up. Regardless, having cleaner IR produced by `opt`, makes assessing the quality of fixes done in `opt` easier.	2022-07-14 08:21:04 -07:00
Brendon Cahoon	c945d88d2b	Revert "[StructurizeCFG] Improve basic block ordering" This reverts commit `f1b05a0a2b`. Need to revert to due to issues identified with testing. The transformation is incorrect for blocks that contain convergent instructions.	2022-07-14 09:40:51 -05:00
Nikita Popov	9e6e631b38	[LoopPredication] Use isSafeToExpandAt() member function (NFC) As a followup to D129630, this switches a usage of the freestanding function in LoopPredication to use the member variant instead. This was the last use of the freestanding function, so drop it entirely.	2022-07-14 14:49:07 +02:00
Nikita Popov	dcf4b733ef	[SCEVExpander] Make CanonicalMode handing in isSafeToExpand() more robust (PR50506) isSafeToExpand() for addrecs depends on whether the SCEVExpander will be used in CanonicalMode. At least one caller currently gets this wrong, resulting in PR50506. Fix this by a) making the CanonicalMode argument on the freestanding functions required and b) adding member functions on SCEVExpander that automatically take the SCEVExpander mode into account. We can use the latter variant nearly everywhere, and thus make sure that there is no chance of CanonicalMode mismatch. Fixes https://github.com/llvm/llvm-project/issues/50506. Differential Revision: https://reviews.llvm.org/D129630	2022-07-14 14:41:51 +02:00
zhongyunde	fc6092fd4d	[IndVars] Eliminate redundant type cast between unsigned integer and float Extend for unsigned integer according the comment of D129191. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129358	2022-07-14 19:41:07 +08:00
Nikita Popov	7a43b382ce	[IndVars] Make sure header phi simplification preserves LCSSA form When simplifying instructions, make sure that the replacement preserves LCSSA form. This fixes the issue reported at: https://reviews.llvm.org/D129293#3650851	2022-07-14 11:46:48 +02:00
Nikita Popov	ebc54e0cd4	[SCCP] Make check for unknown/undef in unary op handling more explicit (NFCI) Make the implementation more similar to other functions, by explicitly skipping an unknown/undef first, and always falling back to overdefined at the end. I don't think it makes a difference now, but could make one once the constant evaluation can fail. In that case we would directly mark the result as overdefined now, rather than keeping it unknown (and later making it overdefined because we think it's undef-based).	2022-07-14 10:56:11 +02:00
Nikita Popov	6db3edc858	[SCCP] Don't check for UndefValue before calling markConstant() The value lattice explicitly represents undef, and markConstant() internally checks for UndefValue and will create an undef rather than constant lattice element in that case. This is mostly a code simplification, it has little practical impact because we usually get undef results from undef operands, and those don't get processed. Only leave the check behind for the CmpInst case, because it currently goes through this incorrect code in the getCompare() implementation: `f98697642c/llvm/include/llvm/Analysis/ValueLattice.h (L456-L457)` Differential Revision: https://reviews.llvm.org/D128330	2022-07-14 10:05:56 +02:00
Kazu Hirata	611ffcf4e4	[llvm] Use value instead of getValue (NFC)	2022-07-13 23:11:56 -07:00
Florian Hahn	ee37ae91b6	[VPlan] Move VPBB verification to separate function (NFC).	2022-07-13 18:53:40 -07:00
Florian Hahn	6f7347b888	[LV] Use PredRecipe directly instead of getOrAddVPValue (NFC). There is no need to look up the VPValue for Instr, PredRecipe can be used directly.	2022-07-13 17:01:42 -07:00
Alexander Shaposhnikov	c916840539	[SimplifyCFG] Improve SwitchToLookupTable optimization Try to use the original value as an index (in the lookup table) in more cases (to avoid one subtraction and shorten the dependency chain) (https://github.com/llvm/llvm-project/issues/56189). Test plan: 1/ ninja check-all 2/ bootstrapped LLVM + Clang pass tests Differential revision: https://reviews.llvm.org/D128897	2022-07-13 23:21:45 +00:00
Leonard Chan	21f72c05c4	[hwasan] Add __hwasan_add_frame_record to the hwasan interface Hwasan includes instructions in the prologue that mix the PC and SP and store it into the stack ring buffer stored at __hwasan_tls. This is a thread_local global exposed from the hwasan runtime. However, if TLS-mechanisms or the hwasan runtime haven't been setup yet, it will be invalid to access __hwasan_tls. This is the case for Fuchsia where we instrument libc, so some functions that are instrumented but can run before hwasan initialization will incorrectly access this global. Additionally, libc cannot have any TLS variables, so we cannot weakly define __hwasan_tls until the runtime is loaded. A way we can work around this is by moving the instructions into a hwasan function that does the store into the ring buffer and creating a weak definition of that function locally in libc. This way __hwasan_tls will not actually be referenced. This is not our long-term solution, but this will allow us to roll out hwasan in the meantime. This patch includes: - A new llvm flag for choosing to emit a libcall rather than instructions in the prologue (off by default) - The libcall for storing into the ringbuffer (__hwasan_add_frame_record) Differential Revision: https://reviews.llvm.org/D128387	2022-07-13 15:15:15 -07:00
Leonard Chan	d843d5c8e6	Revert "[hwasan] Add __hwasan_record_frame_record to the hwasan interface" This reverts commit `4956620387`. This broke a sanitizer builder: https://lab.llvm.org/buildbot/#/builders/77/builds/19597	2022-07-13 15:06:07 -07:00
Florian Hahn	225e3ec622	[LV] Move VPBranchOnMaskRecipe::execute to VPlanRecipes.cpp (NFC).	2022-07-13 14:39:59 -07:00
leonardchan	4956620387	[hwasan] Add __hwasan_record_frame_record to the hwasan interface Hwasan includes instructions in the prologue that mix the PC and SP and store it into the stack ring buffer stored at __hwasan_tls. This is a thread_local global exposed from the hwasan runtime. However, if TLS-mechanisms or the hwasan runtime haven't been setup yet, it will be invalid to access __hwasan_tls. This is the case for Fuchsia where we instrument libc, so some functions that are instrumented but can run before hwasan initialization will incorrectly access this global. Additionally, libc cannot have any TLS variables, so we cannot weakly define __hwasan_tls until the runtime is loaded. A way we can work around this is by moving the instructions into a hwasan function that does the store into the ring buffer and creating a weak definition of that function locally in libc. This way __hwasan_tls will not actually be referenced. This is not our long-term solution, but this will allow us to roll out hwasan in the meantime. This patch includes: - A new llvm flag for choosing to emit a libcall rather than instructions in the prologue (off by default) - The libcall for storing into the ringbuffer (__hwasan_record_frame_record) Differential Revision: https://reviews.llvm.org/D128387	2022-07-14 05:07:11 +08:00
Martin Sebor	ab7ee3c991	[InstCombine] Enable strtol folding with nonnull endptr Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129593	2022-07-13 09:26:34 -06:00
Nikita Popov	07146a9e64	[SCCP] Fix typo in previous commit Ooops, I tested a build from the wrong checkout.	2022-07-13 16:22:40 +02:00
Nikita Popov	e298dfbc1b	[SCCP] Avoid ConstantExpr::get() call Use ConstantFoldUnaryOpOperand() API instead. This is in preparation for removing fneg constant expressions.	2022-07-13 16:20:34 +02:00
Max Kazantsev	62f4572e45	[IndVars][NFC] Make IVOperand parameter an instruction	2022-07-13 19:07:16 +07:00
Max Kazantsev	30e33b4b81	[SCEV][NFC] Make getStrengthenedNoWrapFlagsFromBinOp return optional	2022-07-13 18:54:25 +07:00
David Sherwood	307ace7f20	[LoopVectorize] Ensure the VPReductionRecipe is placed after all it's inputs When vectorising ordered reductions we call a function LoopVectorizationPlanner::adjustRecipesForReductions to replace the existing VPWidenRecipe for the fadd instruction with a new VPReductionRecipe. We attempt to insert the new recipe in the same place, but this is wrong because createBlockInMask may have generated new recipes that VPReductionRecipe now depends upon. I have changed the insertion code to append the recipe to the VPBasicBlock instead. Added a new RUN with tail-folding enabled to the existing test: Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll Differential Revision: https://reviews.llvm.org/D129550	2022-07-13 09:29:25 +01:00
Nikita Popov	af49bed933	[IndVars] Simplify instructions after replacing header phi with preheader value After replacing a loop phi with the preheader value, it's usually possible to simplify some of the using instructions, so do that as part of replaceLoopPHINodesWithPreheaderValues(). Doing this as part of IndVars is valuable, because it may make GEPs in the loop have constant offsets and allow the following SROA run to succeed (as demonstrated in the PhaseOrdering test). Differential Revision: https://reviews.llvm.org/D129293	2022-07-13 10:27:04 +02:00
Nikita Popov	a5ee62a141	[IndVars] Call replaceLoopPHINodesWithPreheaderValues() for already constant exits Currently we only call replaceLoopPHINodesWithPreheaderValues() if optimizeLoopExits() replaces the exit with an unconditional exit. However, it is very common that this already happens as part of eliminateIVComparison(), in which case we're leaving behind the dead header phi. Tweak the early bailout for already-constant exits to also call replaceLoopPHINodesWithPreheaderValues(). Differential Revision: https://reviews.llvm.org/D129214	2022-07-13 09:43:21 +02:00
Augie Fackler	9029bda041	[Attributor] Don't crash if getAnalysisResultForFunction() returns null LoopInfo I have no idea what's going on here. This code was moved around/introduced in change `cb26b01d57` and starts crashing with a NULL dereference once I apply https://reviews.llvm.org/D123090. I assume that I've unwittingly taught the attributor enough that it's able to do more clever things than in the past, and it's able to trip on this case. I make no claims about the correctness of this patch, but it passes tests and seems to fix all the crashes I've been seeing. Differential Revision: https://reviews.llvm.org/D129589	2022-07-12 16:44:06 -04:00
Yuanfang Chen	fcb7d76d65	[coroutine] add nomerge function attribute to `llvm.coro.save` It is illegal to merge two `llvm.coro.save` calls unless their `llvm.coro.suspend` users are also merged. Marks it "nomerge" for the moment. This reverts D129025. Alternative to D129025, which affects other token type users like WinEH. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D129530	2022-07-12 10:39:38 -07:00
Nick Desaulniers	2240d72f15	[X86] initial -mfunction-return=thunk-extern support Adds support for: * `-mfunction-return=<value>` command line flag, and * `__attribute__((function_return("<value>")))` function attribute Where the supported <value>s are: * keep (disable) * thunk-extern (enable) thunk-extern enables clang to change ret instructions into jmps to an external symbol named __x86_return_thunk, implemented as a new MachineFunctionPass named "x86-return-thunks", keyed off the new IR attribute fn_ret_thunk_extern. The symbol __x86_return_thunk is expected to be provided by the runtime the compiled code is linked against and is not defined by the compiler. Enabling this option alone doesn't provide mitigations without corresponding definitions of __x86_return_thunk! This new MachineFunctionPass is very similar to "x86-lvi-ret". The <value>s "thunk" and "thunk-inline" are currently unsupported. It's not clear yet that they are necessary: whether the thunk pattern they would emit is beneficial or used anywhere. Should the <value>s "thunk" and "thunk-inline" become necessary, x86-return-thunks could probably be merged into x86-retpoline-thunks which has pre-existing machinery for emitting thunks (which could be used to implement the <value> "thunk"). Has been found to build+boot with corresponding Linux kernel patches. This helps the Linux kernel mitigate RETBLEED. * CVE-2022-23816 * CVE-2022-28693 * CVE-2022-29901 See also: * "RETBLEED: Arbitrary Speculative Code Execution with Return Instructions." * AMD SECURITY NOTICE AMD-SN-1037: AMD CPU Branch Type Confusion * TECHNICAL GUIDANCE FOR MITIGATING BRANCH TYPE CONFUSION REVISION 1.0 2022-07-12 * Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 SystemZ may eventually want to support "thunk-extern" and "thunk"; both options are used by the Linux kernel's CONFIG_EXPOLINE. This functionality has been available in GCC since the 8.1 release, and was backported to the 7.3 release. Many thanks for folks that provided discrete review off list due to the embargoed nature of this hardware vulnerability. Many Bothans died to bring us this information. Link: https://www.youtube.com/watch?v=IF6HbCKQHK8 Link: https://github.com/llvm/llvm-project/issues/54404 Link: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01197.html Link: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html Link: https://arstechnica.com/information-technology/2022/07/intel-and-amd-cpus-vulnerable-to-a-new-speculative-execution-attack/?comments=1 Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce114c866860aa9eae3f50974efc68241186ba60 Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00702.html Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00707.html Reviewed By: aaron.ballman, craig.topper Differential Revision: https://reviews.llvm.org/D129572	2022-07-12 09:17:54 -07:00
David Sherwood	6b694d600a	[LoopVectorize] Change PredicatedBBsAfterVectorization to be per VF When calculating the cost of Instruction::Br in getInstructionCost we query PredicatedBBsAfterVectorization to see if there is a scalar predicated block. However, this meant that the decisions being made for a given fixed-width VF were affecting the cost for a scalable VF. As a result we were returning InstructionCost::Invalid pointlessly for a scalable VF that should have a low cost. I encountered this for some loops when enabling tail-folding for scalable VFs. Test added here: Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll Differential Revision: https://reviews.llvm.org/D128272	2022-07-12 14:53:20 +01:00
Nikita Popov	3d475dfeb9	[Mem2Reg] Consistently preserve nonnull assume for uninit load When performing a !nonnull load from uninitialized memory, we should preserve the nonnull assume just like in all other cases. We already do this correctly in the generic mem2reg code, but don't handle this case when using the optimized single-block implementation. Make sure that the optimized implementation exhibits the same behavior as the generic implementation.	2022-07-12 12:53:08 +02:00
Kazu Hirata	ec9a0e36d9	[IPO] Remove addLTOOptimizationPasses and addLateLTOOptimizationPasses (NFC) The last uses were removed on Apr 15, 2022 in commit `2e6ac54cf4`. Differential Revision: https://reviews.llvm.org/D129460	2022-07-11 20:15:24 -07:00
Florian Hahn	5d135041c5	[LV] Move VPBlendRecipe::execute to VPlanRecipes.cpp (NFC).	2022-07-11 16:01:07 -07:00
Justin Cady	3d438ceed1	[InstrProf] Mark __llvm_profile_runtime hidden to match libclang_rt.profile definition Mark the symbol hidden to match INSTR_PROF_PROFILE_RUNTIME_VAR in compiler-rt. Fixes second issue discussed at https://discourse.llvm.org/t/63090 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D128842	2022-07-11 11:29:20 -07:00
David Sherwood	03fee6712a	[LoopVectorize] Add option to use active lane mask for loop control flow Currently, for vectorised loops that use the get.active.lane.mask intrinsic we only use the mask for predicated vector operations, such as masked loads and stores, etc. The loop itself is still controlled by comparing the canonical induction variable with the trip count. However, for some targets this is inefficient when it's cheap to use the mask itself to control the loop. This patch adds support for using the active lane mask for control flow by: 1. Generating the active lane mask for the next iteration of the vector loop, rather than the current one. If there are still any remaining iterations then at least the first bit of the mask will be set. 2. Extract the first bit of this mask and use this bit for the conditional branch. I did this by creating a new VPActiveLaneMaskPHIRecipe that sets up the initial PHI values in the vector loop pre-header. I've also made use of the new BranchOnCond VPInstruction for the final instruction in the loop region. Differential Revision: https://reviews.llvm.org/D125301	2022-07-11 13:46:55 +01:00

1 2 3 4 5 ...

31232 Commits