llvm-project

Commit Graph

Author	SHA1	Message	Date
Johannes Doerfert	1df6e171c3	[Attributor] Simplify (integer range) state handling We used to be very conservative when integer states were merged. Instead of adding the known range (which is large due to uncertainty) into the assumed range (which is hopefully small), we can also only allow to merge in both at the same time into their respective counterpart. This will ensure we keep the invariant that assumed is part of known.	2022-06-09 12:00:26 +02:00
Johannes Doerfert	481b8f31df	[Attributor][NFC] Introduce helper struct We often use a context associated with a value. For now only one use case has been changed.	2022-06-09 12:00:26 +02:00
Johannes Doerfert	4277c1be88	[Attributor][FIX] Avoid metadata and duplicate replication assertion When we recreate instructions as part of simplification we need to take care of debug metadata and replacing the value multiple times. For now, we handle both conservatively.	2022-06-09 12:00:26 +02:00
Biplob Mishra	d87bfa9ad0	[InstCombine] Combine instructions of type or/and where AND masks can be combined. The patch simplifies some of the patterns as below (A \| (B & C0)) \| (B & C1) -> A \| (B & C0\|C1) ((B & C0) \| A) \| (B & C1) -> (B & C0\|C1) \| A In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns. Additionally this commit fixes the issue reported with the test case. int f(int a, int b) { int c = ((unsigned char)(a >> 23) & 925); if (a) c = (a >> 23 & b) \| ((unsigned char)(a >> 23) & 925) \| (b >> 23 & 157); return c; } The previous revision/commit did not check one-use of an intermediate value that this transform re-uses. When that value has another use, an existing transform will try to invert the transform here. By adding one-use checks, we avoid the infinite loops seen with the earlier commit. Differential Revision: https://reviews.llvm.org/D124119	2022-06-09 10:58:30 +01:00
Chenbing Zheng	38992d2c5e	[InstCombine] improve fold for icmp-ugt-ashr Existing condition for fold icmp ugt (ashr X, ShAmtC), C --> icmp ugt X, ((C + 1) << ShAmtC) - 1 missed some boundary. It cause this fold don't work for some cases, and the reason is due to signed number overflow. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D127188	2022-06-09 16:22:12 +08:00
Nikita Popov	56c9976d46	[IndVarSimplify] Don't assert that terminator is not SCEVable (PR55925) The IV widening code currently asserts that terminators aren't SCEVable -- however, this is not the case for invokes with a returned attribute. As far as I can tell, this assertions is not necessary -- even if we have a critical edge (the second test case), the trunc gets inserted in a legal position. Fixes https://github.com/llvm/llvm-project/issues/55925. Differential Revision: https://reviews.llvm.org/D127288	2022-06-09 10:12:13 +02:00
Fangrui Song	11136a6032	[DeadArgElim] Remove dead code after r128810	2022-06-08 21:11:54 -07:00
Florian Hahn	cedfd7a2e5	Recommit "[VPlan] Remove uneeded needsVectorIV check." This reverts commit `266ea446ab`. The reasons for the revert have been addressed by cleaning up condition handling in VPlan and properly marking VPBranchOnMaskRecipe as using scalars. The test case for the revert from D123720 has been added in `3d663308a5`.	2022-06-08 14:06:45 +01:00
Chuanqi Xu	0e10f12844	[NFC] Remove commented cerr debugging loggings There are some unused cerr debugging loggings in the codes. It is weird to remain such commented debug helpers in the product.	2022-06-08 15:58:06 +08:00
Chuanqi Xu	733d7cf964	[Debug] [Coroutines] Add deref operator for non complex expression Background: When we construct coroutine frame, we would insert a dbg.declare intrinsic for it: ``` %hdl = call void @llvm.coro.begin() ; would return coroutine handle call void @llvm.dbg.declare(metadata ptr %hdl, metadata ![[DEBUG_VARIABLE: __coro_frame]], metadata !DIExpression()) ``` And in the splitted coroutine, it looks like: ``` define void @coro_func.resume(ptr *hdl) { entry.resume: call void @llvm.dbg.declare(metadata ptr %hdl, metadata ![[DEBUG_VARIABLE: __coro_frame]], metadata !DIExpression()) } ``` And we would salvage the debug info by inserting a new alloca here: ``` define void @coro_func.resume(ptr %hdl) { entry.resume: %frame.debug = alloca ptr call void @llvm.dbg.declare(metadata ptr %frame.debug, metadata ![[DEBUG_VARIABLE: __coro_frame]], metadata !DIExpression()) store ptr %hdl, %frame.debug } ``` But now, the problem comes since the `dbg.declare` refers to the address of that alloca instead of actual coroutine handle. I saw there are codes to solve the problem but it only applies to complex expression only. I feel if it is OK to relax the condition to make it work for `__coro_frame`. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D126277	2022-06-08 10:53:51 +08:00
Wael Yehia	0952cf5bbb	[InstCombine] decomposeSimpleLinearExpr should bail out on negative operands. InstCombine tries to rewrite %prod = mul nsw i64 %X, Scale %acc = add nsw i64 %prod, Offset %0 = alloca i8, i64 %acc, align 4 %1 = bitcast i8* %0 to i32* Use ( %1 ) into %prod = mul nsw i64 %X, Scale/4 %acc = add nsw i64 %prod, Offset/4 %0 = alloca i32, i64 %acc, align 4 Use (%0) But it assumes Scale is unsigned, and performs an unsigned division. So we should bail out if Scale cannot be interpreted as an unsigned safely. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126546	2022-06-08 00:57:25 +00:00
Sanjay Patel	cae993d4c8	[InstCombine] [InstCombine] reduce left-shift-of-right-shifted constant via demanded bits If we don't demand low bits and it is valid to pre-shift a constant: (C2 >> X) << C1 --> (C2 << C1) >> X https://alive2.llvm.org/ce/z/_UzTMP This is the reverse-order shift sibling to `82040d414b` ( D127122 ). It seems likely that we would want to add this to the SDAG version of the code too to keep it on par with IR.	2022-06-07 18:43:27 -04:00
Sanjay Patel	a4d2c5ecaa	[InstCombine] reduce code duplication for accessing type; NFC	2022-06-07 18:43:27 -04:00
Philip Reames	89c4b29e8d	[GuardWidening] Fix a nasty cast bug in `c2eccc6` `c2eccc6` introduced a call to etHasNoUnsignedWrap which implicitly assumes that Inst is a OverflowingBinaryOperator. This is frequently untrue, but was not caught because cast<Ty>(X) has been broken, see https://discourse.llvm.org/t/cast-x-is-broken-implications-and-proposal-to-address/63033 for context. I considered reverting this, but since doing so re-introduces a nasty miscompile of its own, I decided to fix forward instead. I'll note that this is a particularly nasty form of the cast<Ty>(X) issue. Because the cast was succeeding unexpected, we were writing data to instructions which weren't OBOs. This could result in near arbitrary data or memory corruption. I'm a bit shocked that the sanitizers didn't find this TBH.	2022-06-07 13:27:13 -07:00
Florian Hahn	b0c9a71be0	[VPlan] Handle VPInst without underlying instr in VPInterleavedAccess. This violation is hidden while `cast` is missing an isa assertion after D123901.	2022-06-07 21:00:49 +01:00
Martin Sebor	dd2a6d78ee	[InstCombine] Fold memchr of sequences of same characters Enhance memchr libcall folder to handle constant arrays consisting of one or two sequences of cosecutive equal characters. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126515	2022-06-07 13:45:10 -06:00
Martin Sebor	fb6627fa0c	[InstCombine] Add substr helper function (NFC). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126515	2022-06-07 13:27:36 -06:00
Sanjay Patel	82040d414b	[InstCombine] reduce right-shift-of-left-shifted constant via demanded bits If we don't demand high bits (zeros) and it is valid to pre-shift a constant: (C2 << X) >> C1 --> (C2 >> C1) << X https://alive2.llvm.org/ce/z/P3dWDW There are a variety of related patterns, but I haven't found a single solution that gets all of the motivating examples - so pulling this piece out of D126617 along with more tests. We should also handle the case where we shift-right followed by shift-left, but I'll make that a follow-on patch assuming this one is ok. It seems likely that we would want to add this to the SDAG version of the code too to keep it on par with IR. Differential Revision: https://reviews.llvm.org/D127122	2022-06-07 13:28:18 -04:00
Craig Topper	d73684e223	[LoopFlatten] Fix crash if the inner loop trip count comes from a sext instruction. If we look through a truncate in matchLinearIVUser, it's possible we find a sext/zext instruction that didn't come from widening. This will fail the MatchedItCount->getType() == InnerInductionPHI->getType() assertion. Fix this by checking that we did not look through a truncate already. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D127149	2022-06-07 08:21:21 -07:00
Craig Topper	fdd5843572	[LoopFlatten] Replace unchecked dyn_cast with cast. Spotted while reading through the code. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D127146	2022-06-07 08:21:00 -07:00
David Sherwood	997ecb0036	[LoopVectorize] Add FastMathFlags to the select used for reductions with tail-folding Based on reviewer comments on https://reviews.llvm.org/D126692 I've added FastMathFlags to the select instruction used when tail-folding with reductions. These flags can then be used by InstCombine to decide upon the most optimal floating point identity value for fadd/fsub. Doing so unlocks further optimisations, such as folding selects into masked loads. Differential Revision: https://reviews.llvm.org/D126778	2022-06-07 10:21:31 +01:00
Nikita Popov	7fa97b473c	[SCCP] Don't mark ranges from branch conditions as potentially undef Now that transforms introducing branch on poison have been removed, we can stop marking ranges that have been derived from branch conditions as containing undef. The existing comment explains why this is legal. I've checked that alive2 is happy with SCCP tests after this change. Differential Revision: https://reviews.llvm.org/D126647	2022-06-07 10:20:24 +02:00
Enna1	e52a38c8f1	[ASan] Skip any instruction inserted by another instrumentation. Currently, we only check !nosanitize metadata for instruction passed to function `getInterestingMemoryOperands()` or instruction which is a cannot return callable instruction. This patch add this check to any instruction. E.g. ASan shouldn't instrument the instruction inserted by UBSan/pointer-overflow. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D126269	2022-06-07 11:17:07 +08:00
Kevin P. Neal	a1f1bd547b	[IPSCCP] Switch away from Instruction::isSafeToRemove() In D115737 I found that I needed to teach Instruction::isSafeToRemove() about strictfp/constrained intrinsics. It was pointed out that this is probably the wrong function to use isInstructionTriviallyDead(). It doesn't make sense to have a "second, worse implementation". I also believe that the Instruction class is the wrong place for this functionality. The information about whether or not an instruction can be removed is in the transform passes and should stay there. Differential Revision: https://reviews.llvm.org/D118387	2022-06-06 09:24:11 -04:00
Florian Hahn	eaf48dd9b0	[VPlan] Replace BranchOnCount with BranchOnCond if TC <= UF * VF. Try to simplify BranchOnCount to `BranchOnCond true` if TC <= UF * VF. This is an alternative to D121899 which simplifies the VPlan directly instead of doing so late in code-gen. The potential benefit of doing this in VPlan is that this may help cost-modeling in the future. The reason this is done in prepareToExecute at the moment is that a single plan may be used for multiple VFs/UFs. There are further simplifications that can be applied as follow ups: 1. Replace inductions with constants 2. Replace vector region with regular block. Fixes #55354. Depends on D126679. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D126680	2022-06-06 09:38:53 +01:00
Kazu Hirata	8daf23d364	[Scalar] Use llvm::make_early_inc_range (NFC)	2022-06-05 23:53:18 -07:00
Sanjay Patel	3f33d67d8a	[InstCombine] fold mul with masked low bit operand to trunc+select https://alive2.llvm.org/ce/z/o7rQ5q This shows an extra instruction in some cases, but that is caused by an existing canonicalization of trunc -> and+icmp. Codegen should be better for any target where a multiply is more costly than the most simple ALU op. This ends up producing the requested x86 asm from issue #55618, but it's not the same IR. We are missing a canonicalization from the negate+mask pattern to the trunc+select created here.	2022-06-05 20:07:18 -04:00
Kazu Hirata	3b9707dbc0	[llvm] Convert for_each to range-based for loops (NFC)	2022-06-05 12:07:14 -07:00
Kazu Hirata	30f19382c6	[Scalar] Remove isValidSingle (NFC) The last use was removed on Feb 18, 2022 in commit `00ab91b70d`.	2022-06-05 08:45:11 -07:00
Fangrui Song	95a134254a	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 01:07:51 -07:00
Fangrui Song	d86a206f06	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 00:31:44 -07:00
Kazu Hirata	2c4d52467a	[Transforms/Utils] Use predecessors (NFC)	2022-06-05 00:16:14 -07:00
Fangrui Song	d0d1c416cb	Remove unneeded cl::ZeroOrMore for cl::list options	2022-06-04 23:51:13 -07:00
Kazu Hirata	e0039b8d6a	Use llvm::less_second (NFC)	2022-06-04 22:48:32 -07:00
Kazu Hirata	f83a88a179	[Transforms] Use llvm::is_contained (NFC)	2022-06-04 20:48:26 -07:00
Florian Hahn	416a5080d8	[VPlan] Update vector latch terminator edge to exit block after execution. Instead of setting the successor to the exit using CFG.ExitBB, set it to nullptr initially. The successor to the exit block is later set either through createEmptyBasicBlock or after VPlan execution (because at the moment, no block is created by VPlan for the exit block, the existing one is reused). This also enables BranchOnCond to be used as terminator for the exiting block of the topmost vector region. Depends on D126618. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D126679	2022-06-04 21:22:32 +01:00
Fangrui Song	36c7d79dc4	Remove unneeded cl::ZeroOrMore for cl::opt options Similar to `557efc9a8b`. This commit handles options where cl::ZeroOrMore is more than one line below cl::opt.	2022-06-04 00:10:42 -07:00
Fangrui Song	557efc9a8b	[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded. Also remove cl::init(false) while touching the lines.	2022-06-03 21:59:05 -07:00
Alexey Bataev	cac60940b7	[SLP]Improve shuffles cost estimation where possible. Improved/fixed cost modeling for shuffles by providing masks, improved cost model for non-identity insertelements. Differential Revision: https://reviews.llvm.org/D115462	2022-06-03 08:06:22 -07:00
Arnold Schwaighofer	5c902af572	[coro async] Add code to support dynamic aligment of over-aligned types in async frames Async context frames are allocated with a maximum alignment. If a type requests an alignment bigger than that dynamically align the address in the frame. Differential Revision: https://reviews.llvm.org/D126715	2022-06-03 07:06:14 -07:00
Benjamin Kramer	a8d2a381a2	[VPlan] Silence another unused variable warning in release builds	2022-06-03 14:07:56 +02:00
Benjamin Kramer	6b7c186390	[VPlan] Inline variable into assertion. NFC. Avoids a warning in release builds llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp:311:14: warning: unused variable 'BrCond' [-Wunused-variable] Value *BrCond = Br->getCondition();	2022-06-03 13:59:48 +02:00
Florian Hahn	a5bb4a3b4d	[VPlan] Replace CondBit with BranchOnCond VPInstruction. This patch removes CondBit and Predicate from VPBasicBlock. To do so, the patch introduces a new branch-on-cond VPInstruction opcode to model a branch on a condition explicitly. This addresses a long-standing TODO/FIXME that blocks shouldn't be users of VPValues. Those extra users can cause issues for VPValue-based analyses that don't expect blocks. Addressing this fixme should allow us to re-introduce `266ea446ab`. The generic branch opcode can also be used in follow-up patches. Depends on D123005. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D126618	2022-06-03 11:48:31 +01:00
Fangrui Song	df0f30dc36	Revert "[SLP]Improve shuffles cost estimation where possible." This reverts commit `9980c99718`. Caused assertion failures: https://reviews.llvm.org/D115462#3555350	2022-06-03 00:30:34 -07:00
Daniil Suchkov	f1940a5895	Revert "[LoopInterchange] New cost model for loop interchange" Reverting the commit due to numerous buildbot failures. This reverts commit `006334470d`.	2022-06-03 00:52:08 +00:00
Congzhe Cao	006334470d	[LoopInterchange] New cost model for loop interchange This patch proposed to use a new cost model for loop interchange, which is obtained from loop cache analysis. Given a loopnest, what loop cache analysis returns is a vector of loops [loop0, loop1, loop2, ...] where loop0 should be replaced as the outermost loop, loop1 should be placed one more level inside, and loop2 one more level inside, etc. What loop cache analysis does is not only more comprehensive than the current cost model, it is also a "one-shot" query which means that we only need to query it once during the entire loop interchange pass, which is better than the current cost model where we query it every time we check whether it is profitable to interchange two loops. Thus complexity is reduced, especially after D120386 where we do more interchanges to get the globally optimal loop access pattern. Updates made to test cases are mostly minor changes and some corrections. Test coverage for loop interchange is not reduced. Currently we did not completely remove the legacy cost model, but keep it as fall-back in case the new cost model did not run successfully. This is because currently we have some limitations in delinearization, which sometimes makes loop cache analysis bail out. The longer term goal is to enhance delinearization and eventually remove the legacy cost model compeletely. Reviewed By: bmahjour, #loopoptwg Differential Revision: https://reviews.llvm.org/D124926	2022-06-02 19:07:14 -04:00
Sanjay Patel	8689463bfb	[InstCombine] make pattern matching more consistent; NFC We could go either way on this and several similar matches. Just matching as a binop is possibly slightly more efficient; we don't need to re-confirm the opcode of the instruction.	2022-06-02 16:01:23 -04:00
Alexey Bataev	9980c99718	[SLP]Improve shuffles cost estimation where possible. Improved/fixed cost modeling for shuffles by providing masks, improved cost model for non-identity insertelements. Differential Revision: https://reviews.llvm.org/D115462	2022-06-02 11:18:14 -07:00
Liqiang Tao	14e8add939	[llvm][ModuleInliner] Refactor InlineSizePriority and PriorityInlineOrder This patch introduces the abstract base class InlinePriority to serve as the comparison function for the priority queue. A derived class, such as SizePriority, may choose to cache the priorities for different functions for performance reasons. This design shields the type used for the priority away from classes outside InlinePriority and classes derived from it. In turn, PriorityInlineOrder no longer needs to be a template class. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D126300	2022-06-02 23:40:26 +08:00
Liqiang Tao	5c6ed60c51	Revert "[llvm][ModuleInliner] Refactor InlineSizePriority and PriorityInlineOrder" This reverts commit `50de7f1e77`.	2022-06-02 23:18:47 +08:00
Liqiang Tao	50de7f1e77	[llvm][ModuleInliner] Refactor InlineSizePriority and PriorityInlineOrder This patch introduces the abstract base class InlinePriority to serve as the comparison function for the priority queue. A derived class, such as SizePriority, may choose to cache the priorities for different functions for performance reasons. This design shields the type used for the priority away from classes outside InlinePriority and classes derived from it. In turn, PriorityInlineOrder no longer needs to be a template class. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D126300	2022-06-02 22:28:33 +08:00
Florian Hahn	4f1c86e3d5	[VPlan] Remove dead VPlan-native special case from BranchOnCount (NFC). After `05776122b6` this special case doesn't exist any longer.	2022-06-02 12:07:54 +01:00
eopXD	6eab5cade7	[LSR] Early exit for RateFormula when it is already losing. NFC This patch does not effect any behavior of the current code. The codebase implicitly implies that `Cost::RateFormula` is only called when the `Cost` is not in losing status, or else there may be possible to trigger the assertion of `Cost::isValid`. The intention here is to prevent mis-use where future development allow `Cost` that is already loser to call `Cost::RateFormula` - Early exit when `Cost` is already losing. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D125670	2022-06-01 21:02:40 -07:00
Alexey Bataev	73020b4540	Revert "[SLP]Improve shuffles cost estimation where possible." This reverts commit `fd5a6ce9dc` to fix a crash detected by a buildbot https://lab.llvm.org/buildbot/#/builders/179/builds/3805/steps/11/logs/stdio.	2022-06-01 15:44:51 -07:00
Florian Hahn	08482830eb	[LV] Update var name to Exiting, in line with terminology (NFC) Recently the terminology used has been changed from Exit->Exiting in line with common LLVM loop terminology. Update a remaining use of the old terminology.	2022-06-01 22:13:29 +01:00
Alexey Bataev	fd5a6ce9dc	[SLP]Improve shuffles cost estimation where possible. Improved/fixed cost modeling for shuffles by providing masks, improved cost model for non-identity insertelements. Differential Revision: https://reviews.llvm.org/D115462	2022-06-01 11:01:37 -07:00
Alexey Bataev	fe4949942d	[SLP]Fix PR55796: insert point for extractelements from different basic blocks. Extractelement instructions may come from different basic blocks, need to take it into account when looking for a last instruction in the bundle to prevent compiler crash. Differential Revision: https://reviews.llvm.org/D126777	2022-06-01 09:44:53 -07:00
Alexander Kornienko	aa98e7e1eb	Revert "[InstCombine] Combine instructions of type or/and where AND masks can be combined." This reverts commit `ec4adf1f6c`. The commit causes clang to hang on a certain input: ``` $ cat q.cc int f(int a, int b) { int c = ((unsigned char)(a >> 23) & 925); if (a) c = (a >> 23 & b) \| ((unsigned char)(a >> 23) & 925) \| (b >> 23 & 157); return c; } $ time ./clang-15-10515 --target=x86_64--linux-gnu -O1 -c q.cc ^C real 0m45.072s user 0m0.025s sys 0m0.099s ```	2022-06-01 14:20:00 +02:00
Florian Hahn	05776122b6	[VPlan] Use region for each loop in native path. This patch updates the VPlan native path to use VPRegionBlocks for all loops in a loop nest. Up to now, only the outermost loop used a region. This is a step towards unifying both paths and keep things consistent between them. It also prepares various code-gen parts for modeling the pre-header in the inner loop vectorizer (D121624). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123005	2022-06-01 10:41:05 +01:00
Florian Hahn	d157019482	[VPlan] Remove unused native utilities incompatible with nested regions. The implementations of VPlanDominatorTree, VPlanLoopInfo and VPlanPredicator are all incompatible with modeling loops in VPlans as region without explicit back-edges. Those pieces are not actively used and only exercised by a few gtest unit tests. They are at the moment blocking progress towards unifying the native and inner-loop vectorizer paths in D121624 and D123005. I think we should not block forward progress on unused pieces of code, so this patch removes the utilities for now. The plan is to re-introduce them as needed in a way that is compatible with the unified VPlan scheme used in both the inner loop vectorizer and the native path. Reviewed By: sguggill Differential Revision: https://reviews.llvm.org/D123017	2022-06-01 09:32:59 +01:00
Eli Friedman	abdf0da800	[LoopIdiom] Fix bailout for aliasing in memcpy transform. Commit `dd5991cc` modified the aliasing checks here to allow transforming a memcpy where the source and destination point into the same object. However, the change accidentally made the code skip the alias check for other operations in the loop. Instead of completely skipping the alias check, just skip the check for whether the memcpy aliases itself. Differential Revision: https://reviews.llvm.org/D126486	2022-05-31 17:24:23 -07:00
Sanjay Patel	2bf6123f22	[InstCombine] fold icmp of sext bool based on limited range X <=u (sext i1 Y) --> (X == 0) \| Y https://alive2.llvm.org/ce/z/W_tZzo This is the conjugate/sibling pattern suggested with D126171 for a sign-extended bool value.	2022-05-31 12:37:56 -04:00
Augie Fackler	73f664601c	BuildLibCalls: infer allockind attributes on relevant functions Differential Revision: https://reviews.llvm.org/D123089	2022-05-31 10:01:17 -04:00
Augie Fackler	42861faa8e	attributes: introduce allockind attr for describing allocator fn behavior I chose to encode the allockind information in a string constant because otherwise we would get a bit of an explosion of keywords to deal with the possible permutations of allocation function types. I'm not sure that CodeGen.h is the correct place for this enum, but it seemed to kind of match the UWTableKind enum so I put it in the same place. Constructive suggestions on a better location most certainly encouraged. Differential Revision: https://reviews.llvm.org/D123088	2022-05-31 10:01:17 -04:00
Nikita Popov	36cbdaa163	[InstCombine] Fix inbounds preservation when swapping GEPs (PR44206) When reassociating GEPs, we can only keep inbounds if both original GEPs were inbounds, and their offsets have the same sign. For the sake of simplicity, I only handle the case where both offsets are non-negative here. It would probably be fine to just not preserve inbounds at all here, but as I don't see a compile-time impact for adding the isKnownNonNegative() calls I went with this more conservative approach. Fixes https://github.com/llvm/llvm-project/issues/44206. Differential Revision: https://reviews.llvm.org/D126687	2022-05-31 15:45:02 +02:00
Mel Chen	b0fc765350	[NFC] Change LoopVectorizationCostModel::useOrderedReductions() to be a const function. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D126200	2022-05-31 05:39:13 -07:00
Danila Malyutin	4fb3fd7d82	[InstCombine] Fix const folding of switches with default case In case phi was in the default block it could lead to multi-edge. Fixes #55721. Differential Revision: https://reviews.llvm.org/D126650	2022-05-31 15:13:58 +03:00
Nikita Popov	872d69e5d4	[InstCombine] Fix inbounds preservation when merging GEPs (PR55722) Even if the total offset is inbounds, we might represent it by first performing a large negative offset and then a small positive one. With inbounds semantics as currently specified, each offset must be inbounds individually, not just the overall offset of the GEP. Fix this by checking that the sign of all offsets is the same. Fixes https://github.com/llvm/llvm-project/issues/55722.	2022-05-31 11:54:01 +02:00
Sanjay Patel	a0c3c60728	[InstCombine] fold shift-right-by-constant with shift-right-of-constant operand (C2 >> X) >> C1 --> (C2 >> C1) >> X The shift-left form of this transform has existed since: `16f18ed7b5` ...but it applies to matching shift right opcodes too: https://alive2.llvm.org/ce/z/c5eQms	2022-05-30 15:30:01 -04:00
Sanjay Patel	c5d942a4fb	[InstCombine] remove unnecessary one-use check from (C2 << X) << C1 fold The restriction goes back to: `16f18ed7b5` ...but the fold only replaces a shift with a shift, so that's not necessary. Generalizing to other opcodes is planned as a follow-up.	2022-05-30 15:17:54 -04:00
Nuno Lopes	80b3dcc045	[Support] Make report_fatal_error respect its GenCrashDiag argument so it doesn't generate a backtrace There are a few places where we use report_fatal_error when the input is broken. Currently, this function always crashes LLVM with an abort signal, which then triggers the backtrace printing code. I think this is excessive, as wrong input shouldn't give a link to LLVM's github issue URL and tell users to file a bug report. We shouldn't print a stack trace either. This patch changes report_fatal_error so it uses exit() rather than abort() when its argument GenCrashDiag=false. Reviewed by: nikic, MaskRay, RKSimon Differential Revision: https://reviews.llvm.org/D126550	2022-05-30 19:19:23 +01:00
Nikita Popov	a770f534e6	[InstCombine] When swapping GEPs, only keep inbounds if both are If only one of the GEPs is inbounds, then after swapping, there is no guarantee that one of them will be inbounds as well (see e.g. https://alive2.llvm.org/ce/z/agaCnp). This is only a partial fix, because even if both are inbounds, the result is not necessarily inbounds (if the offsets have different signs).	2022-05-30 17:04:42 +02:00
Nikita Popov	2d7bab666f	[InstCombine] Always create new GEPs when swapping GEPs As the long explanatory comment attests, performing the modification in place is pretty tricky. Drop this unnecessary complexity and always create new instructions. This should be NFC-ish, but can probably cause difference due to worklist order.	2022-05-30 16:48:52 +02:00
Nikita Popov	2e101cca69	[Local] Don't remove invoke of non-willreturn function The code was only checking for memory side-effects, but not for divergence side-effects. Replace this with a generic check.	2022-05-30 15:37:46 +02:00
zhongyunde	3e6ba89055	[InstCombine] Fold a mul with bool value into and Fixes https://github.com/llvm/llvm-project/issues/55599 X * Y --> X & Y, iff X, Y can be only {0, 1}. https://alive2.llvm.org/ce/z/_RsTKF Reviewed By: spatel, nikic Differential Revision: https://reviews.llvm.org/D126040	2022-05-30 21:05:00 +08:00
Nikita Popov	1721ff1dfd	[GVN] Enable enable-split-backedge-in-load-pre option by default This option was added in D89854. It prevents GVN from performing load PRE in a loop, if doing so would require critical edge splitting on the backedge. From the review: > I know that GVN Load PRE negatively impacts peeling, > loop predication, so the passes expecting that latch has > a conditional branch. In the PhaseOrdering test in this patch, splitting the backedge negatively affects vectorization: After critical edge splitting, the loop gets rotated, effectively peeling off the first loop iteration. The effect is that the first element is handled separately, then the bulk of the elements use a vectorized reduction (but using unaligned, off-by-one memory accesses) and then a tail of 15 elements is handled separately again. It's probably worth noting that the loop load PRE from D99926 is not affected by this change (as it does not need backedge splitting). This is about normal load PRE that happens to occur inside a loop. Differential Revision: https://reviews.llvm.org/D126382	2022-05-30 09:55:58 +02:00
Max Kazantsev	503d5771b6	[JumpThreading][NFCI] Reuse existing DT instead of recomputation This whole part with recomputation of BPI and BFI looks redundant, and we tried to get rid of it in D124439. Unfortunately, it causes some hard-to-reproduce failures due to invalid state of analysis. Until this is investigated and fixed, let's try to reuse at least part of available analyzes. DT is available at this point, and there is no need to recompute it. Please revert if you see it causing any behavior changes.	2022-05-30 12:48:10 +07:00
Chenbing Zheng	ef256ed58e	[InstCombine] bitcast (extractelement <1 x elt>, dest) -> bitcast(<1 x elt>, dest) Only solve dest type is vector to avoid inverse transform in visitBitCast. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D125951	2022-05-30 10:16:32 +08:00
Florian Hahn	0776c48f9b	Recommit "[LICM] Only create load in ph when promoting load or store doesn't exec." This reverts the revert commit `ad95255b92`. The updated version also creates a load when the store may not execute. In those cases, we still need to introduce a load in a function where there may not have been one before, so this doesn't completely resolve issue #51248. Original message: When only a store is sunk, there is no need to create a load in the pre-header, as the result of the load will never get used. The dead load can can introduce UB, if the function is marked as writeonly. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123473	2022-05-29 21:57:14 +01:00
Florian Hahn	6abce17fc2	[VPlan] Use Exiting-block instead of Exit-block terminology (NFC). In LLVM's common loop terminology, an exit block is a block outside a loop with a predecessor inside the loop. An exiting block is a block inside the loop which branches to an exit block outside the loop. This patch updates a few places where VPlan was using ExitBlock for a block exiting a region. Those instances have been updated to use ExitingBlock. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D126173	2022-05-28 21:16:05 +01:00
eopXD	6a84579243	[LSR][TTI][PowerPC][SystemZ][X86] Add const-ness to TTI::isLSRCostLess. NFC Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D126350	2022-05-27 15:22:23 -07:00
Sanjay Patel	b5b6aa4d53	[InstCombine] fold multiply by signbit-splat to cmp+select (ashr i32 X, 31) * C --> (X < 0) ? -C : 0 https://alive2.llvm.org/ce/z/G8u9SS With a constant operand, this is an improvement in IR and codegen (where it can be converted to a mask op). Without a constant operand, we would have to negate the operand, so that is probably better left to the backend. This is similar but not the same optimization that is requested in #55618.	2022-05-27 11:54:19 -04:00
Sanjay Patel	5a6e085757	[InstCombine] reduce code duplication; NFC	2022-05-27 11:54:19 -04:00
Enna1	52992f136b	Add !nosanitize to FixedMetadataKinds This patch adds !nosanitize metadata to FixedMetadataKinds.def, !nosanitize indicates that LLVM should not insert any sanitizer instrumentation. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D126294	2022-05-27 09:46:13 +08:00
Arthur Eubanks	36096c2b38	[NFC][JumpThreading] Remove InsertFreezeWhenUnfoldingSelect pass parameter All callers pass true. select-unfold-freeze.ll is now a subset of select.ll so delete it. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126501	2022-05-26 16:13:34 -07:00
Sanjay Patel	c4c750058f	[InstCombine] fold mul of signbit directly to X < 0 ? Y : 0 This is effectively NFC (intentionally no test diffs) because we already have the related fold that converts the 'and' pattern to select. So this is just an efficiency improvement.	2022-05-26 16:19:15 -04:00
Sanjay Patel	49f8b05137	[InstCombine] fold icmp equality with sdiv and SMIN This extends the fold from D126410 / `3952c905ef` to allow for the only case where it works with signed division: https://alive2.llvm.org/ce/z/k7_ypu (X s/ Y) == SMIN --> (X == SMIN) && (Y == 1) (X s/ Y) != SMIN --> (X != SMIN) \|\| (Y != 1) This is another improvement based on #55695.	2022-05-26 16:19:15 -04:00
Sanjay Patel	ed5be1523f	[InstCombine] reduce code duplication in icmp+div folds; NFC	2022-05-26 16:19:15 -04:00
Owen Anderson	939a43461b	Revert "Replace the custom linked list in LeaderTableEntry with TinyPtrVector." This reverts commit `1e91149844`. Pending further discussion.	2022-05-26 09:50:36 -07:00
Nikita Popov	c8eb83f2d0	[ControlHeightReduction] Use logical and Use logical instead of bitwise and to combine conditions, to avoid propagating poison from a later condition if an earlier one is already false. This avoids introducing branch on poison. Differential Revision: https://reviews.llvm.org/D125898	2022-05-26 18:03:35 +02:00
Alexey Bataev	7b809c30b9	[SLP]Improve compile time, NFC. Patch improves compile time. For function calls, which cannot be vectorized, create a unique group for each such a call instead of subgroup. It prevents them from being grouped by a subgroups and attempts for their vectorization. Also, looks through casts operand to try to check their groups/subgroups. Reduces number of vectorization attempts. No changes in the statistics for SPEC2017/2006/llvm-test-suite. Differential Revision: https://reviews.llvm.org/D126476	2022-05-26 08:40:59 -07:00
Alexey Bataev	120d52b0ef	[SLP]Fix PR55653: emit undefs where required, not poison. Need to handle a corner case correctly, if all elements are Undefs/Poisons, need to emit actual values, not just poisons. Differential Revision: https://reviews.llvm.org/D126298	2022-05-26 08:38:50 -07:00
Alex Zhikhartsev	8b0d763474	[DFAJumpThreading] Relax analysis to handle unpredictable initial values Responding to a feature request from the Rust community: https://github.com/rust-lang/rust/issues/80630 void foo(X) { for (...) switch (X) case A X = B case B X = C } Even though the initial switch value is non-constant, the switch statement can still be threaded: the initial value will hit the switch statement but the rest of the state changes will proceed by jumping unconditionally. The early predictability check is relaxed to allow unpredictable values anywhere, but later, after the paths through the switch statement have been enumerated, no non-constant state values are allowed along the paths. Any state value not along a path will be an initial switch value, which can be safely ignored. Differential Revision: https://reviews.llvm.org/D124394	2022-05-26 11:29:54 -04:00
Simon Pilgrim	14258d6fb5	[SLP] Move canVectorizeLoads implementation to simplify the diff in D105986. NFC.	2022-05-26 15:23:58 +01:00
Alexey Bataev	9139d484d4	[SLP]Fix crash on reordering of ScatterVectorize nodes. ScatterVectorize nodes should be handled same way as gathers in reorderBottomToTop function, since we can simple reorder the loads in this node. Because of that need to include such nodes to the list of gathered nodes to fix compiler crash. Differential Revision: https://reviews.llvm.org/D126378	2022-05-26 06:25:58 -07:00
Sanjay Patel	3952c905ef	[InstCombine] fold icmp equality with udiv and large constant With large compare constant: (X u/ Y) == C --> (X == C) && (Y == 1) (X u/ Y) != C --> (X != C) \|\| (Y != 1) https://alive2.llvm.org/ce/z/EhKwh6 There are various potential missing icmp (div) transforms shown here: https://github.com/llvm/llvm-project/issues/55695 This is a generalization for part of the udiv + equality. I didn't check in detail, but some of those may only make sense as codegen transforms. This results in one extra instruction in IR, but it is better for analysis, and looks much better in codegen on all targets that I tried. Differential Revision: https://reviews.llvm.org/D126410	2022-05-26 09:08:47 -04:00
Florian Hahn	f96aa493f0	[SimpleLoopUnswitch] Always skip trivial select and set condition. When updating the branch instruction outside the loopduring non-trivial unswitching, always skip trivial selects and update the condition. Otherwise we might create invalid IR, because the trivial select is inside the loop, while the condition is outside the loop. Fixes #55697.	2022-05-26 09:46:24 +01:00
Florian Hahn	390c0ac28d	[LV] Fix indentation in tryToCreateWidenRecipe (NFC).	2022-05-26 08:53:34 +01:00
Owen Anderson	1e91149844	Replace the custom linked list in LeaderTableEntry with TinyPtrVector. The purpose of the custom linked list was to optimize for the case of a single-element list. It turns out that TinyPtrVector handles the same basic scenario even better, reducing the size of LeaderTableEntry by 33%, and requiring only log2(N) allocations as the size of the list grows. The only downside is that we have to store the Value's and BasicBlock's in separate vectors, which is slightly awkward in a few cases. Fortunately that ends up being entirely encapsulated inside helper functions. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D125205	2022-05-25 23:52:44 -07:00
Serguei Katkov	c2eccc67ce	[GuardWidening] Remove nuw/nsw flags for hoisted instructions When we hoist instructions over guard we must clear flags due to these flags might be implied using this guard, so they make sense only after the guard. As an example of the bug due to current behavior. L is known to be in range say [0, 100) c1 = x u< L guard (c1) x1 = add x, 1 c2 = x1 u< L guard(c2) basing on guard(c1) we can say that x1 = add nuw nsw x, 1 after guard widening we get c1 = x u< L x1 = add nuw nsw x, 1 c2 = x1 u< L c = and c1, c2 guard(c) now, basing on fact that x + 1 < L and x >= 0 due to x + 1 is nuw we can prove that x + 1 u< L implies that x u< L, so we can just remove c1 x1 = add nuw nsw x, 1 c2 = x1 u< L guard(c2) But that is not correct due to we will pass x == -1 value. Reviewed By: mkazantsev Subscribers: llvm-commits, nikic Differential Revision: https://reviews.llvm.org/D126354	2022-05-26 13:20:55 +07:00
serge-sans-paille	fb67d683db	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `7030654296` detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D126417	2022-05-26 08:12:34 +02:00
Chenbing Zheng	1486a9c9fe	[InstCombine] [NFC] refector foldXorOfICmps Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126268	2022-05-26 11:07:18 +08:00
Chenbing Zheng	41aab93afc	[InstCombine] bitcast(logic(bitcast(X), bitcast(Y))) -> bitcast'(logic(bitcast'(X), Y)) This patch break foldBitCastBitwiseLogic limite the destination must have an integer element type, and eliminate one bitcast by doing the logic op in the type of the input that has an integer element type. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D126184	2022-05-26 10:23:44 +08:00
Alexey Bataev	3bf5c2c8ec	[SLP]Do not try to generate ScatterVectorize if it will be scalarized. SLP should build ScatterVectorize nodes only if they actually end up with masked gather rather than with scalarization. In the second scenario better to build a gather node. Differential Revision: https://reviews.llvm.org/D126379	2022-05-25 14:25:07 -07:00
Alexey Bataev	10f41a2147	[SLP]Fix PR55688: Miscompile due to incorrect nuw/nsw handling. Need to use all ReductionOps when propagating flags for the reduction ops, otherwise transformation is not correct. Plus, need to drop nuw/nsw flags. Differential Revision: https://reviews.llvm.org/D126371	2022-05-25 13:59:06 -07:00
David Sherwood	87936c7b13	[LoopVectorize] Fix assertion failure in fixReduction when tail-folding When compiling the attached new test in scalable-reductions-tf.ll we were hitting this assertion in fixReduction: Assertion `isa<PHINode>(U) && "Reduction exit must feed Phi's or select" The loop contains a reduction and an intermediate store of the reduction value. When vectorising with tail-folding the contains of 'U' in the assertion above happened to be a scatter_store. It turns out that we were still creating a widen recipe for the invariant store, despite knowing that we can actually sink it. The simplest fix is to change buildVPlanWithVPRecipes so that we look for invariant stores before attempting to widen it. Differential Revision: https://reviews.llvm.org/D126295	2022-05-25 11:46:32 +01:00
Florian Hahn	c6e45ea074	[VPlan] Exit earlier when trying to widen with scalar VFs. This simplifies the code a bit, suggested in D124718. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D125029	2022-05-25 11:05:23 +01:00
Florian Hahn	1ba42dd04b	[VPlan] Use MapVector for LiveOuts for deterministic iteration. During code-gen, we iterate over the LiveOuts and the differences in iteration order can cause slightly different outputs.	2022-05-25 09:30:02 +01:00
Chenbing Zheng	269e3f7369	[InstCombine] [NFC] Move transforms for truncated shifts into narrowBinOp Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D126056	2022-05-25 10:21:39 +08:00
Martin Sebor	46c0ec9df4	[InstCombine] Fold memrchr calls with sequences of identical bytes. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123631	2022-05-24 17:00:11 -06:00
Vasileios Porpodas	9df0568b07	[SLP] Fix crash caused by reorderBottomToTop(). The crash is caused by incorrect order set by reorderBottomToTop(), which happens when it is reordering a TreeEntry which has a user that has already been reordered earlier. Please see the detailed description in the lit test. Differential Revision: https://reviews.llvm.org/D126099	2022-05-24 12:24:19 -07:00
Sanjay Patel	05527b68a0	[InstCombine] fold more shuffles with FP<->Int cast operands shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask) This extends the transform added with `0353c2c996`. If the shuffle reduces vector length, the transform reduces the width of the cast, so that should be a win for most codegen (if not, it can be inverted).	2022-05-24 15:11:38 -04:00
Nikita Popov	e6e0eb3bc8	[InstCombine] Strip bitcasts in GEP diff fold Bitcasts were stripped in one case, but not the other. Of course, this no longer really matters with opaque pointers, but as I went through the trouble of tracking this down, we may as well remove one typed vs opaque pointer optimization discrepancy.	2022-05-24 16:12:01 +02:00
Nikita Popov	b2a13d3e2d	[InstCombine] Use IRBuilder in freeze pushing transform (PR55619) Use IRBuilder so that the newly created freeze instructions automatically gets inserted back into the IC worklist. The changed worklist processing order leads to some cosmetic differences in tests. Fixes https://github.com/llvm/llvm-project/issues/55619.	2022-05-24 15:48:28 +02:00
Alexey Bataev	f9c806ae5c	[SLP][NFC]Make isFirstInsertElement a weak strict ordering comparator. To be used correctly in a sort-like function, isFirstInsertElement function must follow weak strict ordering rule, i.e. isFirstInsertElement(IE1, IE1) should return false.	2022-05-24 06:02:42 -07:00
Nikita Popov	a7c079aaa2	[InstCombine] Support logical and in masked icmp fold Most of the folds implemented in this function work fine with logical operations. We only need to be careful for the cases that work on non-constant masks, where the RHS operand shouldn't be poison. This is a conservative implementation that bails out of illegal transforms, but we could also change these to insert freeze instead.	2022-05-24 11:16:33 +02:00
Nikita Popov	5abaabed22	[InstCombine] Use m_APInt() in asymmetric masked icmp fold This is mostly intended as code cleanup, but it does also add support for splat vectors to this fold.	2022-05-24 10:57:28 +02:00
Nikita Popov	c0e06c7448	[InstCombine] Handle logical and/or in recursive and/or of icmps fold The and/or of icmps fold is also applied in reassociated form. However, this currently only happens for bitwise and of bitwise and, but not for bitwise and of logical and (or other combinations, but this is the one being addressed here). We can do this for bitwise+logical combinations as well, but need to be a bit careful about which of the resulting ands are logical: https://alive2.llvm.org/ce/z/WYSjGh https://alive2.llvm.org/ce/z/guxYnz https://alive2.llvm.org/ce/z/S5SYxY https://alive2.llvm.org/ce/z/2rAWeW	2022-05-24 10:13:10 +02:00
Nikita Popov	81c648a3d9	[LoopUnroll] Freeze tripcount rather than condition This is a followup to D125754. We introduce two branches, one before the unrolled loop and one before the epilogue (and similar for the prologue case). The previous patch only froze the condition on the first branch. Rather than independently freezing the second condition, this patch instead freezes TripCount and bases BECount on it. These are the two quantities involved in the conditions, and this ensures that both work on a consistent, non-poisonous trip count. Differential Revision: https://reviews.llvm.org/D125896	2022-05-24 09:42:39 +02:00
Hendrik Greving	4f93d5cc1d	[BasicBlockUtils] Do not move loop metadata if outer loop header. Fixes a bug preventing moving the loop's metadata to an outer loop's header, which happens if the loop's exit is also the header of an outer loop. Adjusts test for above. Fixes #55416. Differential Revision: https://reviews.llvm.org/D125574	2022-05-23 16:39:54 -07:00
Alexey Bataev	319a722f6f	[SLP][NFC]Improve compile time, NFC. Builds UserIgnore list only once as a SmallDenseSet without rebuilding it between the runs, iterate over gathers instead list of reduction ops, do some checks in the buildTree_rec only if the corresponding containers are not empty.	2022-05-23 12:15:27 -07:00
Sanjay Patel	e8c20d995b	[IR] add and use pattern match specialization for sqrt intrinsic; NFC This was included in D126190 originally, but it's independent and a useful change for readability.	2022-05-23 14:16:30 -04:00
Benjamin Kramer	2f2ca30d0a	Fix an unused variable warning in no-asserts build mode	2022-05-23 19:53:40 +02:00
Nikita Popov	f45c1e436e	[InstCombine] Change operand order in recursive and/or of icmps fold The order obviously doesn't matter for bitwise and/or, but would matter for logical and/or, so change it to preserve the original order.	2022-05-23 17:29:33 +02:00
Jingu Kang	bb82f74612	Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"" This reverts commit `42ebfa8269`. The commmit from https://reviews.llvm.org/D125918 has fixed the stage 2 build failure. Differential Revision: https://reviews.llvm.org/D118979	2022-05-23 16:15:45 +01:00
Alexey Bataev	2ac5ebedea	[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly. SLP vectorizer emits extracts for externally used vectorized scalars and estimates the cost for each such extract. But in many cases these scalars are input for insertelement instructions, forming buildvector, and instead of extractelement/insertelement pair we can emit/cost estimate shuffle(s) cost and generate series of shuffles, which can be further optimized. Tested using test-suite (+SPEC2017), the tests passed, SLP was able to generate/vectorize more instructions in many cases and it allowed to reduce number of re-vectorization attempts (where we could try to vectorize buildector insertelements again and again). Differential Revision: https://reviews.llvm.org/D107966	2022-05-23 07:06:45 -07:00
Sanjay Patel	1ebad988b1	[InstCombine] fold icmp of zext bool based on limited range X <u (zext i1 Y) --> (X == 0) && Y https://alive2.llvm.org/ce/z/avQDRY This is a generalization of `4069cccf3b` based on the post-commit suggestion. This also adds the i1 type check and tests that were missing from the earlier attempt; that commit caused several bot fails and was reverted. Differential Revision: https://reviews.llvm.org/D126171	2022-05-23 09:59:21 -04:00
Nikita Popov	45226d04f0	[InstCombine] Reuse icmp of and/or folds for logical and/or Similarly to a change recently done for fcmps, add a flag that indicates whether the and/or is logical to foldAndOrOfICmps, and reuse the function when folding logical and/or. We were already calling some parts of it, but this gives us a clearer indication of which parts may need poison-safe variants, and would also allow to fold combinations of bitwise and logical and/or. This change should be close to NFC, because all folds this enables were either already called previously, or can make use of implied poison reasoning.	2022-05-23 15:37:07 +02:00
Peter Waller	ade47bdc31	[LV] Improve register pressure estimate at high VFs Previously, `getRegUsageForType` was implemented using `getTypeLegalizationCost`. `getRegUsageForType` is used by the loop vectorizer to estimate the register pressure caused by using a vector type. However, `getTypeLegalizationCost` currently only appears to understand splitting and not scalarization, so significantly underestimates the register requirements. Instead, use `getNumRegisters`, which understands when scalarization can occur (via computeRegisterProperties). This was discovered while investigating D118979 (Set maximum VF with shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the loop vectorizer previously ends up costing an v128i1 as 2 v64i* registers where it actually occupies 128 i32 registers. I'm sending this patch early for comment, I'm still doing some sanity checking with LNT. I note that getRegisterClassForType appears to return VectorRC even though the type in question (large vNi1 types) end up occupying scalar registers. That might be worth fixing too. Differential Revision: https://reviews.llvm.org/D125918	2022-05-23 07:57:45 +00:00
Florian Hahn	145fe57106	[LV] Use exiting block instead of latch in addUsersInExitBlock. The latch may not be the exiting block. Use the exiting block instead when looking up the incoming value of the LCSSA phi node. This fixes a crash with early-exit loops.	2022-05-22 18:27:41 +01:00
Sanjay Patel	cba0ebd576	Revert "[InstCombine] fold icmp with sub and bool" This reverts commit `4069cccf3b`. This causes bot failures, and there's a possibly a better way to get this and other patterns.	2022-05-22 12:13:20 -04:00
Sanjay Patel	4069cccf3b	[InstCombine] fold icmp with sub and bool This is the specific pattern seen in #53432, but it can be extended in multiple ways: 1. The 'zext' could be an 'and' 2. The 'sub' could be some other binop with a similar ==0 property (udiv). There might be some way to generalize using knownbits, but that would require checking that the 'bool' value is created with some instruction that can be replaced with new icmp+logic. https://alive2.llvm.org/ce/z/-KCfpa	2022-05-22 11:51:07 -04:00
Florian Hahn	97590baead	[LV] Widen ptr-inductions with scalar uses for scalable VFs. Current codegen only supports scalarization of pointer inductions for scalable VFs if they are uniform. After `3bebec659` we now may enter the scalarization code path in VPWidenPointerInductionRecipe::execute for scalable vectors. Fall back to widening for scalable vectors if necessary. This should fix a build failure when bootstrapping LLVM with SVE, e.g. https://lab.llvm.org/buildbot/#/builders/176/builds/1723	2022-05-22 16:24:13 +01:00
Florian Hahn	aeb19817d6	Revert "[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly." This reverts commit `fc9c59c355`. The patch triggers an assertion when building SPEC on X86. Reduced reproducer shared at D107966. Also reverts follow-up commit `11a09af76d`.	2022-05-21 21:00:01 +01:00
Florian Hahn	3bebec6592	[VPlan] Model first exit values using VPLiveOut. This patch introduces a new VPLiveOut subclass of VPUser to model exit values explicitly. The initial version handles exit values that are neither part of induction or reduction chains nor first order recurrence phis. Fixes #51366, #54867, #55167, #55459 Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123537	2022-05-21 16:01:38 +01:00
Nikita Popov	6f0ca6fd23	[JumpThreading] Insert freeze when unfolding select JumpThreading may convert selects into branch instructions, in which case the condition needs to be frozen (as branch on poison is immediate undefined behavior, unlike select on poison). The necessary code for this is already in place, this just enables the option. Differential Revision: https://reviews.llvm.org/D125869	2022-05-21 11:24:27 +02:00
Dmitri Gribenko	11a09af76d	Fix an unused variable warning in no-asserts build mode	2022-05-20 17:11:58 +02:00
Sanjay Patel	f0071d43e4	[InstCombine] add use check to fold of bitwise logic with cast ops This was shown as a potential regression in D126040.	2022-05-20 09:08:53 -04:00
Alexey Bataev	fc9c59c355	[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly. SLP vectorizer emits extracts for externally used vectorized scalars and estimates the cost for each such extract. But in many cases these scalars are input for insertelement instructions, forming buildvector, and instead of extractelement/insertelement pair we can emit/cost estimate shuffle(s) cost and generate series of shuffles, which can be further optimized. Tested using test-suite (+SPEC2017), the tests passed, SLP was able to generate/vectorize more instructions in many cases and it allowed to reduce number of re-vectorization attempts (where we could try to vectorize buildector insertelements again and again). Differential Revision: https://reviews.llvm.org/D107966	2022-05-20 05:58:09 -07:00
Alexey Bataev	4e271fc495	[SLP][NFC]Use SmallPtrSet to avoid n*m complexity, NFC.	2022-05-20 05:56:43 -07:00
Florian Hahn	cd61d4bd2f	[LV] Do not LoopSimplify/LCSSA after generating main vector loop. At the moment LV runs LoopSimplify and reconstructs LCSSA form after generating the main vector loop and before generating the epilogue vector loop. In practice, this adds a new exit block for the scalar loop because the middle block now also branches to the original exit block of the scalar loop. It also requires adding a new LCSSA phi in the newly created exit block. This complicates things when modeling exit values in VPlan, because we would need to update the VPlan for the epilogue loop to update the newly created LCSSA phi node. But none of that should be necessary, as all analysis requiring loop-simplify form is already done at this point and LCSSA form of the original loop is not broken. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D125810	2022-05-20 09:58:40 +01:00
Chenbing Zheng	cf348f6a2c	[InstCombine] [NFC] Use a pattern matcher for ExtractElementInst Reviewed By: RKSimon, rampitec Differential Revision: https://reviews.llvm.org/D125857	2022-05-20 10:31:40 +08:00
Nicolas Capens	c153c61fad	Handle instrumentation of scalar single-precision (_ss) intrinsics Instrumentation of scalar double-precision intrinsics such as x86_sse41_round_sd was already handled by https://reviews.llvm.org/D82398, but not their single-precision counterparts. https://issuetracker.google.com/172238865 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D124871	2022-05-19 13:56:51 -07:00
Florian Hahn	c90235f0ef	[LV] Drop wrap flags for reductions using VP def-use chain. Update clearReductionWrapFlags to use the VPlan def-use chain from the reduction phi recipe to drop reduction wrap flags. This addresses an existing FIXME and fixes a crash when instructions in the reduction chain are not used and have been removed before VPlan codegeneration. Fixes #55540.	2022-05-19 20:36:46 +01:00
Nuno Lopes	5fc9449c96	[DeadArgElim] Use poison instead of undef as placeholder for dead arguments It doesn't matter which value we use for dead args, so let's switch to poison, so we can eventually kill undef. Reviewed By: aeubanks, fhahn Differential Revision: https://reviews.llvm.org/D125983	2022-05-19 18:00:24 +01:00
Florian Hahn	32d6ef36d6	[SimpleLoopUnswitch] Skip trivial selects during trivial unswitching. Update the remaining places in unswitchTrivialBranch to properly skip trivial selects. Fixes #55526.	2022-05-19 17:01:13 +01:00
Tiehu Zhang	3ed9f603fd	[LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold The runtime check threshold should also restrict interleave count. Otherwise, too many runtime checks will be generated for some cases. Reviewed By: fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D122126	2022-05-19 23:29:00 +08:00
Florian Hahn	df56fb44f5	[VPlan] Update VPWidenMemoryInstruction to not inherit from VPValue. VPWidenMemoryInstruction also models stores which may not produce a value. This can trip over analyses. Improve the modeling by only adding VPValues for VPWidenMemoryInstructionRecipes modeling loads.	2022-05-19 16:24:58 +01:00
Jay Foad	6bec3e9303	[APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf Most clients only used these methods because they wanted to be able to extend or truncate to the same bit width (which is a no-op). Now that the standard zext, sext and trunc allow this, there is no reason to use the OrSelf versions. The OrSelf versions additionally have the strange behaviour of allowing extending to a smaller width, or truncating to a larger width, which are also treated as no-ops. A small amount of client code relied on this (ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and needed rewriting. Differential Revision: https://reviews.llvm.org/D125557	2022-05-19 11:23:13 +01:00
lizhijin	90ea81fcb2	[LV] Widen freeze instead of scalarizing it This patch changes the strategy for vectorizing freeze instrucion, from replicating multiple times to widening according to selected VF. Fixes #54992 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D125016	2022-05-19 12:28:01 +08:00
Chenbing Zheng	ffaaf2498b	[InstCombine] (rot X, ?) == 0/-1 --> X == 0/-1 In this patch we add a function foldICmpInstWithConstantAllowUndef to fold integer comparisons with a constant operand: icmp Pred X, C where X is some kind of instruction and C is AllowUndef. We move this fold to the new function, so that it can solve undef elts in a vector. Reviewed By: spatel, RKSimon Differential Revision: https://reviews.llvm.org/D125220	2022-05-19 11:22:26 +08:00
Chenbing Zheng	51df77f36d	[InstCombine] Allow undef vectors when foldSelectToCopysign Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D125671	2022-05-19 10:57:49 +08:00
Alexey Bataev	7d8060bc19	[SLP]Improve reductions vectorization. The pattern matching and vectgorization for reductions was not very effective. Some of of the possible reduction values were marked as external arguments, SLP could not find some reduction patterns because of too early attempt to vectorize pair of binops arguments, the cost of consts reductions was not correct. Patch addresses these issues and improves the analysis/cost estimation and vectorization of the reductions. The most significant changes in SLP.NumVectorInstructions: Metric: SLP.NumVectorInstructions [140/14396] Program results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 920.00 3548.00 285.7% test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test 66.00 122.00 84.8% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 100.00 128.00 28.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 664.00 810.00 22.0% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 592.00 687.00 16.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 402.00 426.00 6.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1665.00 1745.00 4.8% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 135.00 139.00 3.0% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 135.00 139.00 3.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 388.00 397.00 2.3% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 895.00 914.00 2.1% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 240.00 244.00 1.7% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 240.00 244.00 1.7% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 820.00 832.00 1.5% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 820.00 832.00 1.5% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 14804.00 14914.00 0.7% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 8125.00 8183.00 0.7% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1330.00 1338.00 0.6% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1330.00 1338.00 0.6% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9832.00 9880.00 0.5% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 5267.00 5291.00 0.5% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 4018.00 4024.00 0.1% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 4018.00 4024.00 0.1% test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test 426.00 424.00 -0.5% test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test 426.00 424.00 -0.5% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 201.00 192.00 -4.5% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 201.00 192.00 -4.5% 644.nab_s and 544.nab_r - reduced number of shuffles but increased number of useful vectorized instructions. 641.leela_s and 541.leela_r - the function `@_ZN9FastBoard25get_pattern3_augment_specEiib` is not inlined anymore but its body gets vectorized successfully. Before, the function was inlined twice and vectorized just after inlining, currently it is not required. The vector code looks pretty similar, just like as it was before. Differential Revision: https://reviews.llvm.org/D111574	2022-05-18 13:22:18 -07:00
Sanjay Patel	ebbc37391f	[InstCombine] allow variable shift amount in bswap + shift fold When shifting by a byte-multiple: bswap (shl X, Y) --> lshr (bswap X), Y bswap (lshr X, Y) --> shl (bswap X), Y This was limited to constants as a first step in D122010 / `60820e53ec` , but issue #55327 shows a source example (and there's a test based on that here) where a variable shift amount is used in this pattern.	2022-05-18 14:38:16 -04:00
NAKAMURA Takumi	6ca7eb2c6d	[SCEV] Part 1, Serialize function calls in function arguments. Evaluation odering in function call arguments is implementation-dependent. In fact, gcc evaluates bottom-top and clang does top-bottom. Fixes #55283 partially. Part of https://reviews.llvm.org/D125627	2022-05-18 23:20:08 +09:00
Sanjay Patel	990cc49ca0	[InstCombine] avoid crash on fold of icmp with cast operand We could do better by inserting a bitcast from scalar int to vector int or using an insertelement (the alternate test does not crash because there's an independent fold like that). But this doesn't seem like a likely pattern, so just bail out for now. Fixes issue #55516.	2022-05-18 09:16:30 -04:00
Sanjay Patel	be6d7cc93c	[InstCombine] reduce code duplication for checking types; NFC	2022-05-18 09:16:30 -04:00
Nikita Popov	c9e7049754	[JumpThreading] Look through freeze in getPredicateAt() fold This code is valid for any icmp, so we can safely look through a freeze when trying to find one. A caveat here is that replaceFoldableUses() may not end up replacing any uses in this case. It might make sense to use the freeze as the context instruction (rather than the terminator) if there is a freeze, to ensure that it always gets folded. This would require some changes to how replaceFoldedUses() works though, as it currently assumes that the value is valid at the end of the block.	2022-05-18 12:09:59 +02:00
Sun Ziping	242961f23b	[llvm][fix-irreducible] ensure that loop subtree under child is correctly reconnected to new loop The modified function was incorrectly (not unnecessarily) ignoring grandchild loops, and this change fixes the bug. In particular, this fixes the handling of the loop { inner, body }. The TODO in the same function is talking about the b1 self loop, which may be "unnecessarily" lost, but that is a different issue.	2022-05-18 10:45:52 +01:00
Nikita Popov	18c70a7bd9	[JumpThreading] Simplify getPredicateAt() based folding It's sufficient to just fold the icmp to true/false here, and then let constant terminator folding take care of the rest. It should be noted that while replaceFoldableUses() may not replace all uses of the icmp, at least the use in the terminator we're working on is always replaceable, so terminator constant folding should be reliably enabled as a subsequent step.	2022-05-18 11:24:52 +02:00
Nikita Popov	d4cdf013c7	[JumpThreading] Use common code to skip freeze (NFC) There are multiple places that want to look through freeze, so store condition without freeze in a separate variable.	2022-05-18 10:49:41 +02:00
Florian Hahn	fcfb86483b	[LV] set Header earlier, use variable instead of repeated access (NFC).	2022-05-18 09:29:59 +01:00
Nikita Popov	e9a1c82d69	[SCEVExpander] Expand umin_seq using freeze %x umin_seq %y is currently expanded to %x == 0 ? 0 : umin(%x, %y). This patch changes the expansion to umin(%x, freeze %y) instead (https://alive2.llvm.org/ce/z/wujUhp). The motivation for this change are the test cases affected by D124910, where the freeze expansion ultimately produces better optimization results. This is largely because `(%x umin_seq %y) == %x` is a common expansion pattern, which reliably optimizes in freeze representation, but only sometimes with the zero comparison (in particular, if %x == 0 can fold to something else, we generally won't be able to cover reasonable code from this.) Differential Revision: https://reviews.llvm.org/D125372	2022-05-18 09:53:07 +02:00
Nikita Popov	323514de58	[LoopUnroll] Avoid branch on poison for runtime unroll with multiple exits When performing runtime unrolling with multiple exits, one of the earlier (non-latch) exits may exit the loop on the first iteration, such that we never branch on the latch exit condition. As such, we need to freeze the condition of the new branch that is introduced before the loop, as it now executes unconditionally. Differential Revision: https://reviews.llvm.org/D125754	2022-05-18 09:51:22 +02:00
Juneyoung Lee	3adcf96b4f	[JumpThreading] Let ProcessImpliedCondition look into freeze instructions This patch makes JumpThreading's ProcessImpliedCondition deal with frozen conditions. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84941	2022-05-18 10:41:31 +09:00
Sanjay Patel	dbf3b5f114	[InstCombine] fold more shuffles with FP<->Int cast operands shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask) This extends the transform added with `0353c2c996`. If the casts are to a larger element type, the transform reduces shuffle bit width, so that should be a win for most codegen (if not, it can be inverted).	2022-05-17 14:25:11 -04:00
Sanjay Patel	f31d39c42c	[InstCombine] remove cast-of-signbit to shift transform The transform was wrong in 3 ways: 1. It created an extra instruction when the source and dest types don't match. 2. It did not account for an extra use of the icmp, so could create 2 extra insts. 3. It favored bit hacks over icmp (icmp generally has better analysis). This fixes #54692 (modeled by the PhaseOrdering tests). This is a minimal step to fix the bug, but we should likely invert this and the sibling transform for the "is negative" pattern too. The backend should be able to invert this back to a shift if that leads to better codegen. This is a reduced try of `3794cc0e99` - that was reverted because it could cause infinite loops by conflicting with the related transforms in this block that create shifts.	2022-05-17 11:10:28 -04:00
Florian Hahn	5b00d13c00	[LV] Fetch vector loop region once and remember it (NFC). This avoids an unnecessary lookup and makes the code slightly more compact.	2022-05-17 15:57:23 +01:00
Alexey Bataev	b0f0313feb	[SLP]Add an extra check for select minmax reduction to avoid crash. Need to check if the reduction is still (not)cmp-select pattern min/max reduction to avoid compiler crash during building list of reduction operations. cmp-sel pattern provides 2 reduction operations, while intrinsics - just one.	2022-05-17 06:05:52 -07:00
Florian Hahn	c1a9d14982	[VPlan] Move usesScalars/onlyFirstLaneUsed to VPUser. Those helpers model properties of a user and they should also be available to non-recipe users. This will be used in D123537 for a new exit value user. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D124936	2022-05-17 11:20:06 +01:00
Nikita Popov	9ba452b08e	[JumpThreading] Don't pass DT to isGuaranteedNotToBeUndefOrPoison() JumpThreading intentionally does not force updating of the DT during optimization, because this may be expensive when many CFG updates and DT calculations are interleaved. We shouldn't be fetching the DT just for the purpose of calling isGuaranteedNotToBeUndefOrPoison(), especially as DT availability doesn't even show benefit in tests.	2022-05-17 11:53:49 +02:00
Dmitry Vassiliev	7759680e2f	[SROA] Avoid postponing rewriting load/store by ignoring lifetime intrinsics in partition's promotability checking This patch fixes a bug that generates unnecessary packing/unpacking structure code because of incorrectly handling lifetime intrinsic. For example, a partition of an alloca may contain many slices: ``` Partition [0, 4): Slice0: [0, 4) used by: load i32 addr; Slice1: [0, 4) used by: store i32 v, addr; Slice2: [0, 16) used by lifetime.start(16, addr); ``` When SROA determines if the partition can be promoted, lifetime.start is currently treated as a whole alloca load/store, so Slice0 and Slice1 cannot be promoted at this attempt, but the packing/unpacking code for Slice0 and Slice1 has been generated. After rewrite lifetime.start/end intrinsic, SROA tries again with Slice0 and Slice1 and finally promotes them, but redundant packing/unpacking code remaining in the IRs. This patch changes promotability checking to ignore lifetime intrinsic (they will be rewritten to correct sizes later), so we can promote the real users (load/store) at the first attempt with optimal code. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D124967	2022-05-17 11:25:59 +02:00
Nikita Popov	a694546f7c	[KnownBits] Add operator== Checking whether two KnownBits are the same is somewhat common, mainly in test code. I don't think there is a lot of room for confusion with "determine what the KnownBits for an icmp eq would be", as that has a different result type (this is what the eq() method implements, which returns Optional<bool>). Differential Revision: https://reviews.llvm.org/D125692	2022-05-17 09:38:13 +02:00
Sanjay Patel	07d549bce9	Revert "[InstCombine] invert canonicalization for cast of signbit test" This reverts commit `3794cc0e99`. This change is suspected of causing bots to hang at stage 2 compiles, so reverting to confirm and investigate.	2022-05-16 17:47:02 -04:00
Ellis Hoag	9a90ea1fdc	[InstrProf] Fix promoter when using counter relocations When using counter relocations, two instructions are emitted to compute the address of the counter variable. ``` %BiasAdd = add i64 ptrtoint <__profc_>, <__llvm_profile_counter_bias> %Addr = inttoptr i64 %BiasAdd to i64* ``` When promoting a counter, these instructions might not be available in the block, so we need to copy these instructions. This fixes https://github.com/llvm/llvm-project/issues/55125 Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D125710	2022-05-16 14:32:39 -07:00
Sanjay Patel	3794cc0e99	[InstCombine] invert canonicalization for cast of signbit test The existing transform was wrong in 3 ways: 1. It created an extra instruction when the source and dest types don't match. 2. It did not account for an extra use of the icmp, so could create 2 extra insts. 3. It favored bit hacks over icmp (icmp generally has better analysis). This fixes #54692 (modeled by the PhaseOrdering tests). This is a minimal step to fix the bug, but we should likely invert the sibling transform for the "is negative" pattern too. The backend should be able to invert this back to a shift if that leads to better codegen.	2022-05-16 12:55:52 -04:00
Ellis Hoag	6e23cd2bf0	[InstrProf][NFC] Save profile bias to function map Add a map from functions to load instructions that compute the profile bias. Previously we assumed that if the first instruction in the function was a load instruction, then it must be computing the bias. This was likely to work out because functions usually start with the `llvm.instrprof.increment` instruction, but optimizations could change this. For example, inlining into a non-profiled function. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D114319	2022-05-16 08:32:31 -07:00
Sanjay Patel	be7f09f7b2	[IR] create and use helper functions that test the signbit; NFCI	2022-05-16 11:26:23 -04:00
Alexey Bataev	152072801e	[SLP]Check if the root of the buildvector has one use only. The root of the buildvector can have only one use, otherwise it can be treated only as a final element of the previous buildvector sequence.	2022-05-16 07:30:36 -07:00
Florian Hahn	b7315ffc3c	[LAA,LV] Add initial support for pointer-diff memory checks. This patch adds initial support for a pointer diff based runtime check scheme for vectorization. This scheme requires fewer computations and checks than the existing full overlap checking, if it is applicable. The main idea is to only check if source and sink of a dependency are far enough apart so the accesses won't overlap in the vector loop. To do so, it is sufficient to compute the difference and compare it to the `VF * UF * AccessSize`. It is sufficient to check `(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards dependence in the vector loop with the given VF and UF. If Src >=u Sink, there is not dependence preventing vectorization, hence the overflow should not matter and using the ULT should be sufficient. Note that the initial version is restricted in multiple ways: 1. Pointers must only either be read or written, by a single instruction (this allows re-constructing source/sink for dependences with the available information) 2. Source and sink pointers must be add-recs, with matching steps 3. The step must be a constant. 3. abs(step) == AccessSize. Most of those restrictions can be relaxed in the future. See https://github.com/llvm/llvm-project/issues/53590. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D119078	2022-05-16 15:27:22 +01:00
Biplob Mishra	ec4adf1f6c	[InstCombine] Combine instructions of type or/and where AND masks can be combined. The patch simplifies some of the patterns as below (A \| (B & C0)) \| (B & C1) -> A \| (B & C0\|C1) ((B & C0) \| A) \| (B & C1) -> (B & C0\|C1) \| A In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns. Differential Revision: https://reviews.llvm.org/D124119	2022-05-16 12:43:33 +01:00
Nikita Popov	7ba484660b	[ControlHeightReduction] Freeze condition when converting select to branch While select conditions can be poison, branch on poison is immediate UB. As such, we need to freeze the condition when converting a select into a branch. Differential Revision: https://reviews.llvm.org/D125398	2022-05-16 10:37:26 +02:00
David Sherwood	befc952045	[LoopVectorize] Permit tail-folding for low trip counts using scalable vectors When the loop vectoriser encounters a known low trip count it tries to create a single predicated loop in order to get the benefit of vectorisation and eliminate the scalar tail. However, until now the vectoriser prevented the use of scalable vectors in this case due to concerns in the past about stability. I believe that tail-folded loops using scalable vectors are now sufficiently well tested that we can enable this. For the same reason I've also enabled it when optimising for code size too. Tests added here: Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll Transforms/LoopVectorize/RISCV/low-trip-count.ll Differential Revision: https://reviews.llvm.org/D121595	2022-05-16 09:14:24 +01:00
Florian Hahn	8b7c3d2179	[LV] Set SCEVCheckCond to nullptr whenever it was used. Under some circumstances, SCEVExpander will insert new instructions when expanding a predicate, but the final result of the expansion can be a false constant. In those cases, the expanded instructions may later be used by other expansions, e.g. the trip count. This may trigger an assertion during SCEVExpander cleanup. To avoid this, always mark the result as used. Fixes #55100.	2022-05-15 21:52:07 +01:00
Craig Topper	b3097eb6cd	[SLP] Fix misspelling of 'analyzed'. NFC	2022-05-15 10:30:24 -07:00
Florian Hahn	39552964e1	[VPlan] Improve printing of VPReplicateRecipe with calls. Suggested as part of D124718.	2022-05-15 15:51:26 +01:00
Wende Tan	59afc4038b	[LowerTypeTests][clang] Implement and allow -fsanitize=cfi-icall for RISCV Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D106888	2022-05-14 18:05:06 -07:00
Fangrui Song	60e5fd00cd	[RS4GC] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build after D125000	2022-05-14 10:47:50 -07:00
Chenbing Zheng	acbad5086a	[InstCombine] [NFC] separate a function foldICmpBinOpWithConstant There is a long function foldICmpInstWithConstant, we can separate a function foldICmpBinOpWithConstant from it. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D125457	2022-05-14 10:54:15 +08:00
Alexander Shaposhnikov	badd088c57	[GlobalOpt] Enable optimization of constructors with different priorities Adjust `optimizeGlobalCtorsList` to handle the case of different priorities. This addresses the issue https://github.com/llvm/llvm-project/issues/55083. Test plan: ninja check-all Differential revision: https://reviews.llvm.org/D125278	2022-05-13 22:19:29 +00:00
Alexey Bataev	8b8281f354	[SLP]Do not vectorize non-profitable alternate nodes. If alternate node has only 2 instructions and the tree is already big enough, better to skip the vectorization of such nodes, they are not very profitable (the resulting code cotains 3 instructions instead of original 2 scalars). SLP can try to vectorize the buildvector sequence in the next attempt, if it is profitable. Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test 72.00 73.00 1.4% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 1186.00 1198.00 1.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 241.00 242.00 0.4% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 2131.00 2139.00 0.4% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 6377.00 6384.00 0.1% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 6377.00 6384.00 0.1% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 12650.00 12658.00 0.1% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26169.00 26147.00 -0.1% test-suite :: MultiSource/Benchmarks/Trimaran/enc-3des/enc-3des.test 99.00 86.00 -13.1% Gains: 526.blender_r - more vectorized trees. enc-3des - same. Others: 510.parest_r - no changes. miniFE - same 623.xalancbmk_s - some (non-profitable) parts of the trees are not vectorized. 523.xalancbmk_r - same lencod - same timberwolfmc - same miniAMR - same Differential Revision: https://reviews.llvm.org/D125571	2022-05-13 14:28:54 -07:00
Alexey Bataev	85f6b15ee5	[SLP]Do not look for buildvector sequence, if the index is reused. If the insert indes was used already or is not constant, we should stop looking for unique buildvector sequence, it mustbe splitted to 2 different buildvectors.	2022-05-13 13:56:02 -07:00
Nikita Popov	afc21c7e79	[ControlHeightReduction] Simplify addToMergedCondition() (NFC)	2022-05-13 15:30:09 +02:00
David Sherwood	92c645b5c1	[LoopVectorize] Add overflow checks when tail-folding with scalable vectors In InnerLoopVectorizer::getOrCreateVectorTripCount there is an assert that the known minimum value for the VF is a power of 2 when tail-folding is enabled. However, for scalable vectors the value of vscale may not be a power of 2, which means we have to worry about the possibility of overflow. I have solved this problem by adding preheader checks that prevent us from entering the vector body if the canonical IV would overflow, i.e. if ((IntMax - TripCount) < (VF * UF)) ... skip vector loop ... Differential Revision: https://reviews.llvm.org/D125235	2022-05-13 14:09:43 +01:00
Nikita Popov	ed1cb01baf	[IRBuilder] Add IsInBounds parameter to CreateGEP() We commonly want to create either an inbounds or non-inbounds GEP based on a boolean value, e.g. when preserving inbounds from existing GEPs. Directly accept such a boolean in the API, rather than requiring a ternary between CreateGEP and CreateInBoundsGEP. This change is not entirely NFC, because we now preserve an inbounds flag in a constant expression edge-case in InstCombine.	2022-05-13 14:30:55 +02:00
Florian Hahn	8e6d481f3b	[ConstraintElimination] Simplify ssub(A,B) if B s>=b && B s>=0. A first patch to use the reasoning in ConstraintElimination to simplify sub with overflow to a regular sub, if the operation is guaranteed to not overflow. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D125264	2022-05-13 13:19:41 +01:00
Nikita Popov	d9ad6a2c8b	[InstCombine] Fix unused variable warning (NFC)	2022-05-13 12:43:21 +02:00
Dmitry Makogon	1da42c9f71	[RS4GC] Cache BDVs and bases alogn with IsKnownBase flag (NFC) This refactors RS4GC to cache results returned findBaseDefiningValue and also gets rid of BaseDefiningValueResult by caching the IsKnownBase flag for BDVs and bases. Differential Revision: https://reviews.llvm.org/D125000	2022-05-13 14:14:17 +07:00
Chenbing Zheng	2a0837aab1	[InstCombine] fix sub(add(X,Y),umin(Y,Z)) --> add(X,usub.sat(Y,Z)) This patch fix bug left in D124503. We should do sub(add(X,Z),umin(Y,Z)) --> add(X,usub.sat(Z,Y)) instead of sub(add(X,Z),umin(Y,Z)) --> add(X,usub.sat(Y,Z)). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D125352	2022-05-13 09:54:10 +08:00
Sanjay Patel	2fa8fc3d0a	[InstCombine] freeze operand in div+mul fold As discussed in issue #37809, this transform is not safe if the input is an undefined value. This is similar to recent changes for urem and sdiv: `d428f09b2c` `99ef341ce9` There is no difference in codegen on the basic examples, but this could lead to regressions. We may need to improve freeze analysis or lowering if that happens. Presumably, in real cases that are similar to the tests where a subsequent transform removes the rem, we will also be able to remove the freeze by seeing that the parameter has 'noundef'.	2022-05-12 13:49:29 -04:00
Quentin Colombet	9766fed9c1	[DeadArgElim] Re-apply: Set unused arguments for internal functions The re-apply includes fixes to clang tests that were missed in the original commit. Original message: Prior to this patch we would only set to undef the unused arguments of the external functions. The rationale was that unused arguments of internal functions wouldn't need to be turned into undef arguments because they should have been simply eliminated by the time we reach that code. This is actually not true because there are plenty of cases where we can't remove unused arguments. For instance, if the internal function is used in an indirect call, it may not be possible to change the function signature. Yet, for statically known call-sites we would still like to mark the unused arguments as undef. This patch enables the "set undef arguments" optimization on internal functions when we encounter cases where internal functions cannot be optimized. I.e., whenever an internal function is marked "live". Differential Revision: https://reviews.llvm.org/D124699	2022-05-12 08:46:16 -07:00
Pavel Samolysov	098afdb0a0	[ArgPromotion] Make a non-byval promotion attempt first It makes sense to make a non-byval promotion attempt first and then fall back to the byval one. The non-byval ('usual') promotion is generally better, for example it does promotion even when a structure has more elements than 'MaxElements' but not all of them are actually used in the function. Differential Revision: https://reviews.llvm.org/D124514	2022-05-12 16:44:52 +02:00
Vasileios Porpodas	0950d4060c	Recommit "[SLP] Make reordering aware of external vectorizable scalar stores." This reverts commit `c2a7904aba`. Original code review: https://reviews.llvm.org/D125111	2022-05-11 16:47:29 -07:00
Arthur Eubanks	c2a7904aba	Revert "[SLP] Make reordering aware of external vectorizable scalar stores." This reverts commit `71bcead98b`. Causes crashes, see comments in D125111.	2022-05-11 15:28:00 -07:00
Sanjay Patel	99ef341ce9	[InstCombine] freeze operand in sdiv expansion As discussed in issue #37809, this transform is not safe if the input is an undefined value. This is similar to a recent change for urem: `d428f09b2c` There is no difference in codegen on the basic examples, but this could lead to regressions. We may need to improve freeze analysis or lowering if that happens. Presumably, in real cases that are similar to the tests where a subsequent transform removes the select, we will also be able to remove the freeze by seeing that the parameter has 'noundef'.	2022-05-11 14:01:28 -04:00
Sanjay Patel	d428f09b2c	[InstCombine] freeze operand in urem expansion As discussed in issue #37809, this transform is not safe if the input is an undefined value. There is no difference in codegen on the basic examples, but this could lead to regressions. We may need to improve freeze analysis or lowering if that happens.	2022-05-11 12:47:26 -04:00
Nikita Popov	6001bfcedc	[InstCombine] Freeze other uses of frozen value If there is a freeze %x, we currently replace all other uses of %x with freeze %x -- as long as they are dominated by the freeze instruction. This patch extends this behavior to cases where we did not originally dominate the use by moving the freeze instruction directly after the definition of the frozen value. The motivation can be seen in test @combine_and_after_freezing_uses: Canonicalizing everything to freeze %x allows folds that are based on value identity (i.e. same operand occurring in two places) to trigger. This also covers the case from D125248. Differential Revision: https://reviews.llvm.org/D125321	2022-05-11 16:47:12 +02:00
Alexey Bataev	f5d45d70a5	[SLP]Further improvement of the cost model for scalars used in buildvectors. Further improvement of the cost model for the scalars used in buildvectors sequences. The main functionality is outlined into a separate function. The cost is calculated in the following way: 1. If the Base vector is not undef vector, resizing the very first mask to have common VF and perform action for 2 input vectors (including non-undef Base). Other shuffle masks are combined with the resulting after the 1 stage and processed as a shuffle of 2 elements. 2. If the Base is undef vector and have only 1 shuffle mask, perform the action only for 1 vector with the given mask, if it is not the identity mask. 3. If > 2 masks are used, perform serie of shuffle actions for 2 vectors, combing the masks properly between the steps. The original implementation misses the very first analysis for the Base vector, so the cost might too optimistic in some cases. But it improves the cost for the insertelements which are part of the current SLP graph. Part of D107966. Differential Revision: https://reviews.llvm.org/D115750	2022-05-11 06:08:55 -07:00
Florian Hahn	635b752211	[VPlan] VPInterleaveRecipe only uses first lane if op not stored. With opaque pointers, both the stored value and the address can be the same. Only consider the recipe using the first lane only if the address is not stored. Fixes #55375.	2022-05-11 11:24:56 +01:00
Nikita Popov	c1bb4a881e	[SCEVExpander] Deduplicate min/max expansion code (NFC)	2022-05-11 12:11:11 +02:00
Alexander Shaposhnikov	da823382d2	[Transform][Utils][NFC] Clean up CtorUtils.cpp	2022-05-11 01:07:54 +00:00
Nick Desaulniers	c167c0a4dc	[BuildLibCalls] infer inreg param attrs from NumRegisterParameters We're having a hard time booting the ARCH=i386 Linux kernel with clang after removing -ffreestanding because instcombine was dropping inreg from callers during libcall simplification, but not the callees defined in different translation units. This led the callers and callees to have wildly different calling conventions, which (predictably) blew up at runtime. Infer the inreg param attrs on function declarations from the module metadata "NumRegisterParameters." This allows us to boot the ARCH=i386 Linux kernel (w/ -ffreestanding removed). Fixes: https://github.com/llvm/llvm-project/issues/53645 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D125285	2022-05-10 16:21:17 -07:00
Vasileios Porpodas	71bcead98b	[SLP] Make reordering aware of external vectorizable scalar stores. The current reordering scheme only checks the ordering of in-tree operands. There are some cases, however, where we need to adjust the ordering based on the ordering of a future SLP-tree who's instructions are not part of the current tree, but are external users. This patch is a simple implementation of this. We keep track of scalar stores that are users of TreeEntries and if they look profitable to vectorize, then we keep track of their ordering. During the reordering step we take this new index order into account. This can remove some shuffles in cases like in the lit test. Differential Revision: https://reviews.llvm.org/D125111	2022-05-10 15:25:35 -07:00
Sanjay Patel	0353c2c996	[InstCombine] fold shuffles with FP<->Int cast operands shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask) This is similar to a recent transform with fneg ( `b331a7ebc1` ), but this is intentionally the most conservative first step to try to avoid regressions in codegen. There are several restrictions that could be removed as follow-up enhancements. Note that a cast with a unary shuffle is currently canonicalized in the other direction (shuffle after cast - D103038 ). We might want to invert that to be consistent with this patch.	2022-05-10 14:20:43 -04:00
Craig Topper	4b36d9bde7	[CVP] Preserve exact name when converting sext->zext and ashr->lshr. Previously we took the old name and always appended a numberic suffix. Since we're doing a 1:1 replacement, it's clearer to keep the original name exactly. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D125281	2022-05-10 09:13:59 -07:00
Craig Topper	7b362ddda9	[SCCP] Preserve Name when converting SExt->ZExt. This makes the output IR more readable since we're doing a one to one replacement. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D125280	2022-05-10 09:13:59 -07:00
Dawid Jurczak	009f6ce0ef	[GVNSink] Make GVNSink resistant against self referencing instructions (PR36954) Before this change GVNSink pass suffers from stack overflow while processing self referenced instruction in unreachable basic block. According [1] and [2] it's reasonable to make pass resistant against self referencing instructions. To fix issue we skip sinking analysis when we reach instruction coming from unreachable block. [1] https://groups.google.com/g/llvm-dev/c/843Tig9IzwA [2] https://lists.llvm.org/pipermail/llvm-dev/2015-February/082629.html Differential Revision: https://reviews.llvm.org/D113897	2022-05-10 16:06:12 +02:00
Nikita Popov	0eafef1171	[SCEVExpander] Remove handling for mixed int/pointer min/max (NFCI) Mixed int/pointer min/max are no longer possible.	2022-05-10 15:11:39 +02:00
Chuanqi Xu	02d6845234	[NFC] [Coroutines] Remove EnableReuseStorageInFrame option The EnableReuseStorageInFrame option is designed for testing only. But it is better to use *_PASS_WITH_PARAMS macro to keep consistent with other passes.	2022-05-10 17:28:43 +08:00
Nikita Popov	d222bab672	[InstCombine] Handle GEP scalar/vector base mismatch (PR55363) `30a12f3f63` switched the type check to use the GEP result type rather than the GEP operand type. However, the GEP result types may match even if the operand types don't, in case GEPs with scalar/vector base and vector index are compared. Fixes https://github.com/llvm/llvm-project/issues/55363.	2022-05-10 11:26:43 +02:00
Chuanqi Xu	beeed0994e	[Coroutines] Use PassManager instead of Legacy PassManager internally This is a following cleanup for the previous work D123918. I missed serveral places which still use legacy pass managers. This patch tries to remove them.	2022-05-10 13:15:11 +08:00
Hongtao Yu	9641b9be9d	[Inliner] Preserve !prof metadata when converting call to invoke. When a callee function is inlined via an invoke instruction, every function call inside the callee, if not an invoke, will be converted to an invoke after cloned to the caller body. I found that during the conversion the !prof metadata was dropped. This in turned caused a cloned indirect call not properly promoted in subsequent passes. The particular scenario I was investigating was with AutoFDO and thinLTO. In prelink, no ICP was triggered (neither by the sample loader nor PGO ICP), no indirect call was promoted. This is because 1) the particular indirect call did not have inlined samples; and 2) PGO ICP was intentionally disabled. After inlining, the prof metadata was dropped. Then in postlink, PGO ICP jumped in but didn't do anything. Thus the opportunity was missed. I'm making a simple fix to preserve !prof metadata when converting call to invoke. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D125249	2022-05-09 15:08:09 -07:00
Alexey Bataev	4212ef8a0e	Revert "[SLP]Further improvement of the cost model for scalars used in buildvectors." This reverts commit `99f31acfce` and several others to fix detected crashes, reported in https://reviews.llvm.org/D115750	2022-05-09 13:46:06 -07:00
Alexey Bataev	cce80bd8b7	[SLP]Adjust assertion check for scalars in several insertelements. If the same scalar is inserted several times into the same buildvector, the mask index can be used already. In this case need to check, that this scalar is already part of the vectorized buildvector.	2022-05-09 13:07:59 -07:00
Florian Hahn	266ea446ab	Revert "Recommit "[VPlan] Remove uneeded needsVectorIV check."" This reverts commit `8b48223447`. This triggers an assertion on a test case mentioned in D123720. Revert while I investigate.	2022-05-09 20:33:14 +01:00
Alexey Bataev	9dc4ced204	[SLP]Try partial store vectorization if supported by target. We can try to vectorize number of stores less than MinVecRegSize / scalar_value_size, if it is allowed by target. Gives an extra opportunity for the vectorization. Fixes PR54985. Differential Revision: https://reviews.llvm.org/D124284	2022-05-09 09:48:15 -07:00
Alexey Bataev	9c3a75eabf	[SLP]Fix a crash when preparing a mask for external scalars. Need to use actual index instead of the tree entry position, since the insert index may be different than 0. It mean, that we vectorized part of the buildvector starting from not initial insertelement instruction beause of some reason.	2022-05-09 07:59:34 -07:00
Florian Hahn	41e142fdc7	Recommit "[SimpleLoopUnswitch] Collect either logical ANDs/ORs but not both." This reverts commit `7211d5ce07`. This version fixes a crash that caused buildbot failures with the first version.	2022-05-09 13:49:12 +01:00
David Green	6f9e1ea0ef	[VectorCombine] Attempt to fold select shuffles from reductions Given a commutative reduction leading from a shuffle, the order of the lanes on the shuffle are not important for the result. This means we can reorder the shuffle to something simpler, which we try shuffling the first vector lanes first. This was D123494. The new shuffle may not be profitable though, and if it is not we can try the folding of select shuffles from D123911. This, with some adjustment as the output lane ordering is now unimportant, can allow the final shuffle to simplify given the inputs to the patterns from D123911. Where as each transformation on their own are not profitable, the combination is. We can only support a single shuffle when called from reductions, but we are able to sort the ReconstructMask, potentially allowing it to simplify to an identity or concat mask. Differential Revision: https://reviews.llvm.org/D125086	2022-05-08 10:32:41 +01:00
Andrew Litteken	e38f014c40	[IROutliner] Accomodate blocks containing PHINodes with one entry outside the region and others inside the region. When a PHINode has an incoming block from outside the region, it must be handled specially when assigning a global value number to each incoming value. A PHINode has multiple predecessors, and we must handle this case rather than only the single predecessor case. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D124777	2022-05-07 17:11:21 -05:00
David Green	802e15c576	[SLP] Cluster ordering for loads Given a load without a better order, this patch partially sorts the elements to form clusters of adjacent elements in memory. These clusters can potentially be loaded in fewer loads, meaning less overall shuffling (for example loading v4i8 clusters of a v16i8 as a single f32 loads, as opposed to multiple independent bytes loads and inserts). Differential Revision: https://reviews.llvm.org/D122145	2022-05-07 14:38:11 +01:00
Sanjay Patel	8650f05c97	[InstCombine] fix miscompile when casting int->FP->int As shown in https://github.com/llvm/llvm-project/issues/55150 - the existing fold may be wrong when converting to a signed value. This is a quick fix to avoid the miscompile. I added tests/comments for all of the signed/unsigned combinations at either side of the boundary width, and tried to confirm with Alive2: https://alive2.llvm.org/ce/z/3p9DSu There are already some TODO items in the test file that suggest possible refinements, so the regression with ui->FP->si is probably ok. It seems unlikely that we'd see these kind of edge cases with non-byte-width integer types in real code. The potential miscompile went undetected for several years. This and `747c6a0c73` fixes #55150. Differential Revision: https://reviews.llvm.org/D124692	2022-05-07 08:46:25 -04:00
Serge Pavlov	eb28da89a6	[InstCombine] Remove side effect of replaced constrained intrinsics If a constrained intrinsic call was replaced by some value, it was not removed in some cases. The dangling instruction resulted in useless instructions executed in runtime. It happened because constrained intrinsics usually have side effect, it is used to model the interaction with floating-point environment. In some cases side effect is actually absent or can be ignored. This change adds specific treatment of constrained intrinsics so that their side effect can be removed if it actually absents. Differential Revision: https://reviews.llvm.org/D118426	2022-05-07 19:04:11 +07:00
Chenbing Zheng	394c683d40	[InstCombine] sub(add(X,Y),umin(Y,Z)) --> add(X,usub.sat(Y,Z)) Alive2: https://alive2.llvm.org/ce/z/2UNVbp Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D124503	2022-05-07 17:17:48 +08:00
Chenbing Zheng	8eaa1ef0d8	[InstCombine] add casts from splat-a-bit pattern if necessary Splatting a bit of constant-index across a value: sext (ashr (trunc iN X to iM), M-1) to iN --> ashr (shl X, N-M), N-1 If the dest type is different, use a cast (adjust use check). https://alive2.llvm.org/ce/z/acAan3 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124590	2022-05-07 15:34:57 +08:00
Alexander Shaposhnikov	f827ee671f	[Scalar][NFC] Minor cleanups in CallSiteSplitting.cpp	2022-05-06 23:03:49 +00:00
Florian Hahn	7211d5ce07	Revert "[SimpleLoopUnswitch] Collect either logical ANDs/ORs but not both." This reverts commit `db7a87ed4f`. This seems to cause a PPC buildbot failure: https://lab.llvm.org/buildbot#builders/93/builds/8787	2022-05-06 22:38:15 +01:00
Sanjay Patel	b331a7ebc1	[InstCombine] canonicalize fneg after shuffle For the unary shuffle pattern, this is opposite to what we try to do with binops, but it seems better to keep it consistent with the motivating binary shuffle pattern. On that, it is clearly better on the usual no-extra uses case. There is a chance that this will pull an fneg away from some other binop and cause a regression in codegen, but that should be invertible in the backend. The transform is birectional: https://alive2.llvm.org/ce/z/kKaKCU https://alive2.llvm.org/ce/z/3Desfw Fixes #45631	2022-05-06 16:30:26 -04:00
Nikita Popov	82190f917a	[InstCombine] Fold icmp of select with implied condition When threading the icmp over the select, check whether the condition can be folded when taking into account the select condition.	2022-05-06 17:13:32 +02:00
Nikita Popov	0863abe3ac	[InstCombine] Fold icmp of select with non-constant operand Try to push an icmp into a select even if the icmp operand isn't constant - perform a generic SimplifyICmpInst instead. This doesn't appear to impact compile-time much, and forming logical and/or is generally profitable, as we have very good support for them.	2022-05-06 16:04:39 +02:00
Max Kazantsev	5a08e81779	[RS4GC] Add support for 'freeze' instruction to findBaseDefiningValue Because this instruction is a noop, we can simply go through it in search of the base.	2022-05-06 20:46:29 +07:00
Max Kazantsev	e6a7afae03	[NFC] Fix typo in assert message	2022-05-06 20:31:34 +07:00
Nikita Popov	b457ac4240	[InstCombine] Extract icmp of select transform (NFC) To make it either to extend to the case where the other operand is not a constant.	2022-05-06 14:46:44 +02:00
Fraser Cormack	bafab9c09f	[InstCombine] Fix scalable-vector bitwise select matching D113035 enhanced the matching of bitwise selects from vector types. This change unfortunately introduced crashes as it tries to cast scalable vector types to integers. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124997	2022-05-06 12:59:39 +01:00
Florian Hahn	db7a87ed4f	[SimpleLoopUnswitch] Collect either logical ANDs/ORs but not both. After D97756, collectHomogenousInstGraphLoopInvariants may collect conditions for both logical ANDs and logical ORs in case the root is a select that matches both logical AND & OR. This means the function won't return invariant values of either AND/OR chains, but both. This can result in incorrect transformations. See llvm/test/Transforms/SimpleLoopUnswitch/trivial-unswitch-logical-and-or.ll. Without the patch, Alive2 rejects the modified tests with: Source and target don't have the same return domain. Note that this also applies to the test case added in D97756 (@test_partial_condition_unswitch_or_select). We can't unswitch on %cond6, because the graph leading to it contains and AND and an OR. This only fixes trivial unswitching for now, but a similar problem likely exists with non-trivial unswitching. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D124526	2022-05-06 09:50:03 +01:00
Marco Elver	9ae87b5973	[Instrumentation] Share InstrumentationIRBuilder between TSan and SanCov Factor our InstrumentationIRBuilder and share it between ThreadSanitizer and SanitizerCoverage. Simplify its usage at the same time (use function of passed Instruction or BasicBlock). This class may be used in other instrumentation passes in future. NFCI. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D125038	2022-05-06 09:15:17 +02:00
David Green	100cb9a2ba	[VectorCombine] Fold shuffle select pattern This patch adds a combine to attempt to reduce the costs of certain select-shuffle patterns. The form of code it attempts to detect is: %x = shuffle ... %y = shuffle ... %a = binop %x, %y %b = binop %x, %y shuffle %a, %b, selectmask A classic select-mask will pick items from each lane of a or b. These do not always have a great lowering on many architectures. This patch attempts to pack a and b into the lower elements, creating a differently ordered shuffle for reconstructing the orignal which may be better than the select mask. This can be better for performance, especially if less elements of a and b need to be computed and the input shuffles are cheaper. Because select-masks are just one form of shuffle, we generalize to any mask. So long as the backend has decent costmodel for the shuffles, this can generally improve things when they come up. For more basic cost models the folds do not appear to be profitable, not getting past the cost checks. Differential Revision: https://reviews.llvm.org/D123911	2022-05-06 08:13:18 +01:00
Chuanqi Xu	2d037873a3	[Coroutines] Don't re-materialize for debug instructions Re-materialize for debug instructions would cause a different code generated if we enabled `-g`. This is bad. So we disable to re-materialize for debug instructions.	2022-05-06 13:52:19 +08:00
Chenbing Zheng	4c8c101b49	[InstCombine] try to narrow more shifted bswap-of-zext Try to narrow more bswap, if the shift amount is less than the zext (bswap (zext X)) >> C --> (zext (bswap X)) << C' https://alive2.llvm.org/ce/z/i7ddjn Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124598	2022-05-06 10:45:10 +08:00
Florian Hahn	f9f7aa30f8	[VPlan] Remove dead code to create VPWidenPHIRecipes (NFCI). After introducing VPWidenPointerInductionRecipe, VPWidenPHIRecipes should not be created at this point. Turn check into an assert.	2022-05-05 19:29:02 +01:00
Serge Pavlov	e1554ac63a	Revert "[InstCombine] Remove side effect of replaced constrained intrinsics" This reverts commit `83914ee96f`. The change caused discussion: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20220502/1034841.html	2022-05-06 01:09:16 +07:00
Marco Elver	47bdea3f7e	[ThreadSanitizer] Add fallback DebugLocation for instrumentation calls When building with debug info enabled, some load/store instructions do not have a DebugLocation attached. When using the default IRBuilder, it attempts to copy the DebugLocation from the insertion-point instruction. When there's no DebugLocation, no attempt is made to add one. This is problematic for inserted calls, where the enclosing function has debug info but the call ends up without a DebugLocation in e.g. LTO builds that verify that both the enclosing function and calls to inlinable functions have debug info attached. This issue was noticed in Linux kernel KCSAN builds with LTO and debug info enabled: \| ... \| inlinable function call in a function with debug info must have a !dbg location \| call void @__tsan_read8(i8* %432) \| ... To fix, ensure that all calls to the runtime have a DebugLocation attached, where the possibility exists that the insertion-point might not have any DebugLocation attached to it. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D124937	2022-05-05 15:21:35 +02:00
Alexey Bataev	99f31acfce	[SLP]Further improvement of the cost model for scalars used in buildvectors. Further improvement of the cost model for the scalars used in buildvectors sequences. The main functionality is outlined into a separate function. The cost is calculated in the following way: 1. If the Base vector is not undef vector, resizing the very first mask to have common VF and perform action for 2 input vectors (including non-undef Base). Other shuffle masks are combined with the resulting after the 1 stage and processed as a shuffle of 2 elements. 2. If the Base is undef vector and have only 1 shuffle mask, perform the action only for 1 vector with the given mask, if it is not the identity mask. 3. If > 2 masks are used, perform serie of shuffle actions for 2 vectors, combing the masks properly between the steps. The original implementation misses the very first analysis for the Base vector, so the cost might too optimistic in some cases. But it improves the cost for the insertelements which are part of the current SLP graph. Part of D107966. Differential Revision: https://reviews.llvm.org/D115750	2022-05-05 06:04:25 -07:00
Florian Hahn	6bd2b70877	[SimpleLoopUnswitch] Add freeze if branch execs for partial unswitching. We cannot skip the freezing the condition if the unswitched branch executes, if the condition is a chain of ANDs/ORs. For example, if if we have an AND %c1, %c2 with %c1 == undef and %c2 == 0, there would be no branch on undef in the original code, but a branch on undef if we unswitch %c1. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D124603	2022-05-05 09:44:07 +01:00
Chuanqi Xu	405bf90235	[NFC] [Pipelines] Hoist CoroCleanup as Module Pass This is similar to previous patch https://reviews.llvm.org/D123925. It could also reduce the time we call declaresCoroCleanupIntrinsics. And it is helpful for further changes. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D124362	2022-05-05 15:15:09 +08:00
Serge Pavlov	83914ee96f	[InstCombine] Remove side effect of replaced constrained intrinsics If a constrained intrinsic call was replaced by some value, it was not removed in some cases. The dangling instruction resulted in useless instructions executed in runtime. It happened because constrained intrinsics usually have side effect, it is used to model the interaction with floating-point environment. In some cases it is correct behavior but often the side effect is actually absent or can be ignored. This change adds specific treatment of constrained intrinsics so that their side effect can be removed if it actually absents. Differential Revision: https://reviews.llvm.org/D118426	2022-05-05 12:02:42 +07:00
Wael Yehia	2407c13aa4	[AIX][PGO] Enable linux style PGO on AIX This patch switches the PGO implementation on AIX from using the runtime registration-based section tracking to the __start_SECNAME/__stop_SECNAME based. In order to enable the recognition of __start_SECNAME/__stop_SECNAME symbols in the AIX linker, the -bdbg:namedsects:ss needs to be used. Reviewed By: jsji, MaskRay, davidxl Differential Revision: https://reviews.llvm.org/D124857	2022-05-05 04:10:39 +00:00
Alexander Shaposhnikov	ec7122f64b	[InstCombine] Fold ((A&B)^C)\|B Fold ((A&B)^C)\|B into C\|B. https://alive2.llvm.org/ce/z/zSGSor This addresses the issue https://github.com/llvm/llvm-project/issues/55169 Test plan: ninja check-all Differential revision: https://reviews.llvm.org/D124710	2022-05-05 00:56:20 +00:00
Sanjay Patel	14f257620c	[InstCombine] add type constraint to intrinsic+shuffle fold This check is in the related fold for binops, but it was missed when the code was adapted for intrinsics in `432c199e84`. The new test would crash when trying to create a new intrinsic with mismatched types.	2022-05-04 13:07:26 -04:00
Sanjay Patel	7e6d318c50	[InstCombine] move shuffle after funnel shift with same-shuffled operands This extends `432c199e84` and `9c4770eaab` with an intrinsic cited directly in issue #46238 Eventually, we will want to use llvm::isTriviallyVectorizable() or create some new API for this list, but for now, I am intentionally making a minimum change to reduce risk and only affect an intrinsic with regression tests in place.	2022-05-04 13:07:26 -04:00
Sanjay Patel	15042f44a2	[InstCombine] propagate FMF when reordering intrinsics and shuffles This was missed when extending the fold to allow fma with `9c4770eaab`	2022-05-04 12:10:38 -04:00
Sanjay Patel	9c4770eaab	[InstCombine] move shuffle after fma with same-shuffled operands https://alive2.llvm.org/ce/z/sD-JVv This extends `432c199e84` with a 3 arg intrinsic to demonstrate that the code works with the extra operand. Eventually, we will want to use llvm::isTriviallyVectorizable() or create some new API for this list, but for now, I am intentionally making a minimum change to reduce risk and only affect an intrinsic with regression tests in place.	2022-05-04 11:50:38 -04:00
Florian Hahn	8b48223447	Recommit "[VPlan] Remove uneeded needsVectorIV check." This reverts commit `f4e1eaa375`. The patch was originally reverted because it uncovered an issue that has now been fixed in `0ef8ca6d88`.	2022-05-04 10:53:42 +01:00
serge-sans-paille	7030654296	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `fa5a4e1b95` detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D124847	2022-05-04 08:32:38 +02:00
Hongtao Yu	3113e5bb52	[CSSPGO] Relax size limitation for priority inlining with preinlined profile As a follow-up to D124632, I'm turning on unlimited size caps for inlining with preinlined profile. It should be safe as a preinlined profile has "bounded" inline contexts. No noticeable size or perf delta was seen with two of our internal large services, but I think this is still a good change to be consistent with the other case. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D124793	2022-05-03 18:43:07 -07:00
Hongtao Yu	e95ae395aa	[CSSPGO][NFC] Replace SampleProfileLoader::ProfileIsCS with FunctionSamples::ProfileIsCS. The two fields have the same meaning. Their values come from the reader. Therefore I'm removing one. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D124788	2022-05-03 18:32:37 -07:00
Sanjay Patel	432c199e84	[InstCombine] move shuffle after min/max with same-shuffled operands This is an intrinsic version of the existing fold for binops. As a first step, I only allowed min/max, but the code is set up to make adding more intrinsics easy (with more or less than 2 arguments). This (and possible follow-ups) are discussed in issue #46238.	2022-05-03 16:23:11 -04:00
Augie Fackler	1deea714b3	BuildLibCalls: simplify switch statement slightly Per feedback on D123086 after submit. Also added a test for vec_malloc et al attribute inference to show it's doing the right thing. The new tests exposed a defect, corrected by adding vec_free to the list of free functions in MemoryBuiltins.cpp, which had been overlooked all the way back in D94710, over a year ago. Differential Revision: https://reviews.llvm.org/D124859	2022-05-03 13:17:33 -04:00
Dawid Jurczak	9c46a9cf61	[NFC][GVNSink] Don't pretend that iteration is over instructions when it's actually over blocks Differential Revision: https://reviews.llvm.org/D124764	2022-05-03 17:19:40 +02:00
Igor Kirillov	4e5e042d9a	[LoopVectorize] Support reductions that store intermediary result Adds ability to vectorize loops containing a store to a loop-invariant address as part of a reduction that isn't converted to SSA form due to lack of aliasing info. Runtime checks are generated to ensure the store does not alias any other accesses in the loop. Ordered fadd reductions are not yet supported. Differential Revision: https://reviews.llvm.org/D110235	2022-05-03 10:12:30 +01:00
David Green	6f81903e89	[LV][SLP] Mark fptosi_sat as vectorizable This adds fptosi_sat and fptoui_sat to the list of trivially vectorizable functions, mainly so that the loop vectorizer can vectorize the instruction. Marking them as trivially vectorizable also allows them to be SLP vectorized, and Scalarized. The signature of a fptosi_sat requires two type overrides (@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first operand of the intrinsic as a overloaded (but not scalar) operand. Differential Revision: https://reviews.llvm.org/D124358	2022-05-03 09:32:34 +01:00
Vitaly Buka	098e807074	Revert "[DeadArgElim] Set unused arguments for internal functions" Breaks bots, see https://reviews.llvm.org/D124699 This reverts commit `e547a333a4`.	2022-05-02 15:10:26 -07:00
Teresa Johnson	084b65f7dc	[memprof] Only insert dynamic shadow load when needed We don't need to insert a load of the dynamic shadow address unless there are interesting memory accesses to profile. Split out of D124703. Differential Revision: https://reviews.llvm.org/D124797	2022-05-02 13:36:00 -07:00
Alexey Bataev	e74a73782f	[SLP][NFC]Minor code changes for better readability, NFC.	2022-05-02 12:58:25 -07:00
Teresa Johnson	a0b5af46a2	[memprof] Don't instrument PGO and other compiler inserted variables Suppress instrumentation of PGO counter accesses, which is unnecessary and costly. Also suppress accesses to other compiler inserted variables starting with "__llvm". This is a slightly expanded variant of what is done for tsan in shouldInstrumentReadWriteFromAddress. Differential Revision: https://reviews.llvm.org/D124703	2022-05-02 12:17:52 -07:00
Alexey Bataev	7ea03f0b4e	[SLP]Improve reductions analysis and emission, part 1. Currently SLP vectorizer walks through the instructions and selects 3 main classes of values: 1) reduction operations - instructions with same reduction opcode (add, mul, min/max, etc.), which build the reduction, 2) reduced values - instructions with the same opcodes, but different from the reduction opcode, 3) extra arguments - all other values, instructions from the different basic block rather than the root node, instructions with to many/less uses. This scheme is not very efficient. It excludes some instructions and all non-instruction values from the reductions (constants, proficient gathers), to many possibly reduced values are marked as extra arguments. Patch improves this process by introducing a bit extended analysis stage. During this stage, we still try to select 3 classes of the values: 1) reduction operations - same as before, 2) possibly reduced values - all instructions from the current block/non-instructions, which may build a vectorization tree, 3) extra arguments - instructions from the different basic blocks. Additionally, an extra sorting of the possibly reduced values occurs to build the scalar sequences which highly likely will bed vectorized, e.g. loads are grouped by the distance between them, constants are grouped together, cmp instructions are sorted by their compare types and predicates, extractelement instructions are sorted by the vector operand, etc. Also, these groups are reordered by their length so the longest group is the first in the list of the possibly reduced values. The vectorization process tries to emit the reductions for all these groups. These reductions, remaining non-vectorized possible reduced values and extra arguments are then combined into the final expression just like it was before. Differential Revision: https://reviews.llvm.org/D114171	2022-05-02 12:03:58 -07:00
Florian Hahn	0ef8ca6d88	[VPlan] Do not create VPWidenCall recipes for scalar vector factors. 'Widen' recipe are only used when actual vector values are generated. Fix tryToWidenCall to do not create VPWidenCallRecipes for scalar vector factors. This was exposed by D123720, because the widened recipes are considered vector users. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D124718	2022-05-02 19:40:33 +01:00
Quentin Colombet	e547a333a4	[DeadArgElim] Set unused arguments for internal functions Prior to this patch we would only set to undef the unused arguments of the external functions. The rationale was that unused arguments of internal functions wouldn't need to be turned into undef arguments because they should have been simply eliminated by the time we reach that code. This is actually not true because there are plenty of cases where we can't remove unused arguments. For instance, if the internal function is used in an indirect call, it may not be possible to change the function signature. Yet, for statically known call-sites we would still like to mark the unused arguments as undef. This patch enables the "set undef arguments" optimization on internal functions when we encounter cases where internal functions cannot be optimized. I.e., whenever an internal function is marked "live". Differential Revision: https://reviews.llvm.org/D124699	2022-05-02 11:16:32 -07:00
Jonas Paulsson	304378fd09	Reapply "[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building libcalls." (was `0f8c626`). This reverts commit `14d9390`. The patch previously failed to recognize cases where user had defined a function alias with an identical name as that of the library function. Module::getFunction() would then return nullptr which is what the sanitizer discovered. In this updated version a new function isLibFuncEmittable() has as well been introduced which is now used instead of TLI->has() anytime a library function is to be emitted . It additionally also makes sure there is e.g. no function alias with the same name in the module. Reviewed By: Eli Friedman Differential Revision: https://reviews.llvm.org/D123198	2022-05-02 19:37:00 +02:00
Arthur Eubanks	b07aab8fc1	[GlobalOpt] Iterate over replaced values deterministically to constprop If there are pre-existing dead instructions, the order we visit replaced values can cause us sometimes to not delete dead instructions. The added test non-deterministically failed without the change.	2022-05-02 09:43:20 -07:00
Nikita Popov	95fedfab6c	[InstCombine] Handle non-canonical GEP index in indexed compare fold (PR55228) Normally the index type will already be canonicalized here, but this is not guaranteed depending on visitation order. The code was already accounting for a potentially needed sext, but a trunc may also be needed. Add a ConstantExpr::getSExtOrTrunc() helper method to make this simpler. This matches the corresponding IRBuilder method in behavior. Fixes https://github.com/llvm/llvm-project/issues/55228.	2022-05-02 17:56:01 +02:00
Augie Fackler	c7ae423e39	BuildLibCalls: add alloc-family attribute to many allocator functions Differential Revision: https://reviews.llvm.org/D123086	2022-05-02 11:12:55 -04:00
Augie Fackler	e940456531	BuildLibCalls: infer allocptr attribute for free and realloc() family functions Differential Revision: https://reviews.llvm.org/D123084	2022-05-02 09:43:21 -04:00
Nikita Popov	aae5f8115a	[Local] Consider atomic loads from constant global as dead Per the guidance in https://llvm.org/docs/Atomics.html#atomics-and-ir-optimization, an atomic load from a constant global can be dropped, as there can be no stores to synchronize with. Any write to the constant global would be UB. IPSCCP will already drop such loads, but the main helper in Local doesn't recognize this currently. This is motivated by D118387. Differential Revision: https://reviews.llvm.org/D124241	2022-05-02 10:52:58 +02:00
Phoebe Wang	7c04454227	[ArgPromotion][Attributor] Update min-legal-vector-width when do promotion X86 codegen uses function attribute `min-legal-vector-width` to select the proper ABI. The intention of the attribute is to reflect user's requirement when they passing or returning vector arguments. So Clang front-end will iterate the vector arguments and set `min-legal-vector-width` to the width of the maximum for both caller and callee. It is assumed any middle end optimizations won't care of the attribute expect inlining and argument promotion. - For inlining, we will propagate the attribute of inlined functions because the inlining functions become the newer caller. - For argument promotion, we check the `min-legal-vector-width` of the caller and callee and refuse to promote when they don't match. The problem comes from the optimizations' combination, as shown by https://godbolt.org/z/zo3hba8xW. The caller `foo` has two callees `bar` and `baz`. When doing argument promotion, both `foo` and `bar` has the same `min-legal-vector-width`. So the argument was promoted to vector. Then the inlining inlines `baz` to `foo` and updates `min-legal-vector-width`, which results in ABI mismatch between `foo` and `bar`. This patch fixes the problem by expanding the concept of `min-legal-vector-width` to indicator of functions arguments. That says, any passes touch functions arguments have to set `min-legal-vector-width` to the value reflects the width of vector arguments. It makes sense to me because any arguments modifications are ABI related and should response for the ABI compatibility. Differential Revision: https://reviews.llvm.org/D123284	2022-05-02 14:13:05 +08:00
Florian Hahn	5387a38c38	[SimpleLoopUnswitch] Freeze individual OR/AND operands. In some cases, it is not enough to freeze the final AND/OR operation when chaining a number of invariant conditions together. After creating a chain of ANDs/ORs, we assume all unswitched operands to be either true or false. But if any of the operands is poison, the rest of the operands could have any value after branching on the frozen condition. To avoid that, freeze individual operands, if needed. In some cases this may lead to unnecessary freezes, but it seems required at least for some cases (see trivial-unswitch-freeze-individual-conditions.ll) Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D124554	2022-05-01 20:11:05 +01:00
Simon Pilgrim	34f97a3709	[VectorCombine] Merge isa<>/cast<> into dyn_cast<>. NFC. We want to handle the the assert in VectorCombine so avoid the repeated isa/cast code.	2022-05-01 20:09:10 +01:00
Simon Pilgrim	09761ce295	[SLPVectorizer] Remove weird unicode character from comment. NFCI. Whatever it was, Visual Assist really didn't like it....	2022-05-01 16:37:21 +01:00
Florian Hahn	8b022f87b0	[SimpleLoopUnswitch] Freeze trivial conditions if needed. Trivial unswitching can also introduce new branches on undef/poison. Freeze the conditions if needed. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D124549	2022-04-30 19:53:36 +01:00
Juneyoung Lee	40a2e35599	[InstCombine] Remove the undef-related workaround code in visitSelectInst This patch removes an old hack in visitSelectInst that was written to avoid miscompilation bugs in loop unswitch. (Added via https://reviews.llvm.org/D35811) The legacy loop unswitch pass will be removed after D124376, and the new simple loop unswitch pass correctly uses freeze to avoid introducing UB after D124252. Since the hack is not necessary anymore, this patch removes it. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D124426	2022-04-30 20:48:42 +09:00
Hongtao Yu	bdb8c50a1c	[CSSPGO] Turn on priority inlining for probe-only profile We have seen that the prioirty inliner delivered on-par performance with the old inliner for probe-only CSSPGO profile, as long as without a size budget. I'm turning on the priority inliner for probe-only profile by default. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D124632	2022-04-29 17:31:56 -07:00
Hongtao Yu	e36786d15f	[CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122602	2022-04-29 17:03:52 -07:00
James Y Knight	02aa795785	Revert "[JumpThreading][NFC][CompileTime] Do not recompute BPI/BFI analyzes" This change has caused non-reproducibility of a self-build of Clang when using NewPM and providing profile data. This reverts commit `35f38583d2`.	2022-04-29 21:15:47 +00:00
Alexey Bataev	484fcb9888	[SLP][NFC]Fix a comment.	2022-04-29 09:27:13 -07:00
Florian Hahn	a80081763c	[SimplifyCFG] Avoid shifting by a too large exponent. TI->getBitWidth can be > 64 and in those cases the shift will be UB due to the exponent being too large. To fix this, cap the shift at 63. I think this should work out fine, because TableSize is itself a 64 bit type and the maximum table size must fit in the type. Also, if we would underestimate the size here, at most we get an extra ZExt. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124608	2022-04-29 15:19:06 +01:00
Anna Thomas	205246cb64	[CompileTime] [Passes] Avoid computing unnecessary analyses. NFC Similar to `c515b2f39e`, If there are no loops in the function as seen through LI, we should avoid computing the remaining expensive analyses (such as SCEV, BPI). Reordered the analyses requests and early return if there are no loops. The logic of avoiding expensive analyses is applied to LoopVectorizer, LoopLoadElimination and LoopUnrollPass, i.e. all function passes which operate on loops. This is an NFC with compile time improvement. Differential Revision: https://reviews.llvm.org/D124529	2022-04-29 10:00:06 -04:00
Nikita Popov	1881711fbb	[InstCombine] Remove memset of undef value This removes memset with undef char. We already do this for stores of undef value. This comes with the caveat that this optimization is not, strictly speaking, legal for undef values, because we might be overwriting a poison value. However, our entire load/store model currently still operates on undef values, so we need to support undef here as well for internal consistency. Once https://github.com/llvm/llvm-project/issues/52930 is resolved, these and related folds can be limited to poison -- I've added FIXMEs to that effect. Differential Revision: https://reviews.llvm.org/D124173	2022-04-29 14:51:18 +02:00
Ricky Zhou	24a133e16f	[LV] Rename CountRoundDown to VectorTripCount (NFC) The name CountRoundDown is potentially misleading, as the number of iterations can be rounded up when folding the tail. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D119681	2022-04-29 13:50:00 +01:00
Nikita Popov	982cbed819	[InstCombine] Fold logical and/or of range icmps with nowrap flags This is an edge-case where we don't convert to bitwise and/or based on implies poison reasoning, so explicitly try to perform the fold in logical form. The transform itself is poison-safe, as both icmps are based on the same value and any nowrap flags are discarded as part of the fold (https://alive2.llvm.org/ce/z/aCwC8b for the used example).	2022-04-29 14:42:42 +02:00
Florian Hahn	e66127e69b	[VPlan] Simplify & adjust code as suggested in D123005. Improve code as suggested in D123005. Applied separately, because the comments where made a diff that has not been rebased to current main.	2022-04-29 13:34:54 +01:00
Nikita Popov	57aaeefc18	[InstCombine] Pass ICmpInsts to foldAndOrOfICmpsUsingRanges() (NFC) Pass the whole instruction rather than unpacking it. This makes it easier to reuse the function in another place, as the entire logic is encapsulated.	2022-04-29 12:46:31 +02:00
Nikita Popov	1f53932a95	[InstCombine] Remove foldAndOrOfEqualityCmpsWithConstants() fold This fold handles a special subset of foldAndOrOfICmpsUsingRanges(), use the more generic implementation instead. The result can differ if a representation using a range comparison is possible, in which case that is preferred over masking. There is a canonicalization opportunity here.	2022-04-29 12:23:00 +02:00
Nikita Popov	5515263e44	[InstCombine] Fold and of two ranges differing by mask This is the de Morgan conjugated variant of the existing fold for ors. Implement this by switching the range code to always work on ors and perform invert operands at the start and end. This makes reasoning easier and makes the extension more obviosuly correct.	2022-04-29 12:01:38 +02:00
Florian Hahn	fb4113ef0c	[Passes] Remove legacy LoopUnswitch pass. The legacy LoopUnswitch pass is only used in the legacy pass manager pipeline, which is deprecated. The NewPM replacement is SimpleLoopUnswitch and I think it is time to remove the legacy LoopUnswitch code. Fixes #31000. Reviewed By: aeubanks, Meinersbur, asbirlea Differential Revision: https://reviews.llvm.org/D124376	2022-04-29 10:30:49 +01:00
Nikita Popov	d5ee20fcc9	[InstCombine] Switch an or of icmps fold to use constant ranges We can express this fold more naturally when working on the constant range implementation. This change is not entirely NFC, because the code now also handles cases that don't match the precise pattern this previously looked for, e.g. we can omit an add on one of the ranges.	2022-04-29 11:15:54 +02:00
David Green	7047c47918	[VecCombine] Fix sort comparator logic in foldShuffleFromReductions I think this sort comparator was overly complex, and the windows expensive check bot agreed, failing as it was not giving a strict weak ordering. Change it to use the comparison of the mask values as unsigned integers. This should sort the undef elements to the end whilst keeping X<Y otherwise.	2022-04-29 09:30:02 +01:00
Nikita Popov	884e9a877b	[SimplifyCFG] Replace condition value when threading Replace the condition value with the known constant value on the threaded edge. This happens implicitly with phi threading because we replace with the incoming value, but not for non-phi threading.	2022-04-29 09:50:27 +02:00
Nikita Popov	4e545bdb35	[SimplifyCFG] Thread branches on same condition in more cases (PR54980) SimplifyCFG implements basic jump threading, if a branch is performed on a phi node with constant operands. However, InstCombine canonicalizes such phis to the condition value of a previous branch, if possible. SimplifyCFG does support this as well, but only in the very limited case where the same condition is used in a direct predecessor -- notably, this does not include the common diamond pattern (i.e. two consecutive if/elses on the same condition). This patch extends the code to look back a limited number of blocks to find a branch on the same value, rather than only looking at the direct predecessor. Fixes https://github.com/llvm/llvm-project/issues/54980. Differential Revision: https://reviews.llvm.org/D124159	2022-04-29 09:44:05 +02:00
Florian Hahn	f4e1eaa375	Revert "[VPlan] Remove uneeded needsVectorIV check." This reverts commit `43842b887e` while I investigate a buildbot failure. It also reverts the follow-up commit `2883de0514`.	2022-04-28 20:16:21 +01:00
David Green	ded8187e35	[VectorCombine] Try to reduce shuffle cost for commutative reduction operands Given a shuffle feeding a commutative reduction, the lane ordering of the shuffle will not alter the result. This is also true if there are a number of operations between the reduction and the shuffle, providing they only operate lane-wise. This patch searches for cases like that in Vector Combine, allowing us to check the cost of the shuffle vs an in-order identity shuffle and replace the order if possible. This only handles a single shuffle at the moment to keep things simple, and is able to ignore splats that produce results where every result is the same. This is a more powerful version of a combine that already happens in instrcombine, capable of optimizing more cases by looking through more instructions and being able to cost the shuffle. Differential Revision: https://reviews.llvm.org/D123494	2022-04-28 19:46:12 +01:00
Alexey Bataev	75e1cf4a6a	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-28 10:04:41 -07:00
Pavel Samolysov	9197959e13	[ArgPromotion] Move ArgPart and OffsetAndArgPart to anonymous namespace The structure ArgPart and alias OffsetAndArgPart have been moved into the anonymous namespace. NFC. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D124617	2022-04-28 09:51:46 -07:00
Pavel Samolysov	6b825e50f7	[ArgPromotion] Change the condition to check the promotion limit The condition should be 'ArgParts.size() > MaxElements', so that if we have exactly 3 elements in the 'ArgParts' vector, the promotion should be allowed because the 'MaxElement' threshold is not exceeded yet. The default value for 'MaxElement' has been decreased to 2 in order to avoid an actual change in argument promoting behavior. However, this changes byval argument transformation behavior by allowing adding not more than 2 arguments to the function instead of 3 allowed before. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D124178	2022-04-28 09:42:58 -07:00
Florian Hahn	2883de0514	[VPlan] Fix comment formatting from `43842b887e`.	2022-04-28 16:31:48 +01:00
Florian Hahn	43842b887e	[VPlan] Remove uneeded needsVectorIV check. Remove one of the last remaining uses of ::needsVectorIV, preparing for its removal. Now that usesScalars is available and based on the information explicit in VPlan, there is no need to use the pre-computed needsVectorIV. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123720	2022-04-28 16:27:34 +01:00
Alexey Bataev	9861ca0c23	Revert "[COST]Improve cost model for shuffles in SLP." This reverts commit `29a470e380` to fix a crash reported in https://reviews.llvm.org/D100486#3479989.	2022-04-28 08:11:56 -07:00
Pavel Samolysov	744a837838	[ArgPromotion] Rename variables according to the code style. NFC Some loop counters ('i', 'e') and variables ('type') were named not in accordance with the code style and clang-tidy issues warnings about the using of such variables. This patch renames the variables and fixes some typos in the comments within the source file. Differential Revision: https://reviews.llvm.org/D123662	2022-04-28 15:32:05 +02:00
Chris Jackson	c792884589	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] Reland `3f2b76ec90` with the test corrected to require x86-registered-target. Differential Revision: https://reviews.llvm.org/D120169	2022-04-28 14:21:56 +01:00
Chris Jackson	cd5f9efc4d	Revert "[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]" This reverts commit `3f2b76ec90`.	2022-04-28 14:07:31 +01:00
Nikita Popov	90dba831ae	[InstCombine] Fold or of icmp ne trunc/and This adds the de Morgan conjugated variant for the existing "and eq" style fold. Proof: https://alive2.llvm.org/ce/z/tkNAcG	2022-04-28 15:07:16 +02:00
Chris Jackson	3f2b76ec90	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] Reland commit `74273d575f` following a fix for a memory leak. The DVIRecoveryRecord vectors now use unique_ptr. Differential Revision: https://reviews.llvm.org/D120169	2022-04-28 13:55:49 +01:00
Nikita Popov	b9dc565147	[GVN] Encode GEPs in offset representation When using opaque pointers, convert GEPs into offset representation of the form P + V1 * Scale1 + V2 * Scale2 + ... + ConstantOffset. This allows us to recognize equivalent address calculations even if the GEPs don't use the same source element type. This fixes an opaque pointer codegen regression seen in rustc. Differential Revision: https://reviews.llvm.org/D124527	2022-04-28 09:32:05 +02:00
Max Kazantsev	35f38583d2	[JumpThreading][NFC][CompileTime] Do not recompute BPI/BFI analyzes They can already be available, and even if not, DT/LI can be available. We should not recompute them. Old PM is unchanged because it would require changing dependencies, and we don't care enough about it. Differential Revision: https://reviews.llvm.org/D124439 Reviewed By: nikic, aeubanks	2022-04-28 10:46:08 +07:00
Wenju He	96d3be8443	[InferAddressSpaces] Check if AS are the same in isNoopPtrIntCastPair isNoopAddrSpaceCast is expecting SrcAS is different from DestAS. If the two AS are the same, consider ptrtoint/inttoptr as noop cast. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123573	2022-04-28 11:10:55 +08:00
Arthur Eubanks	4e65291837	[OpaquePtr][GlobalOpt] Don't attempt to evaluate global constructors with arguments Previously all entries in global_ctors had to have the void()* type and we'd skip evaluating bitcasted functions. With opaque pointers we may see the function directly. Fixes #55147. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D124553	2022-04-27 19:00:44 -07:00
Fangrui Song	c74a706893	[LegacyPM] Remove ThreadSanitizerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove ThreadSanitizerLegacyPass. Reviewed By: #sanitizers, vitalybuka Differential Revision: https://reviews.llvm.org/D124209	2022-04-27 16:25:41 -07:00
Kirill Stoimenov	761366e6ae	Revert "[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]" This reverts commit `74273d575f`. Buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/22795 Failing with memory leak.	2022-04-27 23:11:48 +00:00
Nicolas Abram Lujan	f8a574bf4d	[InstCombine] C0 >> (X - C1) --> (C0 << C1) >> X With the right pre-conditions, we can fold the offset into the shifted constant: https://alive2.llvm.org/ce/z/drMRBU https://alive2.llvm.org/ce/z/cUQv-_ Fixes #55016 Differential Revision: https://reviews.llvm.org/D124369	2022-04-27 14:18:30 -04:00
Martin Sebor	efa0f12c0b	[InstCombine] Fold strnlen calls in equality to zero. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123818	2022-04-27 12:03:24 -06:00
Alexey Bataev	29a470e380	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-27 10:56:26 -07:00
Wei Wang	26a0d53b15	[CHR] Skip region containing llvm.coro.id When a block containing llvm.coro.id is cloned during CHR, it inserts an invalid PHI node with token type to the beginning of the block containing llvm.coro.begin. To avoid such case, we exclude regions with llvm.coro.id. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D124418	2022-04-27 10:27:25 -07:00
Roman Lebedev	ffafa71f64	[InstCombine] 'round up integer': if bias is just right, just reuse instructions This is only useful if we can't create new instruction because %x.aligned has other uses and already sticks around.	2022-04-27 17:27:02 +03:00
Roman Lebedev	aac0afd1dd	[InstCombine] Fold 'round up integer' pattern (when alignment is a power of two) But don't deal with non-splats. The test coverage is sufficiently exhaustive, and alive is happy about the changes there. Example with constants: https://alive2.llvm.org/ce/z/EUaJ5- / https://alive2.llvm.org/ce/z/Bkng2X General proof: https://alive2.llvm.org/ce/z/3RjJ5A	2022-04-27 17:26:55 +03:00
Shilei Tian	a6b355dd31	[SLP] Fix a typo that causes redundant assertion and potential segment fault Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D124497	2022-04-27 10:07:59 -04:00
Anna Thomas	c515b2f39e	[IRCE] Avoid computing potentially unnecessary analyses. NFC IRCE is a function pass that operates on loops. If there are no loops in the function (as seen through LI), we should avoid computing the remaining expensive analyses (such as BPI). Reordered the analyses requests and early return if there are no loops. This is an NFC with compile time improvement. The same will be done in a follow-up patch for the loop vectorizer. Reviewed-By: nikic Differential Revision: https://reviews.llvm.org/D124478	2022-04-27 09:22:10 -04:00
Chris Jackson	74273d575f	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] This relands commit `8f550368b1`. The test is amended with REQUIRES: x86-registered-target, in line with the other debuginfo-scev-salvage tests. Differential Revision: https://reviews.llvm.org/D120169	2022-04-27 13:10:30 +01:00
Chris Jackson	855752e563	Revert [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics[2/2] This reverts commit `8f550368b1`.	2022-04-27 13:06:03 +01:00
Chris Jackson	8f550368b1	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] Second of two patches to extend SCEV-based salvaging to dbg.value intrinsics that have multiple location ops pre-LSR. This second patch adds the core implementation. Reviewers: @StephenTozer, @djtodoro Differential Revision: https://reviews.llvm.org/D120169	2022-04-27 12:47:35 +01:00
Chris Jackson	c45e4c140f	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [1/2] [NFC] First of two patches that extends SCEV-based salvaging to enable salvaging of dbg.value instrinsics that have multiple locations ops before the Loop Strength Reduction pass. The existing single-op SCEV-based salvaging can generate variadic dbg.value intrinsics in order to salvage a dbg.value that has a single location op. If a dbg.value has multiple location ops before LSR, and LSR optimises away one or more of the location operands, then currently no salvaging will be attempted. Salvaging can now be added, but first this patch cleans up consistency in both the code and comments, and applies some refactoring to make application of the new salvaging implementation more straightforward. - Use SCEVDbgValueBuilder for both types of recovery expressions: IV-offset based and iteration count based. - Combine the functions that write the final DIExpression. - Move some static functions into member functions. Reviewers: @Orlando Differential Revision: https://reviews.llvm.org/D120168	2022-04-27 11:43:05 +01:00
Nikita Popov	c103f5e9da	[InstCombine] Combine opaque pointer GEPs with mismatching element types Currently, two GEPs will only be combined if the result element type of one is the same as the source element type of the other. However, this means we may miss folding opportunities where the second GEP could be rewritten using a different element type. This is especially relevant for opaque pointers, where constant GEPs often use i8 element type. Address this by converting GEP indices to offsets, adding them, and then converting them back to indices. The first (inner) GEP is allowed to have variable indices as well, in which case only the constant suffix is converted into an offset. This should address the regression reported in https://reviews.llvm.org/D123300#3467615. Differential Revision: https://reviews.llvm.org/D124459	2022-04-27 09:33:47 +02:00
Alexandros Lamprineas	a910337b5d	[FuncSpec] Conditional jump or move depends on uninitialised value(s). I found this bug when performing a two-stage build of clang with Function Specialization enabled and tuned aggressively. The crash appears only on release builds. Fixes https://github.com/llvm/llvm-project/issues/55000. Before accessing the contents of the ArgInfo iterator inside SCCPInstVisitor::markArgInFuncSpecialization, we should be checking that the iterator is valid. Differential Revision: https://reviews.llvm.org/D124114	2022-04-27 07:28:25 +01:00
Martin Sebor	ffed0cfcdb	[SimplifyLibCalls] avoid slicing 64-bit integers in an ILP32 build (PR #54739 ) Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123472	2022-04-26 17:20:56 -06:00
Martin Sebor	449adafabe	[InstCombine] Fold strnlen of constant strings. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123817	2022-04-26 16:15:28 -06:00
Ricky Zhou	4041c44853	[InstCombine] Update predicate when canonicalizing comparisons in canonicalizeClampLike. canonicalizeClampLike canonicalizes the ule/ugt comparisons to ult/uge, respectively. However, it does not update the variable holding the comparison predicate type after doing this. Later code fails to handle the non-canonical predicate type (specifically, the swap of ThresholdLowIncl and ThresholdHighExcl when Pred0 has been canonicalized from ugt to uge). This leads to the miscompile reported in PR53252. Fix this by updating the comparison predicate after canonicalizing. Fixes #53252 Differential Revision: https://reviews.llvm.org/D119690	2022-04-26 17:35:45 -04:00
Michael Kruse	ff289feeba	[OpenMPIRBuilder] Remove ContinuationBB argument from Body callback. The callback is expected to create a branch to the ContinuationBB (sometimes called FiniBB in some lambdas) argument when finishing. This creates problems: 1. The InsertPoint used for CodeGenIP does not need to be the end of a block. If it is not, a naive callback will insert a branch instruction into the middle of the block. 2. The BasicBlock the CodeGenIP is pointing to may or may not have a terminator. There is an conflict where to branch to if the block already has a terminator. 3. Some API functions work only with block having a terminator. Some workarounds have been used to insert a temporary terminator that is removed again. 4. Some callbacks are sensitive to whether the BasicBlock has a terminator or not. This creates a callback ordering problem where different callback may have different behaviour depending on whether a previous callback created a terminator or not. The problem also exists for FinalizeCallbackTy where some callbacks do create branch to another "continue" block, but unlike BodyGenCallbackTy does not receive the target as argument. This is not addressed in this patch. With this patch, the callback receives an CodeGenIP into a BasicBlock where to insert instructions. If it has to insert control flow, it can split the block at that position as needed but otherwise no separate ContinuationBB is needed. In particular, a callback can be empty without breaking the emitted IR. If the caller needs the control flow to branch to a specific target, it can insert the branch instruction itself and pass an InsertPoint before the terminator to the callback. Certain frontends such as Clang may expect the current IRBuilder position to be at the end of a basic block. In this case its callbacks must split the block at CodeGenIP before setting the IRBuilder position such that the instructions after CodeGenIP are moved to another basic block and before returning create a new branch instruction to the split block. Some utility functions such as `splitBB` are supporting correct splitting of BasicBlocks, independent of whether they have a terminator or not, returning/setting the InsertPoint of an IRBuilder to the end of split predecessor block, and optionally omitting creating a branch to the split successor block to be added later. Reviewed By: kiranchandramohan Differential Revision: https://reviews.llvm.org/D118409	2022-04-26 16:35:01 -05:00
Vasileios Porpodas	fa8a9fea47	Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `6a9bbd9f20`. Code review: https://reviews.llvm.org/D124202	2022-04-26 14:02:40 -07:00
Martin Sebor	ce8f42d4af	[InstCombine] Fold memrchr calls with a constant character. Reviewed By: nikic Differential Revision: //reviews.llvm.org/D123629	2022-04-26 14:02:50 -06:00
Martin Sebor	10c99ce67d	[InstCombine] Fold memrchr calls with constant size, bail on excessive. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123626 Differential Revision: https://reviews.llvm.org/D123628	2022-04-26 14:02:50 -06:00
Martin Sebor	25febbd155	[InstCombine] Fold strnlen with a bound of zero and one. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123816	2022-04-26 14:02:50 -06:00
Martin Sebor	2807c420cd	[InstCombine] add a strnlen handler stub. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123815	2022-04-26 14:02:49 -06:00
Sanjay Patel	903aa5e0f8	[InstCombine] try to fold icmp with mismatched extended operands If a value is known to be non-negative and zexted, that's the same thing as sexted. So for the purpose of looking past the casts with an icmp, treat it as if it was a sext: https://alive2.llvm.org/ce/z/_BDsGV This is necessary, but not enough to solve the motivating problem: https://github.com/llvm/llvm-project/issues/55013 Differential Revision: https://reviews.llvm.org/D124419	2022-04-26 14:26:36 -04:00
Vasileios Porpodas	6a9bbd9f20	Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `55ce296d6f`.	2022-04-26 11:25:26 -07:00
Sanjay Patel	c8ed784ee6	[InstCombine] fold freeze of partial undef/poison vector constants We can always replace the undef elements in a vector constant with regular constants to get rid of the freeze: https://alive2.llvm.org/ce/z/nfRb4F The select diffs show that we might do better by adjusting the logic for a frozen select condition. We may also want to refine the vector constant replacement to consider forming a splat. Differential Revision: https://reviews.llvm.org/D123962	2022-04-26 14:16:11 -04:00
Vasileios Porpodas	55ce296d6f	[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost` Before this patch `Args` was used to pass a broadcat's arguments by SLP. This patch changes this. `Args` is now used for passing the operands of the shuffle. Differential Revision: https://reviews.llvm.org/D124202	2022-04-26 11:11:29 -07:00
Augie Fackler	a907d36cfe	Attributes: add a new `allocptr` attribute This continues the push away from hard-coded knowledge about functions towards attributes. We'll use this to annotate free(), realloc() and cousins and obviate the hard-coded list of free functions. Differential Revision: https://reviews.llvm.org/D123083	2022-04-26 13:57:11 -04:00
Igor Kudrin	39ce68886b	[LoopPeel][NFCI] Simplify the code to calculate peel count for PGO This reorganizes the code as a preparation for D123865: * Use more descriptive names for variables * Simplify a condition by use an already calculated value for `MaxPeelCount` * Remove a duplicate log entry * Report basic values for loop costs Differential Revision: https://reviews.llvm.org/D124388	2022-04-26 18:44:24 +04:00
Igor Kudrin	c71890e158	[LoopPeel][NFC] Exit early if there is no room for peeling Differential Revision: https://reviews.llvm.org/D123864	2022-04-26 18:43:56 +04:00
Florian Hahn	c59d95f6a4	[ConstraintElimination] Check if const. is small enough before using it Check if the value of a ConstantInt is small enough to be used for solving before calling getSExtValue. Fixes #55085	2022-04-26 13:56:32 +01:00
Liqiang Tao	b9fc18f89a	[llvm][Inline] Remove PriorityInlineOrder in SCC inliner Since the size of most of SCC's is 1, the PriorityInlineOrder would not change the inline order in SCC inliner. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D123608	2022-04-26 20:20:10 +08:00
Florian Hahn	857c612d89	[IPSCCP] Support unfeasible default dests for switch. At the moment, unfeasible default destinations are not handled properly in removeNonFeasibleEdges. So far, only unfeasible cases are removed, but later code expects unreachable blocks to have no predecessors. This is causing the crash reported in PR49573. If the default destination is unfeasible it won't be executed. Create a new unreachable block on demand and use that as default destination. Note that at the moment this only is relevant for cases where resolvedUndefsIn marks the first case as executable. Regular switch handling has a FIXME/TODO to support determining whether the default case is feasible or not. Fixes #48917. Differential Revision: https://reviews.llvm.org/D113497	2022-04-26 12:41:41 +01:00
Dmitry Makogon	d03d2d8aea	[RS4GC] Prune inputs of BDV if they are BDV themselves Don't check whether an input of BDV can be pruned if the input is the BDV itself. BDV is present in the states map, so in case the input is the BDV itself, we'd return false. So explicitly check this case. Differential Revision: https://reviews.llvm.org/D123846	2022-04-26 16:05:00 +07:00
Sanjay Patel	6631907ad2	[InstCombine] use isKnownNonNegative to reduce code duplication; NFC We may be able to make the ValueTracking wrapper smarter in the future (for example, analyze a simple recurrence), so this will automatically benefit if that happens.	2022-04-25 17:13:29 -04:00
Valery N Dmitriev	88b9e46fb5	[SLP] Steer for the best chance in tryToVectorize() when rooting with binary ops. tryToVectorize() method implements one of searching paths for vectorizable tree roots in SLP vectorizer, specifically for binary and comparison operations. Order of making probes for various scalar pairs was defined by its implementation: the instruction operands, then climb over one operand if the instruction is its sole user and then perform same actions for another operand if previous attempts failed. Problem with this approach is that among these options we can have more than a single vectorizable tree candidate and it is not necessarily the one that encountered first. Trying to build vectorizable tree for each possible combination for just evaluation is expensive. But we already have lookahead heuristics mechanism which we use for finding best pick among operands of commutative instructions. It calculates cumulative score for candidates in two consecutive lanes. This patch introduces use of the heuristics for choosing the best pair among several combinations. We only try one that looks as most promising for vectorization. Additional benefit is that we reduce total number of vectorization trees built for probes because we skip those looking non-profitable early. Reviewed By: Alexey Bataev (ABataev), Vasileios Porpodas (vporpo) Differential Revision: https://reviews.llvm.org/D124309	2022-04-25 12:25:33 -07:00
Nathan Lanza	950c95cfdd	[coroutines] Get an IntegerType from the value instead of defaulting to 64 bit This AliasPtr is being created always from an Int64 even for targets where 32 bit is the proper type. e.g. “thumbv7-none-linux-android16”. This causes the assert in the `get` func to fail as we're getting a 32 bit from the APInt. Fix this by simply always just getting the type from the value instead. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D123272	2022-04-25 11:10:46 -07:00
Fangrui Song	39e23bb059	[LegacyPM] Remove HWAsanSanitizerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove AddressSanitizerLegacyPass... ..., ModuleAddressSanitizerLegacyPass, and ASanGlobalsMetadataWrapperPass. MemorySanitizerLegacyPass was removed in D123894. AddressSanitizerLegacyPass was removed in D124216. Reviewed By: #sanitizers, vitalybuka Differential Revision: https://reviews.llvm.org/D124337	2022-04-25 10:21:26 -07:00
David Green	9727c77d58	[NFC] Rename Instrinsic to Intrinsic	2022-04-25 18:13:23 +01:00
Florian Hahn	6a6cc5542b	[SimpleLoopUnswitch] Enable freezing of conditions by default. This fixes a series of mis-compiles by SimpleLoopUnswitch. My measurements showed no performance regression with -O3 on AArch64 in SPEC2006, SPEC2017 and a set of internal benchmarks. Fixes #50387, #50430 Depends on D124251. Reviewed By: nikic, aqjune Differential Revision: https://reviews.llvm.org/D124252	2022-04-25 14:26:41 +01:00
Nikita Popov	e8945110d2	[InstCombine] Remove redundant unsigned underflow fold (NFCI) This is now handled as a combination of two other folds: (A+B) <= A & (A+B) != 0 --> (A+B)-1 < A (A+B)-1 < A --> -B < A	2022-04-25 14:22:43 +02:00
Nikita Popov	ee50925894	[InstCombine] Fold (X != 0) & (Y u>= X) This adds the De Morgan conjugated fold for the existing (X == 0) \| (Y u< X) fold. Proof: https://alive2.llvm.org/ce/z/3Me3JQ	2022-04-25 13:16:47 +02:00
Nikita Popov	2bec8d6d59	[InstCombine] Fold X + Y + C u< X This is a variation on the X + Y u< X fold with an extra constant. Proof: https://alive2.llvm.org/ce/z/VNb8pY	2022-04-25 12:53:39 +02:00
Max Kazantsev	606a000d1a	[LoopInstSimplify] Ignore users in unreachable blocks. PR55072 Logic in this pass assumes that all users of loop instructions are either in the same loop or are LCSSA Phis. In fact, there can also be users in unreachable blocks that currently break assertions. Such users don't need to go to the next round of simplifications. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D124368	2022-04-25 17:35:28 +07:00
Chenbing Zheng	5805cfb901	[InstCombine] Complete folding of fneg-of-fabs This patch add a function foldSelectWithFCmpToFabs, and do more combine for fneg-of-fabs. With 'nsz': fold (X < +/-0.0) ? X : -X or (X <= +/-0.0) ? X : -X to -fabs(x) fold (X > +/-0.0) ? X : -X or (X >= +/-0.0) ? X : -X to -fabs(x) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D123830	2022-04-25 09:53:36 +08:00
Valery N Dmitriev	edf7bed87b	[SLP][NFC] Outline lookahead heuristics into a separate helper class. Minor refactoring to reduce size of functional change D124309: look-ahead scoring routines pulled out of VLOperands and formed new LookAheadHeuristics helper class. Reviewed By: Alexey Bataev (ABataev), Vasileios Porpodas (vporpo) Differential Revision: https://reviews.llvm.org/D124313	2022-04-22 18:59:08 -07:00
Paul Kirth	4683a2effa	[llvm][misexpect] Avoid division by 0 when using sample profiling MisExpect diagnostics should not prevent compilation from succeeding, and the assertion is insufficient to prevent division by zero in release builds. This patch addresses that by replacing the assert with an early return. Additionally, it disables MisExpect diagnostics when using sample profiling, since this is the only known case where this error has manifested. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D124302	2022-04-22 22:48:00 +00:00
Florian Hahn	b341c44010	[SimpleLoopUnswitch] Check if freeze is needed for partial unswitching. We only need to insert a Freeze instruction if any of the conditions may be poison. Similar checks are already done in the other places SimpleLoopUnswitch creates Freeze instruction. Reviewed By: aeubanks, efriedma Differential Revision: https://reviews.llvm.org/D124259	2022-04-22 21:24:55 +01:00
Simon Pilgrim	ffe13960b5	[InstCombine] Fold (A & 2^C1) + A => A & (2^C1 - 1) iff bit C1 in A is a sign bit (PR21929) Alive2: https://alive2.llvm.org/ce/z/Ygq26C This is the final missing fold to handle the modulo2 simplification: https://github.com/llvm/llvm-project/issues/22303 Fixes #22303 Differential Revision: https://reviews.llvm.org/D123374	2022-04-22 16:59:02 +01:00
Nikita Popov	369ef9bf60	[InstCombine] Extract code for or of icmp eq zero and icmp fold (NFC) To make it easier to extend this to the congruent and case.	2022-04-22 16:48:59 +02:00
Nikita Popov	ba46ae7bd8	[InstCombine] Merge foldAndOfICmps() and foldOrOfICmps() (NFCI) Folds are supposed to always be added in conjugated pairs for and and or. Merge the two functions to make folds for which this is currently not the case more obvious.	2022-04-22 12:48:03 +02:00
Nikita Popov	3e1d2c352c	[InstCombine] Fix or of commuted foldable predicates `1d90e53044` switch this code to store the predicates and operands in variables, but retained a swapOperands() call here. Thus the commuted cases were no longer folded. Additionally, as the change was not reported, the next InstCombine iteration would not pick it up either.	2022-04-22 12:31:26 +02:00
Nikita Popov	993b166deb	Reapply [SimplifyCFG] Handle branch on same condition in pred more directly Reapplying without changes, after a fix to a dependent patch. ----- Rather than creating a PHI node and then using the PHI threading code, directly handle this case in FoldCondBranchOnValueKnownInPredecessor(). This change is supposed to be NFC-ish, but may cause changes due to different transform order.	2022-04-22 10:27:38 +02:00
Nikita Popov	df18e37541	Reapply [SimplifyCFG] Make FoldCondBranchOnPHI more amenable to extension (NFCI) Reapply with SmallMapVector instead of SmallDenseMap, which should address the non-determinism issue. ----- This general threading transform can be performed whenever we know a constant value for the condition in a predecessor, which would currently just be the case of a phi node with constant arguments.	2022-04-22 09:42:11 +02:00
Fangrui Song	16a4d3a85c	[LegacyPM] Remove AddressSanitizerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove AddressSanitizerLegacyPass, ModuleAddressSanitizerLegacyPass, and ASanGlobalsMetadataWrapperPass. MemorySanitizerLegacyPass was removed in D123894. Reviewed By: #sanitizers, vitalybuka Differential Revision: https://reviews.llvm.org/D124216	2022-04-21 19:25:57 -07:00
Nico Weber	0e0759f441	Revert "[LegacyPM] Remove AddressSanitizerLegacyPass" This reverts commit `e68c589e53`. Breaks check-llvm, see comments on https://reviews.llvm.org/D124216	2022-04-21 22:14:36 -04:00
Fangrui Song	e68c589e53	[LegacyPM] Remove AddressSanitizerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove AddressSanitizerLegacyPass, ModuleAddressSanitizerLegacyPass, and ASanGlobalsMetadataWrapperPass. MemorySanitizerLegacyPass was removed in D123894. Reviewed By: #sanitizers, vitalybuka Differential Revision: https://reviews.llvm.org/D124216	2022-04-21 18:18:39 -07:00
Vitaly Buka	9be90748f1	Revert "[asan] Emit .size directive for global object size before redzone" Revert "[docs] Fix underline" Breaks a lot of asan tests in google. This reverts commit `365c3e85bc`. This reverts commit `78a784bea4`.	2022-04-21 16:21:17 -07:00
Alex Brachet	78a784bea4	[asan] Emit .size directive for global object size before redzone This emits an `st_size` that represents the actual useable size of an object before the redzone is added. Reviewed By: vitalybuka, MaskRay, hctim Differential Revision: https://reviews.llvm.org/D123010	2022-04-21 20:46:38 +00:00
Sanjay Patel	664ae7bbcc	[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X (2nd try) The first attempt at this missed a check to make sure the offset constant was in range and caused many bot failures. That was missed in the Alive2 proof because on overshift creates poison rather than the assert from APInt. Here's an alternate attempt at a proof using count-trailing-zeros: https://alive2.llvm.org/ce/z/pnXQYR Original commit message: This is similar to an existing pre-shift-of-constant fold: `8a9c70fc01` ...but in this case, we need no-wrap on the shl and a negative offset: https://alive2.llvm.org/ce/z/_RVz99	2022-04-21 16:18:46 -04:00
Fangrui Song	35e350d5ba	Revert "[SimplifyCFG] Handle branch on same condition in pred more directly" and "[SimplifyCFG] Make FoldCondBranchOnPHI more amenable to extension" This reverts commit `3df86e799e`. This reverts commit `8988254667`. `[SimplifyCFG] Handle branch on same condition in pred more directly` caused non-determinism when compiling opt with a bootstrapped clang. I have to revert the dependent commit as well.	2022-04-21 12:58:58 -07:00
Fangrui Song	409eb5dc3e	[LegacyPM] Remove GCOVProfilerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove GCOVProfilerLegacyPass. I have checked many LLVM users and only llvm-hs[1] uses the legacy gcov pass. [1]: https://github.com/llvm-hs/llvm-hs/issues/392 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123829	2022-04-21 10:59:30 -07:00
Fangrui Song	d133538b8b	[LegacyPM] Remove MemorySanitizerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove MemorySanitizerLegacyPass. Differential Revision: https://reviews.llvm.org/D123894	2022-04-21 10:21:46 -07:00
Vasileios Porpodas	889588ee97	[SLP] Refactoring isLegalBroadcastLoad() to use `ElementCount`. Replacing `unsigned` with `ElementCount` in the argument of `isLegalBroadcastLoad()`. This helps reduce the diff of a future SLP patch for AArch64.	2022-04-21 10:19:00 -07:00
chenglin.bi	25aba1abb5	Revert "[InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1)" This reverts commit `b543d28df7`.	2022-04-22 00:56:20 +08:00
chenglin.bi	b543d28df7	[InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1) Follow up D123453, add one-use limitation for (X * C2) << C1 --> X * (C2 << C1) to make consistent with lshr (mul nuw x, MulC), ShAmtC -> mul nuw x, (MulC >> ShAmtC) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124183	2022-04-22 00:32:36 +08:00
Sanjay Patel	8960ba7491	Revert "[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X" This reverts commit `5819f4a422`. This caused bots to fail with a crash/assert during the fold, so some constraint was missed.	2022-04-21 12:15:27 -04:00
Sanjay Patel	5819f4a422	[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X This is similar to an existing pre-shift-of-constant fold: `8a9c70fc01` ...but in this case, we need no-wrap on the shl and a negative offset: https://alive2.llvm.org/ce/z/_RVz99 Fixes #54890	2022-04-21 11:38:27 -04:00
Nikita Popov	46c2b41d02	[InstCombine] Remove dead code (NFC) This was a leftover condition without code.	2022-04-21 15:53:53 +02:00
Nikola Tesic	c5600aef88	[Debugify] Limit number of processed functions for original mode Debugify in OriginalDebugInfo mode, does (DebugInfo) collect-before-pass & check-after-pass for each instruction, which is pretty expensive. When used to analyze DebugInfo losses in large projects (like LLVM), this raises the build time unacceptably. This patch introduces a limit for the number of processed functions per compile unit. By default, the limit is set to UINT_MAX (practically unlimited), and by using the introduced option -debugify-func-limit the limit could be set to any positive integer number. Differential revision: https://reviews.llvm.org/D115714	2022-04-21 13:58:17 +02:00
Nikita Popov	3df86e799e	[SimplifyCFG] Handle branch on same condition in pred more directly Rather than creating a PHI node and then using the PHI threading code, directly handle this case in FoldCondBranchOnValueKnownInPredecessor(). This change is supposed to be NFC-ish, but may cause changes due to different transform order.	2022-04-21 11:22:02 +02:00
Nikita Popov	8988254667	[SimplifyCFG] Make FoldCondBranchOnPHI more amenable to extension This general threading transform can be performed whenever we know a constant value for the condition in a predecessor, which would currently just be the case of a phi node with constant arguments.	2022-04-21 10:49:49 +02:00
Chuanqi Xu	7eaa84eac3	[NFC] Code cleanups for coroutine after we remvoed legacy passes	2022-04-21 15:32:46 +08:00
Chuanqi Xu	483efc9ad0	[Pipelines] Remove Legacy Passes in Coroutines The legacy passes are deprecated now and would be removed in near future. This patch tries to remove legacy passes in coroutines. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D123918	2022-04-21 10:59:11 +08:00
Florian Hahn	bea69b232f	[VPlan] Initial modeling of middle block in VPlan. This patch extends the scope of VPlan to also include the exit (aka middle) block. For now, the exit block remains empty, but handling of exit values will subsequently be moved to VPlan, by adding recipes to model exit values in the exit block. As a first step, this will allow fixing #51366. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123457	2022-04-20 19:34:41 +01:00
Craig Topper	e3f6c2d288	[InstCombine] Don't look through bitcast from vector in collectInsertionElements. We're making a recursive call here and everything in the function assumes we're looking at scalars. This would be violated if we looked through a bitcast from vectors. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124015	2022-04-20 09:15:32 -07:00
chenglin.bi	1fae4b492d	[InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor if c is divisible by (1 << ShAmtC), we can fold this pattern: lshr (mul nuw x, c), ShAmtC -> mul nuw x, (c >> ShAmtC) https://alive2.llvm.org/ce/z/ox4wAt Fix https://github.com/llvm/llvm-project/issues/54824 Reviewed By: spatel, lebedev.ri, craig.topper Differential Revision: https://reviews.llvm.org/D123453	2022-04-21 00:13:36 +08:00
Sanjay Patel	bf09a925f2	[InstCombine] remove likely redundant ValueTracking-based folds for shifts This is not expected to have a functional difference as discussed in the post-commit comments for `8a9c70fc01`. All of the motivating tests for the older fold still optimize as expected because other code can infer the 'nuw'.	2022-04-20 11:28:31 -04:00
Nikita Popov	d727505e40	[SimplifyCFG] Remove one-use limitation in FoldCondBranchOnPHI() BlockIsSimpleEnoughToThreadThrough() already checks that the phi (and all other instructions) are not used outside the block, so this one-use check is not necessary for legality. I also don't see any reason why it would be necessary for profitability (in fact, those extra uses will be replaced with constants, which should be generally profitable).	2022-04-20 15:56:20 +02:00
Chuanqi Xu	5b6742a6bd	[NFC] Return correct PreservedAnalysis for CoroEarly This is a fix for previous typo. It makes no sense to return PreservedAnalyses::all() if anything is change. This change simplify codes further.	2022-04-20 16:47:10 +08:00
Fangrui Song	14d9390721	Revert D123198 "[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building libcalls." test/Transforms/InstCombine/pr39177.ll failed in a -DLLVM_USE_SANITIZER=Undefined build. ``` lib/Transforms/Utils/BuildLibCalls.cpp:1217:17: runtime error: reference binding to null pointer of type 'llvm::Function' ``` `Function &F = *M->getFunction(Name);` This reverts commit `0f8c626723`.	2022-04-19 22:26:10 -07:00
Andrew Browne	204c12eef9	[DFSan] Print an error before calling null extern_weak functions, incase dfsan instrumentation optimized out a null check. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D124051	2022-04-19 17:01:41 -07:00
Ilia Diachkov	6c69427e88	[SPIR-V](3/6) Add MC layer, object file support, and InstPrinter The patch adds SPIRV-specific MC layer implementation, SPIRV object file support and SPIRVInstPrinter. Differential Revision: https://reviews.llvm.org/D116462 Authors: Aleksandr Bezzubikov, Lewis Crawford, Ilia Diachkov, Michal Paszkowski, Andrey Tretyakov, Konrad Trifunovic Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com> Co-authored-by: Ilia Diachkov <iliya.diyachkov@intel.com> Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com> Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com> Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>	2022-04-20 01:10:25 +02:00
Paul Kirth	bac6cd5bf8	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D115907	2022-04-19 21:23:48 +00:00
Jonas Paulsson	0f8c626723	[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building libcalls. A new set of overloaded functions named getOrInsertLibFunc() are now supposed to be used instead of getOrInsertFunction() when building a libcall from within an LLVM optimizer(). The idea is that this new function also makes sure that any mandatory argument attributes are added to the function prototype (after calling getOrInsertFunction()). inferLibFuncAttributes() is renamed to inferNonMandatoryLibFuncAttrs() as it only adds attributes that are not necessary for correctness but merely helping with later optimizations. Generally, the front end is responsible for building a correct function prototype with the needed argument attributes. If the middle end however is the one creating the call, e.g. when replacing one libcall with another, it then must take this responsibility. This continues the work of properly handling argument extension if required by the target ABI when building a lib call. getOrInsertLibFunc() now does this for all libcalls currently built by any LLVM optimizer. It is expected that when in the future a new optimization builds a new libcall with an integer argument it is to be added to getOrInsertLibFunc() with the proper handling. Note that not all targets have it in their ABI to sign/zero extend integer arguments to the full register width, but this will be done selectively as determined by getExtAttrForI32Param(). Review: Eli Friedman, Nikita Popov, Dávid Bolvanský Differential Revision: https://reviews.llvm.org/D123198	2022-04-19 21:22:07 +02:00
Sanjay Patel	8a9c70fc01	[InstCombine] C0 shift (X add nuw C) --> (C0 shift C) shift X With 'nuw' we can convert the increment of the shift amount into a pre-shift (constant fold) of the shifted constant: https://alive2.llvm.org/ce/z/FkTyR2 Fixes issue #41976	2022-04-19 15:21:34 -04:00
Vasileios Porpodas	8d4b5e0833	[NFC][SLP] Improved description of getShallowScore() and getScoreAtLevelRec() Differential Revision: https://reviews.llvm.org/D124027	2022-04-19 12:15:36 -07:00
Florian Hahn	4026b718b8	[VPlan] Remove unused SCEV forward declaration (NFC).	2022-04-19 17:16:17 +02:00
Alexey Bataev	883571928c	Revert "[SLP]Improve reductions analysis and emission, part 1." This reverts commit `0e1f4d4d3c` to fix a crash reported in PR54976	2022-04-19 06:17:03 -07:00
Florian Hahn	a65f2730d2	[VPlan] Expand induction step in VPlan pre-header. This patch moves SCEV expansion of steps used by VPWidenIntOrFpInductionRecipes to the pre-header using VPExpandSCEVRecipe. This ensures that those steps are expanded while the CFG is in a valid state. Previously, SCEV expansion may happen during vector body code-generation, during which the CFG may be invalid, causing issues with SCEV expansion. Depends on D122095. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D122096	2022-04-19 13:06:39 +02:00
Chuanqi Xu	f9bee35689	[Pipelines] Hoist CoroEarly as a module pass This change could reduce the time we call `declaresCoroEarlyIntrinsics`. And it is helpful for future changes. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D123925	2022-04-19 11:04:24 +08:00
Michael Kruse	2d92ee97f1	Reapply "[OpenMP] Refactor OMPScheduleType enum." This reverts commit `af0285122f`. The test "libomp::loop_dispatch.c" on builder openmp-gcc-x86_64-linux-debian fails from time-to-time. See #54969. This patch is unrelated.	2022-04-18 21:56:47 -05:00
Vasileios Porpodas	b1333f03d9	Recommit "[SLP] Support internal users of splat loads" Code review: https://reviews.llvm.org/D121940 This reverts commit `359dbb0d3d`.	2022-04-18 15:58:01 -07:00
Michael Kruse	af0285122f	Revert "[OpenMP] Refactor OMPScheduleType enum." This reverts commit `9ec501da76`. It may have caused the openmp-gcc-x86_64-linux-debian buildbot to fail. https://lab.llvm.org/buildbot/#/builders/4/builds/20377	2022-04-18 14:38:31 -05:00
Sanjay Patel	3a27b51b27	[InstCombine] reduce code for freeze of undef The description was ambiguous about the behavior when boths select arms are constant or both arms are not constant. I don't think there's any evidence to support either way, but this matches the code with a more specified description. We can extend this to deal with vector constants with undef/poison elements. Currently, those don't get folded anywhere.	2022-04-18 15:14:02 -04:00
Vasileios Porpodas	359dbb0d3d	Revert "[SLP] Support internal users of splat loads" This reverts commit `f8e1337115`.	2022-04-18 12:12:34 -07:00
Michael Kruse	9ec501da76	[OpenMP] Refactor OMPScheduleType enum. The OMPScheduleType enum stores the constants from libomp's internal sched_type in kmp.h and are used by several kmp API functions. The enum values have an internal structure, namely each scheduling algorithm (e.g.) exists in four variants: unordered, orderend, normerge unordered, and nomerge ordered. This patch (basically a followup to D114940) splits the "ordered" and "nomerge" bits into separate flags, as was already done for the "monotonic" and "nonmonotonic", so we can apply bit flags operations on them. It also now contains all possible combinations according to kmp's sched_type. Deriving of the OMPScheduleType enum from clause parameters has been moved form MLIR's OpenMPToLLVMIRTranslation.cpp to OpenMPIRBuilder to make available for clang as well. Since the primary purpose of the flag is the binary interface to libomp, it has been made more private to LLVMFrontend. The primary interface for generating worksharing-loop using OpenMPIRBuilder code becomes `applyWorkshareLoop` which derives the OMPScheduleType automatically and calls the appropriate emitter function. While this is mostly a NFC refactor, it still applies the following functional changes: * The logic from OpenMPToLLVMIRTranslation to derive the OMPScheduleType also applies to clang. Most notably, it now applies the nonmonotonic flag for non-static schedules by default. * In OpenMPToLLVMIRTranslation, the nonmonotonic default flag was previously not applied if the simd modifier was used. I assume this was a bug, since the effect was due to `loop.schedule_modifier()` returning `mlir::omp::ScheduleModifier::none` instead of `llvm::Optional::None`. * In OpenMPToLLVMIRTranslation, the nonmonotonic default flag was set even if ordered was specified, in breach to what the comment before citing the OpenMP specification says. I assume this was an oversight. The ordered flag with parameter was not considered in this patch. Changes will need to be made (e.g. adding/modifying function parameters) when support for it is added. The lengthy names of the enum values can be discussed, for the moment this is avoiding reusing previously existing enum value names such as `StaticChunked` to avoid confusion. Reviewed By: peixin Differential Revision: https://reviews.llvm.org/D123403	2022-04-18 14:03:17 -05:00
Vasileios Porpodas	f8e1337115	[SLP] Support internal users of splat loads Until now we would only accept a broadcast load pattern if it is only used by a single vector of instructions. This patch relaxes this, and allows for the broadcast to have more than one user vector, as long as all of its uses are internal to the SLP graph and vectorized. Differential Revision: https://reviews.llvm.org/D121940	2022-04-18 11:59:44 -07:00
Arthur Eubanks	2e6ac54cf4	[LegacyPM] Remove ThinLTO/LTO pipelines Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove the (Thin)LTO pipelines. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D123882	2022-04-18 10:09:41 -07:00
Zarko Todorovski	ce87133120	[llvm][IPO] Inclusive language: Rename mergefunc-sanity to mergefunc-verify and remove other instances of sanity in MergeFunctions.cpp This patch renames the mergefunc-sanity to mergefunc-verify and renames the related functions to use more inclusive language Reviewed By: cebowleratibm Differential Revision: https://reviews.llvm.org/D114374	2022-04-18 11:50:08 -04:00
Aaron Ballman	86cdb2929c	Silence a "not all control paths return a value" warning; NFC	2022-04-18 08:54:08 -04:00
Johannes Doerfert	e87f10a771	[Attributor] CGSCC pass should not recompute results outside the SCC (reapply) When we run the CGSCC pass we should only invest time on the SCC. We can initialize AAs with information from the module slice but we should not update those AAs. We make an exception for are call site of the SCC as they are helpful providing information for the SCC. Minor modifications to pointer privatization allow us to perform it even in the CGSCC pass, similar to ArgumentPromotion.	2022-04-17 12:48:49 -05:00
Andrew Litteken	d7c56a076e	[IROutliner] Ensure that phi values that are passed in as arguments are remapped as arguments Issue: https://github.com/llvm/llvm-project/issues/54430 For incoming values of phi nodes added to an outlined function to accommodate different exit paths in the function, when a value is a constant that is passed into the outlined function as an argument, we find the corresponding value in the first extracted function used to fill the overall outlined function. When this value is an argument, the corresponding value used will be the old value, prior to outlining. This patch maintains a mapping from these values to arguments, and uses this mapping to update the added phi node accordingly. Reviewers: paquette Recommit of `d6eb480afb` Differential Revision: https://reviews.llvm.org/D122206	2022-04-16 15:47:52 -05:00
eop Chen	38ec33d6b9	[LSR] Update outdated comment	2022-04-16 12:11:15 -07:00
Joseph Huber	984a0dc386	[OpenMP] Use new offloading binary when embedding offloading images The previous patch introduced the offloading binary format so we can store some metada along with the binary image. This patch introduces using this inside the linker wrapper and Clang instead of the previous method that embedded the metadata in the section name. Differential Revision: https://reviews.llvm.org/D122683	2022-04-15 20:35:26 -04:00
Johannes Doerfert	3be3b40188	[Attributor][NFCI] Introduce AttributorConfig to bundle all options Instead of lengthy constructors we can now set the members of a read-only struct before the Attributor is created. Should make it clearer what is configurable and also help introducing new options in the future. This actually added IsModulePass and avoids deduction through the Function set size. No functional change was intended.	2022-04-15 18:17:19 -05:00
Florian Hahn	73f5d7d0d6	[VPlan] Handle equal address and store ops in onlyFirstLaneDemanded. With opaque pointers, the stored value and address can be the same. Previously the code in VPWidenMemoryInstructionRecipe::onlyFirstLaneDemanded incorrectly considers stores with matching store and pointer operands as only demanding the first lane, causing a crash.	2022-04-15 22:53:33 +02:00
Johannes Doerfert	04f3a224bc	[Attributor][NFC] Introduce a flag to distinguish the scope of a query	2022-04-15 14:56:10 -05:00
Johannes Doerfert	bd72acf4d8	[Attributor][NFC] Code cleanup to minimize follow up changes	2022-04-15 14:56:09 -05:00
Johannes Doerfert	2d8e7834b0	[Attributor][NFC] Rename AAPotentialValues to AAPotentialConstantValues	2022-04-15 14:56:09 -05:00
Johannes Doerfert	1fb415fee9	[AMDGPU][FIX] Proper load-store-vectorizer result with opaque pointers The original code relied on the fact that we needed a bitcast instruction (for non constant base objects). With opaque pointers there might not be a bitcast. Always check if reordering is required instead. Fixes: https://github.com/llvm/llvm-project/issues/54896 Differential Revision: https://reviews.llvm.org/D123694	2022-04-15 13:42:46 -05:00
Fangrui Song	04e094a336	[PGO] Remove legacy PM passes Legacy PM for optimization pipeline was deprecated in 13.0.0 and Clang dropped legacy PM support in D123609. This change removes legacy PM passes for PGO so that downstream projects won't be able to use it. It seems appropriate to start removing such "add-on" features like instrumentations, before we remove more stuff after 15.x is branched. I have checked many LLVM users and only ldc[1] uses the legacy PGO pass. [1]: https://github.com/ldc-developers/ldc/issues/3961 Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D123834	2022-04-15 10:26:43 -07:00
Andrew Browne	cddcf2170a	[DFSan] Avoid replacing uses of functions in comparisions. This can cause crashes by accidentally optimizing out checks for extern_weak_func != nullptr, when replaced with a known-not-null wrapper. This solution isn't perfect (only avoids replacement on specific patterns) but should address common cases. Internal reference: b/185245029 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D123701	2022-04-14 14:14:52 -07:00
Sanjay Patel	2c2568f39e	[InstCombine] canonicalize select with signbit test This is part of solving issue #54750 - in that example we have both forms of the compare and do not recognize the equivalence.	2022-04-14 14:28:47 -04:00
Andrew Litteken	6f8eba06c2	Revert "[IROutliner] Ensure that phi values that are passed in as arguments are remapped as arguments" Failing test due to typo This reverts commit `d6eb480afb`.	2022-04-14 12:23:33 -05:00
Andrew Litteken	d6eb480afb	[IROutliner] Ensure that phi values that are passed in as arguments are remapped as arguments Issue: https://github.com/llvm/llvm-project/issues/54430 For incoming values of phi nodes added to an outlined function to accommodate different exit paths in the function, when a value is a constant that is passed into the outlined function as an argument, we find the corresponding value in the first extracted function used to fill the overall outlined function. When this value is an argument, the corresponding value used will be the old value, prior to outlining. This patch maintains a mapping from these values to arguments, and uses this mapping to update the added phi node accordingly. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D122206	2022-04-14 12:16:23 -05:00
Andrew Litteken	a919d3d888	[IROutliner] Ensure that incoming blocks of PHINodes are included in the unique numbering gneration for phi nodes for each exit path Issue: https://github.com/llvm/llvm-project/issues/54431 PHINodes that need to be generated to accommodate a PHINode outside the region due to different output paths need to have their own numbering to determine the number of output schemes required to properly handle all the outlined regions. This numbering was previously only determined by the order and values of the incoming values, as well as the parent block of the PHINode. This adds the incoming blocks to the calculation of a hash value for these PHINodes as well, and the supporting infrastructure to give each block in a region a corresponding canonical numbering. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D122207	2022-04-14 12:13:17 -05:00
chenglin.bi	00871e2f4f	[SimplifyCFG] Try to fold switch with single result value and power-of-2 cases to mask+select When switch with 2^n cases go to one result, check if the 2^n cases can be covered by n bit masks. If yes we can use "and condition, ~mask" to simplify the switch case 0 2 4 6 -> and condition, -7 https://alive2.llvm.org/ce/z/jjH_0N case 0 2 8 10 -> and condition, -11 https://alive2.llvm.org/ce/z/K7E-2V case 2 4 8 12 -> and (sub condition, 2), -11 https://alive2.llvm.org/ce/z/CrxbYg Fix one case of https://github.com/llvm/llvm-project/issues/39957 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D122485	2022-04-15 00:10:00 +08:00
Florian Hahn	2c14cdf831	[VPlan] Turn external defs in Value -> VPValue mapping. This addresses an existing TODO by keeping a mapping of external IR Value * definitions wrapped in VPValues for use in a VPlan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123700	2022-04-14 12:03:09 +02:00
Ruiling Song	1e01f95057	LowerSwitch: Avoid inserting NewDefault block The NewDefault was used to simplify the updating of PHI nodes, but it causes some inefficiency for target that will run structurizer later. For example, for a simple two-case switch, the extra NewDefault is causing unstructured CFG like: O / \ O O / \ / \ C1 ND C2 \ \| / \ \| / D The change is to avoid the ND(NewDefault) block, that is we will get a structured CFG for above example like: O / \ / \ O O / \ / \ C1 \ / C2 \-> D <-/ The IR change introduced by this patch should be trivial to other targets, so I am doing this unconditionally. Fall-through among the cases will also cause unstructured CFG, but it need more work and will be addressed in a separate change. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D123607	2022-04-14 13:30:56 +08:00
Sanjay Patel	0ef46dc0f9	[SimplifyCFG] improve readability in switch-to-select; NFC	2022-04-13 17:14:45 -04:00
serge-sans-paille	fa5a4e1b95	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `a96638e50e` detected a few regressions, fixing them.	2022-04-13 20:53:19 +02:00
Florian Hahn	ad95255b92	Revert "[LICM] Only create load in pre-header when promoting load." This reverts commit `4bf3b7dc92`. This might be causing another buildbot failure.	2022-04-13 20:24:28 +02:00
serge-sans-paille	262eba01b3	Revert "[ValueTracking] Make getStringLenth aware of strdup" This reverts commit `e810d55809`. The commit was not taken into account the fact that strduped string could be modified. Checking if such modification happens would make the function very costly, without a test case in mind it's not worth the effort.	2022-04-13 19:17:28 +02:00
Anna Thomas	28f27dd264	Check users of instrinsics instead of traversing entire function.NFC Updated LowerGuardIntrinsic and LowerWidenableCondition to check for users of the respective intrinsic, instead of checking for guards and widenable conditions by traversing the entire function. This is an NFC. Should save some compile time.	2022-04-13 12:28:51 -04:00
Florian Hahn	4bf3b7dc92	Recommit "[LICM] Only create load in pre-header when promoting load." This reverts the revert commit `1ddc719680`. This version of the patch sets the initial available value to poison, which resolves an issue with the SSAUpdater breaking LCSSA form.	2022-04-13 17:20:39 +02:00
Nikita Popov	8c74169990	[SimplifyLibCalls] Don't mark memchr() memory as fully dereferenceable C11 specifies memchr() as follows: > The memchr function locates the first occurrence of c (converted > to an unsigned char) in the initial n characters (each interpreted > as unsigned char) of the object pointed to by s. The implementation > shall behave as if it reads the characters sequentially and stops > as soon as a matching character is found. In particular, it is well-defined to specify a memchr size larger than the underlying object, as long as the character is found before the end of the object. Differential Revision: https://reviews.llvm.org/D123665	2022-04-13 16:46:18 +02:00
Sanjay Patel	cd0d0d633b	[SimplifyCFG] make a debug option for case max when converting switch to select This should be "NFC" as written, but it will make D122485 smaller and give us more flexibility to experiment with optimization level vs. compile-time. Differential Revision: https://reviews.llvm.org/D123625	2022-04-13 06:55:13 -04:00
Vitaly Buka	79fa8be4ae	[NFC][msan] Switch pointer to a reference	2022-04-12 18:45:50 -07:00
Alexey Bataev	0e1f4d4d3c	[SLP]Improve reductions analysis and emission, part 1. Currently SLP vectorizer walks through the instructions and selects 3 main classes of values: 1) reduction operations - instructions with same reduction opcode (add, mul, min/max, etc.), which build the reduction, 2) reduced values - instructions with the same opcodes, but different from the reduction opcode, 3) extra arguments - all other values, instructions from the different basic block rather than the root node, instructions with to many/less uses. This scheme is not very efficient. It excludes some instructions and all non-instruction values from the reductions (constants, proficient gathers), to many possibly reduced values are marked as extra arguments. Patch improves this process by introducing a bit extended analysis stage. During this stage, we still try to select 3 classes of the values: 1) reduction operations - same as before, 2) possibly reduced values - all instructions from the current block/non-instructions, which may build a vectorization tree, 3) extra arguments - instructions from the different basic blocks. Additionally, an extra sorting of the possibly reduced values occurs to build the scalar sequences which highly likely will bed vectorized, e.g. loads are grouped by the distance between them, constants are grouped together, cmp instructions are sorted by their compare types and predicates, extractelement instructions are sorted by the vector operand, etc. Also, these groups are reordered by their length so the longest group is the first in the list of the possibly reduced values. The vectorization process tries to emit the reductions for all these groups. These reductions, remaining non-vectorized possible reduced values and extra arguments are then combined into the final expression just like it was before. Differential Revision: https://reviews.llvm.org/D114171	2022-04-12 17:46:11 -07:00
Muhammad Omair Javaid	42ebfa8269	Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth" This reverts commit `64b6192e81`. This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage: https://lab.llvm.org/buildbot/#/builders/176/builds/1515 llvm-tblgen crashes after applying this patch.	2022-04-13 04:53:07 +05:00
Arthur Eubanks	51561b5e80	[ArgPromo][OpaquePointer] Don't promote mismatched function types Mismatched call/callee function types is considered an indirect call. Fixes crash in https://reviews.llvm.org/D123300#3446023.	2022-04-12 15:17:45 -07:00
Nikita Popov	0adadfa68f	[MSan] Ensure argument shadow initialized on memcpy We need to explicitly query the shadow here, because it is lazily initialized for byval arguments. Without opaque pointers this used to mostly work out, because there would be a bitcast to `i8*` present, and that would query, and copy in case of byval, the argument shadow. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D123602	2022-04-12 14:53:02 -07:00
Vitaly Buka	efdc90baaa	Revert "[MSan] Ensure argument shadow initialized on memcpy" Invalid author. This reverts commit `163a9f4552`.	2022-04-12 14:53:02 -07:00
Vitaly Buka	163a9f4552	[MSan] Ensure argument shadow initialized on memcpy We need to explicitly query the shadow here, because it is lazily initialized for byval arguments. Without opaque pointers this used to mostly work out, because there would be a bitcast to `i8*` present, and that would query, and copy in case of byval, the argument shadow. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D123602	2022-04-12 14:49:52 -07:00
Sanjay Patel	d9211be13d	[SimplifyCFG] cleanup code for converting switch to select (NFC) This renames functions for more general usage (and current capitalization style) before a proposed logic change in D122485. Differential Revision: https://reviews.llvm.org/D123614	2022-04-12 12:17:54 -04:00
serge-sans-paille	e810d55809	[ValueTracking] Make getStringLenth aware of strdup During strlen compile-time evaluation, make it possible to track size of strduped strings. Differential Revision: https://reviews.llvm.org/D123497	2022-04-12 14:47:29 +02:00
Liqin Weng	fa4b4f0fcb	[InstCombine] fold more constant remainder to select-of-constants remainder Reviewed By: xbolva00, spatel, Chenbing.Zheng Differential Revision: https://reviews.llvm.org/D123486	2022-04-12 09:40:56 +08:00
Alexander Shaposhnikov	f6bb156fb1	[InstCombine] Fold icmp(X) ? f(X) : C This diff extends foldSelectInstWithICmp to handle the case icmp(X) ? f(X) : C when f(X) is guaranteed to be equal to C for all X in the exact range of the inverse predicate. This addresses the issue https://github.com/llvm/llvm-project/issues/54089. Differential revision: https://reviews.llvm.org/D123159 Test plan: make check-all	2022-04-12 01:32:55 +00:00
Sanjay Patel	1206a18d41	[InstCombine] guard against splat-mul corner case The test is already simplified, and I'm not sure how to write a test to exercise the new clause. But it protects the 2-bit pattern from miscompiling as noted in D123453. https://alive2.llvm.org/ce/z/QPyVfv (If we managed to fall into the mul transform, it would wrongly create a zero on this pattern.)	2022-04-11 15:50:13 -04:00
Whitney Tsang	80304c5f88	[LoopUnroll] Always respect user unroll pragma IMO when user provide unroll pragma, compiler should always respect it. It is not clear to me why loop unroll pass currently ensure that the unrolled loop size is limited by PragmaUnrollThreshold. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D119148	2022-04-11 14:33:24 -04:00
Sanjay Patel	7783db55af	[InstCombine] try to fold low-mask of ashr to lshr With one-use, we handle this via demanded-bits. But We need to handle extra uses to improve issue #54750. https://alive2.llvm.org/ce/z/aDYkPv	2022-04-11 11:56:40 -04:00
Florian Hahn	1ddc719680	Revert "[LICM] Only create load in pre-header when promoting load." This reverts commit `42229b96bf`. This appears to cause crashes on multiple bots.	2022-04-11 17:37:23 +02:00
Nikita Popov	9af8cc8d17	[SimplifyLibCalls] Remove unnecessary inbounds check Even if the GEP is not inbounds, the GEP will have provenance of the global, and accessing past the extent of the global would be undefined behavior.	2022-04-11 16:51:09 +02:00
Florian Hahn	42229b96bf	[LICM] Only create load in pre-header when promoting load. When only a store is sunk, there is no need to create a load in the pre-header, as the result of the load will never get used. The dead load can can introduce UB, if the function is marked as writeonly. Fixes #51248. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123473	2022-04-11 16:45:18 +02:00
Simon Pilgrim	431e93f4f5	[InstCombine] Fold sub(add(x,y),min/max(x,y)) -> max/min(x,y) (PR38280) As discussed on Issue #37628, we can flip a min/max node if we're subtracting from the sum of the node's operands Alive2: https://alive2.llvm.org/ce/z/W_KXfy Differential Revision: https://reviews.llvm.org/D123399	2022-04-11 11:32:56 +01:00
Florian Hahn	5f1eb74850	[VPlan] Place VPExpandSCEVRecipe in pre-header. After D121624 models the pre-header in VPlan, VPExpandSCEVRecipes can be placed there. This ensures SCEV expansion happens before modifying the CFG during VPlan execution, when CFG is incomplete. Depends on D121624. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D122095	2022-04-10 10:26:20 +02:00
Florian Hahn	256c6b0ba1	[VPlan] Model pre-header explicitly. This patch extends the scope of VPlan to also model the pre-header. The pre-header can be used to place recipes that should be code-gen'd outside the loop, like SCEV expansion. Depends on D121623. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121624	2022-04-09 14:19:47 +02:00
Matt Arsenault	9fdd25848a	Transforms: Fix code duplication between LowerAtomic and AtomicExpand	2022-04-08 19:06:36 -04:00
Florian Hahn	467dbcd9f1	[LV] Set debug loc after setting insert point. This fixes the code to actually use the location of the instruction, if available. Previously, SetInsertPoint would overwrite the insert point set from the instruction.	2022-04-08 20:34:40 +02:00
Arthur Eubanks	b22ffc7b98	[CaptureTracking] Ignore ephemeral values in EarliestEscapeInfo And thread DSE's ephemeral values to EarliestEscapeInfo. This allows more precise analysis in DSEState::isReadClobber() via BatchAA. Followup to D123162. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123342	2022-04-08 10:07:26 -07:00
Florian Hahn	29fe998eaa	[VPlan] Preserve debug location when creating branch. Update createEmptyBasicBlock to preserve the debug location of the previous terminator.	2022-04-08 17:22:53 +02:00
Zaara Syeda	07005440ae	[LSR] Optimize unused IVs to final values in the exit block Loop Strength Reduce sometimes optimizes away all uses of an induction variable from a loop but leaves the IV increments. When the only remaining use of the IV is the PHI in the exit block, this patch will call rewriteLoopExitValues to replace the exit block PHI with the final value of the IV to skip the updates in each loop iteration. Differential Revision: https://reviews.llvm.org/D118808	2022-04-08 11:16:37 -04:00
Nikita Popov	c8c6362560	[LICM] Pass MemorySSAUpdater by referene (NFC) Make it clearer that this is a required dependency.	2022-04-08 10:08:57 +02:00
Nikita Popov	5cefe7d9f5	[LoopSink] Require MemorySSA This makes MemorySSA in LoopSink required, and removes the AST-based implementation, as well as the related support code in LICM. Differential Revision: https://reviews.llvm.org/D123288	2022-04-08 09:49:44 +02:00
serge-sans-paille	aa15ea47e2	[builtin_object_size] Basic support for posix_memalign It actually implements support for seeing through loads, using alias analysis to refine the result. This is rather limited, but I didn't want to rely on more than available analysis at that point (to be gentle with compilation time), and it does seem to catch common scenario, as showcased by the included tests. Differential Revision: https://reviews.llvm.org/D122431	2022-04-08 09:31:11 +02:00
Evgeniy Brevnov	da41214d65	Add support for atomic memory copy lowering Currently, the utility supports lowering of non atomic memory transfer routines only. This patch adds support for atomic version of memcopy. This may be useful for targets not supporting atomic memcopy. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D118443	2022-04-08 10:41:31 +07:00
Austin Kerbow	26b14c3ea7	[InferAddressSpaces] Fix assert on invalid bitcast placement Similar to the problem in `0bb25b4603`, bitcasts that are inserted must dominate all uses. When rewriting "values" with "new values" that have the updated address space, we may replace the "new value" with a bitcast if one of the original users is an addresspace cast. This bitcast must be inserted before ALL users, not only before the addresspace cast. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122964	2022-04-07 20:07:53 -07:00
Chenbing Zheng	467cbb6249	[InstCombine] fold more constant divisor to select-of-constants divisor By adding a parameter to function FoldOpIntoSelect， we can fold more Ops to Select. For this example, we tend to fold the division instruction, so we no longer care whether SelectInst is one use. This patch slove TODO left in InstCombine/div.ll. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122967	2022-04-08 10:19:24 +08:00
Augie Fackler	f3c702fbd1	InstCombineCalls: fix annotateAnyAllocCallSite to report changes Spotted during review of D123052. Differential Revision: https://reviews.llvm.org/D123232	2022-04-07 13:49:09 -04:00
Arthur Eubanks	17fdaccccf	[CaptureTracking] Ignore ephemeral values when determining pointer escapeness Ephemeral values cannot cause a pointer to escape. No change in compile time: https://llvm-compile-time-tracker.com/compare.php?from=4371710085ba1c376a094948b806ddd3b88319de&to=c5ddbcc4866f38026737762ee8d7b9b00395d4f4&stat=instructions This partially fixes some regressions caused by more calls to `__builtin_assume` (D122397). Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D123162	2022-04-07 10:11:14 -07:00
Augie Fackler	b916414096	BuildLibCalls: also set allocsize() attributes This is part of being able to get rid of two more columns in MemoryBuiltins.cpp's large table. We'll have two more changes before we can finish the job. Differential Revision: https://reviews.llvm.org/D119582	2022-04-07 12:38:44 -04:00
Augie Fackler	f120be6c86	InstCombineCalls: when adding an align attribute, never reduce it Sometimes we can infer an align from an allocalign but the function already promised it'd be more-aligned than the allocalign and there's an existing align that we shouldn't reduce. Make sure we handle that correctly. Differential Revision: https://reviews.llvm.org/D121642	2022-04-07 12:38:44 -04:00
Augie Fackler	ca051a46fb	InstCombineCalls: infer return alignment from allocalign attributes This exposes a couple of lingering bugs, which will be fixed in the next two commits. Differential Revision: https://reviews.llvm.org/D123052	2022-04-07 12:38:44 -04:00
Nikita Popov	e22af03a79	[Sink] Don't sink non-willreturn calls (PR51188) Fixes https://github.com/llvm/llvm-project/issues/51188.	2022-04-07 16:35:05 +02:00
Simon Pilgrim	afa1ae9e0c	[InstCombine] SimplifyDemandedUseBits - allow and(srem(X,Pow2),C) -> and(X,C) to work on vector types Replace m_ConstantInt with m_APInt to match uniform (no-undef) vector remainder amounts.	2022-04-07 15:24:45 +01:00
Simon Pilgrim	5909c67883	[InstCombine] SimplifyDemandedUseBits - add TODO to remove shl node if we only demand known sign bits of the shift source Similar to what we already perform for ashr/lshr	2022-04-07 14:35:11 +01:00
Simon Pilgrim	5e90224839	[InstCombine] SimplifyDemandedUseBits - remove lshr node if we only demand known sign bit This is a lshr equivalent to D122340 - if we don't demand any of the additional sign bits introduced by the ashr, the lshr can be treated as an ashr and we can remove the shift entirely if we only demand already known sign bits. Another step towards PR21929 https://alive2.llvm.org/ce/z/6f3kjq Differential Revision: https://reviews.llvm.org/D123118	2022-04-07 14:33:31 +01:00
Benjamin Kramer	ff485d727f	Transforms: Remove unused include Utils can't depend on Scalar transforms.	2022-04-07 10:40:28 +02:00
Florian Hahn	4388c979da	[VPlan] Use vector.body as header name in VPlan native path. This brings the VPlan block naming in line with the naming of the generated basic blocks.	2022-04-07 10:31:12 +02:00
Nikita Popov	674ee4d353	[LoopSink] Use MemorySSA with legacy pass manager LoopSink with the legacy pass manager still uses AST, because we can't compute MemorySSA conditionally. I think now that the legacy pass manager will be removed soon(TM) we don't need to care about compile-time impact here anymore. Additionally, since MemorySSA is no longer eagerly optimized, the impact is actually not that high anymore (~0.2% geomean regression on CTMark). This just makes legacy PM and new PM behavior line up -- as a followup I'll drop these options entirely and make MemorySSA use mandatory. Differential Revision: https://reviews.llvm.org/D123216	2022-04-07 09:40:29 +02:00
Matt Arsenault	39f1568633	Transforms: Split LowerAtomics into separate Utils and pass This will allow code sharing from AtomicExpandPass. Not entirely sure why these exist as separate passes though.	2022-04-06 20:54:45 -04:00
Alina Sbirlea	08075a7ee8	Revert `f7381a795a` Roll-forward `29fada4a3d`. Issue triggered was due to UB. Differential Revision: https://reviews.llvm.org/D121987	2022-04-06 16:06:14 -07:00

... 8 9 10 11 12 ...

31127 Commits