llvm-project

Commit Graph

Author	SHA1	Message	Date
zhongyunde	3e6ba89055	[InstCombine] Fold a mul with bool value into and Fixes https://github.com/llvm/llvm-project/issues/55599 X * Y --> X & Y, iff X, Y can be only {0, 1}. https://alive2.llvm.org/ce/z/_RsTKF Reviewed By: spatel, nikic Differential Revision: https://reviews.llvm.org/D126040	2022-05-30 21:05:00 +08:00
Nikita Popov	1721ff1dfd	[GVN] Enable enable-split-backedge-in-load-pre option by default This option was added in D89854. It prevents GVN from performing load PRE in a loop, if doing so would require critical edge splitting on the backedge. From the review: > I know that GVN Load PRE negatively impacts peeling, > loop predication, so the passes expecting that latch has > a conditional branch. In the PhaseOrdering test in this patch, splitting the backedge negatively affects vectorization: After critical edge splitting, the loop gets rotated, effectively peeling off the first loop iteration. The effect is that the first element is handled separately, then the bulk of the elements use a vectorized reduction (but using unaligned, off-by-one memory accesses) and then a tail of 15 elements is handled separately again. It's probably worth noting that the loop load PRE from D99926 is not affected by this change (as it does not need backedge splitting). This is about normal load PRE that happens to occur inside a loop. Differential Revision: https://reviews.llvm.org/D126382	2022-05-30 09:55:58 +02:00
Max Kazantsev	503d5771b6	[JumpThreading][NFCI] Reuse existing DT instead of recomputation This whole part with recomputation of BPI and BFI looks redundant, and we tried to get rid of it in D124439. Unfortunately, it causes some hard-to-reproduce failures due to invalid state of analysis. Until this is investigated and fixed, let's try to reuse at least part of available analyzes. DT is available at this point, and there is no need to recompute it. Please revert if you see it causing any behavior changes.	2022-05-30 12:48:10 +07:00
Chenbing Zheng	ef256ed58e	[InstCombine] bitcast (extractelement <1 x elt>, dest) -> bitcast(<1 x elt>, dest) Only solve dest type is vector to avoid inverse transform in visitBitCast. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D125951	2022-05-30 10:16:32 +08:00
Florian Hahn	0776c48f9b	Recommit "[LICM] Only create load in ph when promoting load or store doesn't exec." This reverts the revert commit `ad95255b92`. The updated version also creates a load when the store may not execute. In those cases, we still need to introduce a load in a function where there may not have been one before, so this doesn't completely resolve issue #51248. Original message: When only a store is sunk, there is no need to create a load in the pre-header, as the result of the load will never get used. The dead load can can introduce UB, if the function is marked as writeonly. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123473	2022-05-29 21:57:14 +01:00
Florian Hahn	6abce17fc2	[VPlan] Use Exiting-block instead of Exit-block terminology (NFC). In LLVM's common loop terminology, an exit block is a block outside a loop with a predecessor inside the loop. An exiting block is a block inside the loop which branches to an exit block outside the loop. This patch updates a few places where VPlan was using ExitBlock for a block exiting a region. Those instances have been updated to use ExitingBlock. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D126173	2022-05-28 21:16:05 +01:00
eopXD	6a84579243	[LSR][TTI][PowerPC][SystemZ][X86] Add const-ness to TTI::isLSRCostLess. NFC Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D126350	2022-05-27 15:22:23 -07:00
Sanjay Patel	b5b6aa4d53	[InstCombine] fold multiply by signbit-splat to cmp+select (ashr i32 X, 31) * C --> (X < 0) ? -C : 0 https://alive2.llvm.org/ce/z/G8u9SS With a constant operand, this is an improvement in IR and codegen (where it can be converted to a mask op). Without a constant operand, we would have to negate the operand, so that is probably better left to the backend. This is similar but not the same optimization that is requested in #55618.	2022-05-27 11:54:19 -04:00
Sanjay Patel	5a6e085757	[InstCombine] reduce code duplication; NFC	2022-05-27 11:54:19 -04:00
Enna1	52992f136b	Add !nosanitize to FixedMetadataKinds This patch adds !nosanitize metadata to FixedMetadataKinds.def, !nosanitize indicates that LLVM should not insert any sanitizer instrumentation. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D126294	2022-05-27 09:46:13 +08:00
Arthur Eubanks	36096c2b38	[NFC][JumpThreading] Remove InsertFreezeWhenUnfoldingSelect pass parameter All callers pass true. select-unfold-freeze.ll is now a subset of select.ll so delete it. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126501	2022-05-26 16:13:34 -07:00
Sanjay Patel	c4c750058f	[InstCombine] fold mul of signbit directly to X < 0 ? Y : 0 This is effectively NFC (intentionally no test diffs) because we already have the related fold that converts the 'and' pattern to select. So this is just an efficiency improvement.	2022-05-26 16:19:15 -04:00
Sanjay Patel	49f8b05137	[InstCombine] fold icmp equality with sdiv and SMIN This extends the fold from D126410 / `3952c905ef` to allow for the only case where it works with signed division: https://alive2.llvm.org/ce/z/k7_ypu (X s/ Y) == SMIN --> (X == SMIN) && (Y == 1) (X s/ Y) != SMIN --> (X != SMIN) \|\| (Y != 1) This is another improvement based on #55695.	2022-05-26 16:19:15 -04:00
Sanjay Patel	ed5be1523f	[InstCombine] reduce code duplication in icmp+div folds; NFC	2022-05-26 16:19:15 -04:00
Owen Anderson	939a43461b	Revert "Replace the custom linked list in LeaderTableEntry with TinyPtrVector." This reverts commit `1e91149844`. Pending further discussion.	2022-05-26 09:50:36 -07:00
Nikita Popov	c8eb83f2d0	[ControlHeightReduction] Use logical and Use logical instead of bitwise and to combine conditions, to avoid propagating poison from a later condition if an earlier one is already false. This avoids introducing branch on poison. Differential Revision: https://reviews.llvm.org/D125898	2022-05-26 18:03:35 +02:00
Alexey Bataev	7b809c30b9	[SLP]Improve compile time, NFC. Patch improves compile time. For function calls, which cannot be vectorized, create a unique group for each such a call instead of subgroup. It prevents them from being grouped by a subgroups and attempts for their vectorization. Also, looks through casts operand to try to check their groups/subgroups. Reduces number of vectorization attempts. No changes in the statistics for SPEC2017/2006/llvm-test-suite. Differential Revision: https://reviews.llvm.org/D126476	2022-05-26 08:40:59 -07:00
Alexey Bataev	120d52b0ef	[SLP]Fix PR55653: emit undefs where required, not poison. Need to handle a corner case correctly, if all elements are Undefs/Poisons, need to emit actual values, not just poisons. Differential Revision: https://reviews.llvm.org/D126298	2022-05-26 08:38:50 -07:00
Alex Zhikhartsev	8b0d763474	[DFAJumpThreading] Relax analysis to handle unpredictable initial values Responding to a feature request from the Rust community: https://github.com/rust-lang/rust/issues/80630 void foo(X) { for (...) switch (X) case A X = B case B X = C } Even though the initial switch value is non-constant, the switch statement can still be threaded: the initial value will hit the switch statement but the rest of the state changes will proceed by jumping unconditionally. The early predictability check is relaxed to allow unpredictable values anywhere, but later, after the paths through the switch statement have been enumerated, no non-constant state values are allowed along the paths. Any state value not along a path will be an initial switch value, which can be safely ignored. Differential Revision: https://reviews.llvm.org/D124394	2022-05-26 11:29:54 -04:00
Simon Pilgrim	14258d6fb5	[SLP] Move canVectorizeLoads implementation to simplify the diff in D105986. NFC.	2022-05-26 15:23:58 +01:00
Alexey Bataev	9139d484d4	[SLP]Fix crash on reordering of ScatterVectorize nodes. ScatterVectorize nodes should be handled same way as gathers in reorderBottomToTop function, since we can simple reorder the loads in this node. Because of that need to include such nodes to the list of gathered nodes to fix compiler crash. Differential Revision: https://reviews.llvm.org/D126378	2022-05-26 06:25:58 -07:00
Sanjay Patel	3952c905ef	[InstCombine] fold icmp equality with udiv and large constant With large compare constant: (X u/ Y) == C --> (X == C) && (Y == 1) (X u/ Y) != C --> (X != C) \|\| (Y != 1) https://alive2.llvm.org/ce/z/EhKwh6 There are various potential missing icmp (div) transforms shown here: https://github.com/llvm/llvm-project/issues/55695 This is a generalization for part of the udiv + equality. I didn't check in detail, but some of those may only make sense as codegen transforms. This results in one extra instruction in IR, but it is better for analysis, and looks much better in codegen on all targets that I tried. Differential Revision: https://reviews.llvm.org/D126410	2022-05-26 09:08:47 -04:00
Florian Hahn	f96aa493f0	[SimpleLoopUnswitch] Always skip trivial select and set condition. When updating the branch instruction outside the loopduring non-trivial unswitching, always skip trivial selects and update the condition. Otherwise we might create invalid IR, because the trivial select is inside the loop, while the condition is outside the loop. Fixes #55697.	2022-05-26 09:46:24 +01:00
Florian Hahn	390c0ac28d	[LV] Fix indentation in tryToCreateWidenRecipe (NFC).	2022-05-26 08:53:34 +01:00
Owen Anderson	1e91149844	Replace the custom linked list in LeaderTableEntry with TinyPtrVector. The purpose of the custom linked list was to optimize for the case of a single-element list. It turns out that TinyPtrVector handles the same basic scenario even better, reducing the size of LeaderTableEntry by 33%, and requiring only log2(N) allocations as the size of the list grows. The only downside is that we have to store the Value's and BasicBlock's in separate vectors, which is slightly awkward in a few cases. Fortunately that ends up being entirely encapsulated inside helper functions. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D125205	2022-05-25 23:52:44 -07:00
Serguei Katkov	c2eccc67ce	[GuardWidening] Remove nuw/nsw flags for hoisted instructions When we hoist instructions over guard we must clear flags due to these flags might be implied using this guard, so they make sense only after the guard. As an example of the bug due to current behavior. L is known to be in range say [0, 100) c1 = x u< L guard (c1) x1 = add x, 1 c2 = x1 u< L guard(c2) basing on guard(c1) we can say that x1 = add nuw nsw x, 1 after guard widening we get c1 = x u< L x1 = add nuw nsw x, 1 c2 = x1 u< L c = and c1, c2 guard(c) now, basing on fact that x + 1 < L and x >= 0 due to x + 1 is nuw we can prove that x + 1 u< L implies that x u< L, so we can just remove c1 x1 = add nuw nsw x, 1 c2 = x1 u< L guard(c2) But that is not correct due to we will pass x == -1 value. Reviewed By: mkazantsev Subscribers: llvm-commits, nikic Differential Revision: https://reviews.llvm.org/D126354	2022-05-26 13:20:55 +07:00
serge-sans-paille	fb67d683db	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `7030654296` detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D126417	2022-05-26 08:12:34 +02:00
Chenbing Zheng	1486a9c9fe	[InstCombine] [NFC] refector foldXorOfICmps Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126268	2022-05-26 11:07:18 +08:00
Chenbing Zheng	41aab93afc	[InstCombine] bitcast(logic(bitcast(X), bitcast(Y))) -> bitcast'(logic(bitcast'(X), Y)) This patch break foldBitCastBitwiseLogic limite the destination must have an integer element type, and eliminate one bitcast by doing the logic op in the type of the input that has an integer element type. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D126184	2022-05-26 10:23:44 +08:00
Alexey Bataev	3bf5c2c8ec	[SLP]Do not try to generate ScatterVectorize if it will be scalarized. SLP should build ScatterVectorize nodes only if they actually end up with masked gather rather than with scalarization. In the second scenario better to build a gather node. Differential Revision: https://reviews.llvm.org/D126379	2022-05-25 14:25:07 -07:00
Alexey Bataev	10f41a2147	[SLP]Fix PR55688: Miscompile due to incorrect nuw/nsw handling. Need to use all ReductionOps when propagating flags for the reduction ops, otherwise transformation is not correct. Plus, need to drop nuw/nsw flags. Differential Revision: https://reviews.llvm.org/D126371	2022-05-25 13:59:06 -07:00
David Sherwood	87936c7b13	[LoopVectorize] Fix assertion failure in fixReduction when tail-folding When compiling the attached new test in scalable-reductions-tf.ll we were hitting this assertion in fixReduction: Assertion `isa<PHINode>(U) && "Reduction exit must feed Phi's or select" The loop contains a reduction and an intermediate store of the reduction value. When vectorising with tail-folding the contains of 'U' in the assertion above happened to be a scatter_store. It turns out that we were still creating a widen recipe for the invariant store, despite knowing that we can actually sink it. The simplest fix is to change buildVPlanWithVPRecipes so that we look for invariant stores before attempting to widen it. Differential Revision: https://reviews.llvm.org/D126295	2022-05-25 11:46:32 +01:00
Florian Hahn	c6e45ea074	[VPlan] Exit earlier when trying to widen with scalar VFs. This simplifies the code a bit, suggested in D124718. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D125029	2022-05-25 11:05:23 +01:00
Florian Hahn	1ba42dd04b	[VPlan] Use MapVector for LiveOuts for deterministic iteration. During code-gen, we iterate over the LiveOuts and the differences in iteration order can cause slightly different outputs.	2022-05-25 09:30:02 +01:00
Chenbing Zheng	269e3f7369	[InstCombine] [NFC] Move transforms for truncated shifts into narrowBinOp Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D126056	2022-05-25 10:21:39 +08:00
Martin Sebor	46c0ec9df4	[InstCombine] Fold memrchr calls with sequences of identical bytes. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123631	2022-05-24 17:00:11 -06:00
Vasileios Porpodas	9df0568b07	[SLP] Fix crash caused by reorderBottomToTop(). The crash is caused by incorrect order set by reorderBottomToTop(), which happens when it is reordering a TreeEntry which has a user that has already been reordered earlier. Please see the detailed description in the lit test. Differential Revision: https://reviews.llvm.org/D126099	2022-05-24 12:24:19 -07:00
Sanjay Patel	05527b68a0	[InstCombine] fold more shuffles with FP<->Int cast operands shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask) This extends the transform added with `0353c2c996`. If the shuffle reduces vector length, the transform reduces the width of the cast, so that should be a win for most codegen (if not, it can be inverted).	2022-05-24 15:11:38 -04:00
Nikita Popov	e6e0eb3bc8	[InstCombine] Strip bitcasts in GEP diff fold Bitcasts were stripped in one case, but not the other. Of course, this no longer really matters with opaque pointers, but as I went through the trouble of tracking this down, we may as well remove one typed vs opaque pointer optimization discrepancy.	2022-05-24 16:12:01 +02:00
Nikita Popov	b2a13d3e2d	[InstCombine] Use IRBuilder in freeze pushing transform (PR55619) Use IRBuilder so that the newly created freeze instructions automatically gets inserted back into the IC worklist. The changed worklist processing order leads to some cosmetic differences in tests. Fixes https://github.com/llvm/llvm-project/issues/55619.	2022-05-24 15:48:28 +02:00
Alexey Bataev	f9c806ae5c	[SLP][NFC]Make isFirstInsertElement a weak strict ordering comparator. To be used correctly in a sort-like function, isFirstInsertElement function must follow weak strict ordering rule, i.e. isFirstInsertElement(IE1, IE1) should return false.	2022-05-24 06:02:42 -07:00
Nikita Popov	a7c079aaa2	[InstCombine] Support logical and in masked icmp fold Most of the folds implemented in this function work fine with logical operations. We only need to be careful for the cases that work on non-constant masks, where the RHS operand shouldn't be poison. This is a conservative implementation that bails out of illegal transforms, but we could also change these to insert freeze instead.	2022-05-24 11:16:33 +02:00
Nikita Popov	5abaabed22	[InstCombine] Use m_APInt() in asymmetric masked icmp fold This is mostly intended as code cleanup, but it does also add support for splat vectors to this fold.	2022-05-24 10:57:28 +02:00
Nikita Popov	c0e06c7448	[InstCombine] Handle logical and/or in recursive and/or of icmps fold The and/or of icmps fold is also applied in reassociated form. However, this currently only happens for bitwise and of bitwise and, but not for bitwise and of logical and (or other combinations, but this is the one being addressed here). We can do this for bitwise+logical combinations as well, but need to be a bit careful about which of the resulting ands are logical: https://alive2.llvm.org/ce/z/WYSjGh https://alive2.llvm.org/ce/z/guxYnz https://alive2.llvm.org/ce/z/S5SYxY https://alive2.llvm.org/ce/z/2rAWeW	2022-05-24 10:13:10 +02:00
Nikita Popov	81c648a3d9	[LoopUnroll] Freeze tripcount rather than condition This is a followup to D125754. We introduce two branches, one before the unrolled loop and one before the epilogue (and similar for the prologue case). The previous patch only froze the condition on the first branch. Rather than independently freezing the second condition, this patch instead freezes TripCount and bases BECount on it. These are the two quantities involved in the conditions, and this ensures that both work on a consistent, non-poisonous trip count. Differential Revision: https://reviews.llvm.org/D125896	2022-05-24 09:42:39 +02:00
Hendrik Greving	4f93d5cc1d	[BasicBlockUtils] Do not move loop metadata if outer loop header. Fixes a bug preventing moving the loop's metadata to an outer loop's header, which happens if the loop's exit is also the header of an outer loop. Adjusts test for above. Fixes #55416. Differential Revision: https://reviews.llvm.org/D125574	2022-05-23 16:39:54 -07:00
Alexey Bataev	319a722f6f	[SLP][NFC]Improve compile time, NFC. Builds UserIgnore list only once as a SmallDenseSet without rebuilding it between the runs, iterate over gathers instead list of reduction ops, do some checks in the buildTree_rec only if the corresponding containers are not empty.	2022-05-23 12:15:27 -07:00
Sanjay Patel	e8c20d995b	[IR] add and use pattern match specialization for sqrt intrinsic; NFC This was included in D126190 originally, but it's independent and a useful change for readability.	2022-05-23 14:16:30 -04:00
Benjamin Kramer	2f2ca30d0a	Fix an unused variable warning in no-asserts build mode	2022-05-23 19:53:40 +02:00
Nikita Popov	f45c1e436e	[InstCombine] Change operand order in recursive and/or of icmps fold The order obviously doesn't matter for bitwise and/or, but would matter for logical and/or, so change it to preserve the original order.	2022-05-23 17:29:33 +02:00
Jingu Kang	bb82f74612	Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"" This reverts commit `42ebfa8269`. The commmit from https://reviews.llvm.org/D125918 has fixed the stage 2 build failure. Differential Revision: https://reviews.llvm.org/D118979	2022-05-23 16:15:45 +01:00
Alexey Bataev	2ac5ebedea	[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly. SLP vectorizer emits extracts for externally used vectorized scalars and estimates the cost for each such extract. But in many cases these scalars are input for insertelement instructions, forming buildvector, and instead of extractelement/insertelement pair we can emit/cost estimate shuffle(s) cost and generate series of shuffles, which can be further optimized. Tested using test-suite (+SPEC2017), the tests passed, SLP was able to generate/vectorize more instructions in many cases and it allowed to reduce number of re-vectorization attempts (where we could try to vectorize buildector insertelements again and again). Differential Revision: https://reviews.llvm.org/D107966	2022-05-23 07:06:45 -07:00
Sanjay Patel	1ebad988b1	[InstCombine] fold icmp of zext bool based on limited range X <u (zext i1 Y) --> (X == 0) && Y https://alive2.llvm.org/ce/z/avQDRY This is a generalization of `4069cccf3b` based on the post-commit suggestion. This also adds the i1 type check and tests that were missing from the earlier attempt; that commit caused several bot fails and was reverted. Differential Revision: https://reviews.llvm.org/D126171	2022-05-23 09:59:21 -04:00
Nikita Popov	45226d04f0	[InstCombine] Reuse icmp of and/or folds for logical and/or Similarly to a change recently done for fcmps, add a flag that indicates whether the and/or is logical to foldAndOrOfICmps, and reuse the function when folding logical and/or. We were already calling some parts of it, but this gives us a clearer indication of which parts may need poison-safe variants, and would also allow to fold combinations of bitwise and logical and/or. This change should be close to NFC, because all folds this enables were either already called previously, or can make use of implied poison reasoning.	2022-05-23 15:37:07 +02:00
Peter Waller	ade47bdc31	[LV] Improve register pressure estimate at high VFs Previously, `getRegUsageForType` was implemented using `getTypeLegalizationCost`. `getRegUsageForType` is used by the loop vectorizer to estimate the register pressure caused by using a vector type. However, `getTypeLegalizationCost` currently only appears to understand splitting and not scalarization, so significantly underestimates the register requirements. Instead, use `getNumRegisters`, which understands when scalarization can occur (via computeRegisterProperties). This was discovered while investigating D118979 (Set maximum VF with shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the loop vectorizer previously ends up costing an v128i1 as 2 v64i* registers where it actually occupies 128 i32 registers. I'm sending this patch early for comment, I'm still doing some sanity checking with LNT. I note that getRegisterClassForType appears to return VectorRC even though the type in question (large vNi1 types) end up occupying scalar registers. That might be worth fixing too. Differential Revision: https://reviews.llvm.org/D125918	2022-05-23 07:57:45 +00:00
Florian Hahn	145fe57106	[LV] Use exiting block instead of latch in addUsersInExitBlock. The latch may not be the exiting block. Use the exiting block instead when looking up the incoming value of the LCSSA phi node. This fixes a crash with early-exit loops.	2022-05-22 18:27:41 +01:00
Sanjay Patel	cba0ebd576	Revert "[InstCombine] fold icmp with sub and bool" This reverts commit `4069cccf3b`. This causes bot failures, and there's a possibly a better way to get this and other patterns.	2022-05-22 12:13:20 -04:00
Sanjay Patel	4069cccf3b	[InstCombine] fold icmp with sub and bool This is the specific pattern seen in #53432, but it can be extended in multiple ways: 1. The 'zext' could be an 'and' 2. The 'sub' could be some other binop with a similar ==0 property (udiv). There might be some way to generalize using knownbits, but that would require checking that the 'bool' value is created with some instruction that can be replaced with new icmp+logic. https://alive2.llvm.org/ce/z/-KCfpa	2022-05-22 11:51:07 -04:00
Florian Hahn	97590baead	[LV] Widen ptr-inductions with scalar uses for scalable VFs. Current codegen only supports scalarization of pointer inductions for scalable VFs if they are uniform. After `3bebec659` we now may enter the scalarization code path in VPWidenPointerInductionRecipe::execute for scalable vectors. Fall back to widening for scalable vectors if necessary. This should fix a build failure when bootstrapping LLVM with SVE, e.g. https://lab.llvm.org/buildbot/#/builders/176/builds/1723	2022-05-22 16:24:13 +01:00
Florian Hahn	aeb19817d6	Revert "[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly." This reverts commit `fc9c59c355`. The patch triggers an assertion when building SPEC on X86. Reduced reproducer shared at D107966. Also reverts follow-up commit `11a09af76d`.	2022-05-21 21:00:01 +01:00
Florian Hahn	3bebec6592	[VPlan] Model first exit values using VPLiveOut. This patch introduces a new VPLiveOut subclass of VPUser to model exit values explicitly. The initial version handles exit values that are neither part of induction or reduction chains nor first order recurrence phis. Fixes #51366, #54867, #55167, #55459 Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123537	2022-05-21 16:01:38 +01:00
Nikita Popov	6f0ca6fd23	[JumpThreading] Insert freeze when unfolding select JumpThreading may convert selects into branch instructions, in which case the condition needs to be frozen (as branch on poison is immediate undefined behavior, unlike select on poison). The necessary code for this is already in place, this just enables the option. Differential Revision: https://reviews.llvm.org/D125869	2022-05-21 11:24:27 +02:00
Dmitri Gribenko	11a09af76d	Fix an unused variable warning in no-asserts build mode	2022-05-20 17:11:58 +02:00
Sanjay Patel	f0071d43e4	[InstCombine] add use check to fold of bitwise logic with cast ops This was shown as a potential regression in D126040.	2022-05-20 09:08:53 -04:00
Alexey Bataev	fc9c59c355	[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly. SLP vectorizer emits extracts for externally used vectorized scalars and estimates the cost for each such extract. But in many cases these scalars are input for insertelement instructions, forming buildvector, and instead of extractelement/insertelement pair we can emit/cost estimate shuffle(s) cost and generate series of shuffles, which can be further optimized. Tested using test-suite (+SPEC2017), the tests passed, SLP was able to generate/vectorize more instructions in many cases and it allowed to reduce number of re-vectorization attempts (where we could try to vectorize buildector insertelements again and again). Differential Revision: https://reviews.llvm.org/D107966	2022-05-20 05:58:09 -07:00
Alexey Bataev	4e271fc495	[SLP][NFC]Use SmallPtrSet to avoid n*m complexity, NFC.	2022-05-20 05:56:43 -07:00
Florian Hahn	cd61d4bd2f	[LV] Do not LoopSimplify/LCSSA after generating main vector loop. At the moment LV runs LoopSimplify and reconstructs LCSSA form after generating the main vector loop and before generating the epilogue vector loop. In practice, this adds a new exit block for the scalar loop because the middle block now also branches to the original exit block of the scalar loop. It also requires adding a new LCSSA phi in the newly created exit block. This complicates things when modeling exit values in VPlan, because we would need to update the VPlan for the epilogue loop to update the newly created LCSSA phi node. But none of that should be necessary, as all analysis requiring loop-simplify form is already done at this point and LCSSA form of the original loop is not broken. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D125810	2022-05-20 09:58:40 +01:00
Chenbing Zheng	cf348f6a2c	[InstCombine] [NFC] Use a pattern matcher for ExtractElementInst Reviewed By: RKSimon, rampitec Differential Revision: https://reviews.llvm.org/D125857	2022-05-20 10:31:40 +08:00
Nicolas Capens	c153c61fad	Handle instrumentation of scalar single-precision (_ss) intrinsics Instrumentation of scalar double-precision intrinsics such as x86_sse41_round_sd was already handled by https://reviews.llvm.org/D82398, but not their single-precision counterparts. https://issuetracker.google.com/172238865 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D124871	2022-05-19 13:56:51 -07:00
Florian Hahn	c90235f0ef	[LV] Drop wrap flags for reductions using VP def-use chain. Update clearReductionWrapFlags to use the VPlan def-use chain from the reduction phi recipe to drop reduction wrap flags. This addresses an existing FIXME and fixes a crash when instructions in the reduction chain are not used and have been removed before VPlan codegeneration. Fixes #55540.	2022-05-19 20:36:46 +01:00
Nuno Lopes	5fc9449c96	[DeadArgElim] Use poison instead of undef as placeholder for dead arguments It doesn't matter which value we use for dead args, so let's switch to poison, so we can eventually kill undef. Reviewed By: aeubanks, fhahn Differential Revision: https://reviews.llvm.org/D125983	2022-05-19 18:00:24 +01:00
Florian Hahn	32d6ef36d6	[SimpleLoopUnswitch] Skip trivial selects during trivial unswitching. Update the remaining places in unswitchTrivialBranch to properly skip trivial selects. Fixes #55526.	2022-05-19 17:01:13 +01:00
Tiehu Zhang	3ed9f603fd	[LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold The runtime check threshold should also restrict interleave count. Otherwise, too many runtime checks will be generated for some cases. Reviewed By: fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D122126	2022-05-19 23:29:00 +08:00
Florian Hahn	df56fb44f5	[VPlan] Update VPWidenMemoryInstruction to not inherit from VPValue. VPWidenMemoryInstruction also models stores which may not produce a value. This can trip over analyses. Improve the modeling by only adding VPValues for VPWidenMemoryInstructionRecipes modeling loads.	2022-05-19 16:24:58 +01:00
Jay Foad	6bec3e9303	[APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf Most clients only used these methods because they wanted to be able to extend or truncate to the same bit width (which is a no-op). Now that the standard zext, sext and trunc allow this, there is no reason to use the OrSelf versions. The OrSelf versions additionally have the strange behaviour of allowing extending to a smaller width, or truncating to a larger width, which are also treated as no-ops. A small amount of client code relied on this (ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and needed rewriting. Differential Revision: https://reviews.llvm.org/D125557	2022-05-19 11:23:13 +01:00
lizhijin	90ea81fcb2	[LV] Widen freeze instead of scalarizing it This patch changes the strategy for vectorizing freeze instrucion, from replicating multiple times to widening according to selected VF. Fixes #54992 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D125016	2022-05-19 12:28:01 +08:00
Chenbing Zheng	ffaaf2498b	[InstCombine] (rot X, ?) == 0/-1 --> X == 0/-1 In this patch we add a function foldICmpInstWithConstantAllowUndef to fold integer comparisons with a constant operand: icmp Pred X, C where X is some kind of instruction and C is AllowUndef. We move this fold to the new function, so that it can solve undef elts in a vector. Reviewed By: spatel, RKSimon Differential Revision: https://reviews.llvm.org/D125220	2022-05-19 11:22:26 +08:00
Chenbing Zheng	51df77f36d	[InstCombine] Allow undef vectors when foldSelectToCopysign Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D125671	2022-05-19 10:57:49 +08:00
Alexey Bataev	7d8060bc19	[SLP]Improve reductions vectorization. The pattern matching and vectgorization for reductions was not very effective. Some of of the possible reduction values were marked as external arguments, SLP could not find some reduction patterns because of too early attempt to vectorize pair of binops arguments, the cost of consts reductions was not correct. Patch addresses these issues and improves the analysis/cost estimation and vectorization of the reductions. The most significant changes in SLP.NumVectorInstructions: Metric: SLP.NumVectorInstructions [140/14396] Program results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 920.00 3548.00 285.7% test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test 66.00 122.00 84.8% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 100.00 128.00 28.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 664.00 810.00 22.0% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 592.00 687.00 16.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 402.00 426.00 6.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1665.00 1745.00 4.8% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 135.00 139.00 3.0% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 135.00 139.00 3.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 388.00 397.00 2.3% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 895.00 914.00 2.1% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 240.00 244.00 1.7% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 240.00 244.00 1.7% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 820.00 832.00 1.5% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 820.00 832.00 1.5% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 14804.00 14914.00 0.7% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 8125.00 8183.00 0.7% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1330.00 1338.00 0.6% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1330.00 1338.00 0.6% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9832.00 9880.00 0.5% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 5267.00 5291.00 0.5% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 4018.00 4024.00 0.1% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 4018.00 4024.00 0.1% test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test 426.00 424.00 -0.5% test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test 426.00 424.00 -0.5% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 201.00 192.00 -4.5% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 201.00 192.00 -4.5% 644.nab_s and 544.nab_r - reduced number of shuffles but increased number of useful vectorized instructions. 641.leela_s and 541.leela_r - the function `@_ZN9FastBoard25get_pattern3_augment_specEiib` is not inlined anymore but its body gets vectorized successfully. Before, the function was inlined twice and vectorized just after inlining, currently it is not required. The vector code looks pretty similar, just like as it was before. Differential Revision: https://reviews.llvm.org/D111574	2022-05-18 13:22:18 -07:00
Sanjay Patel	ebbc37391f	[InstCombine] allow variable shift amount in bswap + shift fold When shifting by a byte-multiple: bswap (shl X, Y) --> lshr (bswap X), Y bswap (lshr X, Y) --> shl (bswap X), Y This was limited to constants as a first step in D122010 / `60820e53ec` , but issue #55327 shows a source example (and there's a test based on that here) where a variable shift amount is used in this pattern.	2022-05-18 14:38:16 -04:00
NAKAMURA Takumi	6ca7eb2c6d	[SCEV] Part 1, Serialize function calls in function arguments. Evaluation odering in function call arguments is implementation-dependent. In fact, gcc evaluates bottom-top and clang does top-bottom. Fixes #55283 partially. Part of https://reviews.llvm.org/D125627	2022-05-18 23:20:08 +09:00
Sanjay Patel	990cc49ca0	[InstCombine] avoid crash on fold of icmp with cast operand We could do better by inserting a bitcast from scalar int to vector int or using an insertelement (the alternate test does not crash because there's an independent fold like that). But this doesn't seem like a likely pattern, so just bail out for now. Fixes issue #55516.	2022-05-18 09:16:30 -04:00
Sanjay Patel	be6d7cc93c	[InstCombine] reduce code duplication for checking types; NFC	2022-05-18 09:16:30 -04:00
Nikita Popov	c9e7049754	[JumpThreading] Look through freeze in getPredicateAt() fold This code is valid for any icmp, so we can safely look through a freeze when trying to find one. A caveat here is that replaceFoldableUses() may not end up replacing any uses in this case. It might make sense to use the freeze as the context instruction (rather than the terminator) if there is a freeze, to ensure that it always gets folded. This would require some changes to how replaceFoldedUses() works though, as it currently assumes that the value is valid at the end of the block.	2022-05-18 12:09:59 +02:00
Sun Ziping	242961f23b	[llvm][fix-irreducible] ensure that loop subtree under child is correctly reconnected to new loop The modified function was incorrectly (not unnecessarily) ignoring grandchild loops, and this change fixes the bug. In particular, this fixes the handling of the loop { inner, body }. The TODO in the same function is talking about the b1 self loop, which may be "unnecessarily" lost, but that is a different issue.	2022-05-18 10:45:52 +01:00
Nikita Popov	18c70a7bd9	[JumpThreading] Simplify getPredicateAt() based folding It's sufficient to just fold the icmp to true/false here, and then let constant terminator folding take care of the rest. It should be noted that while replaceFoldableUses() may not replace all uses of the icmp, at least the use in the terminator we're working on is always replaceable, so terminator constant folding should be reliably enabled as a subsequent step.	2022-05-18 11:24:52 +02:00
Nikita Popov	d4cdf013c7	[JumpThreading] Use common code to skip freeze (NFC) There are multiple places that want to look through freeze, so store condition without freeze in a separate variable.	2022-05-18 10:49:41 +02:00
Florian Hahn	fcfb86483b	[LV] set Header earlier, use variable instead of repeated access (NFC).	2022-05-18 09:29:59 +01:00
Nikita Popov	e9a1c82d69	[SCEVExpander] Expand umin_seq using freeze %x umin_seq %y is currently expanded to %x == 0 ? 0 : umin(%x, %y). This patch changes the expansion to umin(%x, freeze %y) instead (https://alive2.llvm.org/ce/z/wujUhp). The motivation for this change are the test cases affected by D124910, where the freeze expansion ultimately produces better optimization results. This is largely because `(%x umin_seq %y) == %x` is a common expansion pattern, which reliably optimizes in freeze representation, but only sometimes with the zero comparison (in particular, if %x == 0 can fold to something else, we generally won't be able to cover reasonable code from this.) Differential Revision: https://reviews.llvm.org/D125372	2022-05-18 09:53:07 +02:00
Nikita Popov	323514de58	[LoopUnroll] Avoid branch on poison for runtime unroll with multiple exits When performing runtime unrolling with multiple exits, one of the earlier (non-latch) exits may exit the loop on the first iteration, such that we never branch on the latch exit condition. As such, we need to freeze the condition of the new branch that is introduced before the loop, as it now executes unconditionally. Differential Revision: https://reviews.llvm.org/D125754	2022-05-18 09:51:22 +02:00
Juneyoung Lee	3adcf96b4f	[JumpThreading] Let ProcessImpliedCondition look into freeze instructions This patch makes JumpThreading's ProcessImpliedCondition deal with frozen conditions. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84941	2022-05-18 10:41:31 +09:00
Sanjay Patel	dbf3b5f114	[InstCombine] fold more shuffles with FP<->Int cast operands shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask) This extends the transform added with `0353c2c996`. If the casts are to a larger element type, the transform reduces shuffle bit width, so that should be a win for most codegen (if not, it can be inverted).	2022-05-17 14:25:11 -04:00
Sanjay Patel	f31d39c42c	[InstCombine] remove cast-of-signbit to shift transform The transform was wrong in 3 ways: 1. It created an extra instruction when the source and dest types don't match. 2. It did not account for an extra use of the icmp, so could create 2 extra insts. 3. It favored bit hacks over icmp (icmp generally has better analysis). This fixes #54692 (modeled by the PhaseOrdering tests). This is a minimal step to fix the bug, but we should likely invert this and the sibling transform for the "is negative" pattern too. The backend should be able to invert this back to a shift if that leads to better codegen. This is a reduced try of `3794cc0e99` - that was reverted because it could cause infinite loops by conflicting with the related transforms in this block that create shifts.	2022-05-17 11:10:28 -04:00
Florian Hahn	5b00d13c00	[LV] Fetch vector loop region once and remember it (NFC). This avoids an unnecessary lookup and makes the code slightly more compact.	2022-05-17 15:57:23 +01:00
Alexey Bataev	b0f0313feb	[SLP]Add an extra check for select minmax reduction to avoid crash. Need to check if the reduction is still (not)cmp-select pattern min/max reduction to avoid compiler crash during building list of reduction operations. cmp-sel pattern provides 2 reduction operations, while intrinsics - just one.	2022-05-17 06:05:52 -07:00
Florian Hahn	c1a9d14982	[VPlan] Move usesScalars/onlyFirstLaneUsed to VPUser. Those helpers model properties of a user and they should also be available to non-recipe users. This will be used in D123537 for a new exit value user. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D124936	2022-05-17 11:20:06 +01:00
Nikita Popov	9ba452b08e	[JumpThreading] Don't pass DT to isGuaranteedNotToBeUndefOrPoison() JumpThreading intentionally does not force updating of the DT during optimization, because this may be expensive when many CFG updates and DT calculations are interleaved. We shouldn't be fetching the DT just for the purpose of calling isGuaranteedNotToBeUndefOrPoison(), especially as DT availability doesn't even show benefit in tests.	2022-05-17 11:53:49 +02:00
Dmitry Vassiliev	7759680e2f	[SROA] Avoid postponing rewriting load/store by ignoring lifetime intrinsics in partition's promotability checking This patch fixes a bug that generates unnecessary packing/unpacking structure code because of incorrectly handling lifetime intrinsic. For example, a partition of an alloca may contain many slices: ``` Partition [0, 4): Slice0: [0, 4) used by: load i32 addr; Slice1: [0, 4) used by: store i32 v, addr; Slice2: [0, 16) used by lifetime.start(16, addr); ``` When SROA determines if the partition can be promoted, lifetime.start is currently treated as a whole alloca load/store, so Slice0 and Slice1 cannot be promoted at this attempt, but the packing/unpacking code for Slice0 and Slice1 has been generated. After rewrite lifetime.start/end intrinsic, SROA tries again with Slice0 and Slice1 and finally promotes them, but redundant packing/unpacking code remaining in the IRs. This patch changes promotability checking to ignore lifetime intrinsic (they will be rewritten to correct sizes later), so we can promote the real users (load/store) at the first attempt with optimal code. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D124967	2022-05-17 11:25:59 +02:00
Nikita Popov	a694546f7c	[KnownBits] Add operator== Checking whether two KnownBits are the same is somewhat common, mainly in test code. I don't think there is a lot of room for confusion with "determine what the KnownBits for an icmp eq would be", as that has a different result type (this is what the eq() method implements, which returns Optional<bool>). Differential Revision: https://reviews.llvm.org/D125692	2022-05-17 09:38:13 +02:00
Sanjay Patel	07d549bce9	Revert "[InstCombine] invert canonicalization for cast of signbit test" This reverts commit `3794cc0e99`. This change is suspected of causing bots to hang at stage 2 compiles, so reverting to confirm and investigate.	2022-05-16 17:47:02 -04:00
Ellis Hoag	9a90ea1fdc	[InstrProf] Fix promoter when using counter relocations When using counter relocations, two instructions are emitted to compute the address of the counter variable. ``` %BiasAdd = add i64 ptrtoint <__profc_>, <__llvm_profile_counter_bias> %Addr = inttoptr i64 %BiasAdd to i64* ``` When promoting a counter, these instructions might not be available in the block, so we need to copy these instructions. This fixes https://github.com/llvm/llvm-project/issues/55125 Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D125710	2022-05-16 14:32:39 -07:00
Sanjay Patel	3794cc0e99	[InstCombine] invert canonicalization for cast of signbit test The existing transform was wrong in 3 ways: 1. It created an extra instruction when the source and dest types don't match. 2. It did not account for an extra use of the icmp, so could create 2 extra insts. 3. It favored bit hacks over icmp (icmp generally has better analysis). This fixes #54692 (modeled by the PhaseOrdering tests). This is a minimal step to fix the bug, but we should likely invert the sibling transform for the "is negative" pattern too. The backend should be able to invert this back to a shift if that leads to better codegen.	2022-05-16 12:55:52 -04:00
Ellis Hoag	6e23cd2bf0	[InstrProf][NFC] Save profile bias to function map Add a map from functions to load instructions that compute the profile bias. Previously we assumed that if the first instruction in the function was a load instruction, then it must be computing the bias. This was likely to work out because functions usually start with the `llvm.instrprof.increment` instruction, but optimizations could change this. For example, inlining into a non-profiled function. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D114319	2022-05-16 08:32:31 -07:00
Sanjay Patel	be7f09f7b2	[IR] create and use helper functions that test the signbit; NFCI	2022-05-16 11:26:23 -04:00
Alexey Bataev	152072801e	[SLP]Check if the root of the buildvector has one use only. The root of the buildvector can have only one use, otherwise it can be treated only as a final element of the previous buildvector sequence.	2022-05-16 07:30:36 -07:00
Florian Hahn	b7315ffc3c	[LAA,LV] Add initial support for pointer-diff memory checks. This patch adds initial support for a pointer diff based runtime check scheme for vectorization. This scheme requires fewer computations and checks than the existing full overlap checking, if it is applicable. The main idea is to only check if source and sink of a dependency are far enough apart so the accesses won't overlap in the vector loop. To do so, it is sufficient to compute the difference and compare it to the `VF * UF * AccessSize`. It is sufficient to check `(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards dependence in the vector loop with the given VF and UF. If Src >=u Sink, there is not dependence preventing vectorization, hence the overflow should not matter and using the ULT should be sufficient. Note that the initial version is restricted in multiple ways: 1. Pointers must only either be read or written, by a single instruction (this allows re-constructing source/sink for dependences with the available information) 2. Source and sink pointers must be add-recs, with matching steps 3. The step must be a constant. 3. abs(step) == AccessSize. Most of those restrictions can be relaxed in the future. See https://github.com/llvm/llvm-project/issues/53590. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D119078	2022-05-16 15:27:22 +01:00
Biplob Mishra	ec4adf1f6c	[InstCombine] Combine instructions of type or/and where AND masks can be combined. The patch simplifies some of the patterns as below (A \| (B & C0)) \| (B & C1) -> A \| (B & C0\|C1) ((B & C0) \| A) \| (B & C1) -> (B & C0\|C1) \| A In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns. Differential Revision: https://reviews.llvm.org/D124119	2022-05-16 12:43:33 +01:00
Nikita Popov	7ba484660b	[ControlHeightReduction] Freeze condition when converting select to branch While select conditions can be poison, branch on poison is immediate UB. As such, we need to freeze the condition when converting a select into a branch. Differential Revision: https://reviews.llvm.org/D125398	2022-05-16 10:37:26 +02:00
David Sherwood	befc952045	[LoopVectorize] Permit tail-folding for low trip counts using scalable vectors When the loop vectoriser encounters a known low trip count it tries to create a single predicated loop in order to get the benefit of vectorisation and eliminate the scalar tail. However, until now the vectoriser prevented the use of scalable vectors in this case due to concerns in the past about stability. I believe that tail-folded loops using scalable vectors are now sufficiently well tested that we can enable this. For the same reason I've also enabled it when optimising for code size too. Tests added here: Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll Transforms/LoopVectorize/RISCV/low-trip-count.ll Differential Revision: https://reviews.llvm.org/D121595	2022-05-16 09:14:24 +01:00
Florian Hahn	8b7c3d2179	[LV] Set SCEVCheckCond to nullptr whenever it was used. Under some circumstances, SCEVExpander will insert new instructions when expanding a predicate, but the final result of the expansion can be a false constant. In those cases, the expanded instructions may later be used by other expansions, e.g. the trip count. This may trigger an assertion during SCEVExpander cleanup. To avoid this, always mark the result as used. Fixes #55100.	2022-05-15 21:52:07 +01:00
Craig Topper	b3097eb6cd	[SLP] Fix misspelling of 'analyzed'. NFC	2022-05-15 10:30:24 -07:00
Florian Hahn	39552964e1	[VPlan] Improve printing of VPReplicateRecipe with calls. Suggested as part of D124718.	2022-05-15 15:51:26 +01:00
Wende Tan	59afc4038b	[LowerTypeTests][clang] Implement and allow -fsanitize=cfi-icall for RISCV Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D106888	2022-05-14 18:05:06 -07:00
Fangrui Song	60e5fd00cd	[RS4GC] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build after D125000	2022-05-14 10:47:50 -07:00
Chenbing Zheng	acbad5086a	[InstCombine] [NFC] separate a function foldICmpBinOpWithConstant There is a long function foldICmpInstWithConstant, we can separate a function foldICmpBinOpWithConstant from it. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D125457	2022-05-14 10:54:15 +08:00
Alexander Shaposhnikov	badd088c57	[GlobalOpt] Enable optimization of constructors with different priorities Adjust `optimizeGlobalCtorsList` to handle the case of different priorities. This addresses the issue https://github.com/llvm/llvm-project/issues/55083. Test plan: ninja check-all Differential revision: https://reviews.llvm.org/D125278	2022-05-13 22:19:29 +00:00
Alexey Bataev	8b8281f354	[SLP]Do not vectorize non-profitable alternate nodes. If alternate node has only 2 instructions and the tree is already big enough, better to skip the vectorization of such nodes, they are not very profitable (the resulting code cotains 3 instructions instead of original 2 scalars). SLP can try to vectorize the buildvector sequence in the next attempt, if it is profitable. Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test 72.00 73.00 1.4% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 1186.00 1198.00 1.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 241.00 242.00 0.4% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 2131.00 2139.00 0.4% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 6377.00 6384.00 0.1% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 6377.00 6384.00 0.1% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 12650.00 12658.00 0.1% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26169.00 26147.00 -0.1% test-suite :: MultiSource/Benchmarks/Trimaran/enc-3des/enc-3des.test 99.00 86.00 -13.1% Gains: 526.blender_r - more vectorized trees. enc-3des - same. Others: 510.parest_r - no changes. miniFE - same 623.xalancbmk_s - some (non-profitable) parts of the trees are not vectorized. 523.xalancbmk_r - same lencod - same timberwolfmc - same miniAMR - same Differential Revision: https://reviews.llvm.org/D125571	2022-05-13 14:28:54 -07:00
Alexey Bataev	85f6b15ee5	[SLP]Do not look for buildvector sequence, if the index is reused. If the insert indes was used already or is not constant, we should stop looking for unique buildvector sequence, it mustbe splitted to 2 different buildvectors.	2022-05-13 13:56:02 -07:00
Nikita Popov	afc21c7e79	[ControlHeightReduction] Simplify addToMergedCondition() (NFC)	2022-05-13 15:30:09 +02:00
David Sherwood	92c645b5c1	[LoopVectorize] Add overflow checks when tail-folding with scalable vectors In InnerLoopVectorizer::getOrCreateVectorTripCount there is an assert that the known minimum value for the VF is a power of 2 when tail-folding is enabled. However, for scalable vectors the value of vscale may not be a power of 2, which means we have to worry about the possibility of overflow. I have solved this problem by adding preheader checks that prevent us from entering the vector body if the canonical IV would overflow, i.e. if ((IntMax - TripCount) < (VF * UF)) ... skip vector loop ... Differential Revision: https://reviews.llvm.org/D125235	2022-05-13 14:09:43 +01:00
Nikita Popov	ed1cb01baf	[IRBuilder] Add IsInBounds parameter to CreateGEP() We commonly want to create either an inbounds or non-inbounds GEP based on a boolean value, e.g. when preserving inbounds from existing GEPs. Directly accept such a boolean in the API, rather than requiring a ternary between CreateGEP and CreateInBoundsGEP. This change is not entirely NFC, because we now preserve an inbounds flag in a constant expression edge-case in InstCombine.	2022-05-13 14:30:55 +02:00
Florian Hahn	8e6d481f3b	[ConstraintElimination] Simplify ssub(A,B) if B s>=b && B s>=0. A first patch to use the reasoning in ConstraintElimination to simplify sub with overflow to a regular sub, if the operation is guaranteed to not overflow. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D125264	2022-05-13 13:19:41 +01:00
Nikita Popov	d9ad6a2c8b	[InstCombine] Fix unused variable warning (NFC)	2022-05-13 12:43:21 +02:00
Dmitry Makogon	1da42c9f71	[RS4GC] Cache BDVs and bases alogn with IsKnownBase flag (NFC) This refactors RS4GC to cache results returned findBaseDefiningValue and also gets rid of BaseDefiningValueResult by caching the IsKnownBase flag for BDVs and bases. Differential Revision: https://reviews.llvm.org/D125000	2022-05-13 14:14:17 +07:00
Chenbing Zheng	2a0837aab1	[InstCombine] fix sub(add(X,Y),umin(Y,Z)) --> add(X,usub.sat(Y,Z)) This patch fix bug left in D124503. We should do sub(add(X,Z),umin(Y,Z)) --> add(X,usub.sat(Z,Y)) instead of sub(add(X,Z),umin(Y,Z)) --> add(X,usub.sat(Y,Z)). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D125352	2022-05-13 09:54:10 +08:00
Sanjay Patel	2fa8fc3d0a	[InstCombine] freeze operand in div+mul fold As discussed in issue #37809, this transform is not safe if the input is an undefined value. This is similar to recent changes for urem and sdiv: `d428f09b2c` `99ef341ce9` There is no difference in codegen on the basic examples, but this could lead to regressions. We may need to improve freeze analysis or lowering if that happens. Presumably, in real cases that are similar to the tests where a subsequent transform removes the rem, we will also be able to remove the freeze by seeing that the parameter has 'noundef'.	2022-05-12 13:49:29 -04:00
Quentin Colombet	9766fed9c1	[DeadArgElim] Re-apply: Set unused arguments for internal functions The re-apply includes fixes to clang tests that were missed in the original commit. Original message: Prior to this patch we would only set to undef the unused arguments of the external functions. The rationale was that unused arguments of internal functions wouldn't need to be turned into undef arguments because they should have been simply eliminated by the time we reach that code. This is actually not true because there are plenty of cases where we can't remove unused arguments. For instance, if the internal function is used in an indirect call, it may not be possible to change the function signature. Yet, for statically known call-sites we would still like to mark the unused arguments as undef. This patch enables the "set undef arguments" optimization on internal functions when we encounter cases where internal functions cannot be optimized. I.e., whenever an internal function is marked "live". Differential Revision: https://reviews.llvm.org/D124699	2022-05-12 08:46:16 -07:00
Pavel Samolysov	098afdb0a0	[ArgPromotion] Make a non-byval promotion attempt first It makes sense to make a non-byval promotion attempt first and then fall back to the byval one. The non-byval ('usual') promotion is generally better, for example it does promotion even when a structure has more elements than 'MaxElements' but not all of them are actually used in the function. Differential Revision: https://reviews.llvm.org/D124514	2022-05-12 16:44:52 +02:00
Vasileios Porpodas	0950d4060c	Recommit "[SLP] Make reordering aware of external vectorizable scalar stores." This reverts commit `c2a7904aba`. Original code review: https://reviews.llvm.org/D125111	2022-05-11 16:47:29 -07:00
Arthur Eubanks	c2a7904aba	Revert "[SLP] Make reordering aware of external vectorizable scalar stores." This reverts commit `71bcead98b`. Causes crashes, see comments in D125111.	2022-05-11 15:28:00 -07:00
Sanjay Patel	99ef341ce9	[InstCombine] freeze operand in sdiv expansion As discussed in issue #37809, this transform is not safe if the input is an undefined value. This is similar to a recent change for urem: `d428f09b2c` There is no difference in codegen on the basic examples, but this could lead to regressions. We may need to improve freeze analysis or lowering if that happens. Presumably, in real cases that are similar to the tests where a subsequent transform removes the select, we will also be able to remove the freeze by seeing that the parameter has 'noundef'.	2022-05-11 14:01:28 -04:00
Sanjay Patel	d428f09b2c	[InstCombine] freeze operand in urem expansion As discussed in issue #37809, this transform is not safe if the input is an undefined value. There is no difference in codegen on the basic examples, but this could lead to regressions. We may need to improve freeze analysis or lowering if that happens.	2022-05-11 12:47:26 -04:00
Nikita Popov	6001bfcedc	[InstCombine] Freeze other uses of frozen value If there is a freeze %x, we currently replace all other uses of %x with freeze %x -- as long as they are dominated by the freeze instruction. This patch extends this behavior to cases where we did not originally dominate the use by moving the freeze instruction directly after the definition of the frozen value. The motivation can be seen in test @combine_and_after_freezing_uses: Canonicalizing everything to freeze %x allows folds that are based on value identity (i.e. same operand occurring in two places) to trigger. This also covers the case from D125248. Differential Revision: https://reviews.llvm.org/D125321	2022-05-11 16:47:12 +02:00
Alexey Bataev	f5d45d70a5	[SLP]Further improvement of the cost model for scalars used in buildvectors. Further improvement of the cost model for the scalars used in buildvectors sequences. The main functionality is outlined into a separate function. The cost is calculated in the following way: 1. If the Base vector is not undef vector, resizing the very first mask to have common VF and perform action for 2 input vectors (including non-undef Base). Other shuffle masks are combined with the resulting after the 1 stage and processed as a shuffle of 2 elements. 2. If the Base is undef vector and have only 1 shuffle mask, perform the action only for 1 vector with the given mask, if it is not the identity mask. 3. If > 2 masks are used, perform serie of shuffle actions for 2 vectors, combing the masks properly between the steps. The original implementation misses the very first analysis for the Base vector, so the cost might too optimistic in some cases. But it improves the cost for the insertelements which are part of the current SLP graph. Part of D107966. Differential Revision: https://reviews.llvm.org/D115750	2022-05-11 06:08:55 -07:00
Florian Hahn	635b752211	[VPlan] VPInterleaveRecipe only uses first lane if op not stored. With opaque pointers, both the stored value and the address can be the same. Only consider the recipe using the first lane only if the address is not stored. Fixes #55375.	2022-05-11 11:24:56 +01:00
Nikita Popov	c1bb4a881e	[SCEVExpander] Deduplicate min/max expansion code (NFC)	2022-05-11 12:11:11 +02:00
Alexander Shaposhnikov	da823382d2	[Transform][Utils][NFC] Clean up CtorUtils.cpp	2022-05-11 01:07:54 +00:00
Nick Desaulniers	c167c0a4dc	[BuildLibCalls] infer inreg param attrs from NumRegisterParameters We're having a hard time booting the ARCH=i386 Linux kernel with clang after removing -ffreestanding because instcombine was dropping inreg from callers during libcall simplification, but not the callees defined in different translation units. This led the callers and callees to have wildly different calling conventions, which (predictably) blew up at runtime. Infer the inreg param attrs on function declarations from the module metadata "NumRegisterParameters." This allows us to boot the ARCH=i386 Linux kernel (w/ -ffreestanding removed). Fixes: https://github.com/llvm/llvm-project/issues/53645 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D125285	2022-05-10 16:21:17 -07:00
Vasileios Porpodas	71bcead98b	[SLP] Make reordering aware of external vectorizable scalar stores. The current reordering scheme only checks the ordering of in-tree operands. There are some cases, however, where we need to adjust the ordering based on the ordering of a future SLP-tree who's instructions are not part of the current tree, but are external users. This patch is a simple implementation of this. We keep track of scalar stores that are users of TreeEntries and if they look profitable to vectorize, then we keep track of their ordering. During the reordering step we take this new index order into account. This can remove some shuffles in cases like in the lit test. Differential Revision: https://reviews.llvm.org/D125111	2022-05-10 15:25:35 -07:00
Sanjay Patel	0353c2c996	[InstCombine] fold shuffles with FP<->Int cast operands shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask) This is similar to a recent transform with fneg ( `b331a7ebc1` ), but this is intentionally the most conservative first step to try to avoid regressions in codegen. There are several restrictions that could be removed as follow-up enhancements. Note that a cast with a unary shuffle is currently canonicalized in the other direction (shuffle after cast - D103038 ). We might want to invert that to be consistent with this patch.	2022-05-10 14:20:43 -04:00
Craig Topper	4b36d9bde7	[CVP] Preserve exact name when converting sext->zext and ashr->lshr. Previously we took the old name and always appended a numberic suffix. Since we're doing a 1:1 replacement, it's clearer to keep the original name exactly. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D125281	2022-05-10 09:13:59 -07:00
Craig Topper	7b362ddda9	[SCCP] Preserve Name when converting SExt->ZExt. This makes the output IR more readable since we're doing a one to one replacement. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D125280	2022-05-10 09:13:59 -07:00
Dawid Jurczak	009f6ce0ef	[GVNSink] Make GVNSink resistant against self referencing instructions (PR36954) Before this change GVNSink pass suffers from stack overflow while processing self referenced instruction in unreachable basic block. According [1] and [2] it's reasonable to make pass resistant against self referencing instructions. To fix issue we skip sinking analysis when we reach instruction coming from unreachable block. [1] https://groups.google.com/g/llvm-dev/c/843Tig9IzwA [2] https://lists.llvm.org/pipermail/llvm-dev/2015-February/082629.html Differential Revision: https://reviews.llvm.org/D113897	2022-05-10 16:06:12 +02:00
Nikita Popov	0eafef1171	[SCEVExpander] Remove handling for mixed int/pointer min/max (NFCI) Mixed int/pointer min/max are no longer possible.	2022-05-10 15:11:39 +02:00
Chuanqi Xu	02d6845234	[NFC] [Coroutines] Remove EnableReuseStorageInFrame option The EnableReuseStorageInFrame option is designed for testing only. But it is better to use *_PASS_WITH_PARAMS macro to keep consistent with other passes.	2022-05-10 17:28:43 +08:00
Nikita Popov	d222bab672	[InstCombine] Handle GEP scalar/vector base mismatch (PR55363) `30a12f3f63` switched the type check to use the GEP result type rather than the GEP operand type. However, the GEP result types may match even if the operand types don't, in case GEPs with scalar/vector base and vector index are compared. Fixes https://github.com/llvm/llvm-project/issues/55363.	2022-05-10 11:26:43 +02:00
Chuanqi Xu	beeed0994e	[Coroutines] Use PassManager instead of Legacy PassManager internally This is a following cleanup for the previous work D123918. I missed serveral places which still use legacy pass managers. This patch tries to remove them.	2022-05-10 13:15:11 +08:00
Hongtao Yu	9641b9be9d	[Inliner] Preserve !prof metadata when converting call to invoke. When a callee function is inlined via an invoke instruction, every function call inside the callee, if not an invoke, will be converted to an invoke after cloned to the caller body. I found that during the conversion the !prof metadata was dropped. This in turned caused a cloned indirect call not properly promoted in subsequent passes. The particular scenario I was investigating was with AutoFDO and thinLTO. In prelink, no ICP was triggered (neither by the sample loader nor PGO ICP), no indirect call was promoted. This is because 1) the particular indirect call did not have inlined samples; and 2) PGO ICP was intentionally disabled. After inlining, the prof metadata was dropped. Then in postlink, PGO ICP jumped in but didn't do anything. Thus the opportunity was missed. I'm making a simple fix to preserve !prof metadata when converting call to invoke. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D125249	2022-05-09 15:08:09 -07:00
Alexey Bataev	4212ef8a0e	Revert "[SLP]Further improvement of the cost model for scalars used in buildvectors." This reverts commit `99f31acfce` and several others to fix detected crashes, reported in https://reviews.llvm.org/D115750	2022-05-09 13:46:06 -07:00
Alexey Bataev	cce80bd8b7	[SLP]Adjust assertion check for scalars in several insertelements. If the same scalar is inserted several times into the same buildvector, the mask index can be used already. In this case need to check, that this scalar is already part of the vectorized buildvector.	2022-05-09 13:07:59 -07:00
Florian Hahn	266ea446ab	Revert "Recommit "[VPlan] Remove uneeded needsVectorIV check."" This reverts commit `8b48223447`. This triggers an assertion on a test case mentioned in D123720. Revert while I investigate.	2022-05-09 20:33:14 +01:00
Alexey Bataev	9dc4ced204	[SLP]Try partial store vectorization if supported by target. We can try to vectorize number of stores less than MinVecRegSize / scalar_value_size, if it is allowed by target. Gives an extra opportunity for the vectorization. Fixes PR54985. Differential Revision: https://reviews.llvm.org/D124284	2022-05-09 09:48:15 -07:00
Alexey Bataev	9c3a75eabf	[SLP]Fix a crash when preparing a mask for external scalars. Need to use actual index instead of the tree entry position, since the insert index may be different than 0. It mean, that we vectorized part of the buildvector starting from not initial insertelement instruction beause of some reason.	2022-05-09 07:59:34 -07:00
Florian Hahn	41e142fdc7	Recommit "[SimpleLoopUnswitch] Collect either logical ANDs/ORs but not both." This reverts commit `7211d5ce07`. This version fixes a crash that caused buildbot failures with the first version.	2022-05-09 13:49:12 +01:00
David Green	6f9e1ea0ef	[VectorCombine] Attempt to fold select shuffles from reductions Given a commutative reduction leading from a shuffle, the order of the lanes on the shuffle are not important for the result. This means we can reorder the shuffle to something simpler, which we try shuffling the first vector lanes first. This was D123494. The new shuffle may not be profitable though, and if it is not we can try the folding of select shuffles from D123911. This, with some adjustment as the output lane ordering is now unimportant, can allow the final shuffle to simplify given the inputs to the patterns from D123911. Where as each transformation on their own are not profitable, the combination is. We can only support a single shuffle when called from reductions, but we are able to sort the ReconstructMask, potentially allowing it to simplify to an identity or concat mask. Differential Revision: https://reviews.llvm.org/D125086	2022-05-08 10:32:41 +01:00
Andrew Litteken	e38f014c40	[IROutliner] Accomodate blocks containing PHINodes with one entry outside the region and others inside the region. When a PHINode has an incoming block from outside the region, it must be handled specially when assigning a global value number to each incoming value. A PHINode has multiple predecessors, and we must handle this case rather than only the single predecessor case. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D124777	2022-05-07 17:11:21 -05:00
David Green	802e15c576	[SLP] Cluster ordering for loads Given a load without a better order, this patch partially sorts the elements to form clusters of adjacent elements in memory. These clusters can potentially be loaded in fewer loads, meaning less overall shuffling (for example loading v4i8 clusters of a v16i8 as a single f32 loads, as opposed to multiple independent bytes loads and inserts). Differential Revision: https://reviews.llvm.org/D122145	2022-05-07 14:38:11 +01:00
Sanjay Patel	8650f05c97	[InstCombine] fix miscompile when casting int->FP->int As shown in https://github.com/llvm/llvm-project/issues/55150 - the existing fold may be wrong when converting to a signed value. This is a quick fix to avoid the miscompile. I added tests/comments for all of the signed/unsigned combinations at either side of the boundary width, and tried to confirm with Alive2: https://alive2.llvm.org/ce/z/3p9DSu There are already some TODO items in the test file that suggest possible refinements, so the regression with ui->FP->si is probably ok. It seems unlikely that we'd see these kind of edge cases with non-byte-width integer types in real code. The potential miscompile went undetected for several years. This and `747c6a0c73` fixes #55150. Differential Revision: https://reviews.llvm.org/D124692	2022-05-07 08:46:25 -04:00
Serge Pavlov	eb28da89a6	[InstCombine] Remove side effect of replaced constrained intrinsics If a constrained intrinsic call was replaced by some value, it was not removed in some cases. The dangling instruction resulted in useless instructions executed in runtime. It happened because constrained intrinsics usually have side effect, it is used to model the interaction with floating-point environment. In some cases side effect is actually absent or can be ignored. This change adds specific treatment of constrained intrinsics so that their side effect can be removed if it actually absents. Differential Revision: https://reviews.llvm.org/D118426	2022-05-07 19:04:11 +07:00
Chenbing Zheng	394c683d40	[InstCombine] sub(add(X,Y),umin(Y,Z)) --> add(X,usub.sat(Y,Z)) Alive2: https://alive2.llvm.org/ce/z/2UNVbp Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D124503	2022-05-07 17:17:48 +08:00
Chenbing Zheng	8eaa1ef0d8	[InstCombine] add casts from splat-a-bit pattern if necessary Splatting a bit of constant-index across a value: sext (ashr (trunc iN X to iM), M-1) to iN --> ashr (shl X, N-M), N-1 If the dest type is different, use a cast (adjust use check). https://alive2.llvm.org/ce/z/acAan3 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124590	2022-05-07 15:34:57 +08:00
Alexander Shaposhnikov	f827ee671f	[Scalar][NFC] Minor cleanups in CallSiteSplitting.cpp	2022-05-06 23:03:49 +00:00
Florian Hahn	7211d5ce07	Revert "[SimpleLoopUnswitch] Collect either logical ANDs/ORs but not both." This reverts commit `db7a87ed4f`. This seems to cause a PPC buildbot failure: https://lab.llvm.org/buildbot#builders/93/builds/8787	2022-05-06 22:38:15 +01:00
Sanjay Patel	b331a7ebc1	[InstCombine] canonicalize fneg after shuffle For the unary shuffle pattern, this is opposite to what we try to do with binops, but it seems better to keep it consistent with the motivating binary shuffle pattern. On that, it is clearly better on the usual no-extra uses case. There is a chance that this will pull an fneg away from some other binop and cause a regression in codegen, but that should be invertible in the backend. The transform is birectional: https://alive2.llvm.org/ce/z/kKaKCU https://alive2.llvm.org/ce/z/3Desfw Fixes #45631	2022-05-06 16:30:26 -04:00
Nikita Popov	82190f917a	[InstCombine] Fold icmp of select with implied condition When threading the icmp over the select, check whether the condition can be folded when taking into account the select condition.	2022-05-06 17:13:32 +02:00
Nikita Popov	0863abe3ac	[InstCombine] Fold icmp of select with non-constant operand Try to push an icmp into a select even if the icmp operand isn't constant - perform a generic SimplifyICmpInst instead. This doesn't appear to impact compile-time much, and forming logical and/or is generally profitable, as we have very good support for them.	2022-05-06 16:04:39 +02:00
Max Kazantsev	5a08e81779	[RS4GC] Add support for 'freeze' instruction to findBaseDefiningValue Because this instruction is a noop, we can simply go through it in search of the base.	2022-05-06 20:46:29 +07:00
Max Kazantsev	e6a7afae03	[NFC] Fix typo in assert message	2022-05-06 20:31:34 +07:00
Nikita Popov	b457ac4240	[InstCombine] Extract icmp of select transform (NFC) To make it either to extend to the case where the other operand is not a constant.	2022-05-06 14:46:44 +02:00
Fraser Cormack	bafab9c09f	[InstCombine] Fix scalable-vector bitwise select matching D113035 enhanced the matching of bitwise selects from vector types. This change unfortunately introduced crashes as it tries to cast scalable vector types to integers. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124997	2022-05-06 12:59:39 +01:00
Florian Hahn	db7a87ed4f	[SimpleLoopUnswitch] Collect either logical ANDs/ORs but not both. After D97756, collectHomogenousInstGraphLoopInvariants may collect conditions for both logical ANDs and logical ORs in case the root is a select that matches both logical AND & OR. This means the function won't return invariant values of either AND/OR chains, but both. This can result in incorrect transformations. See llvm/test/Transforms/SimpleLoopUnswitch/trivial-unswitch-logical-and-or.ll. Without the patch, Alive2 rejects the modified tests with: Source and target don't have the same return domain. Note that this also applies to the test case added in D97756 (@test_partial_condition_unswitch_or_select). We can't unswitch on %cond6, because the graph leading to it contains and AND and an OR. This only fixes trivial unswitching for now, but a similar problem likely exists with non-trivial unswitching. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D124526	2022-05-06 09:50:03 +01:00
Marco Elver	9ae87b5973	[Instrumentation] Share InstrumentationIRBuilder between TSan and SanCov Factor our InstrumentationIRBuilder and share it between ThreadSanitizer and SanitizerCoverage. Simplify its usage at the same time (use function of passed Instruction or BasicBlock). This class may be used in other instrumentation passes in future. NFCI. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D125038	2022-05-06 09:15:17 +02:00
David Green	100cb9a2ba	[VectorCombine] Fold shuffle select pattern This patch adds a combine to attempt to reduce the costs of certain select-shuffle patterns. The form of code it attempts to detect is: %x = shuffle ... %y = shuffle ... %a = binop %x, %y %b = binop %x, %y shuffle %a, %b, selectmask A classic select-mask will pick items from each lane of a or b. These do not always have a great lowering on many architectures. This patch attempts to pack a and b into the lower elements, creating a differently ordered shuffle for reconstructing the orignal which may be better than the select mask. This can be better for performance, especially if less elements of a and b need to be computed and the input shuffles are cheaper. Because select-masks are just one form of shuffle, we generalize to any mask. So long as the backend has decent costmodel for the shuffles, this can generally improve things when they come up. For more basic cost models the folds do not appear to be profitable, not getting past the cost checks. Differential Revision: https://reviews.llvm.org/D123911	2022-05-06 08:13:18 +01:00
Chuanqi Xu	2d037873a3	[Coroutines] Don't re-materialize for debug instructions Re-materialize for debug instructions would cause a different code generated if we enabled `-g`. This is bad. So we disable to re-materialize for debug instructions.	2022-05-06 13:52:19 +08:00
Chenbing Zheng	4c8c101b49	[InstCombine] try to narrow more shifted bswap-of-zext Try to narrow more bswap, if the shift amount is less than the zext (bswap (zext X)) >> C --> (zext (bswap X)) << C' https://alive2.llvm.org/ce/z/i7ddjn Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124598	2022-05-06 10:45:10 +08:00
Florian Hahn	f9f7aa30f8	[VPlan] Remove dead code to create VPWidenPHIRecipes (NFCI). After introducing VPWidenPointerInductionRecipe, VPWidenPHIRecipes should not be created at this point. Turn check into an assert.	2022-05-05 19:29:02 +01:00
Serge Pavlov	e1554ac63a	Revert "[InstCombine] Remove side effect of replaced constrained intrinsics" This reverts commit `83914ee96f`. The change caused discussion: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20220502/1034841.html	2022-05-06 01:09:16 +07:00
Marco Elver	47bdea3f7e	[ThreadSanitizer] Add fallback DebugLocation for instrumentation calls When building with debug info enabled, some load/store instructions do not have a DebugLocation attached. When using the default IRBuilder, it attempts to copy the DebugLocation from the insertion-point instruction. When there's no DebugLocation, no attempt is made to add one. This is problematic for inserted calls, where the enclosing function has debug info but the call ends up without a DebugLocation in e.g. LTO builds that verify that both the enclosing function and calls to inlinable functions have debug info attached. This issue was noticed in Linux kernel KCSAN builds with LTO and debug info enabled: \| ... \| inlinable function call in a function with debug info must have a !dbg location \| call void @__tsan_read8(i8* %432) \| ... To fix, ensure that all calls to the runtime have a DebugLocation attached, where the possibility exists that the insertion-point might not have any DebugLocation attached to it. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D124937	2022-05-05 15:21:35 +02:00
Alexey Bataev	99f31acfce	[SLP]Further improvement of the cost model for scalars used in buildvectors. Further improvement of the cost model for the scalars used in buildvectors sequences. The main functionality is outlined into a separate function. The cost is calculated in the following way: 1. If the Base vector is not undef vector, resizing the very first mask to have common VF and perform action for 2 input vectors (including non-undef Base). Other shuffle masks are combined with the resulting after the 1 stage and processed as a shuffle of 2 elements. 2. If the Base is undef vector and have only 1 shuffle mask, perform the action only for 1 vector with the given mask, if it is not the identity mask. 3. If > 2 masks are used, perform serie of shuffle actions for 2 vectors, combing the masks properly between the steps. The original implementation misses the very first analysis for the Base vector, so the cost might too optimistic in some cases. But it improves the cost for the insertelements which are part of the current SLP graph. Part of D107966. Differential Revision: https://reviews.llvm.org/D115750	2022-05-05 06:04:25 -07:00
Florian Hahn	6bd2b70877	[SimpleLoopUnswitch] Add freeze if branch execs for partial unswitching. We cannot skip the freezing the condition if the unswitched branch executes, if the condition is a chain of ANDs/ORs. For example, if if we have an AND %c1, %c2 with %c1 == undef and %c2 == 0, there would be no branch on undef in the original code, but a branch on undef if we unswitch %c1. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D124603	2022-05-05 09:44:07 +01:00
Chuanqi Xu	405bf90235	[NFC] [Pipelines] Hoist CoroCleanup as Module Pass This is similar to previous patch https://reviews.llvm.org/D123925. It could also reduce the time we call declaresCoroCleanupIntrinsics. And it is helpful for further changes. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D124362	2022-05-05 15:15:09 +08:00
Serge Pavlov	83914ee96f	[InstCombine] Remove side effect of replaced constrained intrinsics If a constrained intrinsic call was replaced by some value, it was not removed in some cases. The dangling instruction resulted in useless instructions executed in runtime. It happened because constrained intrinsics usually have side effect, it is used to model the interaction with floating-point environment. In some cases it is correct behavior but often the side effect is actually absent or can be ignored. This change adds specific treatment of constrained intrinsics so that their side effect can be removed if it actually absents. Differential Revision: https://reviews.llvm.org/D118426	2022-05-05 12:02:42 +07:00
Wael Yehia	2407c13aa4	[AIX][PGO] Enable linux style PGO on AIX This patch switches the PGO implementation on AIX from using the runtime registration-based section tracking to the __start_SECNAME/__stop_SECNAME based. In order to enable the recognition of __start_SECNAME/__stop_SECNAME symbols in the AIX linker, the -bdbg:namedsects:ss needs to be used. Reviewed By: jsji, MaskRay, davidxl Differential Revision: https://reviews.llvm.org/D124857	2022-05-05 04:10:39 +00:00
Alexander Shaposhnikov	ec7122f64b	[InstCombine] Fold ((A&B)^C)\|B Fold ((A&B)^C)\|B into C\|B. https://alive2.llvm.org/ce/z/zSGSor This addresses the issue https://github.com/llvm/llvm-project/issues/55169 Test plan: ninja check-all Differential revision: https://reviews.llvm.org/D124710	2022-05-05 00:56:20 +00:00
Sanjay Patel	14f257620c	[InstCombine] add type constraint to intrinsic+shuffle fold This check is in the related fold for binops, but it was missed when the code was adapted for intrinsics in `432c199e84`. The new test would crash when trying to create a new intrinsic with mismatched types.	2022-05-04 13:07:26 -04:00
Sanjay Patel	7e6d318c50	[InstCombine] move shuffle after funnel shift with same-shuffled operands This extends `432c199e84` and `9c4770eaab` with an intrinsic cited directly in issue #46238 Eventually, we will want to use llvm::isTriviallyVectorizable() or create some new API for this list, but for now, I am intentionally making a minimum change to reduce risk and only affect an intrinsic with regression tests in place.	2022-05-04 13:07:26 -04:00
Sanjay Patel	15042f44a2	[InstCombine] propagate FMF when reordering intrinsics and shuffles This was missed when extending the fold to allow fma with `9c4770eaab`	2022-05-04 12:10:38 -04:00
Sanjay Patel	9c4770eaab	[InstCombine] move shuffle after fma with same-shuffled operands https://alive2.llvm.org/ce/z/sD-JVv This extends `432c199e84` with a 3 arg intrinsic to demonstrate that the code works with the extra operand. Eventually, we will want to use llvm::isTriviallyVectorizable() or create some new API for this list, but for now, I am intentionally making a minimum change to reduce risk and only affect an intrinsic with regression tests in place.	2022-05-04 11:50:38 -04:00
Florian Hahn	8b48223447	Recommit "[VPlan] Remove uneeded needsVectorIV check." This reverts commit `f4e1eaa375`. The patch was originally reverted because it uncovered an issue that has now been fixed in `0ef8ca6d88`.	2022-05-04 10:53:42 +01:00
serge-sans-paille	7030654296	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `fa5a4e1b95` detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D124847	2022-05-04 08:32:38 +02:00
Hongtao Yu	3113e5bb52	[CSSPGO] Relax size limitation for priority inlining with preinlined profile As a follow-up to D124632, I'm turning on unlimited size caps for inlining with preinlined profile. It should be safe as a preinlined profile has "bounded" inline contexts. No noticeable size or perf delta was seen with two of our internal large services, but I think this is still a good change to be consistent with the other case. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D124793	2022-05-03 18:43:07 -07:00
Hongtao Yu	e95ae395aa	[CSSPGO][NFC] Replace SampleProfileLoader::ProfileIsCS with FunctionSamples::ProfileIsCS. The two fields have the same meaning. Their values come from the reader. Therefore I'm removing one. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D124788	2022-05-03 18:32:37 -07:00
Sanjay Patel	432c199e84	[InstCombine] move shuffle after min/max with same-shuffled operands This is an intrinsic version of the existing fold for binops. As a first step, I only allowed min/max, but the code is set up to make adding more intrinsics easy (with more or less than 2 arguments). This (and possible follow-ups) are discussed in issue #46238.	2022-05-03 16:23:11 -04:00
Augie Fackler	1deea714b3	BuildLibCalls: simplify switch statement slightly Per feedback on D123086 after submit. Also added a test for vec_malloc et al attribute inference to show it's doing the right thing. The new tests exposed a defect, corrected by adding vec_free to the list of free functions in MemoryBuiltins.cpp, which had been overlooked all the way back in D94710, over a year ago. Differential Revision: https://reviews.llvm.org/D124859	2022-05-03 13:17:33 -04:00
Dawid Jurczak	9c46a9cf61	[NFC][GVNSink] Don't pretend that iteration is over instructions when it's actually over blocks Differential Revision: https://reviews.llvm.org/D124764	2022-05-03 17:19:40 +02:00
Igor Kirillov	4e5e042d9a	[LoopVectorize] Support reductions that store intermediary result Adds ability to vectorize loops containing a store to a loop-invariant address as part of a reduction that isn't converted to SSA form due to lack of aliasing info. Runtime checks are generated to ensure the store does not alias any other accesses in the loop. Ordered fadd reductions are not yet supported. Differential Revision: https://reviews.llvm.org/D110235	2022-05-03 10:12:30 +01:00
David Green	6f81903e89	[LV][SLP] Mark fptosi_sat as vectorizable This adds fptosi_sat and fptoui_sat to the list of trivially vectorizable functions, mainly so that the loop vectorizer can vectorize the instruction. Marking them as trivially vectorizable also allows them to be SLP vectorized, and Scalarized. The signature of a fptosi_sat requires two type overrides (@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first operand of the intrinsic as a overloaded (but not scalar) operand. Differential Revision: https://reviews.llvm.org/D124358	2022-05-03 09:32:34 +01:00
Vitaly Buka	098e807074	Revert "[DeadArgElim] Set unused arguments for internal functions" Breaks bots, see https://reviews.llvm.org/D124699 This reverts commit `e547a333a4`.	2022-05-02 15:10:26 -07:00
Teresa Johnson	084b65f7dc	[memprof] Only insert dynamic shadow load when needed We don't need to insert a load of the dynamic shadow address unless there are interesting memory accesses to profile. Split out of D124703. Differential Revision: https://reviews.llvm.org/D124797	2022-05-02 13:36:00 -07:00
Alexey Bataev	e74a73782f	[SLP][NFC]Minor code changes for better readability, NFC.	2022-05-02 12:58:25 -07:00

... 2 3 4 5 6 ...

30753 Commits