llvm-project

Commit Graph

Author	SHA1	Message	Date
Philip Reames	8906a0fe64	[SCEVExpander] Drop poison generating flags when reusing instructions The basic problem we have is that we're trying to reuse an instruction which is mapped to some SCEV. Since we can have multiple such instructions (potentially with different flags), this is analogous to our need to drop flags when performing CSE. A trivial implementation would simply drop flags on any instruction we decided to reuse, and that would be correct. This patch is almost that trivial patch except that we preserve flags on the reused instruction when existing users would imply UB on overflow already. Adding new users can, at most, refine this program to one which doesn't execute UB which is valid. In practice, this fixes two conceptual problems with the previous code: 1) a binop could have been canonicalized into a form with different opcode or operands, or 2) the inbounds GEP case which was simply unhandled. On the test changes, most are pretty straight forward. We loose some flags (in some cases, they'd have been dropped on the next CSE pass anyways). The one that took me the longest to understand was the ashr-expansion test. What's happening there is that we're considering reuse of the mul, previously we disallowed it entirely, now we allow it with no flags. The surrounding diffs are all effects of generating the same mul with a different operand order, and then doing simple DCE. The loss of the inbounds is unfortunate, but even there, we can recover most of those once we actually treat branch-on-poison as immediate UB. Differential Revision: https://reviews.llvm.org/D112734	2021-11-29 15:23:34 -08:00
Philip Reames	f50207c015	[unroll] Use early return in shouldPartialUnroll [nfc]	2021-11-29 14:37:18 -08:00
Philip Reames	a655e0f991	[unroll] Reduce scope of UnrollFactor variable in computeUnrollCount [NFC] Suggested in review of D114453, done as a separate change to get all uses at once.	2021-11-29 14:33:14 -08:00
Philip Reames	829b62adf5	[unroll] Split full exact and full bound unroll costing [NFC] This change should be NFC. It's posted for review mostly to make sure others are happy with the names I'm introducing for "exact full unroll" and "bounded full unroll". The motivation here is that our cost model for bounded unrolling is too aggressive - it gives benefits for exits we aren't going to prune - but I also just think the new version of the code is a lot easier to follow. Differential Revision: https://reviews.llvm.org/D114453	2021-11-29 14:18:15 -08:00
Sanjay Patel	99f8b795cc	[InstCombine] try to fold 'or' into 'mul' operand or (mul X, Y), X --> mul X, (add Y, 1) (when the multiply has no common bits with X) We already have this fold if the pattern ends in 'add', but we can miss it if the 'add' becomes 'or' via another no-common-bits transform. This is part of fixing: http://llvm.org/PR49055 ...but it won't make a difference on that example yet. https://alive2.llvm.org/ce/z/Vrmoeb Differential Revision: https://reviews.llvm.org/D114729	2021-11-29 17:03:08 -05:00
Stanislav Mekhanoshin	5c6b9e1622	[InstCombine] (~(a \| b) & c) \| ~(c \| (a ^ b)) -> ~((a \| b) & (c \| (b ^ a))) ``` ---------------------------------------- define i3 @src(i3 %a, i3 %b, i3 %c) { %0: %or1 = or i3 %b, %c %not1 = xor i3 %or1, 7 %and1 = and i3 %a, %not1 %xor1 = xor i3 %b, %c %or2 = or i3 %xor1, %a %not2 = xor i3 %or2, 7 %or3 = or i3 %and1, %not2 ret i3 %or3 } => define i3 @tgt(i3 %a, i3 %b, i3 %c) { %0: %obc = or i3 %b, %c %xbc = xor i3 %b, %c %o = or i3 %a, %xbc %and = and i3 %obc, %o %r = xor i3 %and, 7 ret i3 %r } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D112955	2021-11-29 11:20:34 -08:00
Mehrnoosh Heidarpour	c572eb1ad9	[InstCombine] Fold (~A \| B) ^ A --> ~(A & B) https://alive2.llvm.org/ce/z/gLrYPk Fixes: https://llvm.org/PR52518 Reviewed by: spatel Differential revision: https://reviews.llvm.org/D114339	2021-11-29 11:29:21 -05:00
Bjorn Pettersson	297fb66484	Use a deterministic order when updating the DominatorTree This solves a problem with non-deterministic output from opt due to not performing dominator tree updates in a deterministic order. The problem that was analysed indicated that JumpThreading was using the DomTreeUpdater via llvm::MergeBasicBlockIntoOnlyPred. When preparing the list of updates to send to DomTreeUpdater::applyUpdates we iterated over a SmallPtrSet, which didn't give a well-defined order of updates to perform. The added domtree-updates.ll test case is an example that would result in non-deterministic printouts of the domtree. Semantically those domtree:s are equivalent, but it show the fact that when we use the domtree iterator the order in which nodes are visited depend on the order in which dominator tree updates are performed. Since some passes (at least EarlyCSE) are iterating over nodes in the dominator tree in a similar fashion as the domtree printer, then the order in which transforms are applied by such passes, transitively, also depend on the order in which dominator tree updates are performed. And taking EarlyCSE as an example the end result could be different depending on in which order the transforms are applied. Reviewed By: nikic, kuhar Differential Revision: https://reviews.llvm.org/D110292	2021-11-29 13:14:50 +01:00
Florian Hahn	fd71159f64	[LV] Move code from widenInstruction to VPWidenRecipe. (NFC) The code in widenInstruction has already been transitioned to only rely on information provided by VPWidenRecipe directly. Moving the code directly to VPWidenRecipe::execute completes the transition for the recipe. It provides the following advantages: 1. Less indirection, easier to see what's going on. 2. Removes accesses to fields of ILV. 2) in particular ensures that no dependencies on fields in ILV for vector code generation are re-introduced. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D114322	2021-11-29 09:09:00 +00:00
Kazu Hirata	fd7d40640d	[llvm] Use range-based for loops (NFC)	2021-11-28 18:14:49 -08:00
Nikita Popov	3608e18a94	[DSE] Use MapVector for IOLs I'm not sure whether this can cause any actual non-determinism, but at least it makes the DSE debug log non-deterministic, which makes it harder to debug other non-determinism issues.	2021-11-28 21:54:29 +01:00
Florian Hahn	3495090b9b	[LV] Move code from widenGEP to VPWidenGEPRecipe (NFC). The code in widenGEP has already been transitioned to only rely on information provided by VPWidenGEPRecipe directly. Moving the code directly to VPWidenGEPRecipe::execute completes the transition for the recipe. It provides the following advantages: 1. Less indirection, easier to see what's going on. 2. Removes accesses to fields of ILV. 2) in particular ensures that no dependencies on fields in ILV for GEP code generation are re-introduced. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D114321	2021-11-28 18:29:18 +00:00
Sanjay Patel	f55d1eb374	[InstCombine] use decomposeBitTestICmp to make icmp (trunc X), C more consistent This is a follow-on suggested in D112634. Two folds that were added with that patch are subsumed in the call to decomposeBitTestICmp, and two other folds are potentially inverted. The deleted folds were very specialized by instcombine standards because they were restricted to legal integer types based on the data layout. This generalizes the canonical form independent of target/types. This change has a reasonable chance of exposing regressions either in IR or codegen, but I don't have any evidence for either of those yet. A spot check of asm across several in-tree targets shows variations that I expect are mostly neutral. We have one improvement in an existing IR test that I noted with a comment. Using mask ops might also make more code match with D114272. Differential Revision: https://reviews.llvm.org/D114386	2021-11-28 09:59:37 -05:00
Sanjay Patel	97755ab1c6	[InstCombine] reduce code duplication; NFC	2021-11-28 09:27:20 -05:00
Sander de Smalen	28a4deab92	[LV] Fix incorrectly marking a pointer indvar as 'scalar'. collectLoopScalars should only add non-uniform nodes to the list if they are used by a load/store instruction that is marked as CM_Scalarize. Before this patch, the LV incorrectly marked pointer induction variables as 'scalar' when they required to be widened by something else, such as a compare instruction, and weren't used by a node marked as 'CM_Scalarize'. This case is covered by sve-widen-phi.ll. This change also allows removing some code where the LV tried to widen the PHI nodes with a stepvector, even though it was marked as 'scalarAfterVectorization'. Now that this code is more careful about marking instructions that need widening as 'scalar', this code has become redundant. Differential Revision: https://reviews.llvm.org/D114373	2021-11-28 09:49:28 +00:00
Florian Hahn	25dad1064b	[DSE] Optimize defining access of defs while walking upwards. This patch extends the code that walks memory defs upwards to find clobbering accesses to also try to optimize the clobbering defining access. We should be able to find set the optimized access of our starting def (KillingDef), if the following holds: 1. It is the first call of getDomMemoryDef for KillingDef (so Current == KillingDef->getDefiningAccess(). 2. No potentially aliasing defs are skipped. Then if a (partly) aliasing def is encountered, it can be used as optimized access for KillingDef. No further optimizations can be applied to KillingDef. I'd appreciate a careful look, as the existing documentation is not too clear on what is expected for optimized accesses. The motivation for this patch is to use the optimized accesses to cover more cases of redundant stores as follow-up to D111727. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112313	2021-11-27 13:04:28 +00:00
Alexey Bataev	fc0aacf324	[SLP]Improve analysis/emission of vector operands for alternate nodes. Compiler has an analysis for perfect diamond matching but it does not support nodes with main/alternate opcodes. The problem is that the scalars themselves are different and might not match directly with other nodes, but operands and main/alternate opcodes might match and compiler might reuse some previously emitted vector instructions. Need to include this analysis in the cost model and actual vector instructions emission process. Differential Revision: https://reviews.llvm.org/D114101	2021-11-26 06:38:02 -08:00
David Sherwood	e20391fc5d	[LoopVectorize] When tail-folding, don't always predicate uniform loads In VPRecipeBuilder::handleReplication if we believe the instruction is predicated we then proceed to create new VP region blocks even when the load is uniform and only predicated due to tail-folding. I have updated isPredicatedInst to avoid treating a uniform load as predicated when tail-folding, which means we can do a single scalar load and a vector splat of the value. Tests added here: Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll Differential Revision: https://reviews.llvm.org/D112552	2021-11-26 11:30:54 +00:00
Alexey Bataev	4675a1654c	Revert "[SLP]Improve analysis/emission of vector operands for alternate nodes." This reverts commit `496254cf80` to fix compiler crashes reported in D114101#3152982.	2021-11-25 05:19:49 -08:00
Zarko Todorovski	95875d246a	[LLVM][NFC]Inclusive language: remove occurances of sanity check/test from llvm Part of work to use more inclusive language in clang/llvm. Rewording some comments and change function and variable names.	2021-11-24 17:29:55 -05:00
Artur Pilipenko	aa60d169ea	[CVP] Add a cl::opt for canonicalization of signed relational comparisons This canonicalization breaks the ability to discard checks in some cases. Add a command line option to disable it. This option is on by default, so the change is NFC. See for details: https://reviews.llvm.org/D112895#3149487	2021-11-24 13:52:38 -08:00
Alexey Bataev	496254cf80	[SLP]Improve analysis/emission of vector operands for alternate nodes. Compiler has an analysis for perfect diamond matching but it does not support nodes with main/alternate opcodes. The problem is that the scalars themselves are different and might not match directly with other nodes, but operands and main/alternate opcodes might match and compiler might reuse some previously emitted vector instructions. Need to include this analysis in the cost model and actual vector instructions emission process. Differential Revision: https://reviews.llvm.org/D114101	2021-11-24 12:55:24 -08:00
Florian Hahn	2897b67665	[LV] Use OrigLoop instead of induction to get function. (NFC) Upcoming changes will result in Induction not being set/used in some cases. Use OrigLoop to get the function instead.	2021-11-24 20:17:44 +00:00
Stanislav Mekhanoshin	9300b133c8	Revert "[InstCombine] (~(a \| b) & c) \| ~(c \| (a ^ b)) -> ~((a \| b) & (c \| (b ^ a)))" This reverts commit `c407769f5e`.	2021-11-24 11:14:52 -08:00
Florian Hahn	8b86752c60	[VPlan] Remove unused VPInstruction constructor. (NFC) VPInstruction inherits from VPValue, so the constructor taking ArrayRef<VPValue*> covers all cases that would be covered by the removed constructor.	2021-11-24 14:06:50 +00:00
Rosie Sumpter	df32a39dd0	[LoopVectorize][CostModel] Update cost model for fmuladd intrinsic This patch updates the cost model for ordered reductions so that a call to the llvm.fmuladd intrinsic is modelled as a normal fmul instruction plus the cost of an ordered fadd reduction. Differential Revision: https://reviews.llvm.org/D111630	2021-11-24 08:50:05 +00:00
Rosie Sumpter	2d33327f9d	[LoopVectorize] Print fast-math flags for VPReductionRecipe	2021-11-24 08:50:05 +00:00
Rosie Sumpter	991074012a	[LoopVectorize] Propagate fast-math flags for VPInstruction In-loop vector reductions which use the llvm.fmuladd intrinsic involve the creation of two recipes; a VPReductionRecipe for the fadd and a VPInstruction for the fmul. If the call to llvm.fmuladd has fast-math flags these should be propagated through to the fmul instruction, so an interface setFastMathFlags has been added to the VPInstruction class to enable this. Differential Revision: https://reviews.llvm.org/D113125	2021-11-24 08:50:04 +00:00
Rosie Sumpter	c2441b6b89	[LoopVectorize] Add vector reduction support for fmuladd intrinsic Enables LoopVectorize to handle reduction patterns involving the llvm.fmuladd intrinsic. Differential Revision: https://reviews.llvm.org/D111555	2021-11-24 08:50:04 +00:00
Jun Ma	07333810ca	Revert "Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""""" This reverts commit `c93f93b2e3`.	2021-11-24 10:26:37 +08:00
Mehdi Amini	1392b654ff	Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)" This reverts commit `884b6dd311`. The windows build is broken with a linker error.	2021-11-23 20:10:36 +00:00
spupyrev	884b6dd311	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-11-23 11:02:40 -08:00
Zarko Todorovski	0d3add216f	[llvm][NFC] Inclusive language: Reword replace uses of sanity in llvm/lib/Transform comments and asserts Reworded some comments and asserts to avoid usage of `sanity check/test` Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114372	2021-11-23 13:22:55 -05:00
Philip Reames	03d8bc184a	[indvars] Fix lftr crash when preheader is terminated by switch This was found by oss-fuzz. The switch will get canonicalized to a branch, but if it hasn't been when we run LFTR, we crashed on an unneeded assert.	2021-11-23 09:58:46 -08:00
Philip Reames	065f777d27	Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)" This reverts commit `b00fc19822`. This change fails to build (link) on ubuntu x86,	2021-11-23 09:18:28 -08:00
Philip Reames	18086186ab	[unroll] Remove two dead variable assignments [nfc] These variables are not out-params, and we immediately return after assigning them. Thus, the assignments are dead and just confusing. I believe these used to be out-params, but they're not any more.	2021-11-23 09:12:20 -08:00
spupyrev	b00fc19822	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-11-23 09:08:30 -08:00
Philip Reames	5c77aa2b91	[unroll] Use early return in shouldFullUnroll [nfc]	2021-11-23 09:01:36 -08:00
Sanjay Patel	430ad9697d	[InstCombine] enhance bitwise select matching I noticed that adding a seemingly unrelated fold for xor caused regressions on similar patterns, and this is one of the underlying causes. This could also be a variation for code as seen in: https://llvm.org/PR34047 ...although that exact example should be fixed after: D113035 / `c36b7e21bd` The vector test shows that we are actually missing a potential canonicalization for bitcast-of-sext-of-not or the inverse. The scalar test shows that even if we had that canonicalization, it would still be possible to see this pattern due to extra uses. https://alive2.llvm.org/ce/z/y2BAgi	2021-11-23 09:57:44 -05:00
Evgeniy Brevnov	47e2644c89	[DSE][NFC] Introduce "doesn't overwrite" return code for isOverwrite Add OR_None code to indicate that there is no overwrite. This has no any effect for current uses but will be used in one of the next patches building support for PHI translation. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D105098	2021-11-23 17:11:15 +07:00
Huihui Zhang	9cd7c534e2	[InstCombine] Enable fold select into operand for FAdd, FMul, FSub and FDiv. For FAdd, FMul, FSub and FDiv, fold select into one of the operands to enable further optimizations, i.e., floating-point reduction detection. Turn code: %C = fadd %A, %B %D = select %cond, %C, %A into: %C = select %cond, %B, -0.000000e+00 %D = fadd %A, %C Alive2 verification (with --disable-undef-input), timed out otherwise. FAdd - https://alive2.llvm.org/ce/z/eUxN4Y FMul - https://alive2.llvm.org/ce/z/5SWZz4 FSub - https://alive2.llvm.org/ce/z/Dhj8dU FDiv - https://alive2.llvm.org/ce/z/Yj_NA2 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D113442	2021-11-22 15:10:10 -08:00
Stanislav Mekhanoshin	c407769f5e	[InstCombine] (~(a \| b) & c) \| ~(c \| (a ^ b)) -> ~((a \| b) & (c \| (b ^ a))) Transform ``` (~(a \| b) & c) \| ~(c \| (a ^ b)) -> ~((a \| b) & (c \| (b ^ a))) ``` And swapped case: ``` (a \| ~(b & c)) & ~(a & (b ^ c)) --> ~(a \| b) \| (a ^ b ^ c) ``` ``` ---------------------------------------- define i3 @src(i3 %a, i3 %b, i3 %c) { %0: %or1 = or i3 %b, %c %not1 = xor i3 %or1, 7 %and1 = and i3 %a, %not1 %xor1 = xor i3 %b, %c %or2 = or i3 %xor1, %a %not2 = xor i3 %or2, 7 %or3 = or i3 %and1, %not2 ret i3 %or3 } => define i3 @tgt(i3 %a, i3 %b, i3 %c) { %0: %obc = or i3 %b, %c %xbc = xor i3 %b, %c %o = or i3 %a, %xbc %and = and i3 %obc, %o %r = xor i3 %and, 7 ret i3 %r } Transformation seems to be correct! ``` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %and1 = and i4 %b, %c %not1 = xor i4 %and1, 15 %or1 = or i4 %not1, %a %xor1 = xor i4 %b, %c %and2 = and i4 %xor1, %a %not2 = xor i4 %and2, 15 %and3 = and i4 %or1, %not2 ret i4 %and3 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %xor1 = xor i4 %b, %c %xor2 = xor i4 %xor1, %a %or1 = or i4 %a, %b %not1 = xor i4 %or1, 15 %or2 = or i4 %xor2, %not1 ret i4 %or2 } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D112955	2021-11-22 10:49:21 -08:00
Diego Caballero	4348cd42c3	[LV] Drop integer poison-generating flags from instructions that need predication This patch fixes PR52111. The problem is that LV propagates poison-generating flags (`nuw`/`nsw`, `exact` and `inbounds`) in instructions that contribute to the address computation of widen loads/stores that are guarded by a condition. It may happen that when the code is vectorized and the control flow within the loop is linearized, these flags may lead to generating a poison value that is effectively used as the base address of the widen load/store. The fix drops all the integer poison-generating flags from instructions that contribute to the address computation of a widen load/store whose original instruction was in a basic block that needed predication and is not predicated after vectorization. Reviewed By: fhahn, spatel, nlopes Differential Revision: https://reviews.llvm.org/D111846	2021-11-22 10:57:29 +00:00
Roland McGrath	b72b56016a	NFC: clang-format lib/Transforms/Instrumentation/InstrProfiling.cpp Differential Revision: https://reviews.llvm.org/D114343	2021-11-21 18:16:02 -08:00
Nikita Popov	aeba28bc62	[DSE] Drop hasAnalyzableMemoryWrite() (NFCI) The functionality of hasAnalyzableMemoryWrite() is effectively subsumed by getLocForWriteEx(), which will return None if the instruction is not analyzable. The implementations don't match exactly (e.g. getLocForWriteEx() does not limit non-calls to stores), but in conjunction with the isRemovable() check, it ends up being the same.	2021-11-20 23:20:12 +01:00
Florian Hahn	cf8efbd30e	[VPlan] Wrap vector loop blocks in region. A first step towards modeling preheader and exit blocks in VPlan as well. Keeping the vector loop in a region allows for changing the VF as we traverse region boundaries. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D113182	2021-11-20 17:59:48 +00:00
Sanjay Patel	337948ac6e	[InstCombine] add folds for binop with sexted bool and constant operands This is a generalization/extension of the existing and/or folds noted with TODO comments. Those have a one-use constraint that is not necessary. Potential follow-ups are noted by the TODO comments in the new function. We can also call this function from other binop visit* functions, but we need to add tests first. This solves: https://llvm.org/PR52543 https://alive2.llvm.org/ce/z/NWuCR5	2021-11-20 12:33:00 -05:00
Kazu Hirata	d1abf481da	[llvm] Use range-based for loops (NFC)	2021-11-19 21:12:13 -08:00
ksyx	97b9e8438e	[GVN][NFC] Remove redundant check The if-check above deleted part guarantees that StoreOffset <= LoadOffset and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0, while it could be meaningless to have a type with nonpositive size, so that the check could be removed. The values are converted to signed types to avoid unsigned operation with negative offsets. Part of revision D100179 Reapply commit `c35e8185d8` with fixing problem reported by mstorsjo	2021-11-19 20:24:36 -05:00
Ellis Hoag	de11de308b	[InstrProf] Use i32 for GEP index from lowering llvm.instrprof.increment The `llvm.instrprof.increment` intrinsic uses `i32` for the index. We should use this same type for the index into the GEP instructions. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D114268	2021-11-19 15:45:14 -08:00
Fabian Wolff	7eec832def	[DSE] Improve handling of `strncpy` in Dead Store Elimination Fixes PR#52062 and one of the remaining cases of PR#47644. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D114035	2021-11-19 17:46:29 +00:00
Florian Hahn	76effb001d	[LV] Remove obsolete comment about creating a dummy block (NFC) No dummy pre-entry block is created since `a6c4969f5f`. The comment is stale now and can be removed. Mentioned by @Ayal in D113182.	2021-11-19 17:17:04 +00:00
Alexey Bataev	d1fdf867b1	[SLP][NFC]Introduce TreeEntry::getVectorFactor member function, NFC. Added TreeEntry::getVectorFactor to get the final vectotization factor to simplify the code. Differential Revision: https://reviews.llvm.org/D114190	2021-11-19 06:32:19 -08:00
Senran Zhang	0425ea4621	[NFC][OpaquePtr][Evaluator] Remove call to PointerType::getElementType There are still another 2 uses of PointerType::getElementType in Evaluator when evaluating BitCast's on pointers. BitCast's on pointers should be removed when opaque ptr is ready, so I just keep them as is. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D114131	2021-11-19 10:32:55 +08:00
Nikita Popov	46c26991ae	[DSE] Remove getLocForWrite() (NFCI) This implements nearly the same logic as getLocForWriteEx(), and is only used in one place. In that context, we should also know that getLocForWriteEx() returns a non-None result. As such, consolidate everything to use one function.	2021-11-18 21:19:18 +01:00
Nikita Popov	f1295563f1	[DSE] Move removePartiallyOverlappedStores() into DSEState (NFC) So it can use getLocForWriteEx().	2021-11-18 21:19:18 +01:00
Arnold Schwaighofer	7d11c5dac2	Coro: Remove coro_end and coro_suspend_retcon in private unprocessed functions We might emit functions that are private and never called. The coro split pass only processes functions that might be called. Remove intrinsics that we can't generate code for. rdar://84619859 Differential Revision: https://reviews.llvm.org/D114021	2021-11-18 07:48:24 -08:00
Stanislav Mekhanoshin	6d3db28088	[InstCombine] Generalize complex OR patterns to AND For every pattern with only NOT, OR, and AND operations there is always a symmetrical attern with AND and OR swapped. This adds 2 transformations: https://reviews.llvm.org/D113526 ``` (~(a & b) \| c) & (~(a & c) \| b) --> ~((b ^ c) & a) (~(a & b) \| c) & ~(a & c) --> ~((b \| c) & a) ``` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %and1 = and i4 %b, %a %not1 = xor i4 %and1, 15 %and2 = and i4 %a, %c %not2 = xor i4 %and2, 15 %or = or i4 %not2, %b %r = and i4 %or, %not1 ret i4 %r } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %or = or i4 %b, %c %and = and i4 %or, %a %r = xor i4 %and, 15 ret i4 %r } Transformation seems to be correct! ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %and1 = and i4 %a, %b %not1 = xor i4 %and1, 15 %or1 = or i4 %not1, %c %and2 = and i4 %a, %c %not2 = xor i4 %and2, 15 %or2 = or i4 %not2, %b %and3 = and i4 %or1, %or2 ret i4 %and3 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %xor = xor i4 %b, %c %and = and i4 %xor, %a %not = xor i4 %and, 15 ret i4 %not } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D113526	2021-11-17 10:47:36 -08:00
Arthur Eubanks	e3e25b5112	[NewPM] Add option to prevent rerunning function pipeline on functions in CGSCC adaptor In a CGSCC pass manager, we may visit the same function multiple times due to SCC mutations. In the inliner pipeline, this results in running the function simplification pipeline on a function multiple times even if it hasn't been changed since the last function simplification pipeline run. We use a newly introduced analysis to keep track of whether or not a function has changed since the last time the function simplification pipeline has run on it. If we see this analysis available for a function in a CGSCCToFunctionPassAdaptor, we skip running the function passes on the function. The analysis is queried at the end of the function passes so that it's available after the first time the function simplification pipeline runs on a function. This is a per-adaptor option so it doesn't apply to every adaptor. The goal of this is to improve compile times. However, currently we can't turn this on by default at least for the higher optimization levels since the function simplification pipeline is not robust enough to be idempotent in many cases, resulting in performance regressions if we stop running the function simplification pipeline on a function multiple times. We may be able to turn this on for -O1 in the near future, but turning this on for higher optimization levels would require more investment in the function simplification pipeline. Heavily inspired by D98103. Example compile time improvements with flag turned on: https://llvm-compile-time-tracker.com/compare.php?from=998dc4a5d3491d2ae8cbe742d2e13bc1b0cacc5f&to=5c27c913687d3d5559ef3ab42b5a3d513531d61c&stat=instructions Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D113947	2021-11-17 09:06:46 -08:00
Dmitry Vyukov	a7c57c4ec8	tsan: don't consider debug calls as calls Tsan pass does 2 optimizations based on presence of calls: 1. Don't emit function entry/exit callbacks if there are no calls and no memory accesses. 2. Combine read/write of the same variable if there are no intervening calls. However, all debug info is represented as CallInst as well and thus effectively disables these optimizations. Don't consider debug info calls as calls. Reviewed By: glider, melver Differential Revision: https://reviews.llvm.org/D114079	2021-11-17 14:42:16 +01:00
David Sherwood	670dd40244	[Analysis] Fix getNumberOfParts to return 0 when the answer is unknown When asking how many parts are required for a scalable vector type there are occasions when it cannot be computed. For example, <vscale x 1 x i3> is one such vector for AArch64+SVE because at the moment no matter how we promote the i3 type we never end up with a legal vector. This means that getTypeConversion returns TypeScalarizeScalableVector as the LegalizeKind, and then getTypeLegalizationCost returns an invalid cost. This then causes BasicTTImpl::getNumberOfParts to dereference an invalid cost, which triggers an assert. This patch changes getNumberOfParts to return 0 for such cases, since the definition of getNumberOfParts in TargetTransformInfo.h states that we can use a return value of 0 to represent an unknown answer. Currently, LoopVectorize.cpp is the only place where we need to check for 0 as a return value, because all other instances will not currently ask for the number of parts for <vscale x 1 x iX> types. In addition, I have changed the target-independent interface for getNumberOfParts to return 1 and assume there is a single register that can fit the type. The loop vectoriser has lots of tests that are target-independent and they relied upon the 0 value to mean the answer is known and that we are not scalarising the vector. I have added tests here that show we correctly return an invalid cost for VF=vscale x 1 when the loop contains unusual types such as i7: Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll Differential Revision: https://reviews.llvm.org/D113772	2021-11-17 12:07:09 +00:00
Stanislav Mekhanoshin	c74f2e5b27	[InstCombine] Use SpecificBinaryOp_match in two more places Differential Revision: https://reviews.llvm.org/D114038	2021-11-17 01:16:06 -08:00
Hongtao Yu	042cefd2b5	[CSSPGO] Fix a hash code truncating issue in ContextTrieNode. std::hash returns a 64bit hash code while previously we were using only lower 32 bits which caused hash collision for large workloads. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D113688	2021-11-16 11:01:52 -08:00
Sanjay Patel	8fce94f916	[InstCombine] canonicalize icmp with trunc op into mask and cmp, part 2 If C is a high-bit mask: (trunc X) u< C --> (X & C) != C (are any masked-high-bits clear?) If C is low-bit mask: (trunc X) u> C --> (X & ~C) != 0 (are any masked-high-bits set?) If C is not-of-power-of-2 (one clear bit): (trunc X) u> C --> (X & (C+1)) == C+1 (are all masked-high-bits set?) This extends the fold added with: `acabad9ff6` (https://alive2.llvm.org/ce/z/aFr7qV) Using decomposeBitTestICmp() to generalize this is a planned follow-up, but that requires removing an inverse fold. Here are Alive2 generalizations for these folds: https://alive2.llvm.org/ce/z/u-ZpC_ (ult, the previous patch) https://alive2.llvm.org/ce/z/YsuAu2 (ult, this patch) https://alive2.llvm.org/ce/z/ekktQP (ugt, low bitmask) https://alive2.llvm.org/ce/z/pJY9wR (ugt, one clear bit) Differential Revision: https://reviews.llvm.org/D112634	2021-11-16 09:27:30 -05:00
Alexey Bataev	900cc1a226	[SLP]Improve cost of the gather nodes. No need to count the final shuffle cost for the constants, gathering of the constants is just a constant vector + extra inserts, if required. Differential Revision: https://reviews.llvm.org/D113770	2021-11-16 06:25:07 -08:00
Alexey Bataev	cdf8a53c1d	[SLP]Fix windows build, NFC. Need to put `IndexIdx` var to the list of captures.	2021-11-16 06:09:51 -08:00
Alexey Bataev	aa9bbb64be	[SLP]Adjust GEP indices types when trying to build entries. Need to adjust the types of GEPs indices when building the tree entries/operands. Otherwise some of the nodes might differ and vectorizer is unable to correctly find them and count their cost. Differential Revision: https://reviews.llvm.org/D113792	2021-11-16 05:44:33 -08:00
Sander.DeSmalen@arm.com	305816ff1e	[IndVarSimplify] Reduce nondeterministic behaviour in visitIVCast. rGf39978b84f1d3a1da6c32db48f64c8daae64b3ad led to and/or exposed an issue with IndVarSimplification for a loop where a i32 phi node is no longer replaced by a widened (i64) phi node, because the SCEVs of a sign-extend no longer folded the same way. I'm unsure how to properly explain this because it's all rather complicated, but in short: SCEVs don't fold as nicely as they used to and this caused a difference. While investigating this, I found that IndVarSimplify can actually optimise the case in the way we want to if it chooses the widened IV to be 'signed' (the i32 IV is both sign and zero-extended). Oddly enough, there is some level of indeterminism in the way the algorithm works, it just picks the sign of the 'first' zext/sext user, where the order of the users-iterator is not guaranteed to be the same on each invocation of the pass (e.g. shown by first running loop-rotate, which puts the users in a different order). While I think the fix is valid in the sense that consistently picking _any_ order is better than having an nondeterministic order, I can use a bit of advice from people more familiar in this area of the code-base. For example, I'm not sure if this fix is hiding another issue where the IndVarSimplify pass could actually draw the same conclusions (i.e. that it only needs an i64 phi node) if it does a bit more work, regardless of whether it chooses the induction variable to be signed or unsigned. I'm also not sure if choosing signed is better than unsigned, or whether that just happens to be beneficial only in this individual case. Any feedback would be much appreciated! Reviewed By: reames Differential Revision: https://reviews.llvm.org/D112573	2021-11-16 12:41:04 +00:00
Arthur Eubanks	19867de9e7	[NewPM] Only invalidate modified functions' analyses in CGSCC passes + turn on eagerly invalidate analyses Previously, any change in any function in an SCC would cause all analyses for all functions in the SCC to be invalidated. With this change, we now manually invalidate analyses for functions we modify, then let the pass manager know that all function analyses should be preserved since we've already handled function analysis invalidation. So far this only touches the inliner, argpromotion, function-attrs, and updateCGAndAnalysisManager(), since they are the most used. This is part of an effort to investigate running the function simplification pipeline less on functions we visit multiple times in the inliner pipeline. However, this causes major memory regressions especially on larger IR. To counteract this, turn on the option to eagerly invalidate function analyses. This invalidates analyses on functions immediately after they're processed in a module or scc to function adaptor for specific parts of the pipeline. Within an SCC, if a pass only modifies one function, other functions in the SCC do not have their analyses invalidated, so in later function passes in the SCC pass manager the analyses may still be cached. It is only after the function passes that the eager invalidation takes effect. For the default pipelines this makes sense because the inliner pipeline runs the function simplification pipeline after all other SCC passes (except CoroSplit which doesn't request any analyses). Overall this has mostly positive effects on compile time and positive effects on memory usage. https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructions https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss D113196 shows that we slightly regressed compile times in exchange for some memory improvements when turning on eager invalidation. D100917 shows that we slightly improved compile times in exchange for major memory regressions in some cases when invalidating less in SCC passes. Turning these on at the same time keeps the memory improvements while keeping compile times neutral/slightly positive. Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D113304	2021-11-15 14:44:53 -08:00
Philip Reames	8f95e915cd	[unroll-runtime] Relax two profitability limitations on multi-exit unrolling This change is mostly about getting rid of some "uninteresting" cases in a follow on deeper heuristic change. If anyone sees actually interesting code differences out of this, please let me know. I'm not expecting this to have much impact at all. Case 1 - With the single deoptimize non-latch exit, we can't have two exiting blocks sharing an exit block. We can only hit this with a poorly documented debug flag. Case 2 - Why should we treat epilog cases differently from prolog cases? Or to say it differently, why should starting with a constant control whether a multiple exit loop gets unrolled? Sorry for the lack of tests here. These are both exceedingly narrow cases in practice, and after a while trying, I couldn't come up with a test which did anything "useful" as opposed to simply exercise a random combination of force flags. Note that the legality cases for each are already exercised with force flags.	2021-11-15 13:00:14 -08:00
Philip Reames	423da61835	[runtime-unroll] Inline canSafelyUnrollMultiExitLoop [NFC] All of the interesting logic from this routine has been removed, inline the single check into the sole non-assert caller. The assert use has little value with the restructured code and is simply dropped.	2021-11-15 11:39:07 -08:00
Stanislav Mekhanoshin	e785f4ab6a	[PatternMatch] Add m_BinOp/m_c_BinOp with specific opcode Differential Revision: https://reviews.llvm.org/D113508	2021-11-15 11:24:27 -08:00
Philip Reames	e99902a872	[runtime-unroll] Restructure if-clause to improve readability [NFC]	2021-11-15 11:13:27 -08:00
Alexey Bataev	224e46d355	[SLP][DOT][NFCI]Output all scalars for the splats, not only the first one.	2021-11-15 10:54:26 -08:00
Mehrnoosh Heidarpour	7daa95c8fa	[InstCombine] Fold (A^B)\|~A-->~(A&B) https://alive2.llvm.org/ce/z/2v6rhF Fixes: https://llvm.org/PR52478 Differential Revision: https://reviews.llvm.org/D113783	2021-11-15 12:29:37 -05:00
Alexey Bataev	036207d5f2	[SLP]Improve splat detection. A bunch of scalars can be treated as a splat not only if all elements are the same but also if some of them are undefvalues. Differential Revision: https://reviews.llvm.org/D113774	2021-11-15 07:50:34 -08:00
Alexey Bataev	b85152f8b1	[SLP][NFC]Use `isa_and_nonnull` and fix comment, NFC.	2021-11-15 06:49:33 -08:00
ksyx	72b5138d37	Revert "[GVN][NFC] Remove redundant check" This reverts commit `c35e8185d8`. mstorsjo reported in the revision thread that one VNCoercion assertion is violated and seemly in relate to this commit. As per "If a test case that demonstrates a problem is reported in the commit thread, please revert and investigate offline", this commit is reverted.	2021-11-15 09:14:13 -05:00
Alexey Bataev	6fb5bed7d1	[SLP]Do not create unused gather nodes for scalar arguments of vector intrinsics. If the vector intrinsic has scalar argument, we currently still create a tree entry for this argument. This entry is not used, just consumes resources and increases the cost of the tree. Differential Revision: https://reviews.llvm.org/D113806	2021-11-15 06:11:19 -08:00
Sander de Smalen	f835fe8ef7	[LV] Rename blockNeedsPredication to blockNeedsPredicationForAnyReason. The interface is a convenience function to ask if a block requires predication when widening, but it's important that there are two separate concepts to consider: (A) The block was predicated in the original loop. (B) The block was unpredicated in the original loop, but requires predication because of tail folding. In the case of (B) we know that at least one lane of the vector will be executed, which means we can implementing a load from a uniform address with a scalar load + splat (D112552). In the case of predication because of (A), we cannot do this, because the scalar load itself requires predication. The name 'blockNeedsPredication' does not make the distinction between (A) and (B), hence the reason to rename it. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D113392	2021-11-15 08:04:20 +00:00
Kazu Hirata	feb40a3a47	[llvm] Use range-based for loops with instructions (NFC)	2021-11-14 19:40:48 -08:00
Kazu Hirata	d243cbf8ea	[llvm] Use isa instead of dyn_cast (NFC)	2021-11-14 19:40:46 -08:00
Mircea Trofin	a32c2c3808	[NFC] Use Optional<ProfileCount> to model invalid counts ProfileCount could model invalid values, but a user had no indication that the getCount method could return bogus data. Optional<ProfileCount> addresses that, because the user must dereference the optional. In addition, the patch removes concept duplication. Differential Revision: https://reviews.llvm.org/D113839	2021-11-14 19:03:30 -08:00
Kazu Hirata	7379736774	[llvm] Use range-based for loops with User::operands (NFC)	2021-11-14 09:32:38 -08:00
Kazu Hirata	098e935174	[llvm] Use range-based for loops with CallBase::args (NFC)	2021-11-14 09:32:36 -08:00
Mircea Trofin	0662a3612c	[NFC][InlineFunction] Renamed some vars to conform to coding style	2021-11-14 07:26:44 -08:00
Kazu Hirata	7505b7045f	[llvm] Use GetElementPtrInst::indices (NFC)	2021-11-13 21:43:28 -08:00
ksyx	c35e8185d8	[GVN][NFC] Remove redundant check The if-check above deleted part guarantees that StoreOffset <= LoadOffset and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0, while it could be meaningless to have a type with nonpositive size, so that the check could be removed. Part of revision D100179 Reviewed By: nikic	2021-11-13 15:59:43 -05:00
Philip Reames	37ead201e6	[runtime-unroll] Use incrementing IVs instead of decrementing ones This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing. Why does this matter? A couple of reasons: * SCEV doesn't have a native subtract node. Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such. As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones. (You can see this in the inferred flags in some of the test cases.) * Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language. We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced. (You can see this looking at nearby phis in the test cases.) Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen. * Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value. We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.	2021-11-12 15:44:58 -08:00
Philip Reames	de2fed6152	[unroll] Keep unrolled iterations with initial iteration The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration. With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code. However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.	2021-11-12 11:40:50 -08:00
Joel E. Denny	c9dfe322ee	[OpenMP] Fix main thread barrier for Pascal and amdgpu Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113602	2021-11-12 11:18:45 -05:00
Alexey Bataev	352c46e707	[SLP]Improve vectorization of split loads. Need to fix ther cost estimation for split loads, since we look at the subregs already, no need to permute them, need just to estimate subregister insert, if it is smaller than the real register. Also, using split loads, it might be profitable already to vectorize smaller trees with gathering of the loads. Differential Revision: https://reviews.llvm.org/D107188	2021-11-12 06:13:22 -08:00
Nikita Popov	986416251b	[InstCombine] Drop redundant fold for and/or of icmp eq/ne (NFCI) This handles a special case of foldAndOrOfICmpsUsingRanges() with two equality predicates.	2021-11-11 20:25:40 +01:00
Nikita Popov	84e273cced	[InstCombine] Handle undefs in and of icmp eq zero fold For the scalar/splat case, this fold is subsumed by foldLogOpOfMaskedICmps(). However, the conjugated fold for "or" also supports splats with undef. Make both code paths consistent by using m_ZeroInt() for the "and" implementation as well. https://alive2.llvm.org/ce/z/tN63cu https://alive2.llvm.org/ce/z/ufB_Ue	2021-11-11 19:07:07 +01:00
Nikita Popov	0242a6adf7	[InstCombine] Support splat vectors in some or of icmp folds Replace m_ConstantInt() with m_APInt() in order to support splat constants in addition to scalar integers.	2021-11-10 22:59:09 +01:00
Nikita Popov	861adaf2ad	[InstCombine] Support splat vectors in some and of icmp folds Replace m_ConstantInt() with m_APInt() to support splat vectors in addition to scalar integers.	2021-11-10 22:37:54 +01:00
Nikita Popov	58ebc79a64	[InstCombine] Strip offset when folding and/or of icmps When folding and/or of icmps, look through add of a constant and adjust the icmp range instead. Effectively, this decomposes X + C1 < C2 style range checks back into a normal range. This allows us to fold comparisons involving two range checks or one range check and some other condition. We had a fold for a really specific case of this (or of range check and eq, and only one one side!) while this handles it in fully generality. Differential Revision: https://reviews.llvm.org/D113510	2021-11-10 22:01:52 +01:00
Stanislav Mekhanoshin	5731381594	[InstCombine] Relax and reorganize one use checks in the ~(a \| b) & c Since there is just a single check for LHS in ~(A \| B) & C \| ... transforms and multiple RHS checks inside with more coming I am removing m_OneUse checks for LHS and adding new checks for RHS. This is non essential as long as there is total benefit. In addition (~(A \| B) & C) \| (~(A \| C) & B) --> (B ^ C) & ~A checks were overly restrictive, it should be good without any additional checks. Differential Revision: https://reviews.llvm.org/D113141	2021-11-10 10:14:12 -08:00
Sanjay Patel	67299aa84f	[InstCombine] add check for integer source type from cast to prevent crash A problem was noted in the post-commit review for `c36b7e21bd` / D113035 : If the source type is not integer or integer vector, then we could crash when trying to ComputeNumSignBits().	2021-11-10 09:44:55 -05:00
Florian Hahn	93931d78cf	[LV] Do not rely on InductionDescriptor::getCastInsts. (NFC) Now that CastDef is passed as VPValue, there is no need to access ID.getCastInsts, as CastDef can instead be checked.	2021-11-10 13:03:44 +00:00

1 2 3 4 5 ...

29059 Commits