llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	c3d1504d63	[InstCombine] fix crash on type mismatch with fcmp fold The existing predicate doesn't work for a single-element vector, so make sure we are not crossing scalar/vector types. Test (was crashing) based on the post-commit example for: `4827771234`	2022-09-01 08:57:55 -04:00
Sanjay Patel	addbdac5d5	[InstCombine] fold power-of-2 ctlz/cttz with inverted result When X is a power-of-two or zero and zero input is poison: ctlz(i32 X) ^ 31 --> cttz(X) cttz(i32 X) ^ 31 --> ctlz(X) https://alive2.llvm.org/ce/z/Cs7sFE	2022-09-01 08:57:55 -04:00
Nikita Popov	3f8b1d0f15	[LICM] Add some debug output to scalar promotion (NFC)	2022-09-01 14:46:30 +02:00
Alexey Bataev	982d9ef1c1	[SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison. Need either follow the original order of the operands for bool logical ops, or emit freeze instruction to avoid poison propagation. Differential Revision: https://reviews.llvm.org/D126877	2022-09-01 05:34:45 -07:00
Yuanbo Li	ebd0249fcf	[DebugInfo] Missing debug location after replacement in processSRem function This patch fixes an issue in which CorrelatedValuePropagation::processSRem would create new instructions to represent the SRem instruction, but would not correctly copy any existing debug location metadata to the new instruction. Differential Revision: https://reviews.llvm.org/D132218	2022-09-01 13:18:17 +01:00
Florian Hahn	fc444ddc77	[VPlan] Add field to track if intrinsic should be used for call. (NFC) This patch moves the cost-based decision whether to use an intrinsic or library call to the point where the recipe is created. This untangles code-gen from the cost model and also avoids doing some extra work as the information is already computed at construction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132585	2022-09-01 13:14:40 +01:00
Nuno Lopes	fa154a9170	Revert "Expand Div/Rem: consider the case where the dividend is zero" This reverts commit `4aed09868b`.	2022-09-01 12:11:22 +01:00
Nuno Lopes	4aed09868b	Expand Div/Rem: consider the case where the dividend is zero So we can't use ctlz in poison-producing mode	2022-09-01 12:00:03 +01:00
Pavel Samolysov	527b9a9d90	[DeadArgElim] Use structure bindings in foreach loops. NFC Differential Revision: https://reviews.llvm.org/D133026	2022-09-01 13:48:46 +03:00
Nikita Popov	43e7d9af1d	[InstCombine] Fold extractvalue of phi Just as we do for most other operations, we should push extractvalue instructions through phis, if this does not increase unfolded instruction count.	2022-09-01 10:51:54 +02:00
Arthur Eubanks	04f3c20989	[NFC][LICM] Stop passing around unused BFI Uses of this were removed in `1a25d0bfbb`.	2022-08-31 19:15:34 -07:00
Vitaly Buka	53d1ae88f8	[nfc][msan] Prepare the code for check sorting	2022-08-31 15:36:49 -07:00
Nikita Popov	ab6876a40d	reland: [Local] Allow creating callbr with duplicate successors Since D129288, callbr is allowed to have duplicate successors. This patch removes a limitation which prevents optimizations from actually producing such callbrs. This is probably the riskiest of all the recent callbr changes, because code with incorrect assumptions might be lurking somewhere. I fixed the one case I encountered ahead of time in `8201e3ef5c`. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D129997 Originally landed as commit `08860f525a` ("[Local] Allow creating callbr with duplicate successors") Reverted in commit `1cf6b93df1` ("Revert "[Local] Allow creating callbr with duplicate successors"")	2022-08-31 13:23:00 -07:00
Alexey Bataev	588115c117	[SLP][NFC]Add a check for SelectInst to match description, NFC.	2022-08-31 13:04:21 -07:00
Alexey Bataev	d8d9ee10bb	[SLP][NFC]Fix comment and make function following naming standard, NFC.	2022-08-31 12:37:55 -07:00
Philip Reames	8524622bdc	[SLP] Simplify getOperandInfo implementation and be consistent This is NOT nfc. Specifically, the following behavior changes: * Pointers are now allowed. Both uniform, and constants. * FP uniform non-constants can now be recognized. * FP undefs are no longer considered constant. This matches int behavior which we had tests for. FP behavior was untested. Its not clear to me int behavior is reasonable, but it's what tests seem to expect, so go with minimum impact for now.	2022-08-31 12:24:05 -07:00
Nikita Popov	ad66bc42b0	[InstCombine] Use getInsertionPointAfterDef() in freeze fold This simplifies the code and fixes handling of catchswitch, in which case we have no insertion point for the freeze. Originally part of D129660.	2022-08-31 11:32:57 +02:00
Nikita Popov	8f3fd26b74	[Reassociate] Use getInsertionPointerAfterDef() This simplifies the code and fixes handling for the callbr case, where the instruction needs to be inserted in the normal destination, rather than after the terminator. Originally part of D129660.	2022-08-31 11:10:24 +02:00
Nikita Popov	972840aa3b	[IR] Add Instruction::getInsertionPointAfterDef() Transforms occasionally want to insert an instruction directly after the definition point of a value. This involves quite a few different edge cases, e.g. for phi nodes the next insertion point is not the next instruction, and for invokes and callbrs its not even in the same block. Additionally, the insertion point may not exist at all if catchswitch is involved. This adds a general Instruction::getInsertionPointAfterDef() API to implement the necessary logic. For now it is used in two places where this should be mostly NFC. I will follow up with additional uses where this fixes specific bugs in the existing implementations. Differential Revision: https://reviews.llvm.org/D129660	2022-08-31 10:50:10 +02:00
Fangrui Song	13f0795425	[SLPVectorizer] Fix -Wunused-lambda-capture in -DLLVM_ENABLE_ASSERTIONS=off build	2022-08-30 23:01:22 -07:00
Chenbing Zheng	35a3048c25	[InstCombine] add support for multi-use Y of (X op Y) op Z --> (Y op Z) op X For (X op Y) op Z --> (Y op Z) op X we can still do transform when Y is multi-use. In D131356 limit it to one-use, this patch remove this limit. This is still not a complete solution, I add a todo test to show it. In this case, X and Y are both multi use, we can't differentiate how to convert based on this. But at least we don't make the code worse，and it can solve half the scenarios.	2022-08-31 10:55:05 +08:00
Alexey Bataev	ec06df9459	[SLP]Fix PR57447: Assertion `!getTreeEntry(V) && "Scalar already in tree!"' failed. The pointer operands for the ScatterVectorize node may contain non-instruction values and they are not checked for "already being vectorized". Need to check that such pointers are already vectorized and gather them instead of trying to build vectorize node to avoid compiler crash. Differential Revision: https://reviews.llvm.org/D132949	2022-08-30 12:30:14 -07:00
Sanjay Patel	8a19842c0e	[InstCombine] delete redundant folds; NFC InstSimplify does this via isKnownNonEqual(), so it's already using knownbits on these patterns and trying other folds.	2022-08-30 14:21:29 -04:00
Alexey Bataev	afbf5466ba	[SLP]Improve operands kind analaysis for constants. Removed EnableFP parameter in getOperandInfo function since it is not needed, the operands kinds also controlled by the operation code, which allows to remove extra check for the type of the operands. Also, added analysis for uniform constant float values. This change currently does not trigger any changes in the code since TTI does not do analysis for constant floats, so it can be considered NFC. Tested with llvm-test-suite + SPEC2017, no changes. Differential Revision: https://reviews.llvm.org/D132886	2022-08-30 06:35:39 -07:00
zhongyunde	23a5de4294	[InstCombine] Distributive or+mul with const operand We aleady support the transform: `(X+C1)CI -> XCI+C1CI` Here the case is a little special as the form of `(X+C1)CI` is transformed into `(X\|C1)CI`, so we should also support the transform: `(X\|C1)CI -> XCI+C1CI` Fixes https://github.com/llvm/llvm-project/issues/57278 Reviewed By: bcl5980, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D132658	2022-08-30 20:36:52 +08:00
Florian Hahn	b5e208fcba	[DSE] Support looking through memory phis at end of function. Update isWriteAtEndOfFunction to look through MemoryPhis. The reason MemoryPhis were skipped so far was the known AliasAnalysis issue with it missing loop-carried dependences. This problem is already addressed in other parts of the code by skipping MemoryDefs that may be in difference loops. I think the same logic can be applied here. This can have a substantial impact on the number of stores removed in some cases. For MultiSource/SPEC2006/SPEC2017 with -O3: ``` Metric: dse.NumFastStores Program dse.NumFastStores base patch diff External/S...CINT2017rate/557.xz_r/557.xz_r 14.00 45.00 221.4% External/S...te/538.imagick_r/538.imagick_r 439.00 1267.00 188.6% MultiSourc...e/Applications/SIBsim4/SIBsim4 6.00 15.00 150.0% MultiSourc...Prolangs-C/simulator/simulator 3.00 7.00 133.3% MultiSource/Applications/siod/siod 3.00 7.00 133.3% MultiSourc...arks/FreeBench/distray/distray 6.00 9.00 50.0% MultiSourc...e/Applications/obsequi/Obsequi 22.00 30.00 36.4% MultiSource/Benchmarks/Ptrdist/bc/bc 23.00 28.00 21.7% External/S...NT2017rate/502.gcc_r/502.gcc_r 1258.00 1512.00 20.2% External/S...te/520.omnetpp_r/520.omnetpp_r 954.00 1143.00 19.8% External/S...rate/510.parest_r/510.parest_r 5961.00 7122.00 19.5% External/S...C/CINT2006/445.gobmk/445.gobmk 47.00 56.00 19.1% External/S...00.perlbench_r/500.perlbench_r 241.00 286.00 18.7% External/S...NT2006/471.omnetpp/471.omnetpp 36.00 42.00 16.7% External/S...06/400.perlbench/400.perlbench 183.00 210.00 14.8% MultiSource/Applications/SPASS/SPASS 72.00 81.00 12.5% External/S...17rate/541.leela_r/541.leela_r 72.00 80.00 11.1% External/SPEC/CINT2006/403.gcc/403.gcc 585.00 642.00 9.7% MultiSourc...e/Applications/sqlite3/sqlite3 120.00 131.00 9.2% MultiSourc...Applications/hexxagon/hexxagon 11.00 12.00 9.1% External/S.../CFP2006/453.povray/453.povray 566.00 615.00 8.7% External/S...rate/511.povray_r/511.povray_r 578.00 627.00 8.5% External/S...FP2006/482.sphinx3/482.sphinx3 12.00 13.00 8.3% MultiSource/Applications/oggenc/oggenc 130.00 140.00 7.7% MultiSourc...e/Applications/ClamAV/clamscan 250.00 268.00 7.2% MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg 19.00 20.00 5.3% MultiSourc...ch/consumer-jpeg/consumer-jpeg 19.00 20.00 5.3% External/S...te/526.blender_r/526.blender_r 3747.00 3928.00 4.8% MultiSourc...OE-ProxyApps-C++/miniFE/miniFE 104.00 108.00 3.8% MultiSourc...ch/consumer-lame/consumer-lame 54.00 56.00 3.7% MultiSource/Benchmarks/Bullet/bullet 1222.00 1264.00 3.4% MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 973.00 1005.00 3.3% External/S.../CFP2006/447.dealII/447.dealII 2699.00 2780.00 3.0% External/S...06/483.xalancbmk/483.xalancbmk 788.00 810.00 2.8% External/S.../CFP2006/450.soplex/450.soplex 180.00 185.00 2.8% MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR 338.00 345.00 2.1% MultiSourc...Benchmarks/7zip/7zip-benchmark 685.00 699.00 2.0% External/S...FP2017rate/544.nab_r/544.nab_r 158.00 160.00 1.3% MultiSourc...sumer-typeset/consumer-typeset 772.00 781.00 1.2% External/S...2017rate/525.x264_r/525.x264_r 410.00 414.00 1.0% External/S...23.xalancbmk_r/523.xalancbmk_r 998.00 1002.00 0.4% ``` Compile-time is almost neutral: https://llvm-compile-time-tracker.com/compare.php?from=b3125ad3d60531a97eea20009cc9629a87755862&to=84007eee59004f43464eda7f5ba8263ed5158df8&stat=instructions NewPM-O3: +0.03% NewPM-ReleaseThinLTO: -0.01% NewPM-ReleaseLTO-g: +0.03% Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D132365	2022-08-30 13:27:51 +01:00
OCHyams	84a71d5259	[DebugInfo] Fix line number attribution in mldst-motion Taking the example from the test included in this patch: $ cat test.cpp -n 1 void fun(int *a, int cond) { 2 if (cond) 3 a[1] = 1; 4 else 5 a[1] = 2; 6 } mldst-motion will merge and sink the stores in if.then and if.else into if.end. The resultant PHI, gep and store should be attributed line zero with the innermost common scope rather than picking a debug location from one of the original stores. Reviewed By: djtodoro Differential Revision: https://reviews.llvm.org/D132741	2022-08-30 10:03:53 +01:00
jacquesguan	df525c7705	[InstCombine] fold fake floating point vector extract to shift+trunc. This patch supports the FP part of D111082. Differential Revision: https://reviews.llvm.org/D125750	2022-08-30 10:12:16 +08:00
Rong Xu	d7ef0c3970	[llvm-profdata] Improve profile supplementation Current implementation promotes a non-cold function in the SampleFDO profile into a hot function in the FDO profile. This is too aggressive. This patch promotes a hot functions in the SampleFDO profile into a hot function, and a warm function in SampleFDO into a warm function in FDO. Differential Revision: https://reviews.llvm.org/D132601	2022-08-29 16:50:42 -07:00
Philip Reames	8936d86469	[LV] Add debug output for force scalar tracing [nfc] I keep finding myself needing to rule this out as a possible source of scalarization, so add debug output like we have for other instructions we decide to scalarize.	2022-08-29 15:17:51 -07:00
Valery N Dmitriev	329b972d41	[SLP] Try to match reductions before trying to vectorize a vector build sequence. This patch changes order of searching for reductions vs other vectorization possibilities. The idea is if we do not match a reduction it won't be harmful for further attempts to find vectorizable operations on a vector build sequences. But doing it in the opposite order we have good chance to ruin opportunity to match a reduction later. We also don't want to try vectorizing binary operations too early as 2-way vectorization may effectively prohibit wider ones leading to producing less effective code. Differential Revision: https://reviews.llvm.org/D132590	2022-08-29 13:32:14 -07:00
Philip Reames	033a97a8f3	[LV] Minor code restructure of isUniformAfterVectorization [nfc] Mostly just to make a future patch easier to review.	2022-08-29 12:48:27 -07:00
Philip Reames	c37b1a5f76	[RLEV] Pick a correct insert point when incoming instruction is itself a phi node This fixes https://github.com/llvm/llvm-project/issues/57336. It was exposed by a recent SCEV change, but appears to have been a long standing issue. Note that the whole insert into the loop instead of a split exit edge is slightly contrived to begin with; it's there solely because IndVarSimplify preserves the CFG. Differential Revision: https://reviews.llvm.org/D132571	2022-08-29 11:44:33 -07:00
Alexey Bataev	beacf9bd9e	[SLP]Fix PR57322: vectorize constant float stores. Stores for constant floats must be vectorized, improve analysis in SLP vectorizer for stores. Differential Revision: https://reviews.llvm.org/D132750	2022-08-29 11:02:53 -07:00
Alexey Bataev	e6345bf644	[SLP]Improve lookup of the buildvector top insertelement instruction. When estimating the cost of the in-tree vectorized scalars in buildvector sequences, need to take into account the vectorized insertelement instruction. The top of the buildvector seuences is the topmost vectorized insertelement instruction, because it will have > than 1 use after the vectorization. For the affected test case improves througput from 21 to 16 (per llvm-mca). Differential Revision: https://reviews.llvm.org/D132740	2022-08-29 08:19:52 -07:00
Sanjay Patel	6c39a3aae1	[InstCombine] fold not-shift of signbit to icmp+zext https://alive2.llvm.org/ce/z/j_8Wz9 The arithmetic shift was converted to logical shift with: `246078604c` That does not seem to uncover any other missing/conflicting folds, so convert directly to signbit test + cast. We still need to fold the pattern with logical shift to test + cast. This allows reducing patterns where the output type is not the same as the input value: https://alive2.llvm.org/ce/z/nydwFV Fixes #57394	2022-08-29 10:06:31 -04:00
Sanjay Patel	246078604c	[InstCombine] fold inc-of-signbit-splat to not+lshr (iN X s>> (N - 1)) + 1 --> (~X) u>> (N - 1) https://alive2.llvm.org/ce/z/wzS474	2022-08-29 08:48:22 -04:00
Florian Hahn	c78696813f	[LV] Remove unneeded getVectorIntrinsicIDForCall call (NFC). Suggested as independent fix during the review of D132585.	2022-08-29 10:19:47 +01:00
Kazu Hirata	2ad7fd3ac7	[Instrumentation] Use std::clamp (NFC) The use of std::clamp should be safe here. MinRZ is at most 32, while kMaxRZ is 1 << 18, so we have MinRZ <= kMaxRZ, avoiding the undefind behavior of std::clamp.	2022-08-28 23:28:57 -07:00
Kazu Hirata	c63f823875	[llvm] Use range-based for loops (NFC)	2022-08-28 17:35:04 -07:00
Sanjay Patel	ab6892967c	[InstCombine] allow sext in fold of mask using signbit, part 2 https://alive2.llvm.org/ce/z/rcbZmx Sibling tranform to `275aa24c0a` This pattern is seen in the examples in issue #57381.	2022-08-28 11:50:52 -04:00
zhongyunde	84d6966e4d	[InstCombine] Propagate the nuw for combine of add+mul As the commit of D132658, make the 'nuw' change separately. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D132777	2022-08-28 23:01:11 +08:00
Florian Hahn	af98b875e8	[VPlan] Use range check in VPHeaderPHIRecipe::classof (NFC). This addresses a suggestion to simplify the check from D131989. This also makes it easier to ensure that VPHeaderPHIRecipe::classof checks for all header phi ids.	2022-08-28 15:54:12 +01:00
Sanjay Patel	275aa24c0a	[InstCombine] allow sext in fold of mask using signbit ~(iN X s>> (N-1)) & Y --> (X s< 0) ? 0 : Y -- with optional sext https://alive2.llvm.org/ce/z/wFFnZT	2022-08-28 09:01:30 -04:00
Kazu Hirata	b18ff9c461	[Transform] Use range-based for loops (NFC)	2022-08-27 23:54:32 -07:00
Kazu Hirata	d0166c617d	[Utils] Remove redundaunt declarations (NFC) Identified with readability-redundant-declaration.	2022-08-27 23:54:31 -07:00
Kazu Hirata	d1688e9ddf	[llvm] Use std::gcd (NFC) This patch replaces calls to greatestCommonDivisor with std::gcd where both arguments are known to be of unsigned. This means that std::common_type_t of the two argument types should just be the wider one of the two.	2022-08-27 23:54:29 -07:00
Kazu Hirata	56ea4f9bd3	[Transforms] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-27 21:21:02 -07:00
Kazu Hirata	7a617fdf39	Use std::gcd (NFC) This patch replaces calls to GreatestCommonDivisor64 with std::gcd where both arguments are known to be of unsigned types no larger than 64 bits in size.	2022-08-27 21:20:59 -07:00
Florian Hahn	7743badafa	[VPlan] Verify that header only contains header phi recipes. Add verification that VPHeaderPHIRecipes are only in header VPBBs. Also adds missing checks for VPPointerInductionRecipe to VPHeaderPHIRecipe::classof. Split off from D119661. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D131989	2022-08-27 22:06:12 +01:00
Kazu Hirata	21de2888a4	Use llvm::is_contained (NFC)	2022-08-27 09:53:11 -07:00
Kazu Hirata	a33ef8f2b7	Use llvm::all_equal (NFC)	2022-08-27 09:53:10 -07:00
Sanjay Patel	7abf233f44	[InstCombine] allow poison (undef) element in vector signbit transforms If the shift constant has undefined lanes, we can assume those are the same as the defined lanes in these transforms: https://alive2.llvm.org/ce/z/t6TTJ2 Replace undef with poison in the test while here to support the transition away from undef.	2022-08-27 11:57:05 -04:00
Sanjay Patel	c6e56024c6	[InstCombine] fold signbit splat pattern that uses negate 0 - (zext (i8 X u>> 7) to iN) --> sext (i8 X s>> 7) to iN https://alive2.llvm.org/ce/z/jzv4Ud This is part of solving issue #57381.	2022-08-27 08:04:35 -04:00
Vitaly Buka	0d59969abb	[msan] Enable msan-check-constant-shadow by default Depends on D132761. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D132765	2022-08-26 16:34:47 -07:00
Vitaly Buka	134986a720	[msan] Fix handling of constant shadow If constant shadown enabled we had false reports because !isZeroValue() does not guaranty that the values is actually not zero. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D132761	2022-08-26 15:51:02 -07:00
Vitaly Buka	072a2fd738	[NFC][msan] Clang-format the file	2022-08-26 15:11:12 -07:00
Eric Gullufsen	eb1e2b3997	[InstCombine] Canonicalize "and, add", "or, add", "xor, add" Canonicalize ``` ((x + C1) & C2) --> ((x & C2) + C1) ((x + C1) ^ C2) --> ((x ^ C2) + C1) ((x + C1) \| C2) --> ((x \| C2) + C1) ``` for suitable constants `C1` and `C2`. Alive2 proofs: [[ https://alive2.llvm.org/ce/z/BqMDVZ \| add, or --> or, add ]] [[ https://alive2.llvm.org/ce/z/BhAeCl \| add, xor --> xor, add ]] [[ https://alive2.llvm.org/ce/z/jYRHEt \| add, and --> and, add ]] Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D131142	2022-08-26 17:23:29 -04:00
Paul Kirth	3155e3070c	[llvm][misexpect] Re-enable MisExpect for SampleProfiling MisExpect was occasionally crashing under SampleProfiling, due to a division by zero. We worked around that in D124302 by changing the assert to an early return. This patch is intended to add a test case for the crashing scenario and re-enable MisExpect for SampleProfiling. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D124481	2022-08-26 20:24:10 +00:00
Philip Reames	c58791c286	Revert "[InstCombine] Canonicalize "and, add", "or, add", "xor, add"" This reverts commit `d2f110c693`. test/Transforms/InstCombine/freeze.ll fails on ninja check-llvm on x86_64.	2022-08-26 11:18:31 -07:00
Philip Reames	3dcec5e29f	[LV] Consistently use vputils::isUniformAfterVectorization [mostly nfc] I'd extracted isUniform, and Florian moved isUniformAfterVectorization out of VPlan at basically the same time. Let's go ahead and merge them. For the VPTransformState::get path, a VPValue without a def (which corresponds to an external IR value outside of VPLan) is explicitly handled above the uniform check. On the scalarizeInstruction path, I'm less sure why the change isn't visible, but test cases which would seem likely to hit it were already being handled as uniform through some other mechanism. It would be correct to consider values defined outside of vplan uniform here.	2022-08-26 11:09:17 -07:00
Eric Gullufsen	d2f110c693	[InstCombine] Canonicalize "and, add", "or, add", "xor, add" Canonicalize ``` ((x + C1) & C2) --> ((x & C2) + C1) ((x + C1) ^ C2) --> ((x ^ C2) + C1) ((x + C1) \| C2) --> ((x \| C2) + C1) ``` for suitable constants `C1` and `C2`. Alive2 proofs: [[ https://alive2.llvm.org/ce/z/BqMDVZ \| add, or --> or, add ]] [[ https://alive2.llvm.org/ce/z/BhAeCl \| add, xor --> xor, add ]] [[ https://alive2.llvm.org/ce/z/jYRHEt \| add, and --> and, add ]] Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D131142	2022-08-26 14:07:43 -04:00
Sanjay Patel	4827771234	[InstCombine] fold test of equality to 0.0 with bitcast operand fcmp oeq/une (bitcast X), 0.0 --> (and X, SignMaskC) ==/!= 0 https://alive2.llvm.org/ce/z/ZKATGN	2022-08-26 13:46:11 -04:00
Florian Hahn	4e5c44964a	[VPlan] Move isUniformAfterVectorization from VPlan to vputils (NFC). This allows re-using the utility without a VPlan object. The helper also doesn't access any data from VPlan.	2022-08-26 18:26:33 +01:00
Philip Reames	2d5f025779	[LV] Extract utility for checking if VPValue is uniform [nfc]	2022-08-26 09:56:13 -07:00
Florian Hahn	ec37ecbc62	[LCSSA] Skip updating users in unreachable blocks. Don't waste time trying to update users in unreachable blocks.	2022-08-26 15:09:46 +01:00
Daniil Fukalov	9c710ebbdb	[TTI] NFC: Reduce InstructionCost::getValue() usage... in order to propagate `InstructionCost` value upper. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D103406	2022-08-26 16:37:32 +03:00
Adrian Vogelsgesang	5af06ba7dc	[Coro][Debuginfo] Add debug info to `__NoopCoro_ResumeDestroy` function With this commit, we now attach an `DISubprogram` to the LLVM-generated `_NoopCoro_ResumeDestroy` function. Thereby, lldb can show a `std::coroutine_handle` to a `std::noop_coroutine` as ``` continuation = coro frame = 0x555555560d98 { resume = 0x0000555555555c50 (a.out`__NoopCoro_ResumeDestroy) destroy = 0x0000555555555c50 (a.out`__NoopCoro_ResumeDestroy) } ``` instead of ``` continuation = coro frame = 0x555555560d98 { resume = 0x0000555555555c50 (a.out`___lldb_unnamed_symbol211) destroy = 0x0000555555555c50 (a.out`___lldb_unnamed_symbol211) } ``` I renamed the function from `NoopCoro.ResumeDestroy` to `_NoopCoro_ResumeDestroy` because: * the leading `_` makes sure this is a reserved name and should not clash with any user-provided names * the `.` was replaced by a `_`, so the name is now a valid identifier in C, making it allows me to type its name in the debugger Differential Revision: https://reviews.llvm.org/D132580	2022-08-26 05:49:52 -07:00
Matthias Gehre	3e39b27101	[llvm/CodeGen] Add ExpandLargeDivRem pass Adds a pass ExpandLargeDivRem to expand div/rem instructions with more than 128 bits into a loop computing that value. As discussed on https://reviews.llvm.org/D120327, this approach has the advantage that it is independent of the runtime library. This also helps the clang driver, which otherwise would need to understand enough about the runtime library to know whether to allow _BitInts with more than 128 bits. Targets are still free to disable this pass and instead provide a faster implementation in a runtime library. Fixes https://github.com/llvm/llvm-project/issues/44994 Differential Revision: https://reviews.llvm.org/D126644	2022-08-26 11:55:15 +01:00
Dmitry Makogon	9142f67ef2	[SimplifyCFG] Don't widen cond br if false branch has successors Fixes https://github.com/llvm/llvm-project/issues/57221. This limits the tryWidenCondBranchToCondBranch transform making it work only if the false block of widenable condition branch has no successors. If that block has successors, then SimplifyCondBranchToCondBranch may undo the transform done by tryWidenCondBranchToCondBranch, which would lead to infinite cycle of transformation and eventually an assert failing. Differential Revision: https://reviews.llvm.org/D132356	2022-08-26 15:23:37 +07:00
Chuanqi Xu	17631ac676	[Coroutines] Store the index for final suspend point if there is unwind coro end Closing https://github.com/llvm/llvm-project/issues/57339 The root cause for this issue is an pre-mature optimization to eliminate the index for the final suspend point since we feel like we can judge if a coroutine is suspended at the final suspend by if resume_fn_addr is null. However this is not true if the coroutine exists via an exception in promise.unhandled_exception(). According to [dcl.fct.def.coroutine]p14: > If the evaluation of the expression promise.unhandled_exception() > exits via an exception, the coroutine is considered suspended at the > final suspend point. But from the perspective of the implementation, we can't set the coro index to the final suspend point directly since it breaks the states. To fix the issue, we block the optimization if we find there is any unwind coro end, which indicates that it is possible that the coroutine exists via an exception from promise.unhandled_exception(). Test Plan: folly	2022-08-26 14:05:46 +08:00
Max Kazantsev	ccf788a565	[IRCE] Drop SCEV of a Phi after adding a new input. PR57335 Since SCEV learned to look through single value phis with `20d798bd47`, whenever we add a new input to a Phi, we should make sure that the old cached value is dropped. Otherwise, it may lead to various miscompiles, such as breach of dominance as shown in the bug https://github.com/llvm/llvm-project/issues/57335	2022-08-25 18:14:29 +07:00
Chenbing Zheng	adf4519c0e	[InstCombine] recognize bitreverse disguised as shufflevector This patch complete TODO left in D66965, and achieve related pattern for bitreverse. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D132431	2022-08-25 10:41:47 +08:00
Chenbing Zheng	14fae4d136	[InstCombine] Add undef elements support for shrinkFPConstantVector Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D132343	2022-08-25 10:38:48 +08:00
Valery N Dmitriev	a4c8fb9d1f	[SLP][NFC] Refactor SLPVectorizerPass::vectorizeRootInstruction method. The goal is to separate collecting items for post-processing and processing them. Post processing also outlined as dedicated method. Differential Revision: https://reviews.llvm.org/D132603	2022-08-24 17:07:53 -07:00
Sami Tolvanen	cff5bef948	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Relands `67504c9549` with a fix for 32-bit builds. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 22:41:38 +00:00
Sanjay Patel	7c2f93c04a	[InstCombine] use isa instead of dyn_cast for unused value; NFC	2022-08-24 17:58:20 -04:00
Cameron McInally	38d58c1b37	[GlobalOpt] Bail out of GlobalOpt SROA if a Scalable Vector is seen The SROA algorithm won't work for Scalable Vectors, since we don't know how many bytes are loaded/stored. Bail out if a Scalable Vector is seen. Differential Revision: https://reviews.llvm.org/D132417	2022-08-24 13:17:59 -07:00
Sanjay Patel	f7ab70cf8d	[InstCombine] reduce disguised mul+add factorization ~(A * C1) + A --> (A * (1 - C1)) - 1 This is a non-obvious mix of bitwise logic and math: https://alive2.llvm.org/ce/z/U7ACVT The pattern may be produced by Negator from the more typical code seen in issue #57255.	2022-08-24 16:02:12 -04:00
Sami Tolvanen	a79060e275	Revert "KCFI sanitizer" This reverts commit `67504c9549` as using PointerEmbeddedInt to store 32 bits breaks 32-bit arm builds.	2022-08-24 19:30:13 +00:00
Sami Tolvanen	67504c9549	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 18:52:42 +00:00
Philip Reames	23245a914b	[LV] Simplify code given isPredicatedInst doesn't dependent on VF any more [nfc]	2022-08-24 11:42:10 -07:00
Philip Reames	3ab00cfca9	[LV] Adjust code added in `f79214d1` for `531dd3634` [nfc] When rebasing the review which became `f79214d1`, I forgot to adjust for the changed semantics introduced by `531dd3634`. Functionally, this had no impact, but semantically it resulted in an incorrect result for isPredicatedInst. I noticed this while doing a follow up change.	2022-08-24 10:38:17 -07:00
Philip Reames	f79214d1e1	[LV] Support predicated div/rem operations via safe-divisor select idiom This patch adds support for vectorizing conditionally executed div/rem operations via a variant of widening. The existing support for predicated divrem in the vectorizer requires scalarization which we can't do for scalable vectors. The basic idea is that we can always divide (take remainder) by 1 without executing UB. As such, we can use the active lane mask to conditional select either the actual divisor for active lanes, or a constant one for inactive lanes. We already account for the cost of the active lane mask, so the only additional cost is a splat of one and the vector select. This is one of several possible approaches to this problem; see the review thread for discussion on some of the others. This one was chosen mostly because it was straight forward, and none of the others seemed oviously better. I enabled the new code only for scalable vectors. We could also legally enable it for fixed vectors as well, but I haven't thought through the cost tradeoffs between widening and scalarization enough to know if that's profitable. This will be explored in future patches. Differential Revision: https://reviews.llvm.org/D130164	2022-08-24 10:07:59 -07:00
Florian Hahn	689895f432	[VPlan] Remove unneeded `struct` prefix for VPTransformState args (NFC).	2022-08-24 17:58:08 +01:00
spupyrev	8d5b694da1	extending code layout alg The diff modifies ext-tsp code layout algorithm in the following ways: (i) fixes merging of cold block chains (this is a port of D129397); (ii) adjusts the cost model utilized for optimization; (iii) adjusts some APIs so that the implementation can be used in BOLT; this is a prerequisite for D129895. The only non-trivial change is (ii). Here we introduce different weights for conditional and unconditional branches in the cost model. Based on the new model it is slightly more important to increase the number of "fall-through unconditional" jumps, which makes sense, as placing two blocks with an unconditional jump next to each other reduces the number of jump instructions in the generated code. Experimentally, this makes a mild impact on the performance; I've seen up to 0.2%-0.3% perf win on some benchmarks. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D129893	2022-08-24 09:40:25 -07:00
Sanjay Patel	0cfc651032	[InstCombine] ease use constraint in tryFactorization() The stronger one-use checks prevented transforms like this: (x * y) + x --> x * (y + 1) (x * y) - x --> x * (y - 1) https://alive2.llvm.org/ce/z/eMhvQa This is one of the IR transforms suggested in issue #57255. This should be better in IR because it removes a use of a variable operand (we already fold the case with a constant multiply operand). The backend should be able to re-distribute the multiply if that's better for the target. Differential Revision: https://reviews.llvm.org/D132412	2022-08-24 12:10:54 -04:00
Simon Pilgrim	2f217c1214	[InstCombine] Canonicalize ((X & -X) - 1) --> ((X - 1) & ~X) (PR51784) Enables the ctpop((x & -x ) - 1) -> cttz(x, false) fold Alive2: https://alive2.llvm.org/ce/z/EDk4h7 (((X & -X) - 1) --> (~X & (X - 1)) ) Alive2: https://alive2.llvm.org/ce/z/8Yr3XG (CTPOP -> CTTZ) Fixes #51126 Differential Revision: https://reviews.llvm.org/D110488	2022-08-24 16:50:43 +01:00
Sanjay Patel	4391351463	[InstCombine] improve readability in tryFactorization(); NFC Added/removed braces, reduced indents, and renamed a variable.	2022-08-24 11:31:18 -04:00
Simon Pilgrim	80cc8f0f62	Revert rGc360955c4804e9b25017372cb4c6be7adcb216ce "[InstCombine] Canonicalize ((X & -X) - 1) --> (~X & (X - 1)) (PR51784)" The test changes are failing on some buildbots (but not others.....).	2022-08-24 16:26:28 +01:00
Simon Pilgrim	c360955c48	[InstCombine] Canonicalize ((X & -X) - 1) --> (~X & (X - 1)) (PR51784) Enables the ctpop((x & -x ) - 1) -> cttz(x, false) fold Alive2: https://alive2.llvm.org/ce/z/EDk4h7 (((X & -X) - 1) --> (~X & (X - 1)) ) Alive2: https://alive2.llvm.org/ce/z/8Yr3XG (CTPOP -> CTTZ) Fixes #51126 Differential Revision: https://reviews.llvm.org/D110488	2022-08-24 15:31:15 +01:00
David Green	8d830f8d68	[LV] Replace fixed-order cost model with a SK_Splice shuffle The existing cost model for fixed-order recurrences models the phi as an extract shuffle of a v1 vector. The shuffle produced should be a splice, as they take two vectors inputs are extracting from a subset of the lanes. On certain architectures the existing cost model can drastically under-estimate the correct cost for the shuffle, so this changes it to a SK_Splice and passes a correct Mask through to the getShuffleCost call. I believe this might be the first use of a SK_Splice shuffle cost model outside of scalable vectors, and some targets may require additions to the cost-model to correctly account for them. In tree targets appear to all have been updated where needed. Differential Revision: https://reviews.llvm.org/D132308	2022-08-24 13:00:32 +01:00
Keno Fischer	30d7d74d5c	[MSAN] Handle array alloca with non-i64 size specification The array size specification of the an alloca can be any integer, so zext or trunc it to intptr before attempting to multiply it with an intptr constant. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D131846	2022-08-24 03:24:21 +00:00
Keno Fischer	5739d29cde	[MSAN] Correct shadow type for atomicrmw instrumentation We were passing the type of `Val` to `getShadowOriginPtr`, rather than the type of `Val`'s shadow resulting in broken IR. The fix is simple. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D131845	2022-08-24 03:24:19 +00:00
Chris Bieneman	9616905744	[NFC] Fix warning This change came in a few hours ago and introduced a warning. The fix is trivial, so I'm providing it. The original change was reviewed here: https://reviews.llvm.org/D132331	2022-08-23 20:50:37 -05:00
Philip Reames	49547b2241	[slp] Pull out a getOperandInfo variant helper [nfc]	2022-08-23 13:46:05 -07:00
Alvin Wong	c0214db51a	[llvm] Mark CFGuard fn ptr symbol as DSO local and add tests for mingw For mingw target, if a symbol is not marked DSO local, a `.refptr` is generated for it. This makes CFG check calls use an extra pointer dereference, which adds extra overhead compared to the MSVC version, so mark the CFG guard check funciton pointer DSO local to stop it. This should have no effect on MSVC target. Also adapt the existing cfguard tests to run for mingw targets, so that this change is checked. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D132331	2022-08-23 23:39:39 +03:00
Jakub Kuderski	6fa87ec10f	[ADT] Deprecate is_splat and replace all uses with all_equal See the discussion thread for more details: https://discourse.llvm.org/t/adt-is-splat-and-empty-ranges/64692 Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D132335	2022-08-23 11:36:27 -04:00
Florian Hahn	ff34432649	[LoopUtils] Remove unused Loop arg from addDiffRuntimeChecks (NFC). The argument is no longer used, remove it.	2022-08-23 10:15:28 +01:00
Andrew Browne	065d2e1d8b	[DFSan] Fix handling of libAtomic external functions. Implementation based on MSan. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D132070	2022-08-22 16:04:29 -07:00
Jay Foad	f82c55fa08	[InstCombine] Change order of canonicalization of ADD and AND Canonicalize ((x + C1) & C2) --> ((x & C2) + C1) for suitable constants C1 and C2, instead of the other way round. This should allow more constant ADDs to be matched as part of addressing modes for loads and stores. Differential Revision: https://reviews.llvm.org/D130080	2022-08-22 20:03:53 +01:00
Jay Foad	2754ff883d	[InstCombine] Try not to demand low order bits for Add Don't demand low order bits from the LHS of an Add if: - they are not demanded in the result, and - they are known to be zero in the RHS, so they can't possibly overflow and affect higher bit positions This is intended to avoid a regression from a future patch to change the order of canonicalization of ADD and AND. Differential Revision: https://reviews.llvm.org/D130075	2022-08-22 20:03:53 +01:00
Philip Reames	27d3321c4f	[TTI] Use OperandValueInfo in getMemoryOpCost client api [nfc] This removes the last use of OperandValueKind from the client side API, and (once this is fully plumbed through TTI implementation) allow use of the same properties in store costing as arithmetic costing.	2022-08-22 11:26:31 -07:00
Philip Reames	274f86e7a6	[TTI] Remove OperandValueKind/Properties from getArithmeticInstrCost interface [nfc] This completes the client side transition to the OperandValueInfo version of this routine. Backend TTI implementations still use the prior versions for now.	2022-08-22 11:06:32 -07:00
Philip Reames	c42a5f1cc2	[TTI] Migrate getOperandInfo to OperandVaueInfo [nfc] This is part of merging OperandValueKind and OperandValueProperties.	2022-08-22 10:19:02 -07:00
Philip Reames	5cd427106d	[TTI] Start process of merging OperandValueKind and OperandValueProperties [nfc] OperandValueKind and OperandValueProperties both provide facts about the operands of an instruction for purposes of cost modeling. We've discussed merging them several times; before I plumb through more flags, let's go ahead and do so. This change only adds the client side interface for getArithmeticInstrCost and makes a couple of minor changes in client code to prove that it works. Target TTI implementations still use the split flags. I'm deliberately splitting what could be one big change into a series of smaller ones so that I can lean on the compiler to catch errors along the way.	2022-08-22 09:48:15 -07:00
Max Kazantsev	e587199a50	[SCEV] Prove condition invariance via context, try 2 Initial implementation had too weak requirements to positive/negative range crossings. Not crossing zero with nuw is not enough for two reasons: - If ArLHS has negative step, it may turn from positive to negative without crossing 0 boundary from left to right (and crossing right to left doesn't count for unsigned); - If ArLHS crosses SINT_MAX boundary, it still turns from positive to negative; In fact we require that ArLHS always stays non-negative or negative, which an be enforced by the following set of preconditions: - both nuw and nsw; - positive step (looks liftable); Because of positive step, boundary crossing is only possible from left part to the right part. And because of no-wrap flags, it is guaranteed to never happen.	2022-08-22 14:31:19 +07:00
Ting Wang	d2d77e050b	[PowerPC][Coroutines] Add tail-call check with call information for coroutines Fixes #56679. Reviewed By: ChuanqiXu, shchenz Differential Revision: https://reviews.llvm.org/D131953	2022-08-21 22:20:40 -04:00
Sanjay Patel	15e3d86911	[InstCombine] reassociate bitwise logic chains based on uses (X op Y) op Z --> (Y op Z) op X This isn't a complete solution (see TODO tests for possible refinements), but it shows some nice wins and doesn't seem to cause any harm. I think the most potential danger is from conflicting with other folds and causing an infinite loop - that's the reason for avoiding patterns with constant operands. Alternatively, we could try this in the reassociate pass, but we would not immediately see all of the logic folds that instcombine provides. I also looked at improving ValueTracking's isImpliedCondition() (and we should still add some enhancements there), but that would not work in general for bitwise logic reduction. The tests that reduce completely to 0/-1 are motivated by issue #56653. Differential Revision: https://reviews.llvm.org/D131356	2022-08-21 09:42:14 -04:00
Simon Pilgrim	5263155d5b	[CostModel] Add CostKind argument to getShuffleCost Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future. Differential Revision: https://reviews.llvm.org/D132287	2022-08-21 10:54:51 +01:00
Kazu Hirata	8b1b0d1d81	Revert "Use std::is_same_v instead of std::is_same (NFC)" This reverts commit `c5da37e42d`. This patch seems to break builds with some versions of MSVC.	2022-08-20 23:00:39 -07:00
Kazu Hirata	c5da37e42d	Use std::is_same_v instead of std::is_same (NFC)	2022-08-20 22:36:26 -07:00
Kazu Hirata	ec5eab7e87	Use range-based for loops (NFC)	2022-08-20 21:18:32 -07:00
Kazu Hirata	258531b7ac	Remove redundant initialization of Optional (NFC)	2022-08-20 21:18:28 -07:00
Kazu Hirata	6b1bc80188	[Scalar] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-20 21:18:25 -07:00
Philip Reames	b0a2c48e9f	[tti] Consolidate getOperandInfo without OperandValueProperties copies [nfc]	2022-08-19 16:22:22 -07:00
Alexey Bataev	c167028684	[SLP]Delay vectorization of postponable values for instructions with no users. SLP vectorizer tries to find the reductions starting the operands of the instructions with no-users/void returns/etc. But such operands can be postponable instructions, like Cmp, InsertElement or InsertValue. Such operands still must be postponed, vectorizer should not try to vectorize them immediately. Differential Revision: https://reviews.llvm.org/D131965	2022-08-19 08:39:16 -07:00
Alexey Bataev	0e7ed32c71	[SLP]Cost for a constant buildvector. In many cases constant buildvector results in a vector load from a constant/data pool. Need to consider this cost too. Differential Revision: https://reviews.llvm.org/D126885	2022-08-19 08:02:42 -07:00
Alexey Bataev	d53e245951	[COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC. Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to better estimate cost with immediate values. Part of D126885.	2022-08-19 07:33:00 -07:00
Max Kazantsev	f798c042f4	Revert "[SCEV] Prove condition invariance via context" This reverts commit `a3d1fb3b59`. Reverting until investigation of https://github.com/llvm/llvm-project/issues/57247 has concluded.	2022-08-19 21:02:06 +07:00
Caroline Concatto	09afe4155b	[InstCombine] For vector extract when extract vector and insert value type is the same This patch has implements these optimizations: extract.vector(insert.vector(Vector, Value, Idx), Idx) --> Value extract.vector(insert.vector(Vector, Value, InsertIndex), ExtractIndex) --> extract.vector(Vector, ExtractIndex) Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D132137	2022-08-19 12:13:03 +01:00
Sanjay Patel	b066195b3f	[InstCombine] fold bitwise logic or+or+xor+not (~A \| C) \| (A ^ B) --> ~(A & B) \| C https://alive2.llvm.org/ce/z/Qw3aiJ This extends the existing fold (just above the new match) to peek through another 'or' instruction. This should let the motivating case from issue #57174 simplify completely.	2022-08-18 17:14:41 -04:00
Joe Loser	f3a55a1ddf	[llvm] Remove std::clamp equivalent in `Transforms/Utils/MisExpect.cpp` Use `std::clamp` directly from the standard library now that LLVM is built with C++17 standards mode. Differential Revision: https://reviews.llvm.org/D131869	2022-08-18 15:11:25 -06:00
Florian Hahn	b8709a9d03	[LV] Support fixed order recurrences. If the incoming previous value of a fixed-order recurrence is a phi in the header, go through incoming values from the latch until we find a non-phi value. Use this as the new Previous, all uses in the header will be dominated by the original phi, but need to be moved after the non-phi previous value. At the moment, fixed-order recurrences are modeled as a chain of first-order recurrences. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D119661	2022-08-18 19:15:52 +01:00
Philip Reames	1436adae2c	[LV-L] Add const and move method body out of line [nfc]	2022-08-18 11:10:19 -07:00
Philip Reames	c064d3f139	[LV] Use early continue to simplify code [nfc]	2022-08-18 10:31:55 -07:00
Danila Malyutin	4a9ff289fb	[InstCombine] Fix freeze instruction getting inserted before landingpad The code would use first non-phi instruction as an insertion point, however this could lead to freeze getting inserted between phi and landingpad causing a verifier assert. Differential Revision: https://reviews.llvm.org/D132105	2022-08-18 17:43:42 +03:00
Philip Reames	531dd3634d	[LV] Restructure isPredicatedInst and isScalarWithPredication (w/a fix for uniform mem ops) This change reorganizes the code and comments to make the expected semantics of these routines more clear. However, this is not an NFC change. The functional change is having isScalarWithPredication return false if the instruction does not need predicated. Specifically, for the case of a uniform memory operation we were previously considering it not to be a predicated instruction, but were considering it to be scalable with predication. As can be seen with the test changes, this causes uniform memory ops which should have been lowered as uniform-per-parts values to instead be lowering via naive scalarization or if scalarization is infeasible (i.e. scalable vectors) aborted entirely. I also don't trust the code to bail out correctly 100% of the time, so it's possible we had a crash or miscompile from trying to scalarize something which isn't scalaralizable. I haven't found a concrete example here, but I am suspicious. Differential Revision: https://reviews.llvm.org/D131093	2022-08-18 07:14:04 -07:00
Simon Pilgrim	fdec50182d	[CostModel] Replace getUserCost with getInstructionCost * Replace getUserCost with getInstructionCost, covering all cost kinds. * Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks. Original Patch by @samparker (Sam Parker) Differential Revision: https://reviews.llvm.org/D79483	2022-08-18 11:55:23 +01:00
Simon Pilgrim	e48892ee42	[Transforms] LICM.cpp - pull out repeated getUserCost call Pulled out of D79483	2022-08-18 10:43:29 +01:00
Konstantina	5bc8791187	[NewGVN][PHIOFOPS] Bail out if an operand is in OpSafeForPHIOfOps but it is not safe for the current basic block. NewGVN tables are not cleared out between the initial run of NewGVN and the verification. In case of phi-of-ops optimization, OpSafeForPHIOfOps goes out of sync between the two runs. One operand might not be safe for one basic block, but it might be safe for one of its successors. In this case, the operand will be added in OpSafeForPHIOfOps map. In verification phase, we reuse OpSafeForPHIOfOps without updating it again. As a result, the operand will be considered safe for phi-of-ops optimization even for the case that it is not. This patch fixes this problem. Fix for 53807. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D130910	2022-08-17 18:57:46 -07:00
Paul Kirth	656c5d652c	[clang][llvm][NFC] Change misexpect's tolerance option to be 32-bit In D131869 we noticed that we jump through some hoops because we parse the tolerance option used in MisExpect.cpp into a 64-bit integer. This is unnecessary, since the value can only be in the range [0, 100). This patch changes the underlying type to be 32-bit from where it is parsed in Clang through to it's use in LLVM. Reviewed By: jloser Differential Revision: https://reviews.llvm.org/D131935	2022-08-17 14:38:53 +00:00
Ellis Hoag	6f61594d8c	[InstrProf] Add option to avoid instrumenting small functions If a function only has a few instructions, instrumentation can significantly increase the size and performance overhead of that function. Add the `-pgo-function-size-threshold` option to select a size threshold so these small functions are not instrumented. A similar option `-fxray-instruction-threshold=<N>` is used for XRay to reduce binary size overhead [1]. [1] https://www.llvm.org/docs/XRay.html Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D131816	2022-08-17 06:47:15 -07:00
Simon Pilgrim	594c5b1a42	[SLP] Update TODO comment about shuffle mask decoding This is handled in ShuffleVectorInst/getShuffleCost - getInstructionThroughput is (slowly) being removed.	2022-08-17 11:41:46 +01:00
Zain Jaffal	f61f99a105	[instcombine] Optimise for zero initialisation of product given fast flags are enabled Currently, clang ignores the 0 initialisation in finite math For example: ``` double f_prod = 0; double arr[1000]; for (size_t i = 0; i < 1000; i++) { f_prod *= arr[i]; } ``` Clang will ignore that `f_prod` is set to zero and it will generate assembly to iterate over the loop. Reviewed By: fhahn, spatel Differential Revision: https://reviews.llvm.org/D131672	2022-08-17 11:12:15 +01:00
Martin Sebor	a7a1be11e6	[InstCombine] convert second std::min argument to same type as first Ensure both arguments to std::min have the same type in all data models.	2022-08-16 17:34:33 -06:00
Martin Sebor	345514e991	[InstCombine] Add support for strlcpy folding Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D130666	2022-08-16 16:43:40 -06:00
Martin Sebor	e858f5120d	[InstCombine] Remove assumptions about int having 32 bits Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D131731	2022-08-16 15:35:08 -06:00
Sanjay Patel	ce081776b2	[FlattenCFG] avoid crash on malformed code We don't have a dominator tree in this pass, so we can't bail out sooner by checking for unreachable code, but this is a minimal fix for the example in issue #56875.	2022-08-16 15:11:00 -04:00
Danila Malyutin	451497a030	[RS4GC] Handle vectors of pointers in non-live clobbering Fix crash when trying to unconditionally cast alloca type to PointerType Differential Revision: https://reviews.llvm.org/D131146	2022-08-16 17:47:30 +03:00
Alexey Bataev	65c7cecb13	[SLP]Fix PR51320: Try to vectorize single store operands. Currently, we try to vectorize values, feeding into stores, only if slp-vectorize-hor-store option is provided. We can safely enable vectorization of the value operand of a single store in the basic block, if the operand value is used only in store. It should enable extra vectorization and should not increase compile time significantly. Fixes https://github.com/llvm/llvm-project/issues/51320 Differential Revision: https://reviews.llvm.org/D131894	2022-08-16 07:25:21 -07:00
Kevin P. Neal	7f768371a1	Fix build error: [FPEnv][EarlyCSE] Support for CSE when exception behavior is "ignore" or "maytrap" and the rounding mode is known. This should fix these build bot errors: Step 6 (build-check-mlir-build-only) failure: build (failure) C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(124): error C2220: the following warning is treated as an error C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(124): warning C4996: 'llvm::Optional<llvm::fp::ExceptionBehavior>::getValue': Use value instead. C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(129): warning C4996: 'llvm::Optional<llvm::RoundingMode>::getValue': Use value instead. C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(1386): warning C4996: 'llvm::Optional<llvm::fp::ExceptionBehavior>::getValue': Use value instead. C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(1388): warning C4996: 'llvm::Optional<llvm::RoundingMode>::getValue': Use value instead.	2022-08-16 08:47:36 -04:00
Kevin P. Neal	05ac82de40	[FPEnv][EarlyCSE] Support for CSE when exception behavior is "ignore" or "maytrap" and the rounding mode is known. Previously we would only CSE constrained FP intrinsics in the default floating point environment. Exception behavior of "strict" is still not allowed since we are not allowed to remove any traps in that case. There are no restrictions on CSE across function calls inside a function. Differential Revision: https://reviews.llvm.org/D112256	2022-08-16 08:31:42 -04:00
Martin Sebor	65967708d2	[InstCombine] Adjust snprintf folding of constant strings (PR #56598 ) Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D130494	2022-08-15 15:59:21 -06:00
Arthur Eubanks	633f5663c3	[LegacyPM] Remove ThinLTO bitcode writer legacy pass Using the legacy PM for the optimization pipeline is deprecated and in the process of being removed. This is a small step in that direction. For an example of migrating to the new PM: `853b57fe80`	2022-08-15 14:21:16 -07:00
Philip Reames	e792a353b5	[slp] adjust debug output to include final computed cost	2022-08-15 13:51:39 -07:00
Jameson Nash	3a8d7fe201	[SimplifyCFG] teach simplifycfg not to introduce ptrtoint for NI pointers SimplifyCFG expects to be able to cast both sides to an int, if either side can be case to an int, but this is not desirable or legal, in general, per D104547. Spotted in https://github.com/JuliaLang/julia/issues/45702 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D128670	2022-08-15 15:11:48 -04:00
Alexey Bataev	2819126d0c	[SLP][NFC]Replace multiple isa calls with single one where possible, NFC.	2022-08-15 11:56:58 -07:00
Sanjay Patel	e5748c6e73	[InstCombine] reduce sub-with-overflow ==/!= 0 The basic patterns look like this: https://alive2.llvm.org/ce/z/MDj9EC The tests have a use of the overflow value too. Otherwise, existing folds should reduce already. This was noted as a missing IR fold in: `926e7312b2` Hopefully, this makes it easier to implement a backend fix because we should get the same IR regardless of whether the source used builtins or inline code.	2022-08-15 13:03:51 -04:00
Nuno Lopes	0299ebc1bd	InstCombine: use poison instead of undef as placeholder in insertvalue [NFC] These vectors are fully initialized so the placeholder value is irrelevant	2022-08-14 21:37:23 +01:00

1 2 3 4 5 ...

31465 Commits