llvm-project

Commit Graph

Author	SHA1	Message	Date
Arthur Eubanks	c911befaec	[InstCombine] Treat passing undef to noundef params as UB Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D133036	2022-09-01 15:16:45 -07:00
Rong Xu	0caa4a9559	[PGO] Support PGO annotation of CallBrInst We currently instrument CallBrInst but do not annotate it with the branch weight. This patch enables PGO annotation of CallBrInst. Differential Revision: https://reviews.llvm.org/D133040	2022-09-01 14:13:50 -07:00
Vitaly Buka	ef0f866718	[msan] Combine shadow check of the same instruction Reduces .text size by 1% on our large binary. On CTMark (-O2 -fsanitize=memory -fsanitize-memory-use-after-dtor -fsanitize-memory-param-retval) Size -0.4% Time -0.8% Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133071	2022-09-01 13:55:59 -07:00
Vitaly Buka	9110673062	[nfc][msan] Group checks per instruction It's a preparation of to combine shadow checks of the same instruction Reviewed By: kda, kstoimenov Differential Revision: https://reviews.llvm.org/D133065	2022-09-01 13:10:16 -07:00
Jordan Rupprecht	3031a250de	[MSan] Fix determinism issue when using msan-track-origins. When instrumenting `alloca`s, we use a `SmallSet` (i.e. `SmallPtrSet`). When there are fewer elements than the `SmallSet` size, it behaves like a vector, offering stable iteration order. Once we have too many `alloca`s to instrument, the iteration order becomes unstable. This manifests as non-deterministic builds because of the global constant we create while instrumenting the alloca. The test added is a simple IR file, but was discovered while building `libcxx/src/filesystem/operations.cpp` from libc++. A reduced C++ example from that: ``` // clang++ -fsanitize=memory -fsanitize-memory-track-origins \ // -fno-discard-value-names -S -emit-llvm \ // -c op.cpp -o op.ll struct Foo { ~Foo(); }; bool func1(Foo); void func2(Foo); void func3(int) { int f_st, t_st; Foo f, t; func1(f) \|\| func1(f) \|\| func1(t) \|\| func1(f) && func1(t); func2(f); } ``` Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133034	2022-09-01 09:15:57 -07:00
Nuno Lopes	858fe8664e	Expand Div/Rem: consider the case where the dividend is zero So we can't use ctlz in poison-producing mode	2022-09-01 17:04:26 +01:00
Nikita Popov	f5c178b6a4	[LICM] Remove unnecessary condition (NFC)	2022-09-01 15:42:35 +02:00
Nikita Popov	315aef667e	[LICM] Fix thread safety checks for promotion of byval args This code was relying on a very subtle contract: The expectation was that for non-allocas, the unwind safety check would already perform a capture check, so we don't need to perform it later. This held true when this unwind safety was only handled for allocas and noalias calls, but became incorrect when byval support was added. To avoid this kind of issue, just remove the dependency between the unwind and thread-safety checks entirely. At worst, this means we perform a redundant capture check. If this should turn out to be problematic for compile-time, we can cache that query in a more explicit way.	2022-09-01 15:33:46 +02:00
Sanjay Patel	c3d1504d63	[InstCombine] fix crash on type mismatch with fcmp fold The existing predicate doesn't work for a single-element vector, so make sure we are not crossing scalar/vector types. Test (was crashing) based on the post-commit example for: `4827771234`	2022-09-01 08:57:55 -04:00
Sanjay Patel	addbdac5d5	[InstCombine] fold power-of-2 ctlz/cttz with inverted result When X is a power-of-two or zero and zero input is poison: ctlz(i32 X) ^ 31 --> cttz(X) cttz(i32 X) ^ 31 --> ctlz(X) https://alive2.llvm.org/ce/z/Cs7sFE	2022-09-01 08:57:55 -04:00
Nikita Popov	3f8b1d0f15	[LICM] Add some debug output to scalar promotion (NFC)	2022-09-01 14:46:30 +02:00
Alexey Bataev	982d9ef1c1	[SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison. Need either follow the original order of the operands for bool logical ops, or emit freeze instruction to avoid poison propagation. Differential Revision: https://reviews.llvm.org/D126877	2022-09-01 05:34:45 -07:00
Yuanbo Li	ebd0249fcf	[DebugInfo] Missing debug location after replacement in processSRem function This patch fixes an issue in which CorrelatedValuePropagation::processSRem would create new instructions to represent the SRem instruction, but would not correctly copy any existing debug location metadata to the new instruction. Differential Revision: https://reviews.llvm.org/D132218	2022-09-01 13:18:17 +01:00
Florian Hahn	fc444ddc77	[VPlan] Add field to track if intrinsic should be used for call. (NFC) This patch moves the cost-based decision whether to use an intrinsic or library call to the point where the recipe is created. This untangles code-gen from the cost model and also avoids doing some extra work as the information is already computed at construction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132585	2022-09-01 13:14:40 +01:00
Nuno Lopes	fa154a9170	Revert "Expand Div/Rem: consider the case where the dividend is zero" This reverts commit `4aed09868b`.	2022-09-01 12:11:22 +01:00
Nuno Lopes	4aed09868b	Expand Div/Rem: consider the case where the dividend is zero So we can't use ctlz in poison-producing mode	2022-09-01 12:00:03 +01:00
Pavel Samolysov	527b9a9d90	[DeadArgElim] Use structure bindings in foreach loops. NFC Differential Revision: https://reviews.llvm.org/D133026	2022-09-01 13:48:46 +03:00
Nikita Popov	43e7d9af1d	[InstCombine] Fold extractvalue of phi Just as we do for most other operations, we should push extractvalue instructions through phis, if this does not increase unfolded instruction count.	2022-09-01 10:51:54 +02:00
Arthur Eubanks	04f3c20989	[NFC][LICM] Stop passing around unused BFI Uses of this were removed in `1a25d0bfbb`.	2022-08-31 19:15:34 -07:00
Vitaly Buka	53d1ae88f8	[nfc][msan] Prepare the code for check sorting	2022-08-31 15:36:49 -07:00
Nikita Popov	ab6876a40d	reland: [Local] Allow creating callbr with duplicate successors Since D129288, callbr is allowed to have duplicate successors. This patch removes a limitation which prevents optimizations from actually producing such callbrs. This is probably the riskiest of all the recent callbr changes, because code with incorrect assumptions might be lurking somewhere. I fixed the one case I encountered ahead of time in `8201e3ef5c`. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D129997 Originally landed as commit `08860f525a` ("[Local] Allow creating callbr with duplicate successors") Reverted in commit `1cf6b93df1` ("Revert "[Local] Allow creating callbr with duplicate successors"")	2022-08-31 13:23:00 -07:00
Alexey Bataev	588115c117	[SLP][NFC]Add a check for SelectInst to match description, NFC.	2022-08-31 13:04:21 -07:00
Alexey Bataev	d8d9ee10bb	[SLP][NFC]Fix comment and make function following naming standard, NFC.	2022-08-31 12:37:55 -07:00
Philip Reames	8524622bdc	[SLP] Simplify getOperandInfo implementation and be consistent This is NOT nfc. Specifically, the following behavior changes: * Pointers are now allowed. Both uniform, and constants. * FP uniform non-constants can now be recognized. * FP undefs are no longer considered constant. This matches int behavior which we had tests for. FP behavior was untested. Its not clear to me int behavior is reasonable, but it's what tests seem to expect, so go with minimum impact for now.	2022-08-31 12:24:05 -07:00
Nikita Popov	ad66bc42b0	[InstCombine] Use getInsertionPointAfterDef() in freeze fold This simplifies the code and fixes handling of catchswitch, in which case we have no insertion point for the freeze. Originally part of D129660.	2022-08-31 11:32:57 +02:00
Nikita Popov	8f3fd26b74	[Reassociate] Use getInsertionPointerAfterDef() This simplifies the code and fixes handling for the callbr case, where the instruction needs to be inserted in the normal destination, rather than after the terminator. Originally part of D129660.	2022-08-31 11:10:24 +02:00
Nikita Popov	972840aa3b	[IR] Add Instruction::getInsertionPointAfterDef() Transforms occasionally want to insert an instruction directly after the definition point of a value. This involves quite a few different edge cases, e.g. for phi nodes the next insertion point is not the next instruction, and for invokes and callbrs its not even in the same block. Additionally, the insertion point may not exist at all if catchswitch is involved. This adds a general Instruction::getInsertionPointAfterDef() API to implement the necessary logic. For now it is used in two places where this should be mostly NFC. I will follow up with additional uses where this fixes specific bugs in the existing implementations. Differential Revision: https://reviews.llvm.org/D129660	2022-08-31 10:50:10 +02:00
Fangrui Song	13f0795425	[SLPVectorizer] Fix -Wunused-lambda-capture in -DLLVM_ENABLE_ASSERTIONS=off build	2022-08-30 23:01:22 -07:00
Chenbing Zheng	35a3048c25	[InstCombine] add support for multi-use Y of (X op Y) op Z --> (Y op Z) op X For (X op Y) op Z --> (Y op Z) op X we can still do transform when Y is multi-use. In D131356 limit it to one-use, this patch remove this limit. This is still not a complete solution, I add a todo test to show it. In this case, X and Y are both multi use, we can't differentiate how to convert based on this. But at least we don't make the code worse，and it can solve half the scenarios.	2022-08-31 10:55:05 +08:00
Alexey Bataev	ec06df9459	[SLP]Fix PR57447: Assertion `!getTreeEntry(V) && "Scalar already in tree!"' failed. The pointer operands for the ScatterVectorize node may contain non-instruction values and they are not checked for "already being vectorized". Need to check that such pointers are already vectorized and gather them instead of trying to build vectorize node to avoid compiler crash. Differential Revision: https://reviews.llvm.org/D132949	2022-08-30 12:30:14 -07:00
Sanjay Patel	8a19842c0e	[InstCombine] delete redundant folds; NFC InstSimplify does this via isKnownNonEqual(), so it's already using knownbits on these patterns and trying other folds.	2022-08-30 14:21:29 -04:00
Alexey Bataev	afbf5466ba	[SLP]Improve operands kind analaysis for constants. Removed EnableFP parameter in getOperandInfo function since it is not needed, the operands kinds also controlled by the operation code, which allows to remove extra check for the type of the operands. Also, added analysis for uniform constant float values. This change currently does not trigger any changes in the code since TTI does not do analysis for constant floats, so it can be considered NFC. Tested with llvm-test-suite + SPEC2017, no changes. Differential Revision: https://reviews.llvm.org/D132886	2022-08-30 06:35:39 -07:00
zhongyunde	23a5de4294	[InstCombine] Distributive or+mul with const operand We aleady support the transform: `(X+C1)CI -> XCI+C1CI` Here the case is a little special as the form of `(X+C1)CI` is transformed into `(X\|C1)CI`, so we should also support the transform: `(X\|C1)CI -> XCI+C1CI` Fixes https://github.com/llvm/llvm-project/issues/57278 Reviewed By: bcl5980, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D132658	2022-08-30 20:36:52 +08:00
Florian Hahn	b5e208fcba	[DSE] Support looking through memory phis at end of function. Update isWriteAtEndOfFunction to look through MemoryPhis. The reason MemoryPhis were skipped so far was the known AliasAnalysis issue with it missing loop-carried dependences. This problem is already addressed in other parts of the code by skipping MemoryDefs that may be in difference loops. I think the same logic can be applied here. This can have a substantial impact on the number of stores removed in some cases. For MultiSource/SPEC2006/SPEC2017 with -O3: ``` Metric: dse.NumFastStores Program dse.NumFastStores base patch diff External/S...CINT2017rate/557.xz_r/557.xz_r 14.00 45.00 221.4% External/S...te/538.imagick_r/538.imagick_r 439.00 1267.00 188.6% MultiSourc...e/Applications/SIBsim4/SIBsim4 6.00 15.00 150.0% MultiSourc...Prolangs-C/simulator/simulator 3.00 7.00 133.3% MultiSource/Applications/siod/siod 3.00 7.00 133.3% MultiSourc...arks/FreeBench/distray/distray 6.00 9.00 50.0% MultiSourc...e/Applications/obsequi/Obsequi 22.00 30.00 36.4% MultiSource/Benchmarks/Ptrdist/bc/bc 23.00 28.00 21.7% External/S...NT2017rate/502.gcc_r/502.gcc_r 1258.00 1512.00 20.2% External/S...te/520.omnetpp_r/520.omnetpp_r 954.00 1143.00 19.8% External/S...rate/510.parest_r/510.parest_r 5961.00 7122.00 19.5% External/S...C/CINT2006/445.gobmk/445.gobmk 47.00 56.00 19.1% External/S...00.perlbench_r/500.perlbench_r 241.00 286.00 18.7% External/S...NT2006/471.omnetpp/471.omnetpp 36.00 42.00 16.7% External/S...06/400.perlbench/400.perlbench 183.00 210.00 14.8% MultiSource/Applications/SPASS/SPASS 72.00 81.00 12.5% External/S...17rate/541.leela_r/541.leela_r 72.00 80.00 11.1% External/SPEC/CINT2006/403.gcc/403.gcc 585.00 642.00 9.7% MultiSourc...e/Applications/sqlite3/sqlite3 120.00 131.00 9.2% MultiSourc...Applications/hexxagon/hexxagon 11.00 12.00 9.1% External/S.../CFP2006/453.povray/453.povray 566.00 615.00 8.7% External/S...rate/511.povray_r/511.povray_r 578.00 627.00 8.5% External/S...FP2006/482.sphinx3/482.sphinx3 12.00 13.00 8.3% MultiSource/Applications/oggenc/oggenc 130.00 140.00 7.7% MultiSourc...e/Applications/ClamAV/clamscan 250.00 268.00 7.2% MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg 19.00 20.00 5.3% MultiSourc...ch/consumer-jpeg/consumer-jpeg 19.00 20.00 5.3% External/S...te/526.blender_r/526.blender_r 3747.00 3928.00 4.8% MultiSourc...OE-ProxyApps-C++/miniFE/miniFE 104.00 108.00 3.8% MultiSourc...ch/consumer-lame/consumer-lame 54.00 56.00 3.7% MultiSource/Benchmarks/Bullet/bullet 1222.00 1264.00 3.4% MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 973.00 1005.00 3.3% External/S.../CFP2006/447.dealII/447.dealII 2699.00 2780.00 3.0% External/S...06/483.xalancbmk/483.xalancbmk 788.00 810.00 2.8% External/S.../CFP2006/450.soplex/450.soplex 180.00 185.00 2.8% MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR 338.00 345.00 2.1% MultiSourc...Benchmarks/7zip/7zip-benchmark 685.00 699.00 2.0% External/S...FP2017rate/544.nab_r/544.nab_r 158.00 160.00 1.3% MultiSourc...sumer-typeset/consumer-typeset 772.00 781.00 1.2% External/S...2017rate/525.x264_r/525.x264_r 410.00 414.00 1.0% External/S...23.xalancbmk_r/523.xalancbmk_r 998.00 1002.00 0.4% ``` Compile-time is almost neutral: https://llvm-compile-time-tracker.com/compare.php?from=b3125ad3d60531a97eea20009cc9629a87755862&to=84007eee59004f43464eda7f5ba8263ed5158df8&stat=instructions NewPM-O3: +0.03% NewPM-ReleaseThinLTO: -0.01% NewPM-ReleaseLTO-g: +0.03% Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D132365	2022-08-30 13:27:51 +01:00
OCHyams	84a71d5259	[DebugInfo] Fix line number attribution in mldst-motion Taking the example from the test included in this patch: $ cat test.cpp -n 1 void fun(int *a, int cond) { 2 if (cond) 3 a[1] = 1; 4 else 5 a[1] = 2; 6 } mldst-motion will merge and sink the stores in if.then and if.else into if.end. The resultant PHI, gep and store should be attributed line zero with the innermost common scope rather than picking a debug location from one of the original stores. Reviewed By: djtodoro Differential Revision: https://reviews.llvm.org/D132741	2022-08-30 10:03:53 +01:00
jacquesguan	df525c7705	[InstCombine] fold fake floating point vector extract to shift+trunc. This patch supports the FP part of D111082. Differential Revision: https://reviews.llvm.org/D125750	2022-08-30 10:12:16 +08:00
Rong Xu	d7ef0c3970	[llvm-profdata] Improve profile supplementation Current implementation promotes a non-cold function in the SampleFDO profile into a hot function in the FDO profile. This is too aggressive. This patch promotes a hot functions in the SampleFDO profile into a hot function, and a warm function in SampleFDO into a warm function in FDO. Differential Revision: https://reviews.llvm.org/D132601	2022-08-29 16:50:42 -07:00
Philip Reames	8936d86469	[LV] Add debug output for force scalar tracing [nfc] I keep finding myself needing to rule this out as a possible source of scalarization, so add debug output like we have for other instructions we decide to scalarize.	2022-08-29 15:17:51 -07:00
Valery N Dmitriev	329b972d41	[SLP] Try to match reductions before trying to vectorize a vector build sequence. This patch changes order of searching for reductions vs other vectorization possibilities. The idea is if we do not match a reduction it won't be harmful for further attempts to find vectorizable operations on a vector build sequences. But doing it in the opposite order we have good chance to ruin opportunity to match a reduction later. We also don't want to try vectorizing binary operations too early as 2-way vectorization may effectively prohibit wider ones leading to producing less effective code. Differential Revision: https://reviews.llvm.org/D132590	2022-08-29 13:32:14 -07:00
Philip Reames	033a97a8f3	[LV] Minor code restructure of isUniformAfterVectorization [nfc] Mostly just to make a future patch easier to review.	2022-08-29 12:48:27 -07:00
Philip Reames	c37b1a5f76	[RLEV] Pick a correct insert point when incoming instruction is itself a phi node This fixes https://github.com/llvm/llvm-project/issues/57336. It was exposed by a recent SCEV change, but appears to have been a long standing issue. Note that the whole insert into the loop instead of a split exit edge is slightly contrived to begin with; it's there solely because IndVarSimplify preserves the CFG. Differential Revision: https://reviews.llvm.org/D132571	2022-08-29 11:44:33 -07:00
Alexey Bataev	beacf9bd9e	[SLP]Fix PR57322: vectorize constant float stores. Stores for constant floats must be vectorized, improve analysis in SLP vectorizer for stores. Differential Revision: https://reviews.llvm.org/D132750	2022-08-29 11:02:53 -07:00
Alexey Bataev	e6345bf644	[SLP]Improve lookup of the buildvector top insertelement instruction. When estimating the cost of the in-tree vectorized scalars in buildvector sequences, need to take into account the vectorized insertelement instruction. The top of the buildvector seuences is the topmost vectorized insertelement instruction, because it will have > than 1 use after the vectorization. For the affected test case improves througput from 21 to 16 (per llvm-mca). Differential Revision: https://reviews.llvm.org/D132740	2022-08-29 08:19:52 -07:00
Sanjay Patel	6c39a3aae1	[InstCombine] fold not-shift of signbit to icmp+zext https://alive2.llvm.org/ce/z/j_8Wz9 The arithmetic shift was converted to logical shift with: `246078604c` That does not seem to uncover any other missing/conflicting folds, so convert directly to signbit test + cast. We still need to fold the pattern with logical shift to test + cast. This allows reducing patterns where the output type is not the same as the input value: https://alive2.llvm.org/ce/z/nydwFV Fixes #57394	2022-08-29 10:06:31 -04:00
Sanjay Patel	246078604c	[InstCombine] fold inc-of-signbit-splat to not+lshr (iN X s>> (N - 1)) + 1 --> (~X) u>> (N - 1) https://alive2.llvm.org/ce/z/wzS474	2022-08-29 08:48:22 -04:00
Florian Hahn	c78696813f	[LV] Remove unneeded getVectorIntrinsicIDForCall call (NFC). Suggested as independent fix during the review of D132585.	2022-08-29 10:19:47 +01:00
Kazu Hirata	2ad7fd3ac7	[Instrumentation] Use std::clamp (NFC) The use of std::clamp should be safe here. MinRZ is at most 32, while kMaxRZ is 1 << 18, so we have MinRZ <= kMaxRZ, avoiding the undefind behavior of std::clamp.	2022-08-28 23:28:57 -07:00
Kazu Hirata	c63f823875	[llvm] Use range-based for loops (NFC)	2022-08-28 17:35:04 -07:00
Sanjay Patel	ab6892967c	[InstCombine] allow sext in fold of mask using signbit, part 2 https://alive2.llvm.org/ce/z/rcbZmx Sibling tranform to `275aa24c0a` This pattern is seen in the examples in issue #57381.	2022-08-28 11:50:52 -04:00
zhongyunde	84d6966e4d	[InstCombine] Propagate the nuw for combine of add+mul As the commit of D132658, make the 'nuw' change separately. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D132777	2022-08-28 23:01:11 +08:00
Florian Hahn	af98b875e8	[VPlan] Use range check in VPHeaderPHIRecipe::classof (NFC). This addresses a suggestion to simplify the check from D131989. This also makes it easier to ensure that VPHeaderPHIRecipe::classof checks for all header phi ids.	2022-08-28 15:54:12 +01:00
Sanjay Patel	275aa24c0a	[InstCombine] allow sext in fold of mask using signbit ~(iN X s>> (N-1)) & Y --> (X s< 0) ? 0 : Y -- with optional sext https://alive2.llvm.org/ce/z/wFFnZT	2022-08-28 09:01:30 -04:00
Kazu Hirata	b18ff9c461	[Transform] Use range-based for loops (NFC)	2022-08-27 23:54:32 -07:00
Kazu Hirata	d0166c617d	[Utils] Remove redundaunt declarations (NFC) Identified with readability-redundant-declaration.	2022-08-27 23:54:31 -07:00
Kazu Hirata	d1688e9ddf	[llvm] Use std::gcd (NFC) This patch replaces calls to greatestCommonDivisor with std::gcd where both arguments are known to be of unsigned. This means that std::common_type_t of the two argument types should just be the wider one of the two.	2022-08-27 23:54:29 -07:00
Kazu Hirata	56ea4f9bd3	[Transforms] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-27 21:21:02 -07:00
Kazu Hirata	7a617fdf39	Use std::gcd (NFC) This patch replaces calls to GreatestCommonDivisor64 with std::gcd where both arguments are known to be of unsigned types no larger than 64 bits in size.	2022-08-27 21:20:59 -07:00
Florian Hahn	7743badafa	[VPlan] Verify that header only contains header phi recipes. Add verification that VPHeaderPHIRecipes are only in header VPBBs. Also adds missing checks for VPPointerInductionRecipe to VPHeaderPHIRecipe::classof. Split off from D119661. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D131989	2022-08-27 22:06:12 +01:00
Kazu Hirata	21de2888a4	Use llvm::is_contained (NFC)	2022-08-27 09:53:11 -07:00
Kazu Hirata	a33ef8f2b7	Use llvm::all_equal (NFC)	2022-08-27 09:53:10 -07:00
Sanjay Patel	7abf233f44	[InstCombine] allow poison (undef) element in vector signbit transforms If the shift constant has undefined lanes, we can assume those are the same as the defined lanes in these transforms: https://alive2.llvm.org/ce/z/t6TTJ2 Replace undef with poison in the test while here to support the transition away from undef.	2022-08-27 11:57:05 -04:00
Sanjay Patel	c6e56024c6	[InstCombine] fold signbit splat pattern that uses negate 0 - (zext (i8 X u>> 7) to iN) --> sext (i8 X s>> 7) to iN https://alive2.llvm.org/ce/z/jzv4Ud This is part of solving issue #57381.	2022-08-27 08:04:35 -04:00
Vitaly Buka	0d59969abb	[msan] Enable msan-check-constant-shadow by default Depends on D132761. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D132765	2022-08-26 16:34:47 -07:00
Vitaly Buka	134986a720	[msan] Fix handling of constant shadow If constant shadown enabled we had false reports because !isZeroValue() does not guaranty that the values is actually not zero. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D132761	2022-08-26 15:51:02 -07:00
Vitaly Buka	072a2fd738	[NFC][msan] Clang-format the file	2022-08-26 15:11:12 -07:00
Eric Gullufsen	eb1e2b3997	[InstCombine] Canonicalize "and, add", "or, add", "xor, add" Canonicalize ``` ((x + C1) & C2) --> ((x & C2) + C1) ((x + C1) ^ C2) --> ((x ^ C2) + C1) ((x + C1) \| C2) --> ((x \| C2) + C1) ``` for suitable constants `C1` and `C2`. Alive2 proofs: [[ https://alive2.llvm.org/ce/z/BqMDVZ \| add, or --> or, add ]] [[ https://alive2.llvm.org/ce/z/BhAeCl \| add, xor --> xor, add ]] [[ https://alive2.llvm.org/ce/z/jYRHEt \| add, and --> and, add ]] Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D131142	2022-08-26 17:23:29 -04:00
Paul Kirth	3155e3070c	[llvm][misexpect] Re-enable MisExpect for SampleProfiling MisExpect was occasionally crashing under SampleProfiling, due to a division by zero. We worked around that in D124302 by changing the assert to an early return. This patch is intended to add a test case for the crashing scenario and re-enable MisExpect for SampleProfiling. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D124481	2022-08-26 20:24:10 +00:00
Philip Reames	c58791c286	Revert "[InstCombine] Canonicalize "and, add", "or, add", "xor, add"" This reverts commit `d2f110c693`. test/Transforms/InstCombine/freeze.ll fails on ninja check-llvm on x86_64.	2022-08-26 11:18:31 -07:00
Philip Reames	3dcec5e29f	[LV] Consistently use vputils::isUniformAfterVectorization [mostly nfc] I'd extracted isUniform, and Florian moved isUniformAfterVectorization out of VPlan at basically the same time. Let's go ahead and merge them. For the VPTransformState::get path, a VPValue without a def (which corresponds to an external IR value outside of VPLan) is explicitly handled above the uniform check. On the scalarizeInstruction path, I'm less sure why the change isn't visible, but test cases which would seem likely to hit it were already being handled as uniform through some other mechanism. It would be correct to consider values defined outside of vplan uniform here.	2022-08-26 11:09:17 -07:00
Eric Gullufsen	d2f110c693	[InstCombine] Canonicalize "and, add", "or, add", "xor, add" Canonicalize ``` ((x + C1) & C2) --> ((x & C2) + C1) ((x + C1) ^ C2) --> ((x ^ C2) + C1) ((x + C1) \| C2) --> ((x \| C2) + C1) ``` for suitable constants `C1` and `C2`. Alive2 proofs: [[ https://alive2.llvm.org/ce/z/BqMDVZ \| add, or --> or, add ]] [[ https://alive2.llvm.org/ce/z/BhAeCl \| add, xor --> xor, add ]] [[ https://alive2.llvm.org/ce/z/jYRHEt \| add, and --> and, add ]] Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D131142	2022-08-26 14:07:43 -04:00
Sanjay Patel	4827771234	[InstCombine] fold test of equality to 0.0 with bitcast operand fcmp oeq/une (bitcast X), 0.0 --> (and X, SignMaskC) ==/!= 0 https://alive2.llvm.org/ce/z/ZKATGN	2022-08-26 13:46:11 -04:00
Florian Hahn	4e5c44964a	[VPlan] Move isUniformAfterVectorization from VPlan to vputils (NFC). This allows re-using the utility without a VPlan object. The helper also doesn't access any data from VPlan.	2022-08-26 18:26:33 +01:00
Philip Reames	2d5f025779	[LV] Extract utility for checking if VPValue is uniform [nfc]	2022-08-26 09:56:13 -07:00
Florian Hahn	ec37ecbc62	[LCSSA] Skip updating users in unreachable blocks. Don't waste time trying to update users in unreachable blocks.	2022-08-26 15:09:46 +01:00
Daniil Fukalov	9c710ebbdb	[TTI] NFC: Reduce InstructionCost::getValue() usage... in order to propagate `InstructionCost` value upper. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D103406	2022-08-26 16:37:32 +03:00
Adrian Vogelsgesang	5af06ba7dc	[Coro][Debuginfo] Add debug info to `__NoopCoro_ResumeDestroy` function With this commit, we now attach an `DISubprogram` to the LLVM-generated `_NoopCoro_ResumeDestroy` function. Thereby, lldb can show a `std::coroutine_handle` to a `std::noop_coroutine` as ``` continuation = coro frame = 0x555555560d98 { resume = 0x0000555555555c50 (a.out`__NoopCoro_ResumeDestroy) destroy = 0x0000555555555c50 (a.out`__NoopCoro_ResumeDestroy) } ``` instead of ``` continuation = coro frame = 0x555555560d98 { resume = 0x0000555555555c50 (a.out`___lldb_unnamed_symbol211) destroy = 0x0000555555555c50 (a.out`___lldb_unnamed_symbol211) } ``` I renamed the function from `NoopCoro.ResumeDestroy` to `_NoopCoro_ResumeDestroy` because: * the leading `_` makes sure this is a reserved name and should not clash with any user-provided names * the `.` was replaced by a `_`, so the name is now a valid identifier in C, making it allows me to type its name in the debugger Differential Revision: https://reviews.llvm.org/D132580	2022-08-26 05:49:52 -07:00
Matthias Gehre	3e39b27101	[llvm/CodeGen] Add ExpandLargeDivRem pass Adds a pass ExpandLargeDivRem to expand div/rem instructions with more than 128 bits into a loop computing that value. As discussed on https://reviews.llvm.org/D120327, this approach has the advantage that it is independent of the runtime library. This also helps the clang driver, which otherwise would need to understand enough about the runtime library to know whether to allow _BitInts with more than 128 bits. Targets are still free to disable this pass and instead provide a faster implementation in a runtime library. Fixes https://github.com/llvm/llvm-project/issues/44994 Differential Revision: https://reviews.llvm.org/D126644	2022-08-26 11:55:15 +01:00
Dmitry Makogon	9142f67ef2	[SimplifyCFG] Don't widen cond br if false branch has successors Fixes https://github.com/llvm/llvm-project/issues/57221. This limits the tryWidenCondBranchToCondBranch transform making it work only if the false block of widenable condition branch has no successors. If that block has successors, then SimplifyCondBranchToCondBranch may undo the transform done by tryWidenCondBranchToCondBranch, which would lead to infinite cycle of transformation and eventually an assert failing. Differential Revision: https://reviews.llvm.org/D132356	2022-08-26 15:23:37 +07:00
Chuanqi Xu	17631ac676	[Coroutines] Store the index for final suspend point if there is unwind coro end Closing https://github.com/llvm/llvm-project/issues/57339 The root cause for this issue is an pre-mature optimization to eliminate the index for the final suspend point since we feel like we can judge if a coroutine is suspended at the final suspend by if resume_fn_addr is null. However this is not true if the coroutine exists via an exception in promise.unhandled_exception(). According to [dcl.fct.def.coroutine]p14: > If the evaluation of the expression promise.unhandled_exception() > exits via an exception, the coroutine is considered suspended at the > final suspend point. But from the perspective of the implementation, we can't set the coro index to the final suspend point directly since it breaks the states. To fix the issue, we block the optimization if we find there is any unwind coro end, which indicates that it is possible that the coroutine exists via an exception from promise.unhandled_exception(). Test Plan: folly	2022-08-26 14:05:46 +08:00
Max Kazantsev	ccf788a565	[IRCE] Drop SCEV of a Phi after adding a new input. PR57335 Since SCEV learned to look through single value phis with `20d798bd47`, whenever we add a new input to a Phi, we should make sure that the old cached value is dropped. Otherwise, it may lead to various miscompiles, such as breach of dominance as shown in the bug https://github.com/llvm/llvm-project/issues/57335	2022-08-25 18:14:29 +07:00
Chenbing Zheng	adf4519c0e	[InstCombine] recognize bitreverse disguised as shufflevector This patch complete TODO left in D66965, and achieve related pattern for bitreverse. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D132431	2022-08-25 10:41:47 +08:00
Chenbing Zheng	14fae4d136	[InstCombine] Add undef elements support for shrinkFPConstantVector Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D132343	2022-08-25 10:38:48 +08:00
Valery N Dmitriev	a4c8fb9d1f	[SLP][NFC] Refactor SLPVectorizerPass::vectorizeRootInstruction method. The goal is to separate collecting items for post-processing and processing them. Post processing also outlined as dedicated method. Differential Revision: https://reviews.llvm.org/D132603	2022-08-24 17:07:53 -07:00
Sami Tolvanen	cff5bef948	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Relands `67504c9549` with a fix for 32-bit builds. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 22:41:38 +00:00
Sanjay Patel	7c2f93c04a	[InstCombine] use isa instead of dyn_cast for unused value; NFC	2022-08-24 17:58:20 -04:00
Cameron McInally	38d58c1b37	[GlobalOpt] Bail out of GlobalOpt SROA if a Scalable Vector is seen The SROA algorithm won't work for Scalable Vectors, since we don't know how many bytes are loaded/stored. Bail out if a Scalable Vector is seen. Differential Revision: https://reviews.llvm.org/D132417	2022-08-24 13:17:59 -07:00
Sanjay Patel	f7ab70cf8d	[InstCombine] reduce disguised mul+add factorization ~(A * C1) + A --> (A * (1 - C1)) - 1 This is a non-obvious mix of bitwise logic and math: https://alive2.llvm.org/ce/z/U7ACVT The pattern may be produced by Negator from the more typical code seen in issue #57255.	2022-08-24 16:02:12 -04:00
Sami Tolvanen	a79060e275	Revert "KCFI sanitizer" This reverts commit `67504c9549` as using PointerEmbeddedInt to store 32 bits breaks 32-bit arm builds.	2022-08-24 19:30:13 +00:00
Sami Tolvanen	67504c9549	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 18:52:42 +00:00
Philip Reames	23245a914b	[LV] Simplify code given isPredicatedInst doesn't dependent on VF any more [nfc]	2022-08-24 11:42:10 -07:00
Philip Reames	3ab00cfca9	[LV] Adjust code added in `f79214d1` for `531dd3634` [nfc] When rebasing the review which became `f79214d1`, I forgot to adjust for the changed semantics introduced by `531dd3634`. Functionally, this had no impact, but semantically it resulted in an incorrect result for isPredicatedInst. I noticed this while doing a follow up change.	2022-08-24 10:38:17 -07:00
Philip Reames	f79214d1e1	[LV] Support predicated div/rem operations via safe-divisor select idiom This patch adds support for vectorizing conditionally executed div/rem operations via a variant of widening. The existing support for predicated divrem in the vectorizer requires scalarization which we can't do for scalable vectors. The basic idea is that we can always divide (take remainder) by 1 without executing UB. As such, we can use the active lane mask to conditional select either the actual divisor for active lanes, or a constant one for inactive lanes. We already account for the cost of the active lane mask, so the only additional cost is a splat of one and the vector select. This is one of several possible approaches to this problem; see the review thread for discussion on some of the others. This one was chosen mostly because it was straight forward, and none of the others seemed oviously better. I enabled the new code only for scalable vectors. We could also legally enable it for fixed vectors as well, but I haven't thought through the cost tradeoffs between widening and scalarization enough to know if that's profitable. This will be explored in future patches. Differential Revision: https://reviews.llvm.org/D130164	2022-08-24 10:07:59 -07:00
Florian Hahn	689895f432	[VPlan] Remove unneeded `struct` prefix for VPTransformState args (NFC).	2022-08-24 17:58:08 +01:00
spupyrev	8d5b694da1	extending code layout alg The diff modifies ext-tsp code layout algorithm in the following ways: (i) fixes merging of cold block chains (this is a port of D129397); (ii) adjusts the cost model utilized for optimization; (iii) adjusts some APIs so that the implementation can be used in BOLT; this is a prerequisite for D129895. The only non-trivial change is (ii). Here we introduce different weights for conditional and unconditional branches in the cost model. Based on the new model it is slightly more important to increase the number of "fall-through unconditional" jumps, which makes sense, as placing two blocks with an unconditional jump next to each other reduces the number of jump instructions in the generated code. Experimentally, this makes a mild impact on the performance; I've seen up to 0.2%-0.3% perf win on some benchmarks. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D129893	2022-08-24 09:40:25 -07:00
Sanjay Patel	0cfc651032	[InstCombine] ease use constraint in tryFactorization() The stronger one-use checks prevented transforms like this: (x * y) + x --> x * (y + 1) (x * y) - x --> x * (y - 1) https://alive2.llvm.org/ce/z/eMhvQa This is one of the IR transforms suggested in issue #57255. This should be better in IR because it removes a use of a variable operand (we already fold the case with a constant multiply operand). The backend should be able to re-distribute the multiply if that's better for the target. Differential Revision: https://reviews.llvm.org/D132412	2022-08-24 12:10:54 -04:00
Simon Pilgrim	2f217c1214	[InstCombine] Canonicalize ((X & -X) - 1) --> ((X - 1) & ~X) (PR51784) Enables the ctpop((x & -x ) - 1) -> cttz(x, false) fold Alive2: https://alive2.llvm.org/ce/z/EDk4h7 (((X & -X) - 1) --> (~X & (X - 1)) ) Alive2: https://alive2.llvm.org/ce/z/8Yr3XG (CTPOP -> CTTZ) Fixes #51126 Differential Revision: https://reviews.llvm.org/D110488	2022-08-24 16:50:43 +01:00
Sanjay Patel	4391351463	[InstCombine] improve readability in tryFactorization(); NFC Added/removed braces, reduced indents, and renamed a variable.	2022-08-24 11:31:18 -04:00
Simon Pilgrim	80cc8f0f62	Revert rGc360955c4804e9b25017372cb4c6be7adcb216ce "[InstCombine] Canonicalize ((X & -X) - 1) --> (~X & (X - 1)) (PR51784)" The test changes are failing on some buildbots (but not others.....).	2022-08-24 16:26:28 +01:00
Simon Pilgrim	c360955c48	[InstCombine] Canonicalize ((X & -X) - 1) --> (~X & (X - 1)) (PR51784) Enables the ctpop((x & -x ) - 1) -> cttz(x, false) fold Alive2: https://alive2.llvm.org/ce/z/EDk4h7 (((X & -X) - 1) --> (~X & (X - 1)) ) Alive2: https://alive2.llvm.org/ce/z/8Yr3XG (CTPOP -> CTTZ) Fixes #51126 Differential Revision: https://reviews.llvm.org/D110488	2022-08-24 15:31:15 +01:00
David Green	8d830f8d68	[LV] Replace fixed-order cost model with a SK_Splice shuffle The existing cost model for fixed-order recurrences models the phi as an extract shuffle of a v1 vector. The shuffle produced should be a splice, as they take two vectors inputs are extracting from a subset of the lanes. On certain architectures the existing cost model can drastically under-estimate the correct cost for the shuffle, so this changes it to a SK_Splice and passes a correct Mask through to the getShuffleCost call. I believe this might be the first use of a SK_Splice shuffle cost model outside of scalable vectors, and some targets may require additions to the cost-model to correctly account for them. In tree targets appear to all have been updated where needed. Differential Revision: https://reviews.llvm.org/D132308	2022-08-24 13:00:32 +01:00
Keno Fischer	30d7d74d5c	[MSAN] Handle array alloca with non-i64 size specification The array size specification of the an alloca can be any integer, so zext or trunc it to intptr before attempting to multiply it with an intptr constant. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D131846	2022-08-24 03:24:21 +00:00
Keno Fischer	5739d29cde	[MSAN] Correct shadow type for atomicrmw instrumentation We were passing the type of `Val` to `getShadowOriginPtr`, rather than the type of `Val`'s shadow resulting in broken IR. The fix is simple. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D131845	2022-08-24 03:24:19 +00:00
Chris Bieneman	9616905744	[NFC] Fix warning This change came in a few hours ago and introduced a warning. The fix is trivial, so I'm providing it. The original change was reviewed here: https://reviews.llvm.org/D132331	2022-08-23 20:50:37 -05:00
Philip Reames	49547b2241	[slp] Pull out a getOperandInfo variant helper [nfc]	2022-08-23 13:46:05 -07:00
Alvin Wong	c0214db51a	[llvm] Mark CFGuard fn ptr symbol as DSO local and add tests for mingw For mingw target, if a symbol is not marked DSO local, a `.refptr` is generated for it. This makes CFG check calls use an extra pointer dereference, which adds extra overhead compared to the MSVC version, so mark the CFG guard check funciton pointer DSO local to stop it. This should have no effect on MSVC target. Also adapt the existing cfguard tests to run for mingw targets, so that this change is checked. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D132331	2022-08-23 23:39:39 +03:00
Jakub Kuderski	6fa87ec10f	[ADT] Deprecate is_splat and replace all uses with all_equal See the discussion thread for more details: https://discourse.llvm.org/t/adt-is-splat-and-empty-ranges/64692 Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D132335	2022-08-23 11:36:27 -04:00
Florian Hahn	ff34432649	[LoopUtils] Remove unused Loop arg from addDiffRuntimeChecks (NFC). The argument is no longer used, remove it.	2022-08-23 10:15:28 +01:00
Andrew Browne	065d2e1d8b	[DFSan] Fix handling of libAtomic external functions. Implementation based on MSan. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D132070	2022-08-22 16:04:29 -07:00
Jay Foad	f82c55fa08	[InstCombine] Change order of canonicalization of ADD and AND Canonicalize ((x + C1) & C2) --> ((x & C2) + C1) for suitable constants C1 and C2, instead of the other way round. This should allow more constant ADDs to be matched as part of addressing modes for loads and stores. Differential Revision: https://reviews.llvm.org/D130080	2022-08-22 20:03:53 +01:00
Jay Foad	2754ff883d	[InstCombine] Try not to demand low order bits for Add Don't demand low order bits from the LHS of an Add if: - they are not demanded in the result, and - they are known to be zero in the RHS, so they can't possibly overflow and affect higher bit positions This is intended to avoid a regression from a future patch to change the order of canonicalization of ADD and AND. Differential Revision: https://reviews.llvm.org/D130075	2022-08-22 20:03:53 +01:00
Philip Reames	27d3321c4f	[TTI] Use OperandValueInfo in getMemoryOpCost client api [nfc] This removes the last use of OperandValueKind from the client side API, and (once this is fully plumbed through TTI implementation) allow use of the same properties in store costing as arithmetic costing.	2022-08-22 11:26:31 -07:00
Philip Reames	274f86e7a6	[TTI] Remove OperandValueKind/Properties from getArithmeticInstrCost interface [nfc] This completes the client side transition to the OperandValueInfo version of this routine. Backend TTI implementations still use the prior versions for now.	2022-08-22 11:06:32 -07:00
Philip Reames	c42a5f1cc2	[TTI] Migrate getOperandInfo to OperandVaueInfo [nfc] This is part of merging OperandValueKind and OperandValueProperties.	2022-08-22 10:19:02 -07:00
Philip Reames	5cd427106d	[TTI] Start process of merging OperandValueKind and OperandValueProperties [nfc] OperandValueKind and OperandValueProperties both provide facts about the operands of an instruction for purposes of cost modeling. We've discussed merging them several times; before I plumb through more flags, let's go ahead and do so. This change only adds the client side interface for getArithmeticInstrCost and makes a couple of minor changes in client code to prove that it works. Target TTI implementations still use the split flags. I'm deliberately splitting what could be one big change into a series of smaller ones so that I can lean on the compiler to catch errors along the way.	2022-08-22 09:48:15 -07:00
Max Kazantsev	e587199a50	[SCEV] Prove condition invariance via context, try 2 Initial implementation had too weak requirements to positive/negative range crossings. Not crossing zero with nuw is not enough for two reasons: - If ArLHS has negative step, it may turn from positive to negative without crossing 0 boundary from left to right (and crossing right to left doesn't count for unsigned); - If ArLHS crosses SINT_MAX boundary, it still turns from positive to negative; In fact we require that ArLHS always stays non-negative or negative, which an be enforced by the following set of preconditions: - both nuw and nsw; - positive step (looks liftable); Because of positive step, boundary crossing is only possible from left part to the right part. And because of no-wrap flags, it is guaranteed to never happen.	2022-08-22 14:31:19 +07:00
Ting Wang	d2d77e050b	[PowerPC][Coroutines] Add tail-call check with call information for coroutines Fixes #56679. Reviewed By: ChuanqiXu, shchenz Differential Revision: https://reviews.llvm.org/D131953	2022-08-21 22:20:40 -04:00
Sanjay Patel	15e3d86911	[InstCombine] reassociate bitwise logic chains based on uses (X op Y) op Z --> (Y op Z) op X This isn't a complete solution (see TODO tests for possible refinements), but it shows some nice wins and doesn't seem to cause any harm. I think the most potential danger is from conflicting with other folds and causing an infinite loop - that's the reason for avoiding patterns with constant operands. Alternatively, we could try this in the reassociate pass, but we would not immediately see all of the logic folds that instcombine provides. I also looked at improving ValueTracking's isImpliedCondition() (and we should still add some enhancements there), but that would not work in general for bitwise logic reduction. The tests that reduce completely to 0/-1 are motivated by issue #56653. Differential Revision: https://reviews.llvm.org/D131356	2022-08-21 09:42:14 -04:00
Simon Pilgrim	5263155d5b	[CostModel] Add CostKind argument to getShuffleCost Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future. Differential Revision: https://reviews.llvm.org/D132287	2022-08-21 10:54:51 +01:00
Kazu Hirata	8b1b0d1d81	Revert "Use std::is_same_v instead of std::is_same (NFC)" This reverts commit `c5da37e42d`. This patch seems to break builds with some versions of MSVC.	2022-08-20 23:00:39 -07:00
Kazu Hirata	c5da37e42d	Use std::is_same_v instead of std::is_same (NFC)	2022-08-20 22:36:26 -07:00
Kazu Hirata	ec5eab7e87	Use range-based for loops (NFC)	2022-08-20 21:18:32 -07:00
Kazu Hirata	258531b7ac	Remove redundant initialization of Optional (NFC)	2022-08-20 21:18:28 -07:00
Kazu Hirata	6b1bc80188	[Scalar] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-20 21:18:25 -07:00
Philip Reames	b0a2c48e9f	[tti] Consolidate getOperandInfo without OperandValueProperties copies [nfc]	2022-08-19 16:22:22 -07:00
Alexey Bataev	c167028684	[SLP]Delay vectorization of postponable values for instructions with no users. SLP vectorizer tries to find the reductions starting the operands of the instructions with no-users/void returns/etc. But such operands can be postponable instructions, like Cmp, InsertElement or InsertValue. Such operands still must be postponed, vectorizer should not try to vectorize them immediately. Differential Revision: https://reviews.llvm.org/D131965	2022-08-19 08:39:16 -07:00
Alexey Bataev	0e7ed32c71	[SLP]Cost for a constant buildvector. In many cases constant buildvector results in a vector load from a constant/data pool. Need to consider this cost too. Differential Revision: https://reviews.llvm.org/D126885	2022-08-19 08:02:42 -07:00
Alexey Bataev	d53e245951	[COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC. Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to better estimate cost with immediate values. Part of D126885.	2022-08-19 07:33:00 -07:00
Max Kazantsev	f798c042f4	Revert "[SCEV] Prove condition invariance via context" This reverts commit `a3d1fb3b59`. Reverting until investigation of https://github.com/llvm/llvm-project/issues/57247 has concluded.	2022-08-19 21:02:06 +07:00
Caroline Concatto	09afe4155b	[InstCombine] For vector extract when extract vector and insert value type is the same This patch has implements these optimizations: extract.vector(insert.vector(Vector, Value, Idx), Idx) --> Value extract.vector(insert.vector(Vector, Value, InsertIndex), ExtractIndex) --> extract.vector(Vector, ExtractIndex) Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D132137	2022-08-19 12:13:03 +01:00
Sanjay Patel	b066195b3f	[InstCombine] fold bitwise logic or+or+xor+not (~A \| C) \| (A ^ B) --> ~(A & B) \| C https://alive2.llvm.org/ce/z/Qw3aiJ This extends the existing fold (just above the new match) to peek through another 'or' instruction. This should let the motivating case from issue #57174 simplify completely.	2022-08-18 17:14:41 -04:00
Joe Loser	f3a55a1ddf	[llvm] Remove std::clamp equivalent in `Transforms/Utils/MisExpect.cpp` Use `std::clamp` directly from the standard library now that LLVM is built with C++17 standards mode. Differential Revision: https://reviews.llvm.org/D131869	2022-08-18 15:11:25 -06:00
Florian Hahn	b8709a9d03	[LV] Support fixed order recurrences. If the incoming previous value of a fixed-order recurrence is a phi in the header, go through incoming values from the latch until we find a non-phi value. Use this as the new Previous, all uses in the header will be dominated by the original phi, but need to be moved after the non-phi previous value. At the moment, fixed-order recurrences are modeled as a chain of first-order recurrences. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D119661	2022-08-18 19:15:52 +01:00
Philip Reames	1436adae2c	[LV-L] Add const and move method body out of line [nfc]	2022-08-18 11:10:19 -07:00
Philip Reames	c064d3f139	[LV] Use early continue to simplify code [nfc]	2022-08-18 10:31:55 -07:00
Danila Malyutin	4a9ff289fb	[InstCombine] Fix freeze instruction getting inserted before landingpad The code would use first non-phi instruction as an insertion point, however this could lead to freeze getting inserted between phi and landingpad causing a verifier assert. Differential Revision: https://reviews.llvm.org/D132105	2022-08-18 17:43:42 +03:00
Philip Reames	531dd3634d	[LV] Restructure isPredicatedInst and isScalarWithPredication (w/a fix for uniform mem ops) This change reorganizes the code and comments to make the expected semantics of these routines more clear. However, this is not an NFC change. The functional change is having isScalarWithPredication return false if the instruction does not need predicated. Specifically, for the case of a uniform memory operation we were previously considering it not to be a predicated instruction, but were considering it to be scalable with predication. As can be seen with the test changes, this causes uniform memory ops which should have been lowered as uniform-per-parts values to instead be lowering via naive scalarization or if scalarization is infeasible (i.e. scalable vectors) aborted entirely. I also don't trust the code to bail out correctly 100% of the time, so it's possible we had a crash or miscompile from trying to scalarize something which isn't scalaralizable. I haven't found a concrete example here, but I am suspicious. Differential Revision: https://reviews.llvm.org/D131093	2022-08-18 07:14:04 -07:00
Simon Pilgrim	fdec50182d	[CostModel] Replace getUserCost with getInstructionCost * Replace getUserCost with getInstructionCost, covering all cost kinds. * Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks. Original Patch by @samparker (Sam Parker) Differential Revision: https://reviews.llvm.org/D79483	2022-08-18 11:55:23 +01:00
Simon Pilgrim	e48892ee42	[Transforms] LICM.cpp - pull out repeated getUserCost call Pulled out of D79483	2022-08-18 10:43:29 +01:00
Konstantina	5bc8791187	[NewGVN][PHIOFOPS] Bail out if an operand is in OpSafeForPHIOfOps but it is not safe for the current basic block. NewGVN tables are not cleared out between the initial run of NewGVN and the verification. In case of phi-of-ops optimization, OpSafeForPHIOfOps goes out of sync between the two runs. One operand might not be safe for one basic block, but it might be safe for one of its successors. In this case, the operand will be added in OpSafeForPHIOfOps map. In verification phase, we reuse OpSafeForPHIOfOps without updating it again. As a result, the operand will be considered safe for phi-of-ops optimization even for the case that it is not. This patch fixes this problem. Fix for 53807. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D130910	2022-08-17 18:57:46 -07:00
Paul Kirth	656c5d652c	[clang][llvm][NFC] Change misexpect's tolerance option to be 32-bit In D131869 we noticed that we jump through some hoops because we parse the tolerance option used in MisExpect.cpp into a 64-bit integer. This is unnecessary, since the value can only be in the range [0, 100). This patch changes the underlying type to be 32-bit from where it is parsed in Clang through to it's use in LLVM. Reviewed By: jloser Differential Revision: https://reviews.llvm.org/D131935	2022-08-17 14:38:53 +00:00
Ellis Hoag	6f61594d8c	[InstrProf] Add option to avoid instrumenting small functions If a function only has a few instructions, instrumentation can significantly increase the size and performance overhead of that function. Add the `-pgo-function-size-threshold` option to select a size threshold so these small functions are not instrumented. A similar option `-fxray-instruction-threshold=<N>` is used for XRay to reduce binary size overhead [1]. [1] https://www.llvm.org/docs/XRay.html Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D131816	2022-08-17 06:47:15 -07:00
Simon Pilgrim	594c5b1a42	[SLP] Update TODO comment about shuffle mask decoding This is handled in ShuffleVectorInst/getShuffleCost - getInstructionThroughput is (slowly) being removed.	2022-08-17 11:41:46 +01:00
Zain Jaffal	f61f99a105	[instcombine] Optimise for zero initialisation of product given fast flags are enabled Currently, clang ignores the 0 initialisation in finite math For example: ``` double f_prod = 0; double arr[1000]; for (size_t i = 0; i < 1000; i++) { f_prod *= arr[i]; } ``` Clang will ignore that `f_prod` is set to zero and it will generate assembly to iterate over the loop. Reviewed By: fhahn, spatel Differential Revision: https://reviews.llvm.org/D131672	2022-08-17 11:12:15 +01:00
Martin Sebor	a7a1be11e6	[InstCombine] convert second std::min argument to same type as first Ensure both arguments to std::min have the same type in all data models.	2022-08-16 17:34:33 -06:00
Martin Sebor	345514e991	[InstCombine] Add support for strlcpy folding Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D130666	2022-08-16 16:43:40 -06:00
Martin Sebor	e858f5120d	[InstCombine] Remove assumptions about int having 32 bits Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D131731	2022-08-16 15:35:08 -06:00
Sanjay Patel	ce081776b2	[FlattenCFG] avoid crash on malformed code We don't have a dominator tree in this pass, so we can't bail out sooner by checking for unreachable code, but this is a minimal fix for the example in issue #56875.	2022-08-16 15:11:00 -04:00
Danila Malyutin	451497a030	[RS4GC] Handle vectors of pointers in non-live clobbering Fix crash when trying to unconditionally cast alloca type to PointerType Differential Revision: https://reviews.llvm.org/D131146	2022-08-16 17:47:30 +03:00
Alexey Bataev	65c7cecb13	[SLP]Fix PR51320: Try to vectorize single store operands. Currently, we try to vectorize values, feeding into stores, only if slp-vectorize-hor-store option is provided. We can safely enable vectorization of the value operand of a single store in the basic block, if the operand value is used only in store. It should enable extra vectorization and should not increase compile time significantly. Fixes https://github.com/llvm/llvm-project/issues/51320 Differential Revision: https://reviews.llvm.org/D131894	2022-08-16 07:25:21 -07:00
Kevin P. Neal	7f768371a1	Fix build error: [FPEnv][EarlyCSE] Support for CSE when exception behavior is "ignore" or "maytrap" and the rounding mode is known. This should fix these build bot errors: Step 6 (build-check-mlir-build-only) failure: build (failure) C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(124): error C2220: the following warning is treated as an error C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(124): warning C4996: 'llvm::Optional<llvm::fp::ExceptionBehavior>::getValue': Use value instead. C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(129): warning C4996: 'llvm::Optional<llvm::RoundingMode>::getValue': Use value instead. C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(1386): warning C4996: 'llvm::Optional<llvm::fp::ExceptionBehavior>::getValue': Use value instead. C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(1388): warning C4996: 'llvm::Optional<llvm::RoundingMode>::getValue': Use value instead.	2022-08-16 08:47:36 -04:00
Kevin P. Neal	05ac82de40	[FPEnv][EarlyCSE] Support for CSE when exception behavior is "ignore" or "maytrap" and the rounding mode is known. Previously we would only CSE constrained FP intrinsics in the default floating point environment. Exception behavior of "strict" is still not allowed since we are not allowed to remove any traps in that case. There are no restrictions on CSE across function calls inside a function. Differential Revision: https://reviews.llvm.org/D112256	2022-08-16 08:31:42 -04:00
Martin Sebor	65967708d2	[InstCombine] Adjust snprintf folding of constant strings (PR #56598 ) Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D130494	2022-08-15 15:59:21 -06:00
Arthur Eubanks	633f5663c3	[LegacyPM] Remove ThinLTO bitcode writer legacy pass Using the legacy PM for the optimization pipeline is deprecated and in the process of being removed. This is a small step in that direction. For an example of migrating to the new PM: `853b57fe80`	2022-08-15 14:21:16 -07:00
Philip Reames	e792a353b5	[slp] adjust debug output to include final computed cost	2022-08-15 13:51:39 -07:00
Jameson Nash	3a8d7fe201	[SimplifyCFG] teach simplifycfg not to introduce ptrtoint for NI pointers SimplifyCFG expects to be able to cast both sides to an int, if either side can be case to an int, but this is not desirable or legal, in general, per D104547. Spotted in https://github.com/JuliaLang/julia/issues/45702 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D128670	2022-08-15 15:11:48 -04:00
Alexey Bataev	2819126d0c	[SLP][NFC]Replace multiple isa calls with single one where possible, NFC.	2022-08-15 11:56:58 -07:00
Sanjay Patel	e5748c6e73	[InstCombine] reduce sub-with-overflow ==/!= 0 The basic patterns look like this: https://alive2.llvm.org/ce/z/MDj9EC The tests have a use of the overflow value too. Otherwise, existing folds should reduce already. This was noted as a missing IR fold in: `926e7312b2` Hopefully, this makes it easier to implement a backend fix because we should get the same IR regardless of whether the source used builtins or inline code.	2022-08-15 13:03:51 -04:00
Nuno Lopes	0299ebc1bd	InstCombine: use poison instead of undef as placeholder in insertvalue [NFC] These vectors are fully initialized so the placeholder value is irrelevant	2022-08-14 21:37:23 +01:00
Kazu Hirata	50724716cd	[Transforms] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-14 12:51:58 -07:00
Kazu Hirata	448c466636	Use llvm::erase_value (NFC)	2022-08-13 12:55:50 -07:00
Kazu Hirata	109df7f9a4	[llvm] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-13 12:55:42 -07:00
Kazu Hirata	2117fcb1c0	Use Optional::transform instead of Optional::map (NFC) I'm planning to deprecate map in favor of transform for consistency with std::optional::transform in C++23.	2022-08-13 11:48:26 -07:00
Sanjay Patel	8b56fa92de	[InstCombine] fix "X\|(X^Y)" pattern-matching for commuted variants	2022-08-13 11:02:28 -04:00
Sanjay Patel	9d218b61cc	[InstCombine] reduce or-xor-or patterns (A \| ?) \| (A ^ B) --> (A \| ?) \| B https://alive2.llvm.org/ce/z/dbNQw4 This extends the existing transform to peek through another 'or' instruction for the common operand. This is the underlying missing fold that should allow issue #56711 and issue #57120 to reduce even more.	2022-08-13 09:52:01 -04:00
Sanjay Patel	763b31237f	[InstCombine] move comments closer to relevant code; NFC	2022-08-13 09:16:33 -04:00
Kevin Athey	532564de17	[MSAN] add flag to suppress storage of stack variable names with -sanitize-memory-track-origins Allows for even more savings in the binary image while simultaneously removing the name of the offending stack variable. Depends on D131631 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D131728	2022-08-12 11:59:53 -07:00
Arthur Eubanks	a3ac1cfaed	[SampleProfile] Fix non-determinism in promoteMergeNotInlinedContextSamples() We're seeing non-determinism with loading sample profiles. It seems to be related to the order in which we merge FunctionSamples in promoteMergeNotInlinedContextSamples(). Use a MapVector to iterate over NonInlinedCallSites in the order entries were inserted. Reviewed By: wenlei, davidxl Differential Revision: https://reviews.llvm.org/D131592	2022-08-12 10:13:25 -07:00
Kevin Athey	ec277b67eb	[MSAN] Separate id ptr from constant string for variable names used in track origins. The goal is to reduce the size of the MSAN with track origins binary, by making the variable name locations constant which will allow the linker to compress them. Follows: https://reviews.llvm.org/D131415 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D131631	2022-08-12 08:47:36 -07:00
Max Kazantsev	a3d1fb3b59	[SCEV] Prove condition invariance via context Contextual knowledge may be used to prove invariance of some conditions. For example, in this case: ``` ; %len >= 0 guard(%iv = {start,+,1}<nuw> <s %len) guard(%iv = {start,+,1}<nuw> <u %len) ``` the 2nd check always fails if `start` is negative and always passes otherwise. It looks like there are more opportunities of this kind that are still to be implemented in the future. Differential Revision: https://reviews.llvm.org/D129753 Reviewed By: apilipenko	2022-08-12 14:23:35 +07:00
Chuanqi Xu	e190b7cc90	[Coroutines] Maintain the position of final suspend Closing https://github.com/llvm/llvm-project/issues/56329 The problem happens when we try to simplify the suspend points. We might break the assumption that the final suspend lives in the last slot of Shape.CoroSuspends. This patch tries to main the assumption and fixes the problem.	2022-08-12 13:05:08 +08:00
Sanjay Patel	fa68d93d54	[InstCombine] fold reassociative fadd with negated operand We manage to iteratively achieve this result with no extra uses, and the reassociate pass can also do this, but this pattern falls through the cracks in the example from issue #57053.	2022-08-11 11:43:36 -04:00
Marco Elver	c47ec95531	[MemorySanitizer] Support memcpy.inline and memset.inline Other sanitizers (ASan, TSan, see added tests) already handle memcpy.inline and memset.inline by not relying on InstVisitor to turn the intrinsics into calls. Only MSan instrumentation currently does not support them due to missing InstVisitor callbacks. Fix it by actually making InstVisitor handle MemInlineInst. While the mem.inline intrinsics promise no calls to external functions as an optimization, for the sanitizers we need to break this guarantee since access into the runtime is required either way, and performance can no longer be guaranteed. All other cases, where generating a call is incorrect, should instead use no_sanitize. Fixes: https://github.com/llvm/llvm-project/issues/57048 Reviewed By: vitalybuka, dvyukov Differential Revision: https://reviews.llvm.org/D131577	2022-08-11 10:43:49 +02:00
Kevin Athey	057cabd997	Remove function name from sanitize-memory-track-origins binary. This work is being done to reduce the size of MSAN with track origins binary. Builds upon: https://reviews.llvm.org/D131205 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D131415	2022-08-10 15:45:40 -07:00
Johannes Doerfert	b65471d715	[Attributor][FIX] Visit same instructions with different scopes If we collect potential values we need to visit a value even if we have seen it before if the scope is different. The scope is part of the result after all. Test included. Fixes https://github.com/llvm/llvm-project/issues/56753 Differential Revision: https://reviews.llvm.org/D131597	2022-08-10 16:02:12 -05:00
Kevin Athey	d7a47a9bb5	Desist from passing function location to __msan_set_alloca_origin4. This is done by calling __msan_set_alloca_origin and providing the location of the variable by using the call stack. This is prepatory work for dropping variable names when track-origins is enabled. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D131205	2022-08-10 09:02:53 -07:00
Nikita Popov	32017d5efe	[Attributor] Check for noalias call in AAInstanceInfo The relevant property of allocation functions of interest here is their uniqueness (in the sense of disjoint provenance), which is encoded by the noalias return attribute. Differential Revision: https://reviews.llvm.org/D130225	2022-08-10 10:27:14 +02:00
Dinar Temirbulatov	cab6cd6834	[AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding. After D121595 was commited, I noticed regressions assosicated with small trip count numbersvectorisation by tail folding with scalable vectors. As a solution for those issues I propose to introduce the minimal trip count threshold value. Differential Revision: https://reviews.llvm.org/D130755	2022-08-09 22:10:17 +01:00
Sanjay Patel	926e7312b2	[InstCombine] fold usub.with.overflow to icmp when there's no use of the math value https://alive2.llvm.org/ce/z/UE48FH This is part of solving issue #56926.	2022-08-09 13:13:48 -04:00
Sanjay Patel	6bfe5361b7	[InstCombine] add helper function for extract of with-overflow-intrinsic; NFC We can do more with these patterns, so this block is going to grow.	2022-08-09 12:38:11 -04:00
zhongyunde	c2ab65ddaf	[IndVars] Eliminate redundant type cast with different sizes Deal with different sizes between the itofp and fptoi with trunc or sext/zext, depend on D129756. Fixes https://github.com/llvm/llvm-project/issues/55505. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129958	2022-08-09 23:59:42 +08:00
Nikita Popov	4ac00789e1	[RelLookupTableConverter] Bail on invalid pointer size (x32) The RelLookupTableConverter pass currently only supports 64-bit pointers. This is currently enforced using an isArch64Bit() check on the target triple. However, we consider x32 to be a 64-bit target, even though the pointers are 32-bit. (And independently of that specific example, there may be address spaces with different pointer sizes.) As such, add an additional guard for the size of the pointers that are actually part of the lookup table. Differential Revision: https://reviews.llvm.org/D131399	2022-08-09 09:36:39 +02:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Ruobing Han	f756f06cc4	[SimpleLoopUnswitch] Skip non-trivial unswitching of cold loops With profile data, non-trivial LoopUnswitch will only apply on non-cold loops, as unswitching cold loops may not gain much benefit but significantly increase the code size. Reviewed By: aeubanks, asbirlea Differential Revision: https://reviews.llvm.org/D129599	2022-08-08 18:12:04 +00:00
Vang Thao	257251247a	[SROA] Try harder to find a vector promotion viable type when rewriting We are seeing significant performance loss when an alloca fails to get promoted to register. I have observed that this is due to the common type found when attempting to rewrite partition users being unviable for promotion. While if we would have continue looking for a type, we would have found a subtype in the original allocated type that would have enabled promotion. Thus first check if the initial common type found is promotion viable and if not then continue looking instead of stopping with the initial common type found. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128073	2022-08-08 11:04:01 -07:00
Denis Antrushin	36cc533471	[EarlyCSE][OpaquePointers]Replace assert with return for mask type check. When EarlyCSE tries to common vector masked loads/stores, it first checks that they have same base operand and then assumes that this is enough for mask types to be equal. This is true for typed pointers but false for opaque ones - two loads of different vector sizes from same base pointer '%b' are the same, `ptr %b`. (For typed pointers, `%b` was cast to vector pointer type so bases were different). Change assert to return from lambda `isSubmask` so this transformation properly works with opaque pointers. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D131251	2022-08-08 16:14:42 +03:00
Kazu Hirata	e20d210eef	[llvm] Qualify auto (NFC) Identified with readability-qualified-auto.	2022-08-07 23:55:27 -07:00
Kazu Hirata	0e37ef0186	[Transforms] Fix comment typos (NFC)	2022-08-07 23:55:24 -07:00
Kazu Hirata	ba0407ba86	[llvm] Use range-based for loops (NFC)	2022-08-07 00:16:21 -07:00
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Fangrui Song	fa66789d06	[llvm] LLVM_NODISCARD => [[nodiscard]]. NFC With C++17 there is no Clang pedantic warning.	2022-08-07 00:26:33 +00:00
Fangrui Song	5deb678289	Revert "[SampleProfileInference] Work around odr-use of const non-inline static data member to fix -O0 builds after D120508" This reverts commit `48c74bb2e2`. With C++17 the workaround is no longer needed.	2022-08-06 16:48:23 -07:00
Dawid Jurczak	1bd31a6898	[NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T Extracted from https://reviews.llvm.org/D129781 and address comment: https://reviews.llvm.org/D129781#3655571 Differential Revision: https://reviews.llvm.org/D130268	2022-08-05 13:35:41 +02:00
David Spickett	c401dbde71	[llvm][IROutliner] Account for return void in sort comparator This fixes 69 llvm tests that failed when EXPENSIVE_CHECKS was enabled. llvm/test/Transforms/IROutliner/outlining-commutative-operands-opposite-order.ll is one example. When we have EXPENSIVE_CHECKS, _GLIBCXX_DEBUG is defined. This means that libstdc++ will call the compare function to check if it is implemented correctly (that !(a < a) is true). This happens even if there is only one item and here, we expect to see one return void or multiple return constant integer. Don't sort if we have 1 item, but do assert that it is the 1 ret void we expect. In the comparator, assert that neither Value is a nullptr in case one ended up in a the list somehow. Reviewed By: AndrewLitteken Differential Revision: https://reviews.llvm.org/D130230	2022-08-05 09:36:43 +00:00
Chuanqi Xu	230d6f93aa	[Coroutines] Remove lifetime intrinsics for spliied allocas in coroutine frames Closing https://github.com/llvm/llvm-project/issues/56919 It is meaningless to preserve the lifetime markers for the spilled allocas in the coroutine frames and it would block some optimizations too.	2022-08-05 14:50:43 +08:00
Fangrui Song	7d6017fd31	[TTI] Change new getVectorInstrCost overload to use const reference after D131114 A const reference is preferred over a non-null const pointer. `Type *` is kept as is to match the other overload. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D131197	2022-08-04 15:16:51 -07:00
Arthur Eubanks	6e45162adf	[InstrProf] Set prof global variables to internal linkage if adding a comdat COFF has a verifier check that private global variables don't have a comdat of the same name. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D131043	2022-08-04 13:24:55 -07:00
Mingming Liu	bc8f2f3649	[AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation. 1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method. 2) This patch also changes a few callsites (VectorCombine.cpp, SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method. 3) This is a split of D128302. Differential Revision: https://reviews.llvm.org/D131114	2022-08-04 12:58:25 -07:00
Johannes Doerfert	f81a209337	[Attributor][FIX] Deal with implicit `undef` in AAPotentialConstantValues. In contrast to AAPotentialValues, the constant values version can contain implicit `undef` in the set. We had an assertion that could misfire before. Handle it properly now.	2022-08-04 14:44:51 -05:00
Ellis Hoag	12e78ff881	[InstrProf] Add the skipprofile attribute As discussed in [0], this diff adds the `skipprofile` attribute to prevent the function from being profiled while allowing profiled functions to be inlined into it. The `noprofile` attribute remains unchanged. The `noprofile` attribute is used for functions where it is dangerous to add instrumentation to while the `skipprofile` attribute is used to reduce code size or performance overhead. [0] https://discourse.llvm.org/t/why-does-the-noprofile-attribute-restrict-inlining/64108 Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D130807	2022-08-04 08:45:27 -07:00
Arthur Eubanks	203296d642	[BoundsChecking] Fix merging of sizes BoundsChecking uses ObjectSizeOffsetEvaluator to keep track of the underlying size/offset of pointers in allocations. However, ObjectSizeOffsetVisitor (something ObjectSizeOffsetEvaluator uses to check for constant sizes/offsets) doesn't quite treat sizes and offsets the same way as BoundsChecking. BoundsChecking wants to know the size of the underlying allocation and the current pointer's offset within it, but ObjectSizeOffsetVisitor only cares about the size from the pointer to the end of the underlying allocation. This only comes up when merging two size/offset pairs. Add a new mode to ObjectSizeOffsetVisitor which cares about the underlying size/offset rather than the size from the current pointer to the end of the allocation. Fixes a false positive with -fsanitize=bounds. Reviewed By: vitalybuka, asbirlea Differential Revision: https://reviews.llvm.org/D131001	2022-08-03 17:21:19 -07:00
Vitaly Buka	a2aa6809a8	[NFC][Inliner] Add cl::opt<int> to tune InstrCost The plan is tune this for sanitizers. Differential Revision: https://reviews.llvm.org/D131123	2022-08-03 17:14:10 -07:00
Congzhe Cao	8dc4b2edfa	[LoopInterchange][PR56275] Fix legality with negative dependence vectors This is the 2nd patch of the two-patch series (D130188, D130189) that fix PR56275 (https://github.com/llvm/llvm-project/issues/56275) which is a missed opportunity for loop interchange. As follow-up on the dependence analysis (DA) patch D130188, this patch normalizes DA results in loop interchange, such that negative dependence vectors queried by loop interchange are reversed to be non-negative. Now all tests in PR56275 can get interchanged. Those tests are added in lit test as `pr56275.ll`. Reviewed By: kawashima-fj, bmahjour, Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D130189	2022-08-03 19:59:01 -04:00
Bill Wendling	239c831de4	Add switch to use "source_filename" instead of a hash ID for globally promoted local During LTO a local promoted to a global gets a unique suffix based on a hash of the module IR. This means that changes in the local's module can affect the contents in another module that imported it (because the name of the imported promoted local is changed, but that doesn't reflect a real change in the importing module). So any tool that's validating changes to the importing module will see a superficial change. Instead of using the module hash, we can use the "source_filename" if it exists to generate a unique identifier that doesn't change due to LTO shenanigans. Differential Revision: https://reviews.llvm.org/D128863	2022-08-03 16:41:56 -07:00
Philip Reames	569a7f6aa3	[LV] Move definition of isPredicatedInst out of line and make it const [nfc]	2022-08-03 08:53:11 -07:00
Philip Reames	a1cab0daae	[LV] Use cost base decision for uniform mem op strategy [nfc-ish] This is mostly a stylistic change to make the uniform memop widening cost code fit more naturally with the sourounding code. Its not strictly speaking NFC as I added in the store with invariant value case, and we could in theory have a target where a gather/scatter is cheaper than a single load/store... but it's probably NFC in practice. Note that the scatter/gather result can still be overriden later if the result is uniform-by-parts.	2022-08-03 07:47:24 -07:00
Nikita Popov	b128e057c1	[AA] Make ModRefInfo a bitmask enum (NFC) Mark ModRefInfo as a bitmask enum, which allows using normal & and \| operators on it. This supersedes various functions like unionModRef() and intersectModRef(). I think this makes the code cleaner than going through helper functions... Differential Revision: https://reviews.llvm.org/D130870	2022-08-03 10:05:55 +02:00
Paul Kirth	d434e40f39	[llvm][NFC] Refactor code to use ProfDataUtils In this patch we replace common code patterns with the use of utility functions for dealing with profiling metadata. There should be no change in functionality, as the existing checks should be preserved in all cases. Reviewed By: bogner, davidxl Differential Revision: https://reviews.llvm.org/D128860	2022-08-03 00:09:45 +00:00
Vladislav Dzhidzhoev	f6d9f00031	[DebugInfo] Test commit: update irrelevant comments Differential Revision: https://reviews.llvm.org/D130998	2022-08-02 20:21:24 +03:00
Philip Reames	0b47615fcf	[LV] Recognize store of invariant value to invariant address as uniform This extends the handling of uniform memory operations to handle the case where a store is storing a loop invariant value. Unlike the general case of a store to an invariant address where we must use the last active lane, in this case we can use any lane since all lanes must produce the same result. For context, the basic structure of the existing code and how the change fits in: * First, we select a widening strategy. (The result is irrelevant for this patch.) * Then we determine if a computation is uniform within all lanes of VF. (Note this is the uniform-per-part definition, not LAI's uniform across all unrolled iterations definition.) * If it is, we overrule the widening strategy, and unconditionally scalarize. * VPReplicationRecipe - which is what actually does the scalarization - knows how to handle unform-per-part values including for scalable vectors. However, we do need to know that the expression is safe to execute without predication - e.g. the uniform mem op was unconditional in the original loop. (This part was split off and already landed.) An obvious question is why not simply implement the generic case? The answer is that I'm going to, but doing so without a canonicalization towards uniform causes regressions due to bad interaction with scalarization/uniformity of values feeding the uniform mem-op. This patch is needed to avoid those regressions. Differential Revision: https://reviews.llvm.org/D130364	2022-08-02 08:09:49 -07:00
David Sherwood	4ef9cb6c17	[AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses If we have interleave groups in the loop we want to vectorise then we should fall back on normal vectorisation with a scalar epilogue. In such cases when tail-folding is enabled we'll almost certainly go on to create vplans with very high costs for all vector VFs and fall back on VF=1 anyway. This is likely to be worse than if we'd just used an unpredicated vector loop in the first place. Once the vectoriser has proper support for analysing all the costs for each combination of VF and vectorisation style, then we should be able to remove this. Added an extra test here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D128342	2022-08-02 09:52:33 +01:00
jacquesguan	e38af7ba95	[LV] Refactor getExtendedAddReductionCost to support other extended reduction more than Add. Now the API getExtendedAddReductionCost is used to determine the cost of extended Add reduction with optional Mul. For Arm, it could cover the cases. But for other target, for example: RISCV, they support other kinds of extended recution, such as FAdd. This patch does the following changes: 1, Split getExtendedAddReductionCost into 2 new API: getExtendedReductionCost which handles the extended reduction with addtional input of Opcode; getMulAccReductionCost which handle the MLA cases the getExtendedAddReductionCost. 2, Refactor getReductionPatternCost, add some contraint condition to make sure the getMulAccReductionCost should only handle the reuction of Add + Mul. Differential Revision: https://reviews.llvm.org/D130868	2022-08-02 16:02:38 +08:00
Martin Sebor	bcef4d238d	[InstCombine] Correct strtol folding with nonnull endptr Reflect in the pointer's offset the length of the leading part of the consumed string preceding the first converted digit. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D130912	2022-08-01 16:47:05 -06:00
Simon Pilgrim	27105e2f30	MisExpect.h - fix Wdocumentation warnings. NFC.	2022-08-01 15:06:30 +01:00
Alex Bradbury	9bf2d8cbbe	[NFC] Use AllocaInst's getAddressSpace helper	2022-08-01 10:11:16 +01:00
Nikita Popov	7314ad7a06	Revert "[SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions" This reverts commit `7b0f6378e2`. As commented on the review, this patch has a correctness issue regarding the modelling of memory effects.	2022-08-01 09:20:56 +02:00
Momchil Velikov	7b0f6378e2	[SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions SimplifyCFG does some common code hoisting, which is limited to hoisting a sequence of identical instruction in identical order and stops at the first non-identical instruction. This patch allows hoisting instruction pairs over same-length sequences of non-matching instructions. The linear asymptotic complexity of the algorithm stays the same, there's an extra parameter `simplifycfg-hoist-common-skip-limit` serving to limit compilation time and/or the size of the hoisted live ranges. The patch improves SPECv6/525.x264_r by about 10%. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D129370	2022-08-01 07:55:14 +01:00
Kazu Hirata	bf6021709a	Use drop_begin (NFC)	2022-07-31 15:17:09 -07:00
Sanjay Patel	7073ec530e	[InstCombine] canonicalize more zext-and-of-bool compare to narrow and https://alive2.llvm.org/ce/z/vBNiiM This matches variants of patterns that were folded with: `b5a9361c90`	2022-07-30 11:22:05 -04:00
Sanjay Patel	f95a6aea1b	[InstCombine] avoid splitting a constant expression with div/rem fold Follow-up to `d4940c0f3d` to further limit the transform to avoid an unintended pattern/fold of a constant expression.	2022-07-30 09:45:25 -04:00
Nuno Lopes	fffabd5348	[NFC] Switch a few uses of undef to poison as placeholders for unreachable code	2022-07-30 13:55:56 +01:00
Alexander Shaposhnikov	4220ef2be1	[InstCombine] Add fold for redundant sign bits count comparison For power-of-2 C: ((X s>> ShiftC) ^ X) u< C --> (X + C) u< (C << 1) ((X s>> ShiftC) ^ X) u> (C - 1) --> (X + C) u> ((C << 1) - 1) (https://github.com/llvm/llvm-project/issues/56479) Test plan: 0/ ninja check-llvm check-clang + bootstrap LLVM/Clang 1/ https://alive2.llvm.org/ce/z/eEUfx3 Differential revision: https://reviews.llvm.org/D130433	2022-07-30 09:06:53 +00:00
Alexander Shaposhnikov	d982f1e0c6	[InstCombine] Refactor foldICmpMulConstant This is a follow-up to `2ebfda2417` (replace "if" with "else if" since the cases nuw/nsw were meant to be handled separately). Test plan: 1/ ninja check-llvm check-clang check-lld 2/ Bootstrapped LLVM/Clang pass tests	2022-07-30 02:29:15 +00:00
Sanjay Patel	d4940c0f3d	[InstCombine] fix miscompile from urem/udiv transform with constant expression The isa<Constant> check could misfire on an instruction with 2 constant operands. This bug was introduced with `bb789381fc` (D36988). See issue #56810 for a C source example that exposed the bug.	2022-07-29 17:14:30 -04:00
Sanjay Patel	b5a9361c90	[InstCombine] canonicalize zext-and-of-bool compare to narrow and https://alive2.llvm.org/ce/z/3jYbEH We should choose one of these forms, and the option that uses the narrow type allows the motivating example from issue #56294 to reduce. In the best case (no 'not' needed and 'trunc' remains), this does remove an instruction. Note that there is what looks like a regression because there is an existing canonicalization that turns trunc into and+icmp. That is a long-standing transform, and I'm not sure what effect reversing it would have.	2022-07-29 12:02:54 -04:00
Nikita Popov	5eaeeed8cb	[InstCombine] Avoid ConstantExpr::getFNeg() calls (NFCI) Instead call the constant folding API, which can fail. For now, this should be NFC, as we still allow the creation of fneg constant expressions.	2022-07-29 16:01:46 +02:00
Francis Visoiu Mistrih	bfd3883e83	[Matrix] Refactor transpose distribution. NFC Use a function to distribute transposes. Preparation for future patches.	2022-07-28 17:30:00 -07:00
Philip Reames	82c1b136db	[LV] Don't predicate uniform mem op stores unneccessarily We already had the reasoning about uniform mem op loads; if the address is accessed at least once, we know the instruction doesn't need predicated to ensure fault safety. For stores, we do need to ensure that the values visible in memory are the same with and without predication. The easiest sub-case to check for is that all the values being stored are the same. Since we know that at least one lane is active, this tells us that the value must be visible. Warning on confusing terminology: "uniform" vs "uniform mem op" mean two different things here, and this patch is specific to the later. It would not be legal to make this same change for merely "uniform" operations. Differential Revision: https://reviews.llvm.org/D130637	2022-07-28 08:55:52 -07:00
Liqiang Tao	d52e775b05	[llvm][ModuleInliner] Add inline cost priority for module inliner This patch introduces the inline cost priority into the module inliner, which uses the same computation as InlineCost. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D130012	2022-07-28 22:44:03 +08:00
Liqiang Tao	c113594378	Revert "[llvm][ModuleInliner] Add inline cost priority for module inliner" This reverts commit `bb7f62bbbd`.	2022-07-28 22:36:28 +08:00
Liqiang Tao	bb7f62bbbd	[llvm][ModuleInliner] Add inline cost priority for module inliner This patch introduces the inline cost priority into the module inliner, which uses the same computation as InlineCost. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D130012	2022-07-28 21:28:07 +08:00
Sanjay Patel	28ad5dc3f7	[InstCombine] try harder to narrow bitwise logic with cast operands This works with any logic + extend: https://alive2.llvm.org/ce/z/vzsqQD The motivating case is from issue #56294, but that's still not optimal (it should simplify completely).	2022-07-28 07:23:22 -04:00
Paul Kirth	6e9bab71b6	Revert "[llvm][NFC] Refactor code to use ProfDataUtils" This reverts commit `300c9a7881`. We will reland once these issues are ironed out.	2022-07-27 21:38:11 +00:00
Paul Kirth	300c9a7881	[llvm][NFC] Refactor code to use ProfDataUtils In this patch we replace common code patterns with the use of utility functions for dealing with profiling metadata. There should be no change in functionality, as the existing checks should be preserved in all cases. Reviewed By: bogner, davidxl Differential Revision: https://reviews.llvm.org/D128860	2022-07-27 21:13:54 +00:00
Florian Hahn	16e0620d6d	[VPlan] Mark VPPredInstPHIRecipe as not having side-effects. Now that all uses of VPPredInstPHIRecipes are properly modeled, they can be treated as not having side-effects, enabling removal.	2022-07-27 19:29:26 +01:00
Stanislav Mekhanoshin	0562cf442f	Allow data prefetch into non-default address space I am playing with the LoopDataPrefetch pass and found out that it bails to work with a pointer in a non-zero address space. This patch adds the target callback to check if an address space is to be considered for prefetching. Default implementation still only allows address space 0, so this is NFCI. This does not currently affect any known targets, but seems to be generally useful for the future. Differential Revision: https://reviews.llvm.org/D129795	2022-07-27 10:01:26 -07:00
Sanjay Patel	e079bf6558	[AggressiveInstCombine] check sqrt operand to allow more libcall->intrinsic transforms This should fix issue #56383 (at least when compiled with -O3 because this pass is only run at -O3 currently).	2022-07-27 11:36:13 -04:00
Joseph Huber	b08369f7f2	Revert "[OpenMP] Remove noinline attributes in the device runtime" The behaviour of this patch is not great, but it has some side-effects that are required for OpenMPOpt to work. The problem is that when we use `-mlink-builtin-bitcode` we only import used symbols from the runtime. Then OpenMPOpt will insert calls to symbols that were not previously included. This patch removed this implicit behaviour as these functions were kept alive by the `noinline` simply because it kept calls to them in the module. This caused regression in some tests that relied on some OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but will try to fix it more correctly on main. This reverts commit `d61d72dae6`. Fixes #56752	2022-07-27 11:09:18 -04:00
Aaron Kogon	dd3ca65c37	Sinking or hoisting instructions between loops before fusion Instructions between two adjacent loops will be hoisted above the first loop, or sunk below the second to facilitate loop fusion. Hoisting will be attempted for an instruction that dominates the first loop. Otherwise, sinking this instructions will be attempted. Instructions with side effects will not be considered for sinking or hoisting. Hoisting/sinking of any instructions between loops will only be performed if all the instructions can be moved. As well, sinking/hoisting is considered for each instruction in isolation, without taking into account sinking/hoisting decisions for other instructions in the preheader. Differential Revision: https://reviews.llvm.org/D118076	2022-07-27 06:55:09 -04:00
Kirill Stoimenov	d6e1e0a019	[ASan] Use stack safety analysis to optimize allocas instrumentation. Added alloca optimization which was missed during the implemenation of D112098. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130503	2022-07-26 18:48:16 -07:00
Martin Sebor	4447603616	[InstCombine] Fold strtoul and strtoull and avoid PR #56293 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D129224	2022-07-26 14:11:40 -06:00
Sanjay Patel	e3205b8765	[AggressiveInstCombine] convert sqrt libcalls with "nnan" to sqrt intrinsics This is an alternate to D129155 that uses TTI.haveFastSqrt() to avoid a potential miscompile for programs with reads of errno. Moving the transform to AggressiveInstCombine provides access to TTI. If a sqrt call has "nnan", that implies that the input argument is never negative because sqrt of {negative number} --> NAN. If the argument is never negative and the call can be lowered without a libcall, then we can assume that errno accesses are unchanged after lowering, so the call can be translated to the LLVM intrinsic (which is expected to become inline code). This affects codegen for targets like x86 that have sqrt instructions, but still have to conservatively assume that a libcall may be needed to set errno as shown in issue #52620 and issue #56383. This patch won't solve those examples - we will need to extend this to use CannotBeOrderedLessThanZero or similar, enhance that analysis for new operators, and/or deal with llvm.assume too. Differential Revision: https://reviews.llvm.org/D129167	2022-07-26 15:50:14 -04:00
Francis Visoiu Mistrih	448a094d3e	[Matrix] Add assert to catch extracted vectors with poison elements Assert when the extracted vector is wider than the row/column. Differential Revision: https://reviews.llvm.org/D130173	2022-07-26 11:07:02 -07:00
Francis Visoiu Mistrih	2c6e8b4636	[Matrix] Refactor tiled loops in a struct. NFC The three loops have the same structure: index, header, latch.	2022-07-26 11:02:22 -07:00
Stefan Gränitz	1e30820483	[WinEH] Apply funclet operand bundles to nounwind intrinsics that lower to function calls in the course of IR transforms WinEHPrepare marks any function call from EH funclets as unreachable, if it's not a nounwind intrinsic or has no proper funclet bundle operand. This affects ARC intrinsics on Windows, because they are lowered to regular function calls in the PreISelIntrinsicLowering pass. It caused silent binary truncations and crashes during unwinding with the GNUstep ObjC runtime: https://github.com/gnustep/libobjc2/issues/222 This patch adds a new function `llvm::IntrinsicInst::mayLowerToFunctionCall()` that aims to collect all affected intrinsic IDs. * Clang CodeGen uses it to determine whether or not it must emit a funclet bundle operand. * PreISelIntrinsicLowering asserts that the function returns true for all ObjC runtime calls it lowers. * LLVM uses it to determine whether or not a funclet bundle operand must be propagated to inlined call sites. Reviewed By: theraven Differential Revision: https://reviews.llvm.org/D128190	2022-07-26 17:52:43 +02:00
Arthur Eubanks	2eade1dba4	[WPD] Use new llvm.public.type.test intrinsic for potentially publicly visible classes Turning on opaque pointers has uncovered an issue with WPD where we currently pattern match away `assume(type.test)` in WPD so that a later LTT doesn't resolve the type test to undef and introduce an `assume(false)`. The pattern matching can fail in cases where we transform two `assume(type.test)`s into `assume(phi(type.test.1, type.test.2))`. Currently we create `assume(type.test)` for all virtual calls that might be devirtualized. This is to support `-Wl,--lto-whole-program-visibility`. To prevent this, all virtual calls that may not be in the same LTO module instead use a new `llvm.public.type.test` intrinsic in place of the `llvm.type.test`. Then when we know if `-Wl,--lto-whole-program-visibility` is passed or not, we can either replace all `llvm.public.type.test` with `llvm.type.test`, or replace all `llvm.public.type.test` with `true`. This prevents WPD from trying to pattern match away `assume(type.test)` for public virtual calls when failing the pattern matching will result in miscompiles. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D128955	2022-07-26 08:01:08 -07:00
Phoebe Wang	19c5638e4f	[ArgPromotion] Transfer metadata nontemporal to promoted loads Fixes #56703 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D130536	2022-07-26 16:30:08 +08:00
Kazu Hirata	3f3930a451	Remove redundaunt virtual specifiers (NFC) Identified with tidy-modernize-use-override.	2022-07-25 23:00:59 -07:00
zhongyunde	d485c1b73e	[LoopDataPrefetch] Fix crash when TTI doesn't set CacheLineSize Fix https://github.com/llvm/llvm-project/issues/56681 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130418	2022-07-26 13:08:42 +08:00
Joseph Huber	d61d72dae6	[OpenMP] Remove noinline attributes in the device runtime We previously used the `noinline` attributes to specify some defintions which should be kept alive in the runtime. These were then stripped immediately in the OpenMPOpt module pass. However, Since the changes in D130298, we not explicitly state which functions will have external visiblity in the bitcode library. Additionally the OpenMPOpt module pass should run before the inliner pass, so this shouldn't make a difference in whether or not the functions will be alive for the initial pass of OpenMPOpt. This should simplify the interface, and additionally save time spend on scanning funciton names for noinline. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D130368	2022-07-25 15:44:50 -04:00
Warren Ristow	3bbd380a5b	[Reassociate][NFC] Use an appropriate dyn_cast for BinaryOperator In D129523, it was noted that there is are some questionable naked casts from Instruction to BinaryOperator, which could be addressed by doing a dyn_cast directly to BinaryOperator, avoiding the need for the later cast. This cleans up that casting. Reviewed By: nikic, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D130448	2022-07-25 10:24:43 -07:00
Kazu Hirata	95a932fb15	Remove redundaunt override specifiers (NFC) Identified with modernize-use-override.	2022-07-24 22:28:11 -07:00
Kazu Hirata	b5188591a0	[llvm] Remove redundaunt virtual specifiers (NFC) Identified with modernize-use-override.	2022-07-24 21:50:35 -07:00
Warren Ristow	3089b411a4	[Reassociate][NFC] Consistent checking for FastMathFlags suitability In D129523, it was noted that the approach to check whether a value can have FastMathFlags was done in different ways, and they should be made consistent. This patch makes minor changes to fix that. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D130408	2022-07-24 17:44:30 -07:00
Kazu Hirata	acf648b5e9	Use llvm::less_first and llvm::less_second (NFC)	2022-07-24 16:21:29 -07:00
Kazu Hirata	8ac2d06195	[IPO] Use range-based for loops (NFC)	2022-07-24 14:48:06 -07:00
Kazu Hirata	3736a498d4	[IPO] Use std::array for AccessKind2Accesses (NFC) Switching to std:array allow us to use fill. While I am at it, this patch also converts one for loop to a range-based one.	2022-07-23 15:47:53 -07:00
Fangrui Song	7225213c0a	[LegacyPM] Remove {,PostInline}EntryExitInstrumenterPass Following recent changes removing non-core features of the legacy PM/optimization pipeline.	2022-07-23 15:30:15 -07:00
Nuno Lopes	9df0b254d2	[NFC] Switch a few uses of undef to poison as placeholders for unreachable code	2022-07-23 21:50:11 +01:00
Kazu Hirata	2d2e2e7ea9	[Vectorize] Remove isConsecutiveLoadOrStore (NFC) The last use was removed on Jan 4, 2022 in commit `95a93722db`.	2022-07-23 13:01:14 -07:00
Johannes Doerfert	6b7eae11f1	[Attributor][FIX] HasBeenWrittenTo logic should only be used for reads If we look at a write, we should not enact the "has been written to" logic introduced to avoid spurious write -> read dependences. Doing so lead to elimination of stores we needed, which is obviously bad.	2022-07-22 23:57:57 -05:00
Alexander Shaposhnikov	2ebfda2417	[InstCombine] Improve folding of mul + icmp This diff adds folds for patterns like X * A < B where A, B are constants and "mul" has either "nsw" or "nuw". (to address https://github.com/llvm/llvm-project/issues/56563). Test plan: 1/ ninja check-llvm check-clang 2/ Bootstrapped LLVM/Clang pass tests Differential revision: https://reviews.llvm.org/D130039	2022-07-22 22:08:53 +00:00
Sanjay Patel	08091a99ae	Revert "[InstCombine] enhance fold for subtract-from-constant -> xor" This reverts commit `79bb915fb6`. This caused regressions because SCEV works better with sub.	2022-07-22 15:56:24 -04:00
Philip Reames	b5c7213647	[LV] Use early return to simplify code structure	2022-07-22 12:15:14 -07:00
Mircea Trofin	7b81a81d5f	[NFC] FunctionSamples::getEntrySamples -> getHeadSamplesEstimate The name `getEntrySamples` was misleading for 2 reasons. One, it's close in name to `Function::getEntryCount`, but the equivalent here is `getHeadSamples`; second, as opposed to the other get* APIs in `FunctionSamples`, it performs an estimate/heuristic rather than just retrieving raw data (or a non-heuristic derivate off that data, like `getMaxCountInside`) The new name should more clearly communicate its intent; and, being close (in name) to `getHeadSamples`, it should allow the reader discover the relation between them. Also updated the doc comments for both `getHeadSamples[Estimate]` so a reader may better understand the relation between them. Differential Revision: https://reviews.llvm.org/D130281	2022-07-22 09:17:59 -07:00
Benjamin Kramer	5a445395e4	[LV] Remove unused variable. NFC.	2022-07-22 17:43:58 +02:00
Philip Reames	d7bf81fd51	[LV] Rework widening cost of uniform memory ops for clarity [nfc] Reorganize the code to make it clear what is and isn't handle, and why. Restructure bailout to remove (false and confusing) dependence on CM_Scalarize; just return invalid cost and propagate, that's what it is for.	2022-07-22 08:35:45 -07:00
Joseph Huber	3d0ab8638b	[Internalize] Support glob patterns for API lists The internalize pass supports an option to provide a list of symbols that should not be internalized. THis is useful retaining certain defintions that should be kept alive. However, this interface is somewhat difficult to use as it requires knowing every single symbol's name and specifying it. Many APIs provide common prefixes for the symbols exported by the library, so it would make sense to be able to match these using a simple glob pattern. This patch changes the handling from a simple string comparison to a glob pattern match. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D130319	2022-07-22 08:24:32 -04:00
Johannes Doerfert	a50b9f9f1f	[Attributor][FIX] Handle non-recursive but re-entrant functions properly If a function is non-recursive we only performed intra-procedural reasoning for reachability (via AA::isPotentiallyReachable). However, if it is re-entrant that doesn't mean we can't reach. Instead of this problematic logic in the reachability reasoning we utilize logic in AAPointerInfo. If a location is for sure written by a function it can be re-entrant or recursive we know only intra-procedural reasoning is sufficient.	2022-07-22 00:00:56 -05:00
Max Kazantsev	a40af8589e	[RS4GC] Handle special cases in unreachable code for memcpy/memmov The existing code doesn't expect dummy values (undef, poison, null-derived constants etc) as arguments of these intrinsics. However, they can be there in unreached code. Currently we fail trying to find base for them. Handle these cases separately. Return null as base for them to be consistent with the handling in the main algorithm in findBaseDefiningValue. Differential Revision: https://reviews.llvm.org/D129561 Reviewed By: apilipenko	2022-07-22 11:30:43 +07:00
Johannes Doerfert	62f7888d6d	[Attributor] Dominating must-write accesses allow unknown initial values If we have a dominating must-write access we do not need to know the initial value of some object to perform reasoning about the potential values. The dominating must-write has overwritten the initial value.	2022-07-21 23:08:43 -05:00
Johannes Doerfert	c72d93a08a	[Attributor][NFC] Remove unnecessary overwritten methods	2022-07-21 21:57:02 -05:00
Chenbing Zheng	1a0187c9e7	[InstCombine] remove useless ‘InstCombiner::’. nfc Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130220	2022-07-22 09:24:24 +08:00
Philip Reames	bd75350180	[LV] Fix a conceptual mistake around meaning of uniform in isPredicatedInst This code confuses LV's "Uniform" and LVL/LAI's "Uniform". Despite the common name, these are different. * LVs notion means that only the first lane of each unrolled part is required. That is, lanes within a single unroll factor are considered uniform. This allows e.g. widenable memory ops to be considered uses of uniform computations. * LVL and LAI's notion refers to all lanes across all unrollings. IsUniformMem is in turn defined in terms of LAI's notion. Thus a UniformMemOpmeans is a memory operation with a loop invariant address. This means the same address is accessed in every iteration. The tweaked piece of code was trying to match a uniform mem op (i.e. fully loop invariant address), but instead checked for LV's notion of uniformity. In theory, this meant with UF > 1, we could speculate a load which wasn't safe to execute. This ends up being mostly silent in current code as it is nearly impossible to create the case where this difference is visible. The closest I've come in the test case from 54cb87, but even then, the incorrect result is only visible in the vplan debug output; before this change we sink the unsafely speculated load back into the user's predicate blocks before emitting IR. Both before and after IR are correct so the differences aren't "interesting". The other test changes are uninteresting. They're cases where LV's uniform analysis is slightly weaker than SCEV isLoopInvariant.	2022-07-21 15:44:34 -07:00
Alexander Shaposhnikov	e9afdf838e	[GlobalOpt] Enable evaluation of atomic loads Relax the check to allow evaluation of atomic loads (but still skip volatile loads). Test plan: 1/ ninja check-llvm check-clang 2/ Bootstrapped LLVM/Clang pass tests Differential revision: https://reviews.llvm.org/D130211	2022-07-21 21:36:11 +00:00
Augie Fackler	bd6aa67e02	BuildLibCalls: move inference of freeing memory later This probably should have been part of D123089, but the effects of it don't show up until we start removing functions from the table in D130107. Oops. Differential Revision: https://reviews.llvm.org/D130184	2022-07-21 15:31:16 -04:00
Sanjay Patel	78c09f0f24	[PatternMatch][InstCombine] match a vector with constant expression element(s) as a constant expression The InstCombine test is reduced from issue #56601. Without the more liberal match for ConstantExpr, we try to rearrange constants in Negator forever. Alternatively, we could adjust the definition of m_ImmConstant to be more conservative, but that's probably a larger patch, and I don't see any downside to changing m_ConstantExpr. We never capture and modify a ConstantExpr; transforms just want to avoid it. Differential Revision: https://reviews.llvm.org/D130286	2022-07-21 15:23:57 -04:00
David Sherwood	f15b6b2907	[AArch64] Add target hook for preferPredicateOverEpilogue This patch adds the AArch64 hook for preferPredicateOverEpilogue, which currently returns true if SVE is enabled and one of the following conditions (non-exhaustive) is met: 1. The "sve-tail-folding" option is set to "all", or 2. The "sve-tail-folding" option is set to "all+noreductions" and the loop does not contain reductions, 3. The "sve-tail-folding" option is set to "all+norecurrences" and the loop has no first-order recurrences. Currently the default option is "disabled", but this will be changed in a later patch. I've added new tests to show the options behave as expected here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D129560	2022-07-21 17:20:06 +01:00
Nikita Popov	1f69503107	[MemoryBuiltins] Add getReallocatedOperand() function (NFC) Replace the value-accepting isReallocLikeFn() overload with a getReallocatedOperand() function, which returns which operand is the one being reallocated. Currently, this is always the first one, but once allockind(realloc) is respected, the reallocated operand will be determined by the allocptr parameter attribute.	2022-07-21 14:54:16 +02:00
Nikita Popov	46e6dd84b7	[MemoryBuiltins] Remove isFreeCall() function (NFC) Remove isFreeCall() in favor of getFreedOperand(). Replace the two remaining uses with a getFreedOperand() != nullptr check, as they only care that something is getting freed. (The usage in DSE is correct as such. The allocator-related checks in CFLGraph look rather questionable in general.)	2022-07-21 14:44:23 +02:00
Nikita Popov	5e856a8578	[InstCombine] Use getFreedOperand() (NFC) Use getFreedOperand() instead of isFreeCall() to remove the implicit assumption that any pointer operand to a free function is the operand being freed. This won't actually matter until we handle allockind(free).	2022-07-21 14:33:55 +02:00
Nikita Popov	3ac8587a2b	[Attributor] Use getFreedOperand() (NFC) Track which operand is actually freed, to avoid the implicit assumption that it is the first call argument.	2022-07-21 14:26:47 +02:00
Nikita Popov	c81dff3c30	[MemoryBuiltins] Add getFreedOperand() function (NFCI) We currently assume in a number of places that free-like functions free their first argument. This is true for all hardcoded free-like functions, but with the new attribute-based design, the freed argument is supposed to be indicated by the allocptr attribute. To make sure we handle this correctly once allockind(free) is respected, add a getFreedOperand() helper which returns the freed argument, rather than just indicating whether the call frees some argument. This migrates most but not all users of isFreeCall() to the new API. The remaining users are a bit more tricky.	2022-07-21 12:39:35 +02:00
Nikita Popov	8d58c8e57b	Reapply [InstCombine] Don't check for alloc fn before fetching alloc size Reapply the patch with getObjectSize() replaced by getAllocSize(). The former will also look through calls that return their argument, and we'll end up placing dereferenceable attributes on intrinsics like llvm.launder.invariant.group. While this isn't wrong, it also doesn't seem to be particularly useful. For now, use getAllocSize() instead, which sticks closer to the original behavior of this code. ----- This code is just interested in the allocsize, not any other allocator properties.	2022-07-21 11:48:24 +02:00
Nikita Popov	70056d04e2	Revert "[InstCombine] Don't check for alloc fn before fetching object size" This reverts commit `c72c22c04d`. This affected an Analysis test that I missed. Reverting for now.	2022-07-21 10:59:12 +02:00
Nikita Popov	c72c22c04d	[InstCombine] Don't check for alloc fn before fetching object size This code is just interested in the allocsize, not any other allocator properties.	2022-07-21 10:45:03 +02:00
Nikita Popov	f45ab43332	[MemoryBuiltins] Avoid isAllocationFn() call before checking removable alloc Alloc directly checking whether a given call is a removable allocation, instead of first checking whether it is an allocation first.	2022-07-21 09:39:19 +02:00
Chenbing Zheng	8c124c9088	[InstCombine] (ShiftValC >> Y) >s -1/<s 0 --> Y != 0/==0 We can do folds (ShiftValC >> Y) >s -1 --> Y != 0 and (ShiftValC >> Y) <s 0 --> Y == 0, with ShiftValC < 0. Alive2: https://alive2.llvm.org/ce/z/-PRHfD Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D129726	2022-07-21 10:12:29 +08:00
Chenbing Zheng	8075f680c8	[InstCombine] add fold (X > C - 1) ^ (X < C + 1) --> X != C Considering the correctness of this pattern, we should avoid that C - 1 is non-negative and C + 1 is negative. Alive2: https://alive2.llvm.org/ce/z/c_rBaq Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D129622	2022-07-21 10:08:21 +08:00
Johannes Doerfert	ad98ef8be4	[Attributor] Deal with complex PHI nodes better during AAPointerInfo We were quite conservative when it came to PHI node handling to avoid recursive reasoning. Now we check more direct if we have seen a PHI already or not. This allows non-recursive PHI chains to be handled. This also exposed a bug as we did only model the effect of one loop traversal. `phi_no_store_3` has been adapted to show how we would have used `undef` instead of `1` before. With this patch we don't replace it at all, which is expected as we do not argue about loop iterations (or alignments).	2022-07-20 17:34:50 -05:00
Johannes Doerfert	142897dd7d	[Attributor] Only non-exact accesses require a uniform bit-pattern (=0) If we only have exact accesses we should never require the bit-pattern to be uniform (in this case 0). Only a non-exact access should force us to require only 0 values.	2022-07-20 17:34:50 -05:00
Alexander Shaposhnikov	67f1fe8597	[GlobalOpt] Enable evaluation of atomic stores Relax the check to allow evaluation of atomic stores (but still skip volatile stores). Test plan: 1/ ninja check-llvm check-clang 2/ Bootstrapped LLVM/Clang pass tests Differential revision: https://reviews.llvm.org/D129841	2022-07-20 22:33:58 +00:00
Schrodinger ZHU Yifan	304027206c	[ThinLTO] Support aliased GlobalIFunc Fixes https://github.com/llvm/llvm-project/issues/56290: when an ifunc is aliased in LTO, clang will attempt to create an alias summary; however, as ifunc is not included in the module summary, doing so will lead to crash. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D129009	2022-07-20 15:30:38 -07:00
Craig Topper	d76c8f5127	[InstCombine] Add mul with negated power of 2 constant to canEvaluateShifted. If we are right shifting a multiply by a negated power of 2 where the power of 2 is the same as the shift amount, we can replace with a negate followed by an And. New tests have not been committed yet but the patch shows the diffs. Let me know if you want any changes or additional tests. Differential Revision: https://reviews.llvm.org/D130103	2022-07-20 11:00:22 -07:00
Ruobing Han	2b98b8e8fb	fix bug for useless malloc elimination in CodeGenPrepare Put AllocationFn check before I->willReturn can allow CodeGenPrepare to remove useless malloc instruction Differential Revision: https://reviews.llvm.org/D130126	2022-07-20 16:29:51 +00:00
Philip Reames	523a526a02	[LV] Fix miscompile due to srem/sdiv speculation safety condition An srem or sdiv has two cases which can cause undefined behavior, not just one. The existing code did not account for this, and as a result, we miscompiled when we encountered e.g. a srem i64 %v, -1 in a conditional block. Instead of hand rolling the logic, just use the utility function which exists exactly for this purpose. Differential Revision: https://reviews.llvm.org/D130106	2022-07-20 05:35:23 -07:00
Nicolai Hähnle	1ddc51d89d	Inliner: don't mark call sites as 'nounwind' if that would be redundant When F calls G calls H, G is nounwind, and G is inlined into F, then the inlined call-site to H should be effectively nounwind so as not to lose information during inlining. If H itself is nounwind (which often happens when H is an intrinsic), we no longer mark the callsite explicitly as nounwind. Previously, there were cases where the inlined call-site of H differs from a pre-existing call-site of H in F only in the explicitly added nounwind attribute, thus preventing common subexpression elimination. v2: - just check CI->doesNotThrow v3 (resubmit after revert at `3443788087`): - update Clang tests Differential Revision: https://reviews.llvm.org/D129860	2022-07-20 14:17:23 +02:00
Florian Hahn	5124b21648	[VPlan] Initial def-use verification. This patch introduces some initial def-use verification. This catches cases like the one fixed by D129436. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D129717	2022-07-20 11:06:32 +01:00
Fangrui Song	e931c2e870	[LegacyPM] Remove InstrOrderFileLegacyPass Following recent changes removing non-core features of the legacy PM/optimization pipeline.	2022-07-19 23:58:51 -07:00
Kazu Hirata	0387da6f4f	Use value instead of getValue (NFC)	2022-07-19 21:18:26 -07:00
Kazu Hirata	41ae78ea3a	Use has_value instead of hasValue (NFC)	2022-07-19 20:15:44 -07:00

... 4 5 6 7 8 ...

31623 Commits