llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	ed2fdace89	[LV] Use separate index to access StoredValues in vectorizeInterleave. StoredValues only has entries for members of the interleave group. If there are gaps, then using the index i here will either access a wrong entry or be out-of-bounds. Instead use a dedicated index that only gets incremented for members of the interleave group. Fixes #59090.	2022-11-25 15:28:05 +00:00
Jamie Schmeiser	be1ff1fe58	[NFC] Refactor loop peeling code for calculating phi invariance. Summary: Refactor loop peeling code by moving code for calculating phi invariance into a separate class that does the calculation. Redescribe and rework the algorithm in preparation for adding increased functionality. Add test case that does not exhibit peeling that will be subsequently supported. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: mkazantsev (Max Kazantsev) Differential Revision: https://reviews.llvm.org/D138232	2022-11-25 09:07:14 -05:00
Matthias Gehre	5a1d92fa3e	[InstCombine] Update debug intrinsics when rewriting allocas	2022-11-25 08:20:54 +01:00
Fangrui Song	fa71c16455	[Inliner] Move cl::opt inside llvm::	2022-11-24 20:31:13 -08:00
Sanjay Patel	535c5d56a7	[InstCombine] ease restriction for extractelt (bitcast X) fold We were checking for a desirable integer type even when there is no shift in the transform. This is unnecessary since we are truncating directly to the destination type. This removes an extractelt in more cases and seems to make the canonicalization more uniform overall. There's still a potential difference between patterns that need a shift vs. trunc-only. I'm not sure if that is worth keeping at this point, but it can be adjusted in another step (assuming this change does not cause trouble). In the most basic case where I noticed this, we missed a fold that would have completely removed vector ops from a pattern like: https://alive2.llvm.org/ce/z/y4Qdte	2022-11-24 13:27:19 -05:00
Sanjay Patel	bf7f87e62c	[InstCombine] reduce code duplication in foldBitcastExtElt(); NFC	2022-11-24 10:16:37 -05:00
Guillaume Chatelet	e647b4f519	[reland][Alignment][NFC] Use the Align type in MCSection Differential Revision: https://reviews.llvm.org/D138653	2022-11-24 13:19:18 +00:00
Guillaume Chatelet	3467f9c7d6	Revert D138653 [Alignment][NFC] Use the Align type in MCSection" This breaks the bolt project. This reverts commit `409f0dc4a4`.	2022-11-24 12:42:30 +00:00
Guillaume Chatelet	409f0dc4a4	[Alignment][NFC] Use the Align type in MCSection Differential Revision: https://reviews.llvm.org/D138653	2022-11-24 12:32:58 +00:00
Anton Sidorenko	9ee8d2e081	[Debugify] Strip llvm.mir.debugify metadata We don't strip llvm.mir.debugify metadata in `llvm::stripDebugifyMetadata`. This may lead to incorrect number of lines and variables in the metadata when we run debugify twice, e.g. -run-pass=mir-debugify,...,mir-strip-debug,...,mir-debugify. Differential Revision: https://reviews.llvm.org/D138417	2022-11-24 12:20:21 +03:00
Fangrui Song	fa36d72305	[LoopVectorize] Internalize some cl::opt	2022-11-23 23:03:02 -08:00
Vasileios Porpodas	af4e856fa7	[NFC] Replaced BB->getInstList().{erase(),pop_front(),pop_back()} with eraseFromParent(). Differential Revision: https://reviews.llvm.org/D138617	2022-11-23 22:47:46 -08:00
Vasileios Porpodas	8b9a62ee49	[NFC] Use BB->size() instead of BB->getInstList().size(). Differential Revision: https://reviews.llvm.org/D138616	2022-11-23 17:25:53 -08:00
Matt Arsenault	f0693277c7	CloneModule: Handling cloning ifuncs This is tested in a future llvm-reduce patch.	2022-11-23 12:22:06 -05:00
Matt Arsenault	cb0d2887ab	Utils: Fix deleting calls to null in non-0 address spaces	2022-11-23 08:49:44 -05:00
Matt Devereau	ee4d6c8bf0	[VectorCombine] Enable scalarizeBinopOrCmp for scalable vectors This reverts a change to exclude scalarizeBinopOrCmp in VectorCombine for scalable vectors which caused poor scalable Binop codegen. Differential Revision: https://reviews.llvm.org/D138545	2022-11-23 13:17:21 +00:00
Benjamin Kramer	5cfc22cafe	Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes" This reverts commit `cf624b23bc`. It triggers crashes in clang, see the comments on github on the original change.	2022-11-23 13:11:16 +01:00
Stefan Gränitz	c20a80092c	[ObjC][ARC] Fix UB in ObjCARCOpt with -enable-objc-arc-opts=false When ObjCARCOpt::run() returned early, Changed and CFGChanged were never initialized. CFGChanged is read unconditionally afterwards. This came up in the course of D137942.	2022-11-23 11:30:39 +01:00
Matt Arsenault	6463961941	InstCombine: Fold some identities for canonicalize Equality is directly stated as true in the LangRef, and I believe this works for every compare type.	2022-11-22 21:42:44 -05:00
Fangrui Song	297a183022	[asan] Don't demangle __odr_asan_gen_* symbols This relands the ODR indicator part of D138095 (reverted by `06c74b5e73`): a `__odr_asan_gen_*` symbol should use a mangled name as its associated symbol does.	2022-11-22 16:47:33 -08:00
Fangrui Song	06c74b5e73	Revert D138095 Use InernalAlloc in DemangleCXXABI Broke 2/3 tests on macOS which seem to be related to `free(demangled_name)` in DemangleCXXABI.	2022-11-22 16:29:24 -08:00
Roman Lebedev	655d857325	[SROA] `isVectorPromotionViable()`: avoid allowing overly large vectors Otherwise, `compiler-rt/test/asan/TestCases/pr33372.cpp` fails with an assertion: ``` clang-16: /repositories/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:11988: void llvm::SelectionDAG::createOperands(llvm::SDNode *, ArrayRef<llvm::SDValue>): Assertion `SDNode::getMaxNumOperands() >= Vals.size() && "too many operands to fit into SDNode"' failed. ``` I'm not sure if this should be even more conservative, or if we have a named constant for this in middle-end.	2022-11-23 03:23:08 +03:00
Roman Lebedev	cf624b23bc	[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes Now, there's a big caveat here - these bytes are abstract bytes, not the i8 we have in LLVM, so strictly speaking this is not exactly legal, see e.g. https://github.com/AliveToolkit/alive2/issues/860 ^ the "bytes" "could" have been a pointer, and loading it as an integer inserts an implicit ptrtoint. But at the same time, InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()` would expand a memtransfer of 1/2/4/8 bytes into integer-typed load+store, so this isn't exactly a new problem. Note that in memory, poison is byte-wise, so we really can't widen elements, but SROA seems to be inconsistent here. Fixes #59116.	2022-11-23 02:38:25 +03:00
Sami Tolvanen	cacd3e73d7	Add generic KCFI operand bundle lowering The KCFI sanitizer emits "kcfi" operand bundles to indirect call instructions, which the LLVM back-end lowers into an architecture-specific type check with a known machine instruction sequence. Currently, KCFI operand bundle lowering is supported only on 64-bit X86 and AArch64 architectures. As a lightweight forward-edge CFI implementation that doesn't require LTO is also useful for non-Linux low-level targets on other machine architectures, add a generic KCFI operand bundle lowering pass that's only used when back-end lowering support is not available and allows -fsanitize=kcfi to be enabled in Clang on all architectures. This relands commit `eb2a57ebc7` with fixes. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D135411	2022-11-22 23:01:18 +00:00
Roman Lebedev	529eafd9be	[SROA] `isVectorPromotionViable()`: integer-ify non-pointer non-common types This rectifies a FIXME that dates all the way back to 2014 about not doing so due to the backend issues. Presumably sufficient amount of time has passes and all the known issues have been addressed, or at least we will find out of there are some left...	2022-11-23 00:23:00 +03:00
Roman Lebedev	4e18d51ac5	[SROA] `isVectorPromotionViable()`: pointer-ness is sticky As it has been established previously by precedent, if we see a pointer type, then that is the type we must use. Essentially, we don't want to introduce `inttoptr`'s.	2022-11-23 00:23:00 +03:00
Benjamin Kramer	f116107f2d	[VectorCombine] Don't touch instruction after foldSingleElementStore, it might be deleted Use after free found by asan.	2022-11-22 21:12:42 +01:00
Rong Xu	6327d263f5	[CHR] Add a threshold for the code duplication ControlHeightReduction (CHR) clones the code region to reduce the branches in the hot code path. The number of clones is linear to the depth of the region. Currently it does not have control over the code size increase. We are seeing one ~9000 BB functions get expanded to ~250000 BBs, an 25x increase. This creates a big compile time issue for the downstream optimizations. This patch adds a cap for number of clones for one region. Differential Revision: https://reviews.llvm.org/D138333	2022-11-22 11:36:40 -08:00
Matt Arsenault	afb3509113	LoopDeletion: Fix missing newlines in debug printing	2022-11-22 11:12:00 -05:00
Sanjay Patel	ede6d608f4	[VectorCombine] switch on opcode to compile faster This follows `87debdadaf` to further eliminate wasting time calling helper functions only to early return to the main run loop. Once again, this results in significant savings based on experimental data: https://llvm-compile-time-tracker.com/compare.php?from=01023bfcd33f922ed8c934ce563e54abe8bfe246&to=3dce4f70b73e48ccb045decb634c185e6b4c67d5&stat=instructions:u This is NFCI other than making the pass faster. The total cost of VectorCombine runs in an -O3 build appears to be well under 0.1% of compile-time now, so there's not much left to do AFAICT. There's a TODO about making the code cleaner, but it probably doesn't change timing much. I didn't include those changes here because it requires updating much more code.	2022-11-22 10:23:32 -05:00
Thomas Symalla	470aea5ed4	[InstCombine] Fold extractelt with select of constants An extractelt with a constant index which extracts an element from the two vector operands of a select can be directly folded into a select. extractelt (select %x, %vec1, %vec2), %const -> select %x, %vec1[%const], %vec2[%const] Note: the implementation currently only works for constant vector operands. Reviewed By: foad, spatel Differential Revision: https://reviews.llvm.org/D137934	2022-11-22 14:07:06 +01:00
David Green	8e9e22f07b	[LoopFlatten] Fix IV increment use count The add from the IV in the inner loop was always checking for 2 uses, the phi and the compare. The compare could be based on the phi though, leaving one valid use of the compare. In the testcase we could be left with the phi and a lcssa phi as the two users, invalidly allowing flattening where we shouldn't. Fixes 58441 Differential Revision: https://reviews.llvm.org/D138404	2022-11-22 07:23:56 +00:00
Max Kazantsev	57fd7ffeff	[IndVarSimplify] Lift limitations on IV being a Phi for turn-to-invariant These limitations are too strict, and their only purpose is to avoid code size explosion. These restrictions seem obsolete, and the size problem is solved in other places through cheap expansion limits. The motivation is that the old code cannot deal with comparisons against induction variant's increment. Differential Revision: https://reviews.llvm.org/D138412 Reviewed By: lebedev.ri, reames	2022-11-22 12:53:37 +07:00
Kazu Hirata	1f914944b6	Don't use Optional::getPointer (NFC) Since std::optional does not offer getPointer(), this patch replaces X.getPointer() with &*X to make the migration from llvm::Optional to std::optional easier. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716 Differential Revision: https://reviews.llvm.org/D138466	2022-11-21 19:03:40 -08:00
Fangrui Song	db7c82231c	Restore global descriptor demangling after D138095 "[asan] Keep Itanium mangled names in global metadata" This amends commit `00be3578e0` to demangle symbol names in global descriptors. We keep the mangled name for the `__odr_gen_asan_*` variables and the runtime __cxa_demangle call site change (which fixed possible leaks for other scenarios: non-fatal diagnostics). compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cpp uses an undefined weak `__cxa_demangle` which does not pull in an archive definition. A -static-libstdc++ executable link does not get demangled names. Unfortunately this means we cannot rely on runtime demangling. See compiler-rt/test/asan/TestCases/global-demangle.cpp	2022-11-21 20:51:52 +00:00
Sanjay Patel	163bb6d64e	[Passes][VectorCombine] enable early run generally and try load folds An early run of VectorCombine was added with D102496 specifically to deal with unnecessary vector ops produced with the C matrix extension. This patch is proposing to try those folds in general and add a pair of load folds to the menu. The load transform will partly solve (see PhaseOrdering diffs) a longstanding vectorization perf bug by removing redundant loads via GVN: issue #17113 The main reason for not enabling the extra pass generally in the initial patch was compile-time cost. The cost of VectorCombine was significantly (surprisingly) improved with: `87debdadaf` https://llvm-compile-time-tracker.com/compare.php?from=ffe05b8f57d97bc4340f791cb386c8d00e0739f2&to=87debdadaf18f8a5c7e5d563889e10731dc3554d&stat=instructions:u ...so the extra run is going to cost very little now - the total cost of the 2 runs should be less than the 1 run before that micro-optimization: https://llvm-compile-time-tracker.com/compare.php?from=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&to=2c4b68eab5ae969811f422714e0eba44c5f7eefb&stat=instructions:u It may be possible to reduce the cost slightly more with a few more earlier-exits like that, but it's probably in the noise based on timing experiments. Differential Revision: https://reviews.llvm.org/D138353	2022-11-21 13:57:55 -05:00
Sanjay Patel	8f337f8ffe	[VectorCombine] generalize pass param name for early combines; NFC The option was added with https://reviews.llvm.org/D102496, and currently the name is accurate, but I am hoping to add a load transform that is not a scalarization. See issue #17113.	2022-11-21 13:57:55 -05:00
Manuel Brito	1e55d5b1f2	Use poison instead of undef as placeholder for vector construction [NFC] Differential Revision: https://reviews.llvm.org/D138450	2022-11-21 18:43:23 +00:00
Alexey Bataev	ac93b61165	[SLP]Fix PR59098: check if the vector type is scalarized for extractelements. If the resulting type is going to be scalarized, no need to adjust the cost of removed extractelement and insert/extract subvector costs. Otherwise, the compiler can crash because of the wrong type sizes.	2022-11-21 10:26:01 -08:00
Max Kazantsev	2a3ac7fd0c	[NFC][IndVars] Add LLVM_DEBUG printout to replaceExitCond	2022-11-21 19:33:26 +07:00
Kazu Hirata	31b6093434	[Scalar] Teach matchExpandedRem to return std::optional (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-11-20 22:38:43 -08:00
Bjorn Pettersson	1c308d6641	[LV] Clean up LoopVectorizationCostModel::calculateRegisterUsage. NFC Minor refactoring in LoopVectorizationCostModel::calculateRegisterUsage. Also adding some FIXME:s related to what appears to be some short comings related to how the register usage is calculated. Differential Revision: https://reviews.llvm.org/D138342	2022-11-20 20:52:13 +01:00
Kazu Hirata	7524db4d44	[llvm] Remove unused forward declarations (NFC)	2022-11-20 09:59:36 -08:00
Kazu Hirata	1fa870b1bd	Use None consistently (NFC) This patch replaces NoneType() and NoneType::None with None in preparation for migration from llvm::Optional to std::optional. In the std::optional world, we are not guranteed to be able to default-construct std::nullopt_t or peek what's inside it, so neither NoneType() nor NoneType::None has a corresponding expression in the std::optional world. Once we consistently use None, we should even be able to replace the contents of llvm/include/llvm/ADT/None.h with something like: using NoneType = std::nullopt_t; inline constexpr std::nullopt_t None = std::nullopt; to ease the migration from llvm::Optional to std::optional. Differential Revision: https://reviews.llvm.org/D138376	2022-11-20 00:24:40 -08:00
Kazu Hirata	5d1ae6346b	[Analysis] Teach getOptionalIntLoopAttribute to return std::optional (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-11-19 15:36:50 -08:00
Fangrui Song	00be3578e0	[asan] Keep Itanium mangled names in global metadata The runtime calls `MaybeDemangleGlobalName` for error reporting and `__cxxabiv1::__cxa_demangle` is called if available, so demanging Itanium mangled names in global metadata is unnecessary and wastes data size. Add `MaybeDemangleGlobalName` in ODR violation detection to support demangled names in a suppressions file. `MaybeDemangleGlobalName` may call `DemangleCXXABI` and leak memory. Use an internal allocation to prevent lsan leak (in case there is no fatal asan error). The debug feature `report_globals=2` prints information for all instrumented global variables. `MaybeDemangleGlobalName` would be slow, so don't do that. The output looks like `Added Global[0x56448f092d60]: beg=0x56448fa66d60 size=4/32 name=_ZL13test_global_2` and I think the mangled name is fine. Other mangled schemes e.g. Windows (see win-string-literal.ll) remain the current behavior. Reviewed By: hctim Differential Revision: https://reviews.llvm.org/D138095	2022-11-19 01:06:26 +00:00
Sanjay Patel	87debdadaf	[VectorCombine] check instruction type before dispatching to folds This is no externally visible change intended, but appears to be a noticeable (surprising) improvement in compile-time based on: https://llvm-compile-time-tracker.com/compare.php?from=0f3e72e86c8c7c6bf0ec24bf1e2acd74b4123e7b&to=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&stat=instructions:u The early returns in the individual fold functions are not good enough to avoid the overhead of the many "fold*" calls, so this speeds up the main instruction loop enough to make a difference.	2022-11-18 16:03:18 -05:00
OCHyams	4ba08d512c	[Assignment Tracking][24/*] Always RemoveRedundantDbgInstrs in instcombine in assignment tracking builds The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir This reduces peak memory overhead by 15% when building CTMark's tramp3d-v4 with -O2 -g with assignment tracking enabled. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133321	2022-11-18 12:36:41 +00:00
OCHyams	e3cd498ff7	[Assignment Tracking][21/*] Account for assignment tracking in inliner The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir The inliner requires two additions: fixupAssignments - Update inlined instructions' DIAssignID metadata so that inlined DIAssignID attachments are unique to the inlined instance. trackInlinedStores - Treat inlined stores to caller-local variables (i.e. callee stores to argument pointers that point to the caller's allocas) as assignments. Track them using trackAssignments, which is the same method as is used by the AssignmentTrackingPass. This means that we're able to detect stale memory locations due to DSE after inlining. Because the stores are only tracked _after_ inlining, any DSE or movement of stores _before_ inlining will not be accounted for. This is an accepted limitation mentioned in the RFC. One change is also required: Update CloneBlock to preserve debug use-before-defs. Otherwise the assignments will be dropped due to having the intrinsic operands replaced with empty metadata (see use-before-def.ll in this patch and this related discourse post. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133318	2022-11-18 11:55:05 +00:00
OCHyams	86464ed3df	[Assignment Tracking][15/*] Account for assignment tracking in simplifycfg The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir Update simplifycfg: sinkLastInstruction - preserve debug use-before-defs. SpeculativelyExecuteBB - replace the value component of dbg.assign intrinsics when stores are hoisted and merged using a select, and don't delete them. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133310	2022-11-18 10:15:55 +00:00
OCHyams	fcd5098a03	[Assignment Tracking][14/*] Account for assignment tracking in instcombine The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir Most of the updates here are just to ensure DIAssignID attachments are maintained and propagated correctly. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133307	2022-11-18 09:25:33 +00:00
Chuanqi Xu	2c61848c9d	[Coroutines] Handle the writes to promise alloca prior to llvm.coro.begin Previously we've taken care of the writes to allocas prior to llvm.coro.begin. However, since the promise alloca is special so that we never handled it before. For the long time, since the programmers can't access the promise_type due to the c++ language specification, we still failed to recognize the problem until a recent report: https://github.com/llvm/llvm-project/issues/57861 And we've tested many codes that the problem gone away after we handle the writes to the promise alloca prior to @llvm.coro.begin() prope until a recent report: https://github.com/llvm/llvm-project/issues/57861 And we've tested many codes that the problem gone away after we handle the writes to the promise alloca prior to @llvm.coro.begin() properly. Closes https://github.com/llvm/llvm-project/issues/57861	2022-11-18 15:39:39 +08:00
Alexey Bataev	07015e12f0	[SLP]Fix PR59053: trying to erase instruction with users. Need to count the reduced values, vectorized in the tree but not in the top node. Such scalars still must be extracted out of the vector node instead of the original scalar.	2022-11-17 17:23:48 -08:00
Fangrui Song	6b852ffa99	[Sink] Process basic blocks with a single successor This condition seems unnecessary. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D93511	2022-11-18 01:23:12 +00:00
Fangrui Song	fc91c70593	Revert D135411 "Add generic KCFI operand bundle lowering" This reverts commit `eb2a57ebc7`. llvm/include/llvm/Transforms/Instrumentation/KCFI.h including llvm/CodeGen is a layering violation. We should use an approach where Instrumementation/ doesn't need to include CodeGen/. Sorry for not spotting this in the review.	2022-11-17 22:45:30 +00:00
Sami Tolvanen	eb2a57ebc7	Add generic KCFI operand bundle lowering The KCFI sanitizer emits "kcfi" operand bundles to indirect call instructions, which the LLVM back-end lowers into an architecture-specific type check with a known machine instruction sequence. Currently, KCFI operand bundle lowering is supported only on 64-bit X86 and AArch64 architectures. As a lightweight forward-edge CFI implementation that doesn't require LTO is also useful for non-Linux low-level targets on other machine architectures, add a generic KCFI operand bundle lowering pass that's only used when back-end lowering support is not available and allows -fsanitize=kcfi to be enabled in Clang on all architectures. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D135411	2022-11-17 21:55:00 +00:00
Mengxuan Cai	cd58333a62	[LoopInterchange] Refactor and rewrite validDepInterchange() The current code of validDepInterchange() enumerates cases that are legal for interchange. This could be simplified by checking lexicographically order of the swapped direction matrix. Reviewed By: congzhe, Meinersbur, bmahjour Differential Revision: https://reviews.llvm.org/D137461	2022-11-17 13:41:02 -05:00
Stanislav Mekhanoshin	bcaf31ec3f	[AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed. A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC. Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before. Differential Revision: https://reviews.llvm.org/D124217	2022-11-17 09:23:53 -08:00
Evgeniy Brevnov	50f8eb05af	Revert "[JT] Preserve exisiting BPI/BFI during JumpThreading" This reverts commit `52a4018506`.	2022-11-17 17:11:47 +07:00
Evgeniy Brevnov	52a4018506	[JT] Preserve exisiting BPI/BFI during JumpThreading Currently, JT creates and updates local instances of BPI\BFI. As a result global ones have to be invalidated if JT made any changes. In fact, JT doesn't use any information from BPI/BFI for the sake of the transformation itself. It only creates BPI/BFI to keep them up to date. But since it updates local copies (besides cases when it updates profile metadata) it just waste of time. Current patch is a rework of D124439. D124439 makes one step and replaces local copies with global ones retrieved through AnalysisPassManager. Here we do one more step and don't create BPI/BFI if the only reason of creation is to keep BPI/BFI up to date. Overall logic is the following. If there is cached BPI/BFI then update it along the transformations. If there is no existing BPI/BFI, then create it only if it is required to update profile metadata. Please note if BPI/BFI exists on exit from JT (either cached or created) it is always up to date and no reason to invalidate it. Differential Revision: https://reviews.llvm.org/D136827	2022-11-17 17:00:00 +07:00
Fangrui Song	12050a3fb7	[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its leader has been made available_externally. This violates the COMDAT rule that its members must be retained or discarded as a unit. To fix this, update the regular LTO change D34803 to track local linkage GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.) This fixes two problems. (a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to linger around (unreferenced, hence benign), and is now correctly discarded. ``` int foo(); inline int v = foo(); ``` (b) Fix https://github.com/llvm/llvm-project/issues/58215: as a size optimization, we place private `__profd_` in a COMDAT with a `__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change makes the `__profd_` available_externally. ``` cat > c.h <<'eof' extern void bar(); inline __attribute__((noinline)) void foo() {} eof cat > m1.cc <<'eof' #include "c.h" int main() { bar(); foo(); } eof cat > m2.cc <<'eof' #include "c.h" __attribute__((noinline)) void bar() { foo(); } eof clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw ``` If a GlobalAlias references a GlobalValue which is just changed to available_externally, change the GlobalAlias as well (e.g. C5/D5 comdats due to cc1 -mconstructor-aliases). The GlobalAlias may be referenced by other available_externally functions, so it cannot easily be removed. Depends on D137441: we use available_externally to mark a GlobalAlias in a non-prevailing COMDAT, similar to how we handle GlobalVariable/Function. GlobalAlias may refer to a ConstantExpr, not changing GlobalAlias to GlobalVariable gives flexibility for future extensions (the use case is niche. For simplicity we don't handle it yet). In addition, available_externally GlobalAlias is the most straightforward implementation and retains the aliasee information to help optimizers. See windows-vftable.ll: Windows vftable uses an alias pointing to a private constant where the alias is the COMDAT leader. The COMDAT use case is skeptical and ThinLTO does not discard the alias in the non-prevailing COMDAT. This patch retains the behavior. See new tests ctor-dtor-alias2.ll: depending on whether the complete object destructor emitted, when ctor/dtor aliases are used, we may see D0/D2 COMDATs in one TU and D0/D1/D2 in a D5 COMDAT in another TU. Allow such a mix-and-match with `if (GO->getComdat()->getName() == GO->getName()) NonPrevailingComdats.insert(GO->getComdat());` GlobalAlias handling in ThinLTO is still weird, but this patch should hopefully improve the situation for at least all cases I can think of. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D135427	2022-11-16 22:13:22 -08:00
Fangrui Song	2c239da691	Revert D135427 "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally" This reverts commit `8901635423`. This change broke the following example and we need to check `if (GO->getComdat()->getName() == GO->getName())` before `NonPrevailingComdats.insert(GO->getComdat());` Revert for clarify. ``` // a.cc template <typename T> struct A final { virtual ~A() {} }; extern "C" void aa() { A<int> a; } // b.cc template <typename T> struct A final { virtual ~A() {} }; template struct A<int>; extern "C" void bb(A<int> *a) { delete a; } clang -c -fpic -O0 -flto=thin a.cc && ld.lld -shared a.o b.o ```	2022-11-16 21:43:50 -08:00
Florian Hahn	55f56cdc33	[VPlan] Introduce VPValue::hasDefiningRecipe helper (NFC). This clarifies the intention of code that uses the helper. Suggested by @Ayal during review of D136068, thanks!	2022-11-16 23:12:40 +00:00
Florian Hahn	aa16689f82	[VPlan] Use recipe type to avoid getDefiningRecipe call (NFC). Suggested by @Ayal during review of D136068, thanks!	2022-11-16 23:03:34 +00:00
Florian Hahn	239b52d4b6	[VPlan] Update stale comment (NFC). Update comment to reflect current code, which also allows for VPScalarIVStepsRecipes to be uniform. Suggested by @Ayal during review of D136068, thanks!	2022-11-16 22:39:50 +00:00
Florian Hahn	bcc9c5d959	[LV] Replace unnecessary cast_or_null with cast (NFC). The existing code already unconditionally dereferences RepR, so cast_or_null can be replaced by just cast. Suggested by @Ayal during review of D136068, thanks!	2022-11-16 22:31:59 +00:00
Florian Hahn	32f1c5531b	[VPlan] Update VPValue::getDef to return VPRecipeBase, adjust name(NFC) The return value of getDef is guaranteed to be a VPRecipeBase and all users can also accept a VPRecipeBase *. Most users actually case to VPRecipeBase or a specific recipe before using it, so this change removes a number of redundant casts. Also rename it to getDefiningRecipe to make the name a bit clearer. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D136068	2022-11-16 22:12:08 +00:00
Alexey Bataev	9f9fdab9f1	[SLP]Fix PR58766: deleted value used after vectorization. If same instruction is reduced several times, but in one graph is part of buildvector sequence and in another it is vectorized, we may loose information that it was part of buildvector and must be extracted from later vectorized value.	2022-11-16 10:57:03 -08:00
Alexey Bataev	2f8f17c157	[SLP]Fix PR58956: fix insertpoint for reduced buildvector graphs. If the graph is only the buildvector node without main operation, need to inherit insrtpoint from the redution instruction. Otherwise the compiler crashes trying to insert instruction at the entry block.	2022-11-16 07:38:49 -08:00
OCHyams	4898568caa	[Assignment Tracking][11/*] Update RemoveRedundantDbgInstrs The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir Update the RemoveRedundantDbgInstrs utility to avoid sometimes losing information when deleting dbg.assign intrinsics. removeRedundantDbgInstrsUsingBackwardScan - treat dbg.assign intrinsics that are not linked to any instruction just like dbg.values. That is, in a block of contiguous debug intrinsics, delete all other than the last definition for a fragment. Leave linked dbg.assign intrinsics in place. removeRedundantDbgInstrsUsingForwardScan - Don't delete linked dbg.assign intrinsics and don't delete the next intrinsic found even if it would otherwise be eligible for deletion. remomveUndefDbgAssignsFromEntryBlock - Delete undef and unlinked dbg.assign intrinsics encountered in the entry block that come before non-undef non-unlinked intrinsics for the same variable. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133294	2022-11-16 12:27:18 +00:00
Paul Kirth	0cc8752fa1	Revert "[pgo] Avoid introducing relocations by using private alias" This reverts commit `2b8917f8ad`. This breaks with lld and gold	2022-11-16 03:38:14 +00:00
eopXD	c0ef83e3b9	[LSR] Check if terminating value is safe to expand before transformation According to report by @JojoR, the assertion error was hit hence we need to have this check before the actual transformation. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D136415	2022-11-15 14:56:47 -08:00
Arthur Eubanks	70dc3b811e	[AggressiveInstCombine] Remove legacy PM pass As part of legacy PM optimization pipeline removal. This shouldn't be used in codegen pipelines so it should be ok to remove. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D137116	2022-11-15 14:35:15 -08:00
Alexey Bataev	0a33ceee01	[SLP]Fix a crash on analysis of the vectorized node. Need to use advanced check for the same vectorized node to avoid possible compiler crash. We may have 2 similar nodes (vector one and gather) after graph nodes rotation, need to do extra checks for the exact match.	2022-11-15 13:40:28 -08:00
Mikhail Goncharov	4a77d96903	[LegacyPM] remove unset variables in PassManagerBuilder D137915 stopped setting this variables but NewGVN was still used and caused asan failure Differential Revision: https://reviews.llvm.org/D138034	2022-11-15 17:57:44 +01:00
Paul Kirth	2b8917f8ad	[pgo] Avoid introducing relocations by using private alias Instead of using the public, interposable symbol, we can use a private alias and avoid relocations and addends. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D137982	2022-11-15 16:05:24 +00:00
OCHyams	139e08efc5	[Assignment Tracking][23/*] Account for assignment tracking in SLP Vectorizer The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir The SLP-Vectorizer can merge a set of scalar stores into a single vectorized store. Merge DIAssignID intrinsics from the scalar stores onto the new vectorized store. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133320	2022-11-15 15:20:18 +00:00
Fraser Cormack	cfb0d628a7	[MergeICmps][NFC] Fix a couple of typos in a comment	2022-11-15 14:46:23 +00:00
OCHyams	bfa7f62412	[Assignment Tracking][20/*] Account for assignment tracking in DSE The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir DeadStoreElimmination shortens stores that are shadowed by later stores such that the overlapping part of the earlier store is omitted. Insert an unlinked dbg.assign intrinsic with a variable fragment that describes the omitted part to signal that that fragment of the variable has a stale value in memory. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133315	2022-11-15 13:42:56 +00:00
Sameer Sahasrabuddhe	376d0469b9	[AAPointerInfo] refactor how offsets and Access objects are tracked This restores commit `b756096b0c`, which was originally reverted in `00b09a7b18`. AAPointerInfo now maintains a list of all Access objects that it owns, along with the following maps: - OffsetBins: OffsetAndSize -> { Access } - InstTupleMap: RemoteI x LocalI -> Access A RemoteI is any instruction that accesses memory. RemoteI is different from LocalI if and only if LocalI is a call; then RemoteI is some instruction in the callgraph starting from LocalI. Motivation: When AAPointerInfo recomputes the offset for an instruction, it sets the value to Unknown if the new offset is not the same as the old offset. The instruction must now be moved from its current bin to the bin corresponding to the new offset. This happens for example, when: - A PHINode has operands that result in different offsets. - The same remote inst is reachable from the same local inst via different paths in the callgraph: ``` A (local inst) \| B / \ C1 C2 \ / D (remote inst) ``` This fixes a bug where a store is incorrectly eliminated in a lit test. Reviewed By: jdoerfert, ye-luo Differential Revision: https://reviews.llvm.org/D136526	2022-11-15 18:52:11 +05:30
OCHyams	98562e8bb7	[Assignment Tracking][19/*] Account for assignment tracking in ADCE The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir In an attempt to preserve more info, don't delete dbg.assign intrinsics that are considered "out of scope" if they're linked to instructions. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133314	2022-11-15 13:13:38 +00:00
OCHyams	2da67e8053	[Assignment Tracking][18/*] Account for assignment tracking in LICM The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir Merge DIAssignID attachments on stores that are merged and sunk out of loops. The store may be sunk into multiple exit blocks, and in this case all the copies of the store get the same DIAssignID. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133313	2022-11-15 12:24:16 +00:00
OCHyams	e292f91291	[Assignment Tracking][17/*] Account for assignment tracking in memcpyopt The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir Maintain and propagate DIAssignID attachments in memcpyopt. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133312	2022-11-15 11:51:10 +00:00
OCHyams	98c1d11492	[Assignment Tracking][16/*] Account for assignment tracking in mldst-motion The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir mldst-motion will merge and sink the stores in if-diamond branches into the common successor. Attach a merged DIAssignID to the merged store. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133311	2022-11-15 11:28:20 +00:00
OCHyams	0946e463e8	[Assignment Tracking][12/*] Account for assignment tracking in mem2reg The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir The changes for assignment tracking in mem2reg don't require much of a deviation from existing behaviour. dbg.assign intrinsics linked to an alloca are treated much in the same way as dbg.declare users of an alloca, except that we don't insert dbg.value intrinsics to describe assignments when there is already a dbg.assign intrinsic present, e.g. one linked to a store that is going to be removed. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133295	2022-11-15 11:11:57 +00:00
Arthur Eubanks	cbcf123af2	[LegacyPM] Remove cl::opts controlling optimization pass manager passes Move these to the new PM if they're used there. Part of removing the legacy pass manager for optimization pipeline. Reland with UseNewGVN usage in clang removed. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D137915	2022-11-14 09:38:17 -08:00
Arthur Eubanks	d7c1427953	Revert "[LegacyPM] Remove cl::opts controlling optimization pass manager passes" This reverts commit `7ec05fec71`. Breaks bots, e.g. https://lab.llvm.org/buildbot#builders/217/builds/15008	2022-11-14 09:33:38 -08:00
Arthur Eubanks	7ec05fec71	[LegacyPM] Remove cl::opts controlling optimization pass manager passes Move these to the new PM if they're used there. Part of removing the legacy pass manager for optimization pipeline. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D137915	2022-11-14 09:23:17 -08:00
Nikita Popov	6d98f3a6df	[LoopVersioningLICM] Clarify scope of AST (NFC) Make it clearer that the AST is only temporarily used during the legality check, and does not have to survive into the transformation phase.	2022-11-14 15:28:59 +01:00
Nikita Popov	47eddbbf33	[LoopVersioningLICM] Remove unnecessary reset code (NFC) The LoopVersioningLICM object is only ever used for a single loop, but there was various unnecessary code for handling the case where it is reused across loops. Drop that code, and pass the loop to the constructor.	2022-11-14 15:18:26 +01:00
HanSheng Zhang	200f3410cd	[reg2mem] Skip non-sized Instructions (PR58890) We can only convert sized values into alloca/load/store, skip instructions returning other types. Fixes https://github.com/llvm/llvm-project/issues/58890. Differential Revision: https://reviews.llvm.org/D137700	2022-11-14 12:47:39 +01:00
Nikita Popov	42d9261417	[ConstraintElimination] Use SmallVectorImpl (NFC) When passing a SmallVector by reference, don't specify its size.	2022-11-14 11:01:15 +01:00
Sebastian Neubauer	ce879a03c9	[Coroutines] Do not add allocas for retcon coroutines Same as for async-style lowering, if there are no resume points in a function, the coroutine frame pointer will be replaced by an undef, making all accesses to the frame undefinde behavior. Fix this by not adding allocas to the coroutine frame if there are no resume points. Differential Revision: https://reviews.llvm.org/D137866	2022-11-14 10:46:46 +01:00
Nikita Popov	e82b5b5bbd	[ConstraintElimination] Add Decomposition struct (NFCI) Replace the vector of DecompEntry with a struct that stores the constant offset separately. I think this is cleaner than giving the first element special handling. This probably also fixes some potential ubsan errors by more consistently using addWithOverflow/multiplyWithOverflow.	2022-11-14 10:44:16 +01:00
Nikita Popov	30982a595d	[ConstraintElimination] Make decompose() infallible decompose() currently returns a mix of {} and 0 + 1V on failure. This changes it to always return the 0 + 1V form, thus making decompose() infallible. This makes the code marginally more powerful, e.g. we now fold sub_decomp_i80 by treating the constant as a symbolic value. Differential Revision: https://reviews.llvm.org/D137847	2022-11-14 10:42:04 +01:00
Dmitry Makogon	10ab29ec6e	[IRCE] Bail out if AddRec in icmp is for another loop (PR58912) When IRCE runs on outer loop and sees a check of an AddRec of inner loop, it crashes with an assert in SCEV that the AddRec must be loop invariant. This adds a bail out if the AddRec which is checked in icmp is for another loop. Fixes https://github.com/llvm/llvm-project/issues/58912. Differential Revision: https://reviews.llvm.org/D137822	2022-11-14 15:06:13 +07:00
luxufan	98eb917939	[LoopFlatten] Forget all block and loop dispositions after flatten Method forgetLoop only forgets expression of phi or its users. SCEV expressions except the above mentioned may still has loop dispositions that point to the destroyed loop, which might cause a crash. Fixes: https://github.com/llvm/llvm-project/issues/58865 Reviewed By: nikic, fhahn Differential Revision: https://reviews.llvm.org/D137651	2022-11-14 10:19:11 +08:00
Florian Hahn	7854a1abfd	[SimpleLoopUnswitch] Forget SCEVs for replaced phis. Forget SCEVs based on exit phis in case SCEV looked through the phi. After unswitching, it may not be possible to look through the phi due to it having multiple incoming values, so it needs to be re-computed. Fixes #58868	2022-11-13 17:38:39 +00:00
Sanjay Patel	362c23500a	Revert "[InstCombine] allow more folds for multi-use selects (2nd try)" This reverts commit `6eae6b3722`. This version of the patch results in the same DFSAN bot failure as before, so my guess about the SimplifyQuery context instruction was wrong. I don't know what the real bug is.	2022-11-13 11:47:21 -05:00
Sanjay Patel	6eae6b3722	[InstCombine] allow more folds for multi-use selects (2nd try) The 1st try ( `681a6a3990` ) was reverted because it caused a DataFlowSanitizer bot failure. This try modifies the existing calls to simplifyBinOp() to not use a query that sets the context instruction because that seems like a likely source of failure. Since we already try those simplifies with multi-use patterns in some cases, that means the bug is likely present even without this patch. However, I have not been able to reduce a test to prove that this was the bug, so if we see any bot failures with this patch, then it should be reverted again. The reduced simplify power does not affect any optimizations in existing, motivating regression tests. Original commit message: The 'and' case showed up in a recent bug report and prevented more follow-on transforms from happening. We could handle more patterns (for example, the select arms simplified, but not to constant values), but this seems like a safe, conservative enhancement. The backend can convert select-of-constants to math/logic in many cases if it is profitable. There is a lot of overlapping logic for these kinds of patterns (see SimplifySelectsFeedingBinaryOp() and FoldOpIntoSelect()), so there may be some opportunity to improve efficiency. There are also optimization gaps/inconsistency because we do not call this code for all bin-opcodes (see TODO for ashr test).	2022-11-13 10:28:06 -05:00
Michał Górny	f6f1fd443f	Revert "[InstCombine] allow more folds more multi-use selects" This reverts commit `681a6a3990`. It broke sanitizer tests (as seen on buildbots), see: https://reviews.llvm.org/rG681a6a399022#1143137	2022-11-13 07:27:01 +01:00
Mengxuan Cai	ec210f3942	[LoopFuse] Ensure inner loops are in loop simplified form under new PM LoopInfo doesn't give all loops in a loop nest, it gives top level loops only. While isLoopSimplifyForm() only checkes for the outter most loop of a loop nest. As a result, inner loops that are not in simplied form can not be simplified with the original code. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D137672	2022-11-11 15:55:59 -05:00
Sanjay Patel	681a6a3990	[InstCombine] allow more folds more multi-use selects The 'and' case showed up in a recent bug report and prevented more follow-on transforms from happening. We could handle more patterns (for example, the select arms simplified, but not to constant values), but this seems like a safe, conservative enhancement. The backend can convert select-of-constants to math/logic in many cases if it is profitable. There is a lot of overlapping logic for these kinds of patterns (see SimplifySelectsFeedingBinaryOp() and FoldOpIntoSelect()), so there may be some opportunity to improve efficiency. There are also optimization gaps/inconsistency because we do not call this code for all bin-opcodes (see TODO for ashr test).	2022-11-11 15:26:54 -05:00
Jordan Rupprecht	81896f88ce	[NFC] Remove unused OrigLoopID vars	2022-11-11 07:51:40 -08:00
Florian Hahn	2d7e5e29b7	[LV] Remove unused OrigLoopID argument from completeLoopSekelton (NFC). The argument is not used any longer and can be removed.	2022-11-11 15:39:08 +00:00
Nikita Popov	4d33cf4166	[MemCpyOpt] Avoid moving lifetime marker above def (PR58903) This is unlikely to happen with opaque pointers, so just bail out of the transform, rather than trying to move bitcasts/etc as well. Fixes https://github.com/llvm/llvm-project/issues/58903.	2022-11-11 15:06:34 +01:00
Fangrui Song	8901635423	[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its leader has been made available_externally. This violates the COMDAT rule that its members must be retained or discarded as a unit. To fix this, update the regular LTO change D34803 to track local linkage GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.) This fixes two problems. (a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to linger around (unreferenced, hence benign), and is now correctly discarded. ``` int foo(); inline int v = foo(); ``` (b) Fix https://github.com/llvm/llvm-project/issues/58215: as a size optimization, we place private `__profd_` in a COMDAT with a `__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change makes the `__profd_` available_externally. ``` cat > c.h <<'eof' extern void bar(); inline __attribute__((noinline)) void foo() {} eof cat > m1.cc <<'eof' #include "c.h" int main() { bar(); foo(); } eof cat > m2.cc <<'eof' #include "c.h" __attribute__((noinline)) void bar() { foo(); } eof clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw ``` If a GlobalAlias references a GlobalValue which is just changed to available_externally, change the GlobalAlias as well (e.g. C5/D5 comdats due to cc1 -mconstructor-aliases). The GlobalAlias may be referenced by other available_externally functions, so it cannot easily be removed. Depends on D137441: we use available_externally to mark a GlobalAlias in a non-prevailing COMDAT, similar to how we handle GlobalVariable/Function. GlobalAlias may refer to a ConstantExpr, not changing GlobalAlias to GlobalVariable gives flexibility for future extensions (the use case is niche. For simplicity we don't handle it yet). In addition, available_externally GlobalAlias is the most straightforward implementation and retains the aliasee information to help optimizers. See windows-vftable.ll: Windows vftable uses an alias pointing to a private constant where the alias is the COMDAT leader. The COMDAT use case is skeptical and ThinLTO does not discard the alias in the non-prevailing COMDAT. This patch retains the behavior. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D135427	2022-11-10 21:54:43 -08:00
Alan Zhao	885e6105b4	Revert "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally" This reverts commit `89ddcff1d2`. Reason: This breaks bootstrapping builds of LLVM on Windows using ThinLTO; see https://crbug.com/1382839	2022-11-10 17:48:18 -08:00
William Huang	bd2b5ec803	[InstCombine] PR58901 - fix bug with swapping GEP of different types Fix https://github.com/llvm/llvm-project/issues/58901 by adding stricter check whether non-opaque GEP can be swapped. This will not affect GEP swapping optimization in the future since we are switching to opaque GEP Reviewed By: clin1 Differential Revision: https://reviews.llvm.org/D137752	2022-11-10 20:24:41 +00:00
Sanjay Patel	b57819e130	[VectorCombine] widen a load with subvector insert This adapts/copies code from the existing fold that allows widening of load scalar+insert. It can help in IR because it removes a shuffle, and the backend can already narrow loads if that is profitable in codegen. We might be able to consolidate more of the logic, but handling this basic pattern should be enough to make a small difference on one of the motivating examples from issue #17113. The final goal of combining loads on those patterns is not solved though. Differential Revision: https://reviews.llvm.org/D137341	2022-11-10 14:11:32 -05:00
Alexey Bataev	b505fd559d	[SLP]Redesign vectorization of the gather nodes. Gather nodes are vectorized as simply vector of the scalars instead of relying on the actual node. It leads to the fact that in some cases we may miss incorrect transformation (non-matching set of scalars is just ended as a gather node instead of possible vector/gather node). Better to rely on the actual nodes, it allows to improve stability and better detect missed cases. Differential Revision: https://reviews.llvm.org/D135174	2022-11-10 10:59:54 -08:00
Wu, Yingcong	7f07c4d513	[SanitizerCoverage] Fix wrong pointer type return from CreateSecStartEnd() `CreateSecStartEnd()` will return pointer to the input type, so when called with `CreateSecStartEnd(M, SanCovCFsSectionName, IntptrPtrTy)`, `SecStartEnd.first` and `SecStartEnd.second` will have type `IntptrPtrPtrTy`, not `IntptrPtrTy`. This problem should not impact the functionality and with opaque pointer enable, this will not trigger any alarm. But if runs with `-no-opaque-pointers`, this mismatch pointer type will cause type check assertion in `CallInst::init()` to fail. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D137310	2022-11-09 23:29:04 -08:00
wlei	47b0758049	[SampleFDO] Persist profile staleness metrics into binary With https://reviews.llvm.org/D136627, now we have the metrics for profile staleness based on profile statistics, monitoring the profile staleness in real-time can help user quickly identify performance issues. For a production scenario, the build is usually incremental and if we want the real-time metrics, we should store/cache all the old object's metrics somewhere and pull them in a post-build time. To make it more convenient, this patch add an option to persist them into the object binary, the metrics can be reported right away by decoding the binary rather than polling the previous stdout/stderrs from a cache system. For implementation, it writes the statistics first into a new metadata section(llvm.stats) then encode into a special ELF `.llvm_stats` section. The section data is formatted as a list of key/value pair so that future statistics can be easily extended. This is also under a new switch(`-persist-profile-staleness`) In terms of size overhead, the metrics are computed at module level, so the size overhead should be small, measured on one of our internal service, it costs less than < 1MB for a 10GB+ binary. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D136698	2022-11-09 22:34:33 -08:00
OCHyams	23bb4735ca	[Assignment Tracking][10/*] salvageDebugInfo for dbg.assign intrinsics The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir Plumb in salvaging for the address part of dbg.assign intrinsics. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133293	2022-11-09 11:49:46 +00:00
Nikita Popov	ce2f9ba2c9	[SCCP] Add helper for getting constant range (NFC) Add a helper for the recurring pattern of getting a constant range if the value lattice element is one, or a full range otherwise.	2022-11-09 12:42:36 +01:00
OCHyams	a9025f57ba	[Assignment Tracking][8/] Add DIAssignID merging utilities The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir Add method: Instruction::mergeDIAssignID( ArrayRef<const Instruction > SourceInstructions) which merges the DIAssignID metadata attachments on `SourceInstructions` and `this` and replaces uses of the original IDs with the new shared one. This is used when stores are merged, for example sinking stores out of a if-diamond CFG or vectorizing contiguous stores. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133291	2022-11-09 10:46:04 +00:00
Akira Hatanaka	295861514e	[ObjC][ARC] Fix non-deterministic behavior in ProvenanceAnalysis ProvenanceAnalysis::relatedCheck was giving different answers depending on the order in which the pointers were passed. Specifically, it was returning different values when A and B were both loads and were both referring to identifiable objects, but only one was used by a store instruction.	2022-11-08 15:05:25 -08:00
OCHyams	fd16ff3a7e	Reapply: [NFC] Move getDebugValueLoc from static in Local.cpp to DebugInfo.h Reverted in `b22d80dc6a`. Move getDebugValueLoc so that it can be accessed from DebugInfo.h for the Assignment Tracking patch stack and remove redundant parameter Src. Reviewed By: jryans Differential Revision: https://reviews.llvm.org/D132357	2022-11-08 16:25:39 +00:00
Alexey Bataev	b5d91ab73e	[SLP]Fix PR58863: Mask index beyond mask size for non-power-2 insertelement analysis. Need to check if the insertelement mask size is reached during cost analysis to avoid compiler crash. Differential Revision: https://reviews.llvm.org/D137639	2022-11-08 07:54:57 -08:00
skc7	42bce72536	Reapply "[SLP] Extend reordering data of tree entry to support PHInodes". Reapplies `87a2086` (which was reverted in `656f1d8`). Fix for scalable vectors in getInsertIndex merged in `46d53f4`. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D137537	2022-11-08 21:21:28 +05:30
Nathan James	6aa050a690	Reland "[llvm][NFC] Use c++17 style variable type traits" This reverts commit `632a389f96`. This relands commit `1834a310d0`. Differential Revision: https://reviews.llvm.org/D137493	2022-11-08 14:15:15 +00:00
Nathan James	632a389f96	Revert "[llvm][NFC] Use c++17 style variable type traits" This reverts commit `1834a310d0`.	2022-11-08 13:11:41 +00:00
skc7	46d53f45d8	[SLP][NFC] Restructure getInsertIndex Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D137567	2022-11-08 18:07:50 +05:30
Nathan James	1834a310d0	[llvm][NFC] Use c++17 style variable type traits This was done as a test for D137302 and it makes sense to push these changes Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D137493	2022-11-08 12:22:52 +00:00
Dmitry Makogon	ebac59999f	[SimpleLoopUnswitch] Skip trivial selects in guards conditions unswitch candidates We do this for conditional branches, but not for guards for some reason. Fixes pr58666. Differential Revision: https://reviews.llvm.org/D137249	2022-11-08 13:29:27 +07:00
Matt Arsenault	e661185fb3	InstCombine: Fold fdiv nnan x, 0 -> copysign(inf, x) https://alive2.llvm.org/ce/z/gLBFKB	2022-11-07 22:00:15 -08:00
skc7	9d96feb19b	[SLP][NFC] Restructure areTwoInsertFromSameBuildVector Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D137569	2022-11-08 09:32:19 +05:30
Shubham Sandeep Rastogi	b22d80dc6a	Revert "[NFC] Move getDebugValueLoc from static in Local.cpp to DebugInfo.h" This reverts commit `80378a4ca7`. I am reverting this patch because I need to revert `171f7024cc` and without reverting this patch, reverting `171f7024cc` causes conflicts. Patch `171f7024cc` introduced a cyclic dependancy in the module build. https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/48197/consoleFull#-69937453049ba4694-19c4-4d7e-bec5-911270d8a58c In file included from <module-includes>:1: /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/IR/Argument.h:18:10: fatal error: cyclic dependency in module 'LLVM_IR': LLVM_IR -> LLVM_intrinsic_gen -> LLVM_IR ^ While building module 'LLVM_MC' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14: While building module 'LLVM_IR' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCPseudoProbe.h:57: In file included from <module-includes>:12: /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/IR/DebugInfo.h:24:10: fatal error: could not build module 'LLVM_intrinsic_gen' ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ While building module 'LLVM_MC' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14: In file included from <module-includes>:15: In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCContext.h:23: /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCPseudoProbe.h:57:10: fatal error: could not build module 'LLVM_IR' ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~ /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14:10: fatal error: could not build module 'LLVM_MC' ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ 4 errors generated.	2022-11-07 15:19:04 -08:00
Fangrui Song	89ddcff1d2	[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its leader has been made available_externally. This violates the COMDAT rule that its members must be retained or discarded as a unit. To fix this, update the regular LTO change D34803 to track local linkage GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.) This fixes two problems. (a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to linger around (unreferenced, hence benign), and is now correctly discarded. ``` int foo(); inline int v = foo(); ``` (b) Fix https://github.com/llvm/llvm-project/issues/58215: as a size optimization, we place private `__profd_` in a COMDAT with a `__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change makes the `__profd_` available_externally. ``` cat > c.h <<'eof' extern void bar(); inline __attribute__((noinline)) void foo() {} eof cat > m1.cc <<'eof' #include "c.h" int main() { bar(); foo(); } eof cat > m2.cc <<'eof' #include "c.h" __attribute__((noinline)) void bar() { foo(); } eof clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw ``` If a GlobalAlias references a GlobalValue which is just changed to available_externally, change the GlobalAlias as well (e.g. C5/D5 comdats due to cc1 -mconstructor-aliases). The GlobalAlias may be referenced by other available_externally functions, so it cannot easily be removed. Depends on D137441: we use available_externally to mark a GlobalAlias in a non-prevailing COMDAT, similar to how we handle GlobalVariable/Function. GlobalAlias may refer to a ConstantExpr, not changing GlobalAlias to GlobalVariable gives flexibility for future extensions (the use case is niche. For simplicity we don't handle it yet). In addition, available_externally GlobalAlias is the most straightforward implementation and retains the aliasee information to help optimizers. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D135427	2022-11-07 10:07:10 -08:00
Miguel Saldivar	de36d39e24	[InstCombine] Avoid passing pow attributes to sqrt As described in issue #58475, we could pass the attributes of pow to sqrt and crash. Differential Revision: https://reviews.llvm.org/D137454	2022-11-07 12:07:37 -05:00
Alexey Bataev	ecd0b5a532	Revert "[SLP]Redesign vectorization of the gather nodes." This reverts commit `8ddd1ccdf8` to fix buildbots failures reported in https://lab.llvm.org/buildbot#builders/74/builds/14839	2022-11-07 08:35:21 -08:00
Matt Devereau	a8c24d57b8	[InstCombine] Remove redundant splats in InstCombineVectorOps Splatting the first vector element of the result of a BinOp, where any of the BinOp's operands are the result of a first vector element splat can be simplified to splatting the first vector element of the result of the BinOp Differential Revision: https://reviews.llvm.org/D135876	2022-11-07 15:39:05 +00:00
Matt Arsenault	0f68ffe1e2	InstCombine: Fold compare with smallest normal if input denormals are flushed Try to simplify comparisons with the smallest normalized value. If denormals will be treated as 0, we can simplify by using an equality comparison with 0. fcmp olt fabs(x), smallest_normalized_number -> fcmp oeq x, 0.0 fcmp ult fabs(x), smallest_normalized_number -> fcmp ueq x, 0.0 fcmp oge fabs(x), smallest_normalized_number -> fcmp one x, 0.0 fcmp ult fabs(x), smallest_normalized_number -> fcmp ueq x, 0.0 The device libraries have a few range checks that look like this for denormal handling paths.	2022-11-07 07:16:47 -08:00
OCHyams	80378a4ca7	[NFC] Move getDebugValueLoc from static in Local.cpp to DebugInfo.h Move getDebugValueLoc so that it can be accessed from DebugInfo.h for the Assignment Tracking patch stack and remove redundant parameter Src. Reviewed By: jryans Differential Revision: https://reviews.llvm.org/D132357	2022-11-07 15:14:43 +00:00
Alexey Bataev	8ddd1ccdf8	[SLP]Redesign vectorization of the gather nodes. Gather nodes are vectorized as simply vector of the scalars instead of relying on the actual node. It leads to the fact that in some cases we may miss incorrect transformation (non-matching set of scalars is just ended as a gather node instead of possible vector/gather node). Better to rely on the actual nodes, it allows to improve stability and better detect missed cases. Differential Revision: https://reviews.llvm.org/D135174	2022-11-07 07:04:38 -08:00
Nikita Popov	9a45e4beed	[MemCpyOpt] Move lifetime marker before call to enable call slot optimization Currently call slot optimization may be prevented because the lifetime markers for the destination only start after the call. In this case, rather than aborting the transform, we should move the lifetime.start before the call to enable the transform. Differential Revision: https://reviews.llvm.org/D135886	2022-11-07 15:26:00 +01:00
luxufan	49143f9d14	[IndVars] Forget the SCEV when the instruction has been sunk. In the past, the SCEV expression of the sunk instruction was not forgetted. This led to the incorrect block dispositions after the instruction be sunk. Fixes https://github.com/llvm/llvm-project/issues/58662 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D137060	2022-11-06 22:46:49 +08:00
Sanjay Patel	bff6880a5f	[SimplifyLibCalls] improve code readability for AttributeList propagation; NFC It is possible that we can do better on some of these transforms by passing some subset of attributes, but we were not doing that in any of the changed code. So it's better to give that a name to indicate we're clearing attributes or make that more obvious by using the default-constructed empty list.	2022-11-06 09:07:17 -05:00
Sanjay Patel	1c6ebe29d3	[InstCombine] reduce multi-use casts+masks As noted in the code comment, we could generalize this: https://alive2.llvm.org/ce/z/N5m-eZ It saves an instruction even without a constant operand, but the 'and' is wider. We can do that as another step if it doesn't harm anything. I noticed that this missing pattern with a constant operand inhibited other transforms in a recent bug report, so this is enough to solve that case.	2022-11-06 09:07:17 -05:00
David Green	656f1d8b74	Revert "[SLP] Extend reordering data of tree entry to support PHI nodes" This reverts commit `87a20868eb` as it has problems with scalable vectors and use-list orders. Test to follow.	2022-11-06 11:43:51 +00:00
Florian Hahn	a41cb8bf58	[SimpleLoopUnswitch] Forget block & loop dispos during trivial unswitch. Unswitching adjusts the CFG in ways that may invalidate cached loop dispositions. Clear all cached block and loop dispositions during trivial unswitching. The same is already done for non-trivial unswitching. Fixes #58751.	2022-11-05 16:56:06 +00:00
chenglin.bi	6703290361	[InstCombine] fold `sub + and` pattern with specific const value `C1 - ((C3 - X) & C2) --> (X & C2) + (C1 - (C2 & C3))` when: (C3 - ((C2 & C3) - 1)) is pow2 && ((C2 + C3) & ((C2 & C3) - 1)) == ((C2 & C3) - 1) && C2 is negative pow2 \|\| (C3 - X) is nuw https://alive2.llvm.org/ce/z/HXQJV- Fix: #58523 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D136582	2022-11-05 12:58:45 +08:00
Matthew Voss	a4b543a5a5	[llvm-profdata] Check for all duplicate entries in MemOpSize table Previously, we only checked for duplicate zero entries when merging a MemOPSize table (see D92074), but a user recently provided a reproducer demonstrating that other entries can also be duplicated. As demonstrated by the test in this patch, PGOMemOPSizeOpt can potentially generate invalid IR for non-zero, non-consecutive duplicate entries. This seems to be a rare case, since the duplicate entry is often below the threshold, but possible. This patch extends the existing warning to check for any duplicate values in the table, both in the optimization and in llvm-profdata. Differential Revision: https://reviews.llvm.org/D136211	2022-11-04 17:08:54 -07:00
Florian Hahn	9a456b7ad3	[IndVars] Forget SCEV for replaced PHI. Additional SCEV verification highlighted a case where the cached loop dispositions where incorrect after simplifying a phi node in IndVars. Fix it by invalidating the phi before replacing it. Fixes #58750	2022-11-04 18:42:07 +00:00
Sanjay Patel	710e34e136	[VectorCombine] move load safety checks to helper function; NFC These checks can be re-used with other potential transforms such as a load of a subvector-insert.	2022-11-04 10:39:37 -04:00
Juan Manuel MARTINEZ CAAMAÑO	96ad51e3eb	[StructurizeCFG][DebugInfo] Avoid use-after-free Reviewed By: dstuttard Differential Revision: https://reviews.llvm.org/D137408	2022-11-04 13:39:49 +00:00
Alex Gatea	7d0648cb6c	[GVN] Patch for invalid GVN replacement If PRE is performed as part of the main GVN pass (to PRE GEP operands before processing loads), and it is performed across a backedge, we will end up adding the new instruction to the leader table of a block that has not yet been processed. When it will be processed, GVN will incorrectly assume that the value is already available, even though it is only available at the end of the block. Avoid this by not performing PRE across backedges. Fixes https://github.com/llvm/llvm-project/issues/58418. Differential Revision: https://reviews.llvm.org/D136095	2022-11-04 14:28:17 +01:00
Nikita Popov	304f1d59ca	[IR] Switch everything to use memory attribute This switches everything to use the memory attribute proposed in https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579. The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly attributes are dropped. The readnone, readonly and writeonly attributes are restricted to parameters only. The old attributes are auto-upgraded both in bitcode and IR. The bitcode upgrade is a policy requirement that has to be retained indefinitely. The IR upgrade is mainly there so it's not necessary to update all tests using memory attributes in this patch, which is already large enough. We could drop that part after migrating tests, or retain it longer term, to make it easier to import IR from older LLVM versions. High-level Function/CallBase APIs like doesNotAccessMemory() or setDoesNotAccessMemory() are mapped transparently to the memory attribute. Code that directly manipulates attributes (e.g. via AttributeList) on the other hand needs to switch to working with the memory attribute instead. Differential Revision: https://reviews.llvm.org/D135780	2022-11-04 10:21:38 +01:00
Nikita Popov	01ec0ff2dc	[SimplifyCFG] Allow speculating block containing assume() SpeculativelyExecuteBB(), which converts a branch + phi structure into a select, currently bails out if the block contains an assume (because it is not speculatable). Adjust the fold to ignore ephemeral values (i.e. assumes and values only used in assumes) for cost modelling purposes, and drop them when performing the fold. Theoretically, we could try to preserve the assume information by generating a assume(br_cond \|\| assume_cond) style assume, but this is very unlikely to to be useful (because we don't do anything useful with assumes of this form) and it would make things substantially more complicated once we take operand bundle assumes into account (which don't really support a \|\| operation). I'd prefer not to do that without good motivation. Differential Revision: https://reviews.llvm.org/D137339	2022-11-04 09:26:35 +01:00
Congzhe Cao	75b33d6bd5	[LoopInterchange] Check phis in all subloops This is the bugfix to the miscompile mentioned in https://reviews.llvm.org/D132055#3814831. The IR that reproduced the bug is added as the test case in this patch. What this patch does is that, during legality phase instead of checking the phi nodes only in `InnerLoop` and `OuterLoop`, we check phi nodes in all subloops of the `OuterLoop`. Suppose if the loop nest is triply nested, and `InnerLoop` and `OuterLoop` is the middle loop and the outermost loop respectively, we'll check phi nodes in the innermost loop as well, in addition to the ones in the middle and outermost loops. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D134930	2022-11-04 00:20:52 -04:00

1 2 3 4 5 ...

32159 Commits