llvm-project

Commit Graph

Author	SHA1	Message	Date
Matthias Braun	6a383369f9	PGOInstrumentation, GCOVProfiling: Split indirectbr critical edges regardless of PHIs The `SplitIndirectBrCriticalEdges` function was originally designed for `CodeGenPrepare` and skipped splitting of edges when the destination block didn't contain any `PHI` instructions. This only makes sense when reducing COPYs like `CodeGenPrepare`. In the case of `PGOInstrumentation` or `GCOVProfiling` it would result in missed counters and wrong result in functions with computed goto. Differential Revision: https://reviews.llvm.org/D120096	2022-02-23 16:27:37 -08:00
Bill Wendling	a5bbc6ef99	[NFC] Remove unnecessary "#include"s from header files	2022-02-23 01:20:48 -08:00
Nikita Popov	f8d7210032	[GlobalStatus] Keep Visited set in isSafeToDestroyConstant() Constants cannot be cyclic, but they can be tree-like. Keep a visited set to ensure we do not degenerate to exponential run-time. This fixes the problem reported in https://reviews.llvm.org/D117223#3335482, though I haven't been able to construct a concise test case for the issue. This requires a combination of dead constants and the kind of constant expression tree that textual IR cannot represent (because the textual representation, unlike the in-memory representation, is also exponential in size).	2022-02-22 10:02:37 +01:00
Arthur Eubanks	053c2a0020	[SimplifyCFG][OpaquePtr] Check store type when merging conditional store	2022-02-20 11:29:54 -08:00
Arthur Eubanks	129af4daa7	[SCEVExpander][OpaquePtr] Check GEP source type when finding identical GEP Fixes an opaque pointers miscompile. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D120004	2022-02-17 08:48:11 -08:00
Nikita Popov	36fdfaba19	[RelLookupTableConverter] Ensure that GV, GEP and load types match This code could be generalized to be type-independent, but for now just ensure that the same type constraints are enforced with opaque pointers as with typed pointers.	2022-02-17 12:05:05 +01:00
Roman Lebedev	371fcb720e	[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in https://github.com/llvm/llvm-project/issues/53853 and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574 This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes https://github.com/llvm/llvm-project/issues/53853 Refs. https://github.com/rust-lang/rust/issues/85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854	2022-02-17 12:13:55 +03:00
Florian Mayer	c195addb60	[NFC] [MTE] [HWASan] Remove unnecessary member of AllocaInfo Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119981	2022-02-16 15:19:30 -08:00
Nikita Popov	c9032f1a69	[LowerMemIntrinsics] Explicitly use i8 type in memmove lowering By convention, memcpy/memmove intrinsics are always used with i8 pointers (though this is not enforced), so in practice this code was always using an i8 type. Make that explicit. Of course, i8 is not a very profitable choice, and this code could be more performant by picking an appropriate larger type. But that would require additional test coverage and correctness review, and certainly shouldn't be a decision based on the pointer element type.	2022-02-16 16:31:55 +01:00
Max Kazantsev	bfc1217119	[NFC] Introduce option to switch off compatible invokes merge Does not affect default behavior (transform is on).	2022-02-15 21:51:03 +07:00
Florian Mayer	8de457eafc	[HWASAN] use common alignAndPadAlloca Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119614	2022-02-14 15:28:32 -08:00
Florian Mayer	205308de6b	[NFC] [MTE] Move alignAndPadAlloca to MemoryTaggingSupport. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119610	2022-02-14 14:54:04 -08:00
Florian Mayer	6759cdd829	[NFC] [MTE] Use helpers for stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119503	2022-02-11 16:01:46 -08:00
Florian Mayer	bf2f72fa10	[hwasan] keep debug intrinsicts in AllocaInfo. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119498	2022-02-11 16:01:02 -08:00
Florian Mayer	26dbc47468	Revert "[hwasan] keep debug intrinsicts in AllocaInfo." This reverts commit `19fdf85f58`.	2022-02-11 14:41:24 -08:00
Florian Mayer	b1bd64aeee	Revert "[NFC] [MTE] Use helpers for stack tagging." This reverts commit `8f0e5b4e26`.	2022-02-11 14:41:24 -08:00
Florian Mayer	8f0e5b4e26	[NFC] [MTE] Use helpers for stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119503	2022-02-11 10:59:09 -08:00
Florian Mayer	19fdf85f58	[hwasan] keep debug intrinsicts in AllocaInfo. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119498	2022-02-11 10:56:53 -08:00
Florian Mayer	e7356fb3e2	[nfc] [hwasan] factor out logic to collect info about stack this is the first step in unifying some of the logic between hwasan and mte stack tagging. this only moves around code, changes to converge different implementations of the same logic follow later. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118947	2022-02-11 10:54:12 -08:00
Sameer Sahasrabuddhe	d8f99bb6e0	[AMDGPU] replace hostcall module flag with function attribute The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph. If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument. The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument. Reviewed By: jdoerfert, arsenm, kpyzhov Differential Revision: https://reviews.llvm.org/D119216	2022-02-11 22:51:56 +05:30
Philip Reames	5ba115031d	[PSE] Remove assumption that top level predicate is union from public interface [NFC] Note that this doesn't actually cause the top level predicate to become a non-union just yet. The above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.	2022-02-10 16:14:52 -08:00
Philip Reames	d39f4ac494	[SCEV] Unwind SCEVUnionPredicate from getPredicatedBackedgeTakenCount [NFC] For those curious, the whole reason for tracking the predicate set seperately as opposed to just immediately registering the dependencies appears to be allowing the printing code to print a result without changing the PSE state. It's slightly questionable if this justifies the complexity, but since we can preserve it with local ugliness, I did so.	2022-02-09 12:55:40 -08:00
Roman Lebedev	c8ba2b67a0	[SimplifyCFG] 'merge compatible invokes': fully support indirect invokes As long as all the invokes in the set are indirect, we can merge them, but don't merge direct invokes into the set, even though it would be legal to do.	2022-02-08 21:29:38 +03:00
Roman Lebedev	414b47645d	[SimplifyCFG] 'merge compatible invokes': don't create trivial PHI's with all-identical incoming values	2022-02-08 21:29:38 +03:00
Philip Reames	c302f1e677	[SCEV] Generalize SCEVEqualsPredicate to any compare [NFC] PredicatedScalarEvolution has a predicate type for representing A == B. This change generalizes it into something which can represent a A <pred> B. This generality is currently unused, but is motivated by a couple of recent cases which have come up. In particular, I'm currently playing around with using this to simplify the runtime checking code in LoopVectorizer. Regardless of the outcome of that prototyping, generalizing the compare node seemed useful.	2022-02-08 08:18:09 -08:00
Nikita Popov	074561a4a2	[Mem2Reg] Check that load type matches alloca type Alloca promotion can only deal with cases where the load/store types match the alloca type (it explicitly does not support bitcasted load/stores). With opaque pointers this is no longer enforced through the pointer type, so add an explicit check.	2022-02-08 17:16:15 +01:00
Roman Lebedev	42ca7cc889	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ uses If the original invokes had uses, the uses must have been in PHI's, but that immediately results in the incoming values being incompatible. But we'll replace uses of the original invokes with the use of the merged invoke, so as long as the incoming values become compatible after that, we can merge.	2022-02-08 17:49:38 +03:00
Roman Lebedev	9986d60224	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ PHIs but no uses As long as the incoming values for all the invokes in the set are identical, we can merge the invokes.	2022-02-08 17:49:38 +03:00
Roman Lebedev	8411560fd0	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ no uses, no PHI's Even if the invokes have normal destination, iff it's the same block, we can merge them. For now, require that there are no PHI nodes, and the returned values of invokes aren't used.	2022-02-08 17:49:38 +03:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Kazu Hirata	a1a8d10a17	[Transforms] Use default member initialization in LibCallSimplifier (NFC)	2022-02-06 16:36:27 -08:00
Kazu Hirata	3fce5bb7b0	[Transforms] Use default member initialization in LoopVersioning (NFC)	2022-02-06 16:36:25 -08:00
Kazu Hirata	2d650ee03e	[Transforms] Use default member initialization in SCEVFindUnsafe (NFC)	2022-02-05 21:39:27 -08:00
Kazu Hirata	e24384b506	[Transforms] Use default member initialization in SimplifyIndvar (NFC)	2022-02-05 16:29:22 -08:00
Bill Wendling	c6f0940d99	[NFC] Remove unnecessary #includes An attempt to reduce the number of files that are recompiled due to a change. Differential Revision: https://reviews.llvm.org/D119055	2022-02-04 21:22:41 -08:00
Hongtao Yu	dee058c670	[CSSPGO] Turn on ext-tsp by default for CSSPGO. I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119048	2022-02-04 19:46:44 -08:00
Roman Lebedev	18ff1ec3c3	Reland [SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable` As per LangRef's definition of `noreturn` attribute: ``` noreturn This function attribute indicates that the function never returns normally, hence through a return instruction. This produces undefined behavior at runtime if the function ever does dynamically return. nnotated functions may still raise an exception, i.a., nounwind is not implied. ``` So if we `invoke` a `noreturn` function, and the normal destination of an invoke is not an `unreachable`, point it at the new `unreachable` block. The change/fix from the original commit is that we now actually create the new block, and don't just repurpose the original block, because said normal destination block could have other users. This reverts commit `db1176ce66`, relanding commit `598833c987`.	2022-02-05 02:58:19 +03:00
Roman Lebedev	db1176ce66	Revert "[SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable`" The normal destination may have other uses. This reverts commit `598833c987`.	2022-02-05 02:30:20 +03:00
Roman Lebedev	598833c987	[SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable` As per LangRef's definition of `noreturn` attribute: ``` noreturn This function attribute indicates that the function never returns normally, hence through a return instruction. This produces undefined behavior at runtime if the function ever does dynamically return. nnotated functions may still raise an exception, i.a., nounwind is not implied. ```	2022-02-05 02:15:07 +03:00
Roman Lebedev	55cd727c9a	[SimplifyCFG] 'merge compatible invokes': allow PHI nodes in landing pads ... iff the incoming values for the invokes-to-be-merged are compatible (identical).	2022-02-04 20:26:44 +03:00
Roman Lebedev	0d384e9228	[NFC][SimplifyCFG] Extract `IncomingValuesAreCompatible()` out of `SafeToMergeTerminators()`	2022-02-04 20:26:44 +03:00
Roman Lebedev	36df803dfd	[SimplifyCFG] Merge compatible `invoke`s of a `landingpad` While nowadays SimplifyCFG knows how to hoist code from then-else blocks, sink code from unconditional predecessors, and even promote the latter by tail-merging `ret`/`resume` function terminators, that isn't everything. While i (& others) have been trying to deal with merging/sinking `unreachable`, apparently perhaps the more impactful remaining problem is merging the `throw` calls. If we start at the `landingpad`, all the predecessors are unwind edges of `invoke`s, and in some cases some of the `invoke`s are mergeable. ``` /// This is a weird mix of hoisting and sinking. Visually, it goes from: /// [...] [...] /// \| \| /// [invoke0] [invoke1] /// / \ / \ /// [cont0] [landingpad] [cont1] /// to: /// [...] [...] /// \ / /// [invoke] /// / \ /// [cont] [landingpad] ``` This simplifies the IR/CFG, at the cost of debug info and extra PHI nodes. Note that we don't require for all the `invokes` of the `landingpad` to be mergeable, they can form more than a single set, we gracefully handle that. For now, i completely disallowed normal destination, PHI nodes and indirect invokes but that can be supported. Out of all the CTMark projects, only 7zip is C++, so there isn't much impact: https://llvm-compile-time-tracker.com/compare.php?from=ba8eb31bd9542828f6424e15a3014f80f14522c8&to=722fc871c84f14157d45c2159bc9c8c7e2825785&stat=size-total ... but there it currently causes size-total decrease. Differential Revision: https://reviews.llvm.org/D117805	2022-02-04 17:04:21 +03:00
Simon Pilgrim	6b4ebdd46f	ModuleUtils - VFABI::setVectorVariantNames - use ArrayRef<> instead of const SmallVector to pass argument	2022-02-03 12:11:48 +00:00
Roman Lebedev	ee4ba9f3a1	Revert "[SimplifyCFG] Start redesigning `FoldTwoEntryPHINode()`." Unfortunately, it seems we really do need to take the long route; start from the "merge" block, find (all the) "dispatch" blocks, and deal with each "dispatch" block separately, instead of simply starting from each "dispatch" block like it would logically make sense, otherwise we run into a number of other missing folds around `switch` formation, missing sinking/hoisting and phase ordering. This reverts commit `85628ce75b`. This reverts commit `c5fff90953`. This reverts commit `34a98e1046`. This reverts commit `1e353f0922`.	2022-02-03 12:32:50 +03:00
Florian Mayer	fa75a62cb5	[NFC] pull retvec logic to MemoryTaggingSupport. we will also need this for aarch64 stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118852	2022-02-02 16:05:52 -08:00
Fangrui Song	85628ce75b	[SimplifyCFG] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds	2022-02-02 15:11:22 -08:00
Florian Mayer	f7a6c341cb	[mte] support more complicated lifetimes (e.g. for exceptions). Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118848	2022-02-02 14:39:22 -08:00
Florian Mayer	712b31e2d4	[NFC] factor isStandardLifetime out of HWASan this is so we can use it for aarch64 stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118836	2022-02-02 13:23:55 -08:00
Roman Lebedev	c5fff90953	[NFC][SimplifyCFG] Merge `FoldTwoEntryPHINode()` into it's only callee	2022-02-02 17:53:56 +03:00
Roman Lebedev	34a98e1046	[NFC][SimplifyCFG] `FoldTwoEntryPHINode()`: s/BB/MergeBB/	2022-02-02 17:53:56 +03:00
Roman Lebedev	1e353f0922	[SimplifyCFG] Start redesigning `FoldTwoEntryPHINode()`. The current `FoldTwoEntryPHINode()` is not quite designed correctly. It starts from the merge point, and then tries to detect the 'divergence' point. Because of that, it is limited to the simple two-predecessor case, where the PHI completely goes away. but that is rather pessimistic, and it doesn't make much sense from the costmodel side of things. For example if there is some other unrelated predecessor of the merge point, we could split the merge point so that the then/else blocks first branch to an empty block and then to the merge point, and then we'd be able to speculate the then/else code. But if we'd instead simply start at the divergence point, and look for the merge point, then we'll just natively support this case. There's also the fact that `SpeculativelyExecuteBB()` already does just that, but only if there is a single block to speculate, and with a much more restrictive cost model. But that also means we have code duplication. Now, sadly, while this is as much NFCI as possible, there is just no way to cleanly migrate to the proper implementation. The results are going to be different somewhat because of various phase ordering effects and SimplifyCFG block iteration strategy.	2022-02-02 17:53:56 +03:00
serge-sans-paille	e188aae406	Cleanup header dependencies in LLVMCore Based on the output of include-what-you-use. This is a big chunk of changes. It is very likely to break downstream code unless they took a lot of care in avoiding hidden ehader dependencies, something the LLVM codebase doesn't do that well :-/ I've tried to summarize the biggest change below: - llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h - llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h - llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h - llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h - llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h - llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h - llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h And the usual count of preprocessed lines: $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l before: 6400831 after: 6189948 200k lines less to process is no that bad ;-) Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D118652	2022-02-02 06:54:20 +01:00
Anna Thomas	bc48a26655	[LoopPeel] Use reference instead of pointer for DT argument Cleanup code in peelLoop API. We already have usage of DT without guarding against a null DT, so this change constant folds the remaining null DT checks. Also make the argument a reference so that it is clear the argument is a nonnull DT. Extracted from D118472.	2022-02-01 17:00:08 -05:00
Nikita Popov	236fbf571d	[GlobalStatus] Skip non-pointer dead constant users Constant expressions with a non-pointer result type used an early exit that bypassed the later dead constant user check, and resulted in different optimization outcomes depending on whether dead users were present or not. This fixes the issue reported in https://reviews.llvm.org/D117223#3287039.	2022-02-01 15:51:32 +01:00
Fangrui Song	85dfe19b36	[ModuleUtils] Move EmbedBufferInModule to LLVMTransformsUtils D116542 adds EmbedBufferInModule which introduces a layer violation (https://llvm.org/docs/CodingStandards.html#library-layering). See `2d5f857a1e` for detail. EmbedBufferInModule does not use BitcodeWriter functionality and should be moved LLVMTransformsUtils. While here, change the function case to the prevailing convention. It seems that EmbedBufferInModule just follows the steps of EmbedBitcodeInModule. EmbedBitcodeInModule calls WriteBitcodeToFile but has IR update operations which ideally should be refactored to another library. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D118666	2022-01-31 16:33:57 -08:00
Jay Foad	8faad29634	Revert "[Local] invertCondition: try modifying an existing ICmpInst" This reverts commit `a6b54ddaba`. Apparently it is not safe to modify the condition even if it passes the hasOneUse test, because StructurizeCFG might have other references to the condition that are not manifest in the IR use-def chains.	2022-01-31 14:55:36 +00:00
Jay Foad	a6b54ddaba	[Local] invertCondition: try modifying an existing ICmpInst This avoids various cases where StructurizeCFG would otherwise insert an xor i1 instruction, and it since it generally runs late in the pipeline, instcombine does not clean up the xor-of-cmp pattern. Differential Revision: https://reviews.llvm.org/D118478	2022-01-31 10:44:17 +00:00
Nikita Popov	4810051a82	[Inline][Cloning] Reliably remove unreachable blocks during cloning (PR53206) The pruning cloner already tries to remove unreachable blocks. The original cloning process will simplify instructions and constant terminators, and only clone blocks that are reachable at that point. However, phi nodes can only be simplified after everything has been cloned. For that reason, additional blocks may become unreachable after phi simplification. The code does try to handle this as well, but only removes blocks that don't have predecessors. It misses unreachable cycles. This can cause issues if SEH exception handling code is part of an unreachable cycle, as the inliner is not prepared to deal with that. This patch instead performs an explicit scan for reachable blocks, and drops everything else. Fixes https://github.com/llvm/llvm-project/issues/53206. Differential Revision: https://reviews.llvm.org/D118449	2022-01-31 09:31:34 +01:00
Nikita Popov	7d176844d0	[CodeExtractor] Fix warning in assert (NFC)	2022-01-28 16:33:34 +01:00
Nikita Popov	cf0357a545	[BasicBlockUtils] Fix typo in API name (NFC) detatch -> detach. As this requires touching all uses, also lower-case it in accordance with the style guide.	2022-01-28 16:32:13 +01:00
Nikita Popov	91e5096d82	[InlineFunction] Use phis() iterator (NFC)	2022-01-28 10:36:28 +01:00
Florian Hahn	bb5c1b0691	[LoopVersioning] Use IRBuilder for OR simplification.	2022-01-27 09:55:51 +00:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
Nikita Popov	de8867a0b6	[AMDGPUEmitPrintf] Don't require specific pointer element type Rather than checking for i8, simply add a bitcast to i8, so the appendString() code sees the expected type.	2022-01-26 16:16:32 +01:00
Nikita Popov	903c3d2863	[SCEVExpander] Always use i8 GEP for reused value offset We could keep the non-i8 GEP code for non-opaque pointers, but there's two reasons I'm dropping it: First, this actually appears to be dead code, at least it isn't hit in any of our tests. I expect that this is because we usually expand trip counts, and those are never pointers (anymore). Second, the non-i8 GEP was actually incorrect in multiple ways, because it used SCEV type sizes, which don't match DL type sizes (for pointers) and certainly don't match type alloc sizes (which is what GEPs actually use). As such, I'm simplifying the code to always use the i8 GEP code path if it does get hit.	2022-01-26 15:38:58 +01:00
Nikita Popov	bec4e865de	[SCEVExpander] Remove pointer element type access in assertion Assert directly on i8 rather than the element type of i8*.	2022-01-26 10:35:57 +01:00
Giorgis Georgakoudis	95b981ca2a	[CodeExtractor] Enable partial aggregate arguments Summary: Enable CodeExtractor to construct output functions that partially aggregate inputs/outputs in their argument list. A use case is the OMPIRBuilder to create outlined functions for parallel regions that aggregate in a struct the payload variables for the region while passing as scalars thread and bound identifiers. Differential Revision: https://reviews.llvm.org/D96854	2022-01-25 20:50:34 -05:00
Nikita Popov	98db33349b	[SLC] Fix pointer diff type in sprintf() optimization We should always be calculating a byte-wise difference here. Previously this calculated the pointer difference while taking the pointer element type into account, which is incorrect.	2022-01-25 15:22:56 +01:00
Nikita Popov	6a008de82a	[Evaluator] Simplify handling of bitcasted calls When fetching the function, strip casts. When casting the result, use the call result type. Don't actually inspect the bitcast.	2022-01-25 14:19:04 +01:00
Nikita Popov	30d4a7e295	[IRBuilder] Require explicit element type in CreatePtrDiff() For opaque pointer compatibility, we cannot derive the element type from the pointer type.	2022-01-25 12:43:57 +01:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Ahmed Bougacha	e7298464c5	[ObjCARC] Use "UnsafeClaimRV" to refer to unsafeClaim in enums. NFC. This matches the actual runtime function more closely. I considered also renaming both RetainRV/UnsafeClaimRV to end with "ARV", for AutoreleasedReturnValue, but there's less potential for confusion there.	2022-01-24 19:37:01 -08:00
Heejin Ahn	eb675e972d	[WebAssembly] Support Wasm EH + Wasm SjLj D108960 added support for SjLj using Wasm EH instructions, which we call Wasm SjLj going forward. (We call the old SjLj Emscripten SjLj) But it did not support using Wasm EH and Wasm SjLj together. So far users of Wasm EH had to use Wasm EH with Emscripten SjLj, which had a certain limitation and it suffered from bigger code size increases as well. This enables using Wasm EH and Wasm SjLj together. 1. This redirects `catchswitch` and `cleanupret` that unwind to caller to `catch.dispatch.longjmp` BB, which is a `catchswitch` BB that handles longjmps. 2. D108960 converted all longjmpable `call`s to `invokes` that unwind to `catch.dispatch.longjmp`. This CL checks if the `call` is embedded within another `catchpad`, and if so, makes it unwind to its nearest parent's unwind destination, rather than `catch.dispatch.longjmp`. This is necessary to preserve the scoping structure. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D117610	2022-01-19 20:13:54 -08:00
Craig Topper	02d9a4d56d	[LoopPeel] Pass TripCount to computePeelCount by value instead of by reference. NFC The TripCount is not modified by the function so it doesn't need to be passed by reference. Verified by passing it as const reference before changing to value. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D117735	2022-01-19 17:54:45 -08:00
Craig Topper	1507786c22	[LoopPeeling] Fix stale comments. NFC These comments were not updated when PeelingPreferences split from UnrollingPreferences.	2022-01-19 17:00:12 -08:00
Nikita Popov	5ba73c924d	[BuildLibCalls] Mark calloc as inaccessiblememonly Now that DSE handles inaccessiblememonly calloc, mark it as such, as we do with other memory allocation functions.	2022-01-19 12:55:09 +01:00
spupyrev	13d1364a34	A better profi rebalancer This is an extension of profi post-processing step that rebalances counts in CFGs that have basic blocks w/o probes (aka "unknown" blocks). Specifically, the new version finds many more "unknown" subgraphs and marks more "unknown" basic blocks as hot (which prevents unwanted optimization passes). I see up to 0.5% perf on some (large) binaries, e.g., clang-10 and gcc-8. The algorithm is still linear and yields no build time overhead.	2022-01-18 12:14:24 -08:00
Jan Svoboda	5f4ae56457	[llvm] Remove uses of `std::vector<bool>` LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement. This patch does just that for llvm. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D117121	2022-01-18 18:20:45 +01:00
pvellien	4e1c207726	[SimplifyCFG] Fix assertion failure when reusing table switch comparison After D116332, some icmps no longer fold with the target-independent constant folder. The SimplifyCFG code assumed that the comparison would always fold, which is not guaranteed. Explicitly check that the result is either true or false. Differential Revision: https://reviews.llvm.org/D117184	2022-01-18 09:30:54 +01:00
Stephen Tozer	32417b3203	[DebugInfo] ValueMapper impl for DIArgList respects IgnoreMissingLocals This patch fixes an issue in which SSA value reference within a DIArgList would be unnecessarily dropped by llvm-link, even when invoking on a single file (which should be a no-op). The reason for the difference is that the ValueMapper does not refer to the RF_IgnoreMissingLocals flag for LocalAsMetadata contained within a DIArgList; this flag is used for direct LocalAsMetadata uses to preserve SSA references even when the ValueMapper does not have an explicit mapping for the referenced SSA value, which appears to always be the case when using llvm-link in this manner. Differential Revision: https://reviews.llvm.org/D114355	2022-01-17 17:17:32 +00:00
Nikita Popov	c63a3175c2	[AttrBuilder] Remove ctor accepting AttributeList and Index Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeList. Moving this out of the AttrBuilder generally results in cleaner code.	2022-01-15 22:39:31 +01:00
Nikita Popov	d1675e4944	[AttrBuilder] Remove empty() / td_empty() methods The empty() method is a footgun: It only checks whether there are non-string attributes, which is not at all obvious from its name, and of dubious usefulness. td_empty() is entirely unused. Drop these methods in favor of hasAttributes(), which checks whether there are any attributes, regardless of whether these are string or enum attributes.	2022-01-15 17:57:18 +01:00
Florian Hahn	e00158ed5c	[LoopUtils] Use InstSimplifyFolder in addRuntimeChecks. Use the InstSimplifyFolder introduced earlier to perform initial simplification during runtime check construction.	2022-01-15 15:21:16 +00:00
Roman Lebedev	82c8aca934	[SimplifyCFG] Be more aggressive when sinking into block followed by unreachable I strongly believe we need some variant of this. The main problem is e.g. that the glibc's assert has 4 parameters, but the profitability check is only okay with one extra phi node, so D116692 doesn't even trigger on most of the expected cases. While that restriction probably makes sense in normal code, if we are about to run off of a cliff (into an `unreachable`), this successor block is unlikely so the cost to setup these PHI nodes should not be on the hotpath, and shouldn't matter performance-wise. Likewise, we don't sink if there are unconditional predecessors UNLESS we'd sink at least one non-speculatable instruction, which is a performance workaround, but if we are about to run into `unreachable`, it shouldn't matter. Note that we only allow the case where there are at most unconditiona branches on the way to the unreachable block. Differential Revision: https://reviews.llvm.org/D117045	2022-01-13 23:30:31 +03:00
Florian Hahn	e3275cfa94	[BuildLibCalls] Add nounwind,willreturn to memset_pattern{4,8,16}. Similar to memset, memset_pattern{4,8,16} all will return and do not unwind. Use fallthrough to include all attributes also set for memset. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114904	2022-01-12 10:32:53 +00:00
Nikita Popov	47a47733f0	[GlobalStatus] Remove unused HasNonInstructionUser member (NFC) This hasn't been used in a long time.	2022-01-12 09:40:54 +01:00
Nikita Popov	94d6263391	[GlobalStatus] Look through non-constexpr casts analyzeGlobal() looks through non-constexpr cast instructions when looking for users. However, this particular place only strips the casts again if they are constexprs. We should be looking through all casts here.	2022-01-11 16:02:35 +01:00
Florian Hahn	2d67a86b7c	[SCEVExpander] Use IntToPtr for temporary instruction. Use PtrToInt instead Add when creating temporary instructions. The add might get folded away with more sophisticated folding.	2022-01-11 09:40:21 +00:00
Philip Reames	5265ac72c6	[MemoryBuiltin] Add an API for checking if an unused allocation can be removed [NFC] Not all allocation functions are removable if unused. An example of a non-removable allocation would be a direct call to the replaceable global allocation function in C++. An example of a removable one - at least according to historical practice - would be malloc.	2022-01-10 15:43:39 -08:00
Roman Lebedev	82fb4f4b22	[SCEV] Sequential/in-order `UMin` expression As discussed in https://github.com/llvm/llvm-project/issues/53020 / https://reviews.llvm.org/D116692, SCEV is forbidden from reasoning about 'backedge taken count' if the branch condition is a poison-safe logical operation, which is conservatively correct, but is severely limiting. Instead, we should have a way to express those poison blocking properties in SCEV expressions. The proposed semantics is: ``` Sequential/in-order min/max SCEV expressions are non-commutative variants of commutative min/max SCEV expressions. If none of their operands are poison, then they are functionally equivalent, otherwise, if the operand that represents the saturation point* of given expression, comes before the first poison operand, then the whole expression is not poison, but is said saturation point. ``` * saturation point - the maximal/minimal possible integer value for the given type The lowering is straight-forward: ``` compare each operand to the saturation point, perform sequential in-order logical-or (poison-safe!) ordered reduction over those checks, and if reduction returned true then return saturation point else return the naive min/max reduction over the operands ``` https://alive2.llvm.org/ce/z/Q7jxvH (2 ops) https://alive2.llvm.org/ce/z/QCRrhk (3 ops) Note that we don't need to check the last operand: https://alive2.llvm.org/ce/z/abvHQS Note that this is not commutative: https://alive2.llvm.org/ce/z/FK9e97 That allows us to handle the patterns in question. Reviewed By: nikic, reames Differential Revision: https://reviews.llvm.org/D116766	2022-01-10 20:51:26 +03:00
Serge Guelton	d2cc6c2d0c	Use a sorted array instead of a map to store AttrBuilder string attributes Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step. Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions Differential Revision: https://reviews.llvm.org/D116599	2022-01-10 14:49:53 +01:00
Florian Hahn	aecad5828e	[SCEVExpander] Only create trunc when needed. `9345ab3a45` updated generateOverflowCheck to skip creating checks that always evaluate to false. This in turn means that we only need to create TruncTripCount if it is actually used. Sink the TruncTripCount creating into ComputeEndCheck, so it is only created when there's an actual check.	2022-01-10 11:31:27 +00:00
Florian Hahn	ad1b8772cf	[SCEVExpander] Only create multiplication if needed. `9345ab3a45` updated generateOverflowCheck to skip creating checks that always evaluate to false. This in turn means that we only need to compute \|Step\| * Trip count if the result of the multiplication is actually used. Sink the multiplication into ComputeEndCheck, so it is only created when there's an actual check.	2022-01-10 08:49:25 +00:00
Florian Hahn	1ce01b7dfe	[SCEVExpander] Simplify cleanup, skip sorting by dominance. There is no need to sort inserted instructions by dominance, as the deletion loop still requires RAUW with undef before deleting. Removing instructions in reverse insertion order should still insure that the number of uselist updates is kept to a minimum.	2022-01-09 18:38:41 +00:00
Florian Hahn	7f1bf68d7d	[SCEVExpander] Only check overflow if it is needed. `9345ab3a45` updated generateOverflowCheck to skip creating checks that always evaluate to false. This in turn means that we only need to check for overflows if the result of the multiplication is actually used. Sink the Or for the overflow check into ComputeEndCheck, so it is only created when there's an actual check.	2022-01-09 12:55:41 +00:00
Florian Hahn	9345ab3a45	[SCEVExpander] Skip creating <u 0 check, which is always false. Unsigned compares of the form <u 0 are always false. Do not create such a redundant check in generateOverflowCheck. The patch introduces a new lambda to create the check, so we can exit early conveniently and skip creating some instructions feeding the check. I am planning to sink a few additional instructions as follow-ups, but I would prefer to do this separately, to keep the changes and diff smaller. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D116811	2022-01-08 10:31:04 +00:00
Florian Hahn	f395a4f8d5	[SCEVExpand] Only create required predicate checks. Currently generateOverflowCheck always creates code for Step being negative and positive, followed by a select at the end depending on Step's sign. This patch updates the code to only create either the checks for step being positive or negative, if the sign is known. Follow-up to D116696. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D116747	2022-01-07 14:49:02 +00:00
Nikita Popov	348bc76e35	[LibCalls] Infer same attrs for reallocf() as realloc() reallocf() is the same as realloc() but frees the input pointer on failure as well. We can infer the same attributes. Also combine some cases that infer the same attributes and are logically related.	2022-01-07 09:51:15 +01:00
Kazu Hirata	2aed08131d	[llvm] Use true/false instead of 1/0 (NFC) Identified with modernize-use-bool-literals.	2022-01-07 00:39:14 -08:00
Nikita Popov	c8189da201	[ModuleUtils] Remove dead arg from filterDeadComdatFunctions() (NFC) The module argument is no longer used.	2022-01-07 09:12:16 +01:00
Philip Reames	916b35e783	[unroll] Strengthen verification of analysis updates under expensive asserts I am suspecting a bug around updates of loop info for unreachable exits, but don't have a test case. Running this locally on make check didn't reveal anything, we'll see if the expensive checks bots find it.	2022-01-06 08:51:50 -08:00
Florian Hahn	86d113a8b8	[SCEVExpand] Do not create redundant 'or false' for pred expansion. This patch updates SCEVExpander::expandUnionPredicate to not create redundant 'or false, x' instructions. While those are trivially foldable, they can be easily avoided and hinder code that checks the size/cost of the generated checks before further folds. I am planning on look into a few other similar improvements to code generated by SCEVExpander. I remember a while ago @lebedev.ri working on doing some trivial folds like that in IRBuilder itself, but there where concerns that such changes may subtly break existing code. Reviewed By: reames, lebedev.ri Differential Revision: https://reviews.llvm.org/D116696	2022-01-06 11:52:19 +00:00
Nikita Popov	32808cfb24	[IR] Track users of comdats Track all GlobalObjects that reference a given comdat, which allows determining whether a function in a comdat is dead without scanning the whole module. In particular, this makes filterDeadComdatFunctions() have complexity O(#DeadFunctions) rather than O(#SymbolsInModule), which addresses half of the compile-time issue exposed by D115545. Differential Revision: https://reviews.llvm.org/D115864	2022-01-06 09:13:58 +01:00
Sanjay Patel	e2165e0968	[InstCombine] remove trunc user restriction for match of bswap This does not appear to cause any problems, and it fixes #50910 Extra tests with a trunc user were added with: `3a239379` ...but they don't match either way, so there's an opportunity to improve the matching further.	2022-01-05 13:04:11 -05:00
Philip Reames	c16fd6a376	Rename doesNotReadMemory to onlyWritesMemory globally [NFC] The naming has come up as a source of confusion in several recent reviews. onlyWritesMemory is consist with onlyReadsMemory which we use for the corresponding readonly case as well.	2022-01-05 08:52:55 -08:00
Nikita Popov	6e474d3308	[GlobalOpt][Evaluator] Fix off by one error in bounds check (PR53002) We should bail out if the index is >= the size, not > the size. Fixes https://github.com/llvm/llvm-project/issues/53002.	2022-01-05 14:06:02 +01:00
Benjamin Kramer	5f0a349738	Revert "Revert "[InferAttrs] Add writeonly to all the math functions"" This reverts commit `29b6e967f3`. The bug it found in PartiallyInlineLibCalls was fixed in `c8ffc73350`.	2022-01-05 12:16:35 +01:00
Sjoerd Meijer	e550dfa4a6	Silence a few unused variable warnings. NFC.	2022-01-05 09:15:07 +00:00
Martin Storsjö	29b6e967f3	Revert "[InferAttrs] Add writeonly to all the math functions" This reverts commit `ea75be3d9d` and `1eb5b6e850`. That commit caused crashes with compilation e.g. like this (not fixed by the follow-up commit): $ cat sqrt.c float a; b() { sqrt(a); } $ clang -target x86_64-linux-gnu -c -O2 sqrt.c Attributes 'readnone and writeonly' are incompatible! %sqrtf = tail call float @sqrtf(float %0) #1 in function b fatal error: error in backend: Broken function found, compilation aborted!	2022-01-05 11:12:19 +02:00
Nikita Popov	787f86e68c	[GlobalOpt][Evaluator] Don't create bitcast for same type (PR52994) isBitOrNoopPointerCastable() returns true if the types are the same, but it's not actually possible to create a bitcast for all such types. The assumption seems to be that the user will omit creating the cast in that case, as it is unnecessary. Fixes https://github.com/llvm/llvm-project/issues/52994.	2022-01-05 09:17:07 +01:00
Fangrui Song	1eb5b6e850	[InferAttrs] If readonly is already set, set readnone instead of writeonly D116426 may lead to an assertion failure `Attributes 'readonly and writeonly' are incompatible!` if the builtin function already has `readonly`.	2022-01-04 18:59:35 -08:00
Benjamin Kramer	ea75be3d9d	[InferAttrs] Add writeonly to all the math functions All of these functions would be `readnone`, but can't be on platforms where they can set `errno`. A `writeonly` function with no pointer arguments can only write (but never read) global state. Writeonly theoretically allows these calls to be CSE'd (a writeonly call with the same arguments will always result in the same global stores) or hoisted out of loops, but that's not implemented currently. There are a few functions in this list that could be `readnone` instead of `writeonly`, if someone is interested. Differential Revision: https://reviews.llvm.org/D116426	2022-01-04 16:58:05 +01:00
Nikita Popov	bbeaf2aac6	[GlobalOpt][Evaluator] Rewrite global ctor evaluation (fixes PR51879) Global ctor evaluation currently models memory as a map from Constant* to Constant. For this to be correct, it is required that there is only a single Constant referencing a given memory location. The Evaluator tries to ensure this by imposing certain limitations that could result in ambiguities (by limiting types, casts and GEP formats), but ultimately still fails, as can be seen in PR51879. The approach is fundamentally fragile and will get more so with opaque pointers. My original thought was to instead store memory for each global as an offset => value representation. However, we also need to make sure that we can actually rematerialize the modified global initializer into a Constant in the end, which may not be possible if we allow arbitrary writes. What this patch does instead is to represent globals as a MutableValue, which is either a Constant* or a MutableAggregate. The mutable aggregate exists to allow efficient mutation of individual aggregate elements, as mutating an element on a Constant would require interning a new constant. When a write to the Constant is made, it is converted into a MutableAggregate* as needed. I believe this should make the evaluator more robust, compatible with opaque pointers, and a bit simpler as well. Fixes https://github.com/llvm/llvm-project/issues/51221. Differential Revision: https://reviews.llvm.org/D115530	2022-01-04 09:30:54 +01:00
Philip Reames	7203140748	Revert "[unroll] Prune all but first copy of invariant exit" This reverts commit `9bd22595ba`. Seeing some bot failures which look plausibly connected. Revert while investigating/waiting for bots to stablize. e.g. https://lab.llvm.org/buildbot#builders/36/builds/15933	2022-01-03 11:57:35 -08:00
Craig Topper	cbcbbd6ac8	[ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC This function returns an upper bound on the number of bits needed to represent the signed value. Use "Max" to match similar functions in KnownBits like countMaxActiveBits. Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old name around to keep this patch size down. Will do a bulk rename as follow up. Rename KnownBits::countMaxSignedBits->countMaxSignificantBits. Reviewed By: lebedev.ri, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D116522	2022-01-03 11:33:30 -08:00
Craig Topper	14849fe554	[SimplifyCFG] Make use of ComputeMinSignedBits and KnownBits::getBitWidth. NFC	2022-01-03 10:08:14 -08:00
Philip Reames	9bd22595ba	[unroll] Prune all but first copy of invariant exit If we have an exit which is controlled by a loop invariant condition and which dominates the latch, we know only the copy in the first unrolled iteration can be taken. All other copies are dead. The change itself is pretty straight forward, but let me add two points of context: * I'd have expected other transform passes to catch this after unrolling, but I'm seeing multiple examples where we get to the end of O2/O3 without simplifying. * I'd like to do a stronger change which did CSE during unroll and accounted for invariant expressions (as defined by SCEV instead of trivial ones from LoopInfo), but that doesn't fit cleanly into the current code structure. Differential Revision: https://reviews.llvm.org/D116496	2022-01-03 09:55:19 -08:00
Nikita Popov	730414b341	[CodeExtractor] Remove unnecessary explicit attribute handling (NFC) The nounwind and uwtable attributes will get handled as part of the loop below as well, there is no need to special-case them here.	2022-01-03 14:23:25 +01:00
Nikita Popov	587495ffa1	[CodeExtractor] Separate function from param/ret attributes (NFC) This list is confusing because it conflates functions attributes (which are either extractable or not) and other attribute kinds, which are simply irrelevant for this code.	2022-01-03 14:07:12 +01:00
Benjamin Kramer	4683ce2cd8	[InferAttrs] Give strnlen the same attributes as strlen This moves the only string function out of the big list of math funcs. And let's us CSE strnlen calls.	2021-12-30 20:43:43 +01:00
Kazu Hirata	e7774f499b	Use static_assert instead of assert (NFC) Identified with misc-static-assert.	2021-12-26 14:26:44 -08:00
Djordje Todorovic	93615b88f5	[Debugify] Use WeakWH map collected before Pass when checking loc drop This fixes a typo/bug when checking for pointer reuse when testing DI location preservation in the Debugify original mode (when checking -g generated Debug Info). Differential Revision: https://reviews.llvm.org/D115621	2021-12-21 15:54:09 +01:00
Sami Tolvanen	5dc8aaac39	[llvm][IR] Add no_cfi constant With Control-Flow Integrity (CFI), the LowerTypeTests pass replaces function references with CFI jump table references, which is a problem for low-level code that needs the address of the actual function body. For example, in the Linux kernel, the code that sets up interrupt handlers needs to take the address of the interrupt handler function instead of the CFI jump table, as the jump table may not even be mapped into memory when an interrupt is triggered. This change adds the no_cfi constant type, which wraps function references in a value that LowerTypeTestsModule::replaceCfiUses does not replace. Link: https://github.com/ClangBuiltLinux/linux/issues/1353 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D108478	2021-12-20 12:55:32 -08:00
Kazu Hirata	26bd534a79	[llvm] Use none_of instead of \!any_of (NFC)	2021-12-17 13:48:57 -08:00
Kazu Hirata	2b7be47b22	[llvm] Strip redundant lambda (NFC)	2021-12-17 10:51:40 -08:00
Philip Reames	f632c49478	Extract a helper function for computing estimate trip count of an exiting branch Plan to use this in following change to support estimated trip counts derived from multiple loop exits.	2021-12-16 17:29:32 -08:00
eopXD	6734be290b	Revert "[LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified" This reverts commit `fbf6c8ac15`.	2021-12-16 02:06:11 -08:00
Yueh-Ting Chen	fbf6c8ac15	[LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified After this function call, the LLVM IR would look like the following: ``` if (true) /* NonVersionedLoop / else / VersionedLoop */ ``` Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D104631	2021-12-16 00:41:29 -08:00
Arthur Eubanks	d5583366ba	[FunctionComparator] Use getAlign() instead of getAlignment() getAlignment() is deprecated.	2021-12-15 14:40:56 -08:00
Kazu Hirata	7787a8f1b7	[llvm] Use llvm::reverse (NFC)	2021-12-13 21:54:51 -08:00
Nick Desaulniers	95ba0e4563	[SimplifyLibCalls] propagate tail flags on CallInsts I noticed we weren't propagating tail flags on calls when FortifiedLibCallSimplifier.optimizeCall() was replacing calls to runtime checked calls to the non-checked routines (when safe to do so). Make sure to check this before replacing the original calls! Also, avoid any libcall transforms when notail/musttail is present. PR46734 Fixes: https://github.com/llvm/llvm-project/issues/46079 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D107872	2021-12-13 11:18:30 -08:00
Gulfem Savrun Yeniceri	e5a8af7a90	[Passes] Fix relative lookup table converter pass This patch fixes the relative table converter pass for the lookup table accesses that are resulted in an instruction sequence, where gep is not immediately followed by a load, such as gep being hoisted outside the loop or another instruction is inserted in between them. The fix inserts the call to load.relative.instrinsic in the original place of load instead of gep. Issue is reported by FreeBSD via https://bugs.freebsd.org/259921. Differential Revision: https://reviews.llvm.org/D115571	2021-12-12 04:40:17 +00:00
Philip Reames	0d13f94c1d	[reductions] Delete another piece of dead flag handling [NFC] The code claimed to handle nsw/nuw, but those aren't passed via builder state and the explicit IR construction just above never sets them. The only case this bit of code is actually relevant for is FMF flags. However, dropPoisonGeneratingFlags currently doesn't know about FMF at all, so this was a noop. It's also unneeded, as the caller explicitly configures the flags on the builder before this call, and the flags on the individual ops should be controled by the intrinsic flags anyways. If any of the flags aren't safe to propagate, the caller needs to make that change.	2021-12-09 10:56:55 -08:00
Philip Reames	b24db85c0b	[recurrence] Delete dead flag/fmf handling [NFC] The recurrence lowering code has handling which claims to be about flag intersection, but all the callers pass empty arrays to the arguments. The sole exception is a caller of a method which has the argument, but no implementation. I don't know what the intent was here, but it certaintly doesn't actually do anything today.	2021-12-09 10:43:53 -08:00
Philip Reames	2d31b02517	Compute estimated trip counts for multiple exit loops This change allows us to estimate trip count from profile metadata for all multiple exit loops. We still do the estimate only from the latch, but that's fine as it causes us to over estimate the trip count at worst. Reviewing the uses of the API, all but one are cases where we restrict a loop transformation (unroll, and vectorize respectively) when we know the trip count is short enough. So, as a result, the change makes these passes strictly less aggressive. The test change illustrates a case where we'd previously have runtime unrolled a loop which ran fewer iterations than the unroll factor. This is definitely unprofitable. The one case where an upper bound on estimate trip count could drive a more aggressive transform is peeling, and I duplicated the logic being removed from the generic estimation there to keep it the same. The resulting heuristic makes no sense and should probably be immediately removed, but we can do that in a separate change. This was noticed when analyzing regressions on D113939. I plan to come back and incorporate estimated trip counts from other exits, but that's a minor improvement which can follow separately. Differential Revision: https://reviews.llvm.org/D115362	2021-12-09 09:53:49 -08:00
Dmitry Makogon	0b533c1833	[MetaRenamer] Add command line options to disable renaming name with specified prefixes This patch adds 4 options for specifying functions, aliases, globals and structs name prefixes hat don't need to be renamed by MetaRenamer pass. This is useful if one has some downstream logic that depends directly on an entity name. MetaRenamer can break this logic, but with the patch you can tell it not to rename certain names. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D115323	2021-12-09 18:45:06 +07:00
spupyrev	f573f6866e	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-07 07:31:10 -08:00
Nico Weber	3678326d28	Revert "ext-tsp basic block layout" This reverts commit `c68f71eb37`. Breaks tests on arm hosts, see comments on https://reviews.llvm.org/D113424	2021-12-06 19:08:20 -05:00
spupyrev	c68f71eb37	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-06 08:56:39 -08:00
Anna Thomas	72750f0012	[TrivialDeadness] Introduce API separating two different usages The earlier usage of wouldInstructionBeTriviallyDead is based on the assumption that the use_count of that instruction being checked will be zero. This patch separates the API into two different ones: 1. The strictly conservative one where the instruction is trivially dead iff the uses are dead. 2. The slightly relaxed form, where an instruction is dead along paths where it is not used. The second form can be used in identifying instructions that are valid to sink down to uses (D109917). Reviewed-By: reames Differential Revision: https://reviews.llvm.org/D114647	2021-12-03 10:09:52 -05:00
spupyrev	93a2c2919f	profi - a flow-based profile inference algorithm: Part III (out of 3) This is a continuation of D109860 and D109903. An important challenge for profile inference is caused by the fact that the sample profile is collected on a fully optimized binary, while the block and edge frequencies are consumed on an early stage of the compilation that operates with a non-optimized IR. As a result, some of the basic blocks may not have associated sample counts, and it is up to the algorithm to deduce missing frequencies. The problem is illustrated in the figure where three basic blocks are not present in the optimized binary and hence, receive no samples during profiling. We found that it is beneficial to treat all such blocks equally. Otherwise the compiler may decide that some blocks are “cold” and apply undesirable optimizations (e.g., hot-cold splitting) regressing the performance. Therefore, we want to distribute the counts evenly along the blocks with missing samples. This is achieved by a post-processing step that identifies "dangling" subgraphs consisting of basic blocks with no sampled counts; once the subgraphs are found, we rebalance the flow so as every branch probability is 50:50 within the subgraphs. Our experiments indicate up to 1% performance win using the optimization on some binaries and a significant improvement in the quality of profile counts (when compared to ground-truth instrumentation-based counts) {F19093045} Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D109980	2021-12-02 12:01:30 -08:00
spupyrev	98dd2f9ed3	profi - a flow-based profile inference algorithm: Part II (out of 3) This is a continuation of D109860. Traditional flow-based algorithms cannot guarantee that the resulting edge frequencies correspond to a connected flow in the control-flow graph. For example, for an instance in the attached figure, a flow-based (or any other) inference algorithm may produce an output in which the hot loop is disconnected from the entry block (refer to the rightmost graph in the figure). Furthermore, creating a connected minimum-cost maximum flow is a computationally NP-hard problem. Hence, we apply a post-processing adjustments to the computed flow by connecting all isolated flow components ("islands"). This feature helps to keep all blocks with sample counts connected and results in significant performance wins for some binaries. {F19077343} Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D109903	2021-12-02 11:04:21 -08:00
Kazu Hirata	22d82949b0	[llvm] Fix "unused variable" warnings	2021-12-02 09:20:17 -08:00
Djordje Todorovic	2cdc6f2ca6	Reland "[LICM] Hoist LOAD without sinking the STORE" When doing load/store promotion within LICM, if we cannot prove that it is safe to sink the store we won't hoist the load, even though we can prove the load could be dereferenced and moved outside the loop. This patch implements the load promotion by moving it in the loop preheader by inserting proper PHI in the loop. The store is kept as is in the loop. By doing this, we avoid doing the load from a memory location in each iteration. Please consider this small example: loop { var = ptr; if (var) break; ptr= var + 1; } After this patch, it will be: var0 = ptr; loop { var1 = phi (var0, var2); if (var1) break; var2 = var1 + 1; ptr = var2; } This addresses some problems from [0]. [0] https://bugs.llvm.org/show_bug.cgi?id=51193 Differential revision: https://reviews.llvm.org/D113289	2021-12-02 03:53:50 -08:00
Florian Hahn	2de5f39e54	[BuildLibCalls] Add support for memset_pattern{4,8}. Add support for memset_pattern{4,8} similar to the existing memset_pattern16 handling. Reviewed By: ab Differential Revision: https://reviews.llvm.org/D114883	2021-12-02 11:04:25 +00:00
Nikita Popov	a0ff26e08c	[GlobalOpt] Fix assertion failure during instruction deletion This fixes the assertion failure reported in https://reviews.llvm.org/D114889#3166417, by making RecursivelyDeleteTriviallyDeadInstructionsPermissive() more permissive. As the function accepts a WeakTrackingVH, even if originally only Instructions were inserted, we may end up with different Value types after a RAUW operation. As such, we should not assume that the vector only contains instructions. Notably this matches the behavior of the RecursivelyDeleteTriviallyDeadInstructions() function variant which accepts a single value rather than vector.	2021-12-02 11:58:39 +01:00
Florian Hahn	0496edad49	[BuildLibCalls] Add additional attrs to memcpy_chk. `memcpy_chk` can be treated like `memcpy`, with the exception that it may not return (if it aborts the program). See D114793 for a similar patch for `memset_chk`. Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D114863	2021-12-02 09:50:14 +00:00
Arthur Eubanks	512534bc16	[Cloning] Clone metadata on function declarations Previously we missed cloning metadata on function declarations because we don't call CloneFunctionInto() on declarations in CloneModule(). Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D113812	2021-12-01 15:40:05 -08:00
spupyrev	7cc2493daa	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. UPD Dec 1st 2021: - synced the declaration and definition of the option `SampleProfileUseProfi ` to use type `cl::opt<bool`; - added `inline` for `SampleProfileInference<BT>::findUnlikelyJumps` and `SampleProfileInference<BT>::isExit` to avoid linking problems on windows. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-12-01 15:30:38 -08:00
Djordje Todorovic	72f9f066df	Revert "[LICM] Hoist LOAD without sinking the STORE" This reverts commit `ecb9d8e4e3`. I'll reland this as soon as the failing tests are fixed/updated.	2021-12-01 04:39:26 -08:00
Djordje Todorovic	ecb9d8e4e3	[LICM] Hoist LOAD without sinking the STORE When doing load/store promotion within LICM, if we cannot prove that it is safe to sink the store we won't hoist the load, even though we can prove the load could be dereferenced and moved outside the loop. This patch implements the load promotion by moving it in the loop preheader by inserting proper PHI in the loop. The store is kept as is in the loop. By doing this, we avoid doing the load from a memory location in each iteration. Please consider this small example: loop { var = ptr; if (var) break; ptr= var + 1; } After this patch, it will be: var0 = ptr; loop { var1 = phi (var0, var2); if (var1) break; var2 = var1 + 1; ptr = var2; } This addresses some problems from [0]. [0] https://bugs.llvm.org/show_bug.cgi?id=51193 Differential revision: https://reviews.llvm.org/D113289	2021-12-01 04:27:50 -08:00
Florian Hahn	6a5e29d13f	[BuildLibCalls] Add argmemonly, writeonly, nounwind to memset_chk. The memset_chk library function should match memset's attributes with respect of memory effects (argmemonly, writeonly). It also does not raise exceptions. It may not return, in case it aborts the program. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D114793	2021-12-01 10:09:52 +00:00
Nikita Popov	84b978da3b	[LoopUnrollRuntime] Remove unnecessary pointer BECount check (NFC) BECounts are guaranteed to be integers nowadays.	2021-12-01 10:32:37 +01:00
Philip Reames	8906a0fe64	[SCEVExpander] Drop poison generating flags when reusing instructions The basic problem we have is that we're trying to reuse an instruction which is mapped to some SCEV. Since we can have multiple such instructions (potentially with different flags), this is analogous to our need to drop flags when performing CSE. A trivial implementation would simply drop flags on any instruction we decided to reuse, and that would be correct. This patch is almost that trivial patch except that we preserve flags on the reused instruction when existing users would imply UB on overflow already. Adding new users can, at most, refine this program to one which doesn't execute UB which is valid. In practice, this fixes two conceptual problems with the previous code: 1) a binop could have been canonicalized into a form with different opcode or operands, or 2) the inbounds GEP case which was simply unhandled. On the test changes, most are pretty straight forward. We loose some flags (in some cases, they'd have been dropped on the next CSE pass anyways). The one that took me the longest to understand was the ashr-expansion test. What's happening there is that we're considering reuse of the mul, previously we disallowed it entirely, now we allow it with no flags. The surrounding diffs are all effects of generating the same mul with a different operand order, and then doing simple DCE. The loss of the inbounds is unfortunate, but even there, we can recover most of those once we actually treat branch-on-poison as immediate UB. Differential Revision: https://reviews.llvm.org/D112734	2021-11-29 15:23:34 -08:00
Bjorn Pettersson	297fb66484	Use a deterministic order when updating the DominatorTree This solves a problem with non-deterministic output from opt due to not performing dominator tree updates in a deterministic order. The problem that was analysed indicated that JumpThreading was using the DomTreeUpdater via llvm::MergeBasicBlockIntoOnlyPred. When preparing the list of updates to send to DomTreeUpdater::applyUpdates we iterated over a SmallPtrSet, which didn't give a well-defined order of updates to perform. The added domtree-updates.ll test case is an example that would result in non-deterministic printouts of the domtree. Semantically those domtree:s are equivalent, but it show the fact that when we use the domtree iterator the order in which nodes are visited depend on the order in which dominator tree updates are performed. Since some passes (at least EarlyCSE) are iterating over nodes in the dominator tree in a similar fashion as the domtree printer, then the order in which transforms are applied by such passes, transitively, also depend on the order in which dominator tree updates are performed. And taking EarlyCSE as an example the end result could be different depending on in which order the transforms are applied. Reviewed By: nikic, kuhar Differential Revision: https://reviews.llvm.org/D110292	2021-11-29 13:14:50 +01:00
Rosie Sumpter	c2441b6b89	[LoopVectorize] Add vector reduction support for fmuladd intrinsic Enables LoopVectorize to handle reduction patterns involving the llvm.fmuladd intrinsic. Differential Revision: https://reviews.llvm.org/D111555	2021-11-24 08:50:04 +00:00
Jun Ma	07333810ca	Revert "Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""""" This reverts commit `c93f93b2e3`.	2021-11-24 10:26:37 +08:00
Mehdi Amini	1392b654ff	Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)" This reverts commit `884b6dd311`. The windows build is broken with a linker error.	2021-11-23 20:10:36 +00:00
spupyrev	884b6dd311	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-11-23 11:02:40 -08:00
Zarko Todorovski	0d3add216f	[llvm][NFC] Inclusive language: Reword replace uses of sanity in llvm/lib/Transform comments and asserts Reworded some comments and asserts to avoid usage of `sanity check/test` Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114372	2021-11-23 13:22:55 -05:00
Philip Reames	065f777d27	Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)" This reverts commit `b00fc19822`. This change fails to build (link) on ubuntu x86,	2021-11-23 09:18:28 -08:00
spupyrev	b00fc19822	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-11-23 09:08:30 -08:00
Kazu Hirata	d1abf481da	[llvm] Use range-based for loops (NFC)	2021-11-19 21:12:13 -08:00
ksyx	97b9e8438e	[GVN][NFC] Remove redundant check The if-check above deleted part guarantees that StoreOffset <= LoadOffset and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0, while it could be meaningless to have a type with nonpositive size, so that the check could be removed. The values are converted to signed types to avoid unsigned operation with negative offsets. Part of revision D100179 Reapply commit `c35e8185d8` with fixing problem reported by mstorsjo	2021-11-19 20:24:36 -05:00
Senran Zhang	0425ea4621	[NFC][OpaquePtr][Evaluator] Remove call to PointerType::getElementType There are still another 2 uses of PointerType::getElementType in Evaluator when evaluating BitCast's on pointers. BitCast's on pointers should be removed when opaque ptr is ready, so I just keep them as is. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D114131	2021-11-19 10:32:55 +08:00
Philip Reames	8f95e915cd	[unroll-runtime] Relax two profitability limitations on multi-exit unrolling This change is mostly about getting rid of some "uninteresting" cases in a follow on deeper heuristic change. If anyone sees actually interesting code differences out of this, please let me know. I'm not expecting this to have much impact at all. Case 1 - With the single deoptimize non-latch exit, we can't have two exiting blocks sharing an exit block. We can only hit this with a poorly documented debug flag. Case 2 - Why should we treat epilog cases differently from prolog cases? Or to say it differently, why should starting with a constant control whether a multiple exit loop gets unrolled? Sorry for the lack of tests here. These are both exceedingly narrow cases in practice, and after a while trying, I couldn't come up with a test which did anything "useful" as opposed to simply exercise a random combination of force flags. Note that the legality cases for each are already exercised with force flags.	2021-11-15 13:00:14 -08:00
Philip Reames	423da61835	[runtime-unroll] Inline canSafelyUnrollMultiExitLoop [NFC] All of the interesting logic from this routine has been removed, inline the single check into the sole non-assert caller. The assert use has little value with the restructured code and is simply dropped.	2021-11-15 11:39:07 -08:00
Philip Reames	e99902a872	[runtime-unroll] Restructure if-clause to improve readability [NFC]	2021-11-15 11:13:27 -08:00
ksyx	72b5138d37	Revert "[GVN][NFC] Remove redundant check" This reverts commit `c35e8185d8`. mstorsjo reported in the revision thread that one VNCoercion assertion is violated and seemly in relate to this commit. As per "If a test case that demonstrates a problem is reported in the commit thread, please revert and investigate offline", this commit is reverted.	2021-11-15 09:14:13 -05:00
Mircea Trofin	a32c2c3808	[NFC] Use Optional<ProfileCount> to model invalid counts ProfileCount could model invalid values, but a user had no indication that the getCount method could return bogus data. Optional<ProfileCount> addresses that, because the user must dereference the optional. In addition, the patch removes concept duplication. Differential Revision: https://reviews.llvm.org/D113839	2021-11-14 19:03:30 -08:00
Kazu Hirata	098e935174	[llvm] Use range-based for loops with CallBase::args (NFC)	2021-11-14 09:32:36 -08:00
Mircea Trofin	0662a3612c	[NFC][InlineFunction] Renamed some vars to conform to coding style	2021-11-14 07:26:44 -08:00
ksyx	c35e8185d8	[GVN][NFC] Remove redundant check The if-check above deleted part guarantees that StoreOffset <= LoadOffset and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0, while it could be meaningless to have a type with nonpositive size, so that the check could be removed. Part of revision D100179 Reviewed By: nikic	2021-11-13 15:59:43 -05:00
Philip Reames	37ead201e6	[runtime-unroll] Use incrementing IVs instead of decrementing ones This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing. Why does this matter? A couple of reasons: * SCEV doesn't have a native subtract node. Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such. As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones. (You can see this in the inferred flags in some of the test cases.) * Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language. We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced. (You can see this looking at nearby phis in the test cases.) Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen. * Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value. We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.	2021-11-12 15:44:58 -08:00
Philip Reames	de2fed6152	[unroll] Keep unrolled iterations with initial iteration The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration. With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code. However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.	2021-11-12 11:40:50 -08:00
Florian Hahn	2ead34716a	[SimplifyCFG] Add early bailout if Use is not in same BB. Without this patch, passingValueIsAlwaysUndefined will iterate over all instructions from I to the end of the basic block, even if the use is outside the block. This patch adds an early bail out, if the use instruction is outside I's BB. This can greatly reduce compile-time in cases where very large basic blocks are involved, with a large number of PHI nodes and incoming values. Note that the refactoring makes the handling of the case where I is a phi and Use is in PHI more explicit as well: for phi nodes, we can also directly bail out. In the existing code, we would iterate until we reach the end and return false. Based on an earlier patch by Matt Wala. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D113293	2021-11-09 12:57:03 +00:00
Max Kazantsev	cb728cb8a9	[NFC] Get rid of hardcoded magical constant and use Optionals instead Refactor calculateIterationsToInvariance so that it doesn't need a magical constant to signify unknown answer.	2021-11-09 18:13:19 +07:00
Dmitry Makogon	ae14fae0ff	[SCEVExpander] Use stable_sort to sort loop Phis in SCEVExpander::replaceCongruentIVs This is a fix for test failures on expensive checks build caused by `db289340c8`. With LLVM_ENABLE_EXPENSIVE_CHECKS enabled the llvm::sort shuffles the given container. However, the sort is only called when the TTI is passed to replaceCongruentIVs. In the mentioned patch we pass it TTI, so the sort happens. But due to shuffling equivalent Phis may appear in different order from run to run. With the stable_sort instead of sort this is impossible - the order of sorted Phis is preserved.	2021-11-09 16:29:57 +07:00
Anton Afanasyev	ce4fa93db8	[SCCP] Tune cast instruction handling for overdefined operand Extended value is known to be inside range smaller than full one. Prevent SCCP to mark such value as overdefined. Fixes PR52253 Differential Revision: https://reviews.llvm.org/D112721	2021-11-08 18:34:30 +03:00
Kazu Hirata	0d182d9d1e	[Transforms] Use make_early_inc_range (NFC)	2021-11-07 17:03:15 -08:00
Benjamin Kramer	9b8b16457c	Put implementation details into anonymous namespaces. NFCI.	2021-11-07 15:18:30 +01:00
Kazu Hirata	843d1eda18	[llvm] Use llvm::reverse (NFC)	2021-11-06 19:31:18 -07:00
Kazu Hirata	1b108ab975	[Transforms] Use make_early_inc_range (NFC)	2021-11-02 18:13:23 -07:00
Dmitry Makogon	e09958d5eb	[LoopPeel] Peel loops with exits followed by an unreachable or deopt block Added support for peeling loops with exits that are followed either by an unreachable-terminated block or block that has a terminatnig deoptimize call. All blocks in the sequence must have an unique successor, maybe except for the last one. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D110922	2021-11-02 23:12:04 +07:00
Jun Ma	c93f93b2e3	Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."""" This reverts commit `3a998c06a8`.	2021-11-01 15:31:59 +08:00
Kazu Hirata	c714da2ceb	[Transforms] Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC)	2021-10-31 07:57:32 -07:00
Roman Lebedev	156f10c840	[IR] `SCEVExpander::generateOverflowCheck()`: short-circuit `umul_with_overflow`-by-one It's a no-op, no overflow happens ever: https://alive2.llvm.org/ce/z/Zw89rZ While generally i don't like such hacks, we have a very good reason to do this: here we are expanding a run-time correctness check for the vectorization, and said `umul_with_overflow` will not be optimized out before we query the cost of the checks we've generated. Which means, the cost of run-time checks would be artificially inflated, and after https://reviews.llvm.org/D109368 that will affect the minimal trip count for which these checks are even evaluated. And if they aren't even evaluated, then the vectorized code certainly won't be run. We could consider doing this in IRBuilder, but then we'd need to also teach `CreateExtractValue()` to look into chain of `insertvalue`'s, and i'm not sure there's precedent for that. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 19:45:55 +03:00
Nikita Popov	11a8423dab	[SCEV] Use reverse() (NFC)	2021-10-26 11:08:58 +02:00
Max Kazantsev	9bbfe0f72c	[NFC] Remove obsolete simplifyOnceImpl function The function simplifyOnce only calls simplifyOnceImpl and does nothing else. Having this separate helper makes no sense. Removing it. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D112517 Reviewed By: mkazantsev	2021-10-26 13:51:42 +07:00
Max Kazantsev	d4c74cd4e8	[NFC] [LoopPeel] Update IDoms of non-loop blocks dominated by the loop When peeling a loop, we assume that the latch has a `br` terminator and that all loop exits are either terminated with an `unreachable` or have a terminating deoptimize call. So when we peel off the 1st iteration, we change the IDom of all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support loops with exits that are followed by a block with an `unreachable` or a terminating deoptimize call, changing the exit's idom wouldn't be enough and DT would be broken. For example, let `Exit1` and `Exit2` are loop exits, and each of them unconditionally branches to the same `unreachable` terminated block. So neither of the exits dominates this unreachable block. If we change the IDoms of the exits to some peeled loop block, we don't update the dominators of the unreachable block. Currently we just don't get to the peeling logic, saying that we can't peel such loops. Previously we stored exits' IDoms in a map before peeling a loop and then, after peeling off one iteration, we changed their IDoms. Now we use the same logic not only for exits but for all non-loop blocks dominated by the loop. So when we add logic to support peeling loops with exits which branch, for example, to an unreachable-terminated block, we would update the IDoms not only for exits, but for their successors. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D111611 Reviewed By: mkazantsev, nikic	2021-10-26 13:09:07 +07:00
Nikita Popov	3a995c918e	[SCEV] Move SCEVLostPoisonFlags() check into SCEVExpander Always insert values into ExprValueMap, and instead skip using them in SCEVExpander if poison-generating flags have been lost. This ensures that all values that are in ValueExprMap are also in ExprValueMap, so we can use the latter to invalidate the former. This change is probably not entirely NFC for the case where originally the SCEV had no nowrap flags but they were inferred later, in which case that would now allow reusing the existing value for expansion. Differential Revision: https://reviews.llvm.org/D112389	2021-10-25 22:37:20 +02:00
Nikita Popov	477551fd09	[SCEVExpander] Minor cleanup in value reuse (NFC) Use dyn_cast_or_null and convert one of the checks into an assertion. SCEV is a per-function analysis.	2021-10-25 10:32:17 +02:00
Nikita Popov	710596a1e1	[ConstantFolding] Accept offset in ConstantFoldLoadFromConstPtr (NFCI) As this API is now internally offset-based, we can accept a starting offset and remove the need to create a temporary bitcast+gep sequence to perform an offset load. The API now mirrors the ConstantFoldLoadFromConst() API.	2021-10-23 17:59:39 +02:00
Kazu Hirata	6fe949c4ed	[Target, Transforms] Use StringRef::contains (NFC)	2021-10-22 08:52:33 -07:00
Nikita Popov	1848525842	[CodeMetrics] Don't require speculatability for ephemeral values As discussed in D112016, our current requirement of speculatability for ephemeral is overly strict: What we really care about is that the instruction will be DCEd once the assume is dropped. For that it is sufficient that the instruction is side-effect free and not a terminator. In particular, this allows non-dereferenceable loads to be ephemeral values. Differential Revision: https://reviews.llvm.org/D112179	2021-10-21 20:30:01 +02:00
Florian Hahn	8977bd5806	[IndVars] Invalidate SCEV when IR is changed in rewriteLoopExitValue. At the moment, rewriteLoopExitValue forgets the current phi node in the loop that collects phis to rewrite. A few lines after the value is forgotten, SCEV is used again to analyze incoming values and potentially expand SCEV expression. This means that another SCEV is created for PN, before the IR is actually updated in the next loop. This leads to accessing invalid cached expression in combination with D71539. PN should only be changed once the actual incoming exit value is set in the next loop. Moving invalidation there should ensure that PN is invalidated in all relevant cases. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D111495	2021-10-20 20:48:33 +01:00
Itay Bookstein	08ed216000	[IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol As discussed in: * https://reviews.llvm.org/D94166 * https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html The GlobalIndirectSymbol class lost most of its meaning in https://reviews.llvm.org/D109792, which disambiguated getBaseObject (now getAliaseeObject) between GlobalIFunc and everything else. In addition, as long as GlobalIFunc is not a GlobalObject and getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee is a GlobalIFunc cannot currently be modeled properly. Creating aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition, calling getAliaseeObject on a GlobalIFunc will currently return nullptr, which is undesirable because it should return the object itself for non-aliases. This patch refactors the GlobalIFunc class to inherit directly from GlobalObject, and removes GlobalIndirectSymbol (while inlining the relevant parts into GlobalAlias and GlobalIFunc). This allows for calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc itself, making getAliaseeObject() more consistent and enabling alias-to-ifunc to be properly modeled in the IR. I exercised some judgement in the API clients of GlobalIndirectSymbol: some were 'monomorphized' for GlobalAlias and GlobalIFunc, and some remained shared (with the type adapted to become GlobalValue). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D108872	2021-10-20 10:29:47 -07:00
Florian Hahn	e844f05397	[LoopUtils] Simplify addRuntimeCheck to return a single value. This simplifies the return value of addRuntimeCheck from a pair of instructions to a single `Value `. The existing users of addRuntimeChecks were ignoring the first element of the pair, hence there is not reason to track FirstInst and return it. Additionally all users of addRuntimeChecks use the second returned `Instruction ` just as `Value `, so there is no need to return an `Instruction `. Therefore there is no need to create a redundant dummy `and X, true` instruction any longer. Effectively this change should not impact the generated code because the redundant AND will be folded by later optimizations. But it is easy to avoid creating it in the first place and it allows more accurately estimating the cost of the runtime checks.	2021-10-18 18:03:09 +01:00
Peter Waller	c4603a8a43	[InstCombine][DebugInfo] Remove superflous assertion, add test When this code was added, an unnecessary assertion slipped in which we now hit in real code. Add a test to defend against it firing again.	2021-10-18 11:00:01 +00:00
Max Kazantsev	baad10c09e	Revert "[NFC] [LoopPeel] Change the way DT is updated for loop exits" This reverts commit `fa16329ae0`. See comments in discussion. Merged by mistake, not entirely getting what the problem was.	2021-10-18 17:14:11 +07:00
Max Kazantsev	fa16329ae0	[NFC] [LoopPeel] Change the way DT is updated for loop exits When peeling a loop, we assume that the latch has a `br` terminator and that all loop exits are either terminated with an `unreachable` or have a terminating deoptimize call. So when we peel off the 1st iteration, we change the IDom of all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support loops with exits that are followed by a block with an `unreachable` or a terminating deoptimize call, changing the exit's idom wouldn't be enough and DT would be broken. For example, let `Exit1` and `Exit2` are loop exits, and each of them unconditionally branches to the same `unreachable` terminated block. So neither of the exits dominates this unreachable block. If we change the IDoms of the exits to some peeled loop block, we don't update the dominators of the unreachable block. Currently we just don't get to the peeling logic, saying that we can't peel such loops. With this NFC we just insert edges from cloned exiting blocks to their exits after peeling each iteration (we accumulate the insertion updates and then after peeling apply the updates to DT). This patch was a part of D110922. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D111611 Reviewed By: mkazantsev	2021-10-18 10:23:05 +07:00
Stephen Tozer	f5ed223b0f	[DebugInfo] Limit the size of DIExpressions that we will salvage up to Fixes: https://bugs.llvm.org/show_bug.cgi?id=51841 This patch places an arbitrary limit on the size of DIExpressions that we will produce via salvaging, for performance reasons. This helps to fix a performance issue observed in the bug above, in which debug values would be salvaged hundreds of times, producing expressions with over 1000 elements and causing the compiler to hang. Limiting the size of debug values that we will produce to 128 largely fixes this issue. Reviewed By: dblaikie, jmorse Differential Revision: https://reviews.llvm.org/D110332	2021-10-15 17:34:21 +01:00
Kazu Hirata	81e9c90686	[llvm] Use llvm::is_contained (NFC)	2021-10-14 22:44:09 -07:00
Nikita Popov	69853f9920	[IVUsers] Move preheader check into SCEVExpander Rather than checking for loop nest preheaders upfront in IVUsers, move this requirement into isSafeToExpand() from SCEVExpander. Historically, LSR did not check whether SCEVs are safe to expand and fully relied on IVUsers to validate this. Later, support for non-expandable SCEVs was added via rigid formulas. Checking this in isSafeToExpand() makes it more obvious what exactly this check is guarding against, and avoids the awkward loop nest scan. This is a followup to https://reviews.llvm.org/D111493#3055286. Differential Revision: https://reviews.llvm.org/D111681	2021-10-14 21:52:31 +02:00
Arthur Eubanks	3628bb7436	Make various assume bundle data structures use uint64_t Following D110451, we need to make sure to support 64 bit values.	2021-10-13 10:38:41 -07:00
Hongtao Yu	098a0d8fbc	[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3. This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation. Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are: - Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes - Unblocked CSE by avoiding pseudo probe from clobbering memory SSA - Unblocked induction variable simpliciation - Allow empty loop deletion by treating probe intrinsic isDroppable - Some refactoring. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110847	2021-10-12 09:44:12 -07:00
Florian Hahn	40d85f16c4	[LoopPeel] Use any_of & contains instead of for & find. Using contains was suggested in D108114, but I forgot to include it when landing the patch.	2021-10-12 12:18:01 +01:00
Florian Hahn	cd0ba9dc58	[LoopPeel] Peel if it turns invariant loads dereferenceable. This patch adds a new cost heuristic that allows peeling a single iteration off read-only loops, if the loop contains a load that 1. is feeding an exit condition, 2. dominates the latch, 3. is not already known to be dereferenceable, 4. and has a loop invariant address. If all non-latch exits are terminated with unreachable, such loads in the loop are guaranteed to be dereferenceable after peeling, enabling hoisting/CSE'ing them. This enables vectorization of loops with certain runtime-checks, like multiple calls to `std::vector::at` if the vector is passed as pointer. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D108114	2021-10-12 11:42:28 +01:00
David Sherwood	26b7d9d622	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-11 09:41:38 +01:00
Nikita Popov	ea12adc169	[CanonicalizeFreeze] Drop IVUsers.h include (NFC) Looking for users of IVUsers, this was a false positive. Only LSR uses IVUsers.	2021-10-09 17:01:26 +02:00
Arthur Eubanks	9405217999	Revert "Recommit "[LoopPeel] Peel loops with deoptimizing exits"" This reverts commit `d68b59f3eb`. This is causing crashes, see D110922 for details.	2021-10-08 10:53:23 -07:00
Philip Reames	d694dd0f0d	Add iterator range variants of isGuaranteedToTransferExecutionToSuccessor [mostly-nfc] This factors out utilities for scanning a bounded block of instructions since we have this code repeated in a bunch of places. The change to InlineFunction isn't strictly NFC as the limit mechanism there didn't handle debug instructions correctly.	2021-10-08 09:50:10 -07:00
Max Kazantsev	d68b59f3eb	Recommit "[LoopPeel] Peel loops with deoptimizing exits" Removed obsolete DT verification that should not be there because the strategy of DT updates has changed. Differential Revision: https://reviews.llvm.org/D110922	2021-10-08 17:54:27 +07:00
Max Kazantsev	48a5a2d1af	Revert "[LoopPeel] Peel loops with deoptimizing exits" This reverts commit `8a959625c4`. Reported failures with LLVM_ENABLE_EXPENSIVE_CHECKS, need to investigate.	2021-10-08 16:07:59 +07:00
Max Kazantsev	8a959625c4	[LoopPeel] Peel loops with deoptimizing exits Added support for peeling loops with "deoptimizing" exits - such exits that it or any of its children (or any of their children, etc) either has a @llvm.experimental.deoptimize call prior to the terminating return instruction of this basic block or is terminated with unreachable. All blocks in the the sequence must have a single successor, maybe except for the last one. Previously we only checked the exit block for being deoptimizing. Now we check if the last reachable block from the exit is deoptimizing. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D110922 Reviewed By: mkazantsev	2021-10-08 10:32:13 +07:00
Bjorn Pettersson	7f93bb4a58	[LoopRotate] Forget SCEV values in RewriteUsesOfClonedInstructions This patch fixes problems reported in PR51981. When rotating a loop it isn't enough to just forget SCEV for that loop nest. When rotating we might clone some instructions from the old header into the preheader, and insert new PHI nodes to merge values together. There could be users of the original value that are updated to use the PHI result. And those users were not necessarily depending on a PHI node earlier, so they weren't cleaned up when just forgetting all SCEV:s for the loop nest. So we need to explicitly forget those values to avoid invalid cached SCEV expressions. Reviewed By: fhahn, mkazantsev Differential Revision: https://reviews.llvm.org/D110813	2021-10-07 19:36:30 +02:00
Itay Bookstein	40ec1c0f16	[IR][NFC] Rename getBaseObject to getAliaseeObject To better reflect the meaning of the now-disambiguated {GlobalValue, GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction (D109792), the function is renamed to getAliaseeObject.	2021-10-06 19:33:10 -07:00
Arthur Eubanks	72dddce652	More size_t -> uint64_t fixes after `05392466` Fixes some bots where the two differ.	2021-10-06 15:13:47 -07:00
Arthur Eubanks	ab7d421869	size_t -> uint64_t after `05392466` Fixes some bots where the two differ.	2021-10-06 13:52:04 -07:00
Arthur Eubanks	569346f274	Revert "Reland [IR] Increase max alignment to 4GB" This reverts commit `8d64314ffe`.	2021-10-06 11:38:11 -07:00
Arthur Eubanks	1b76312e98	Update some types after D110451 To fix mismatched size_t vs uint64_t on some platforms.	2021-10-06 11:27:48 -07:00
Simon Pilgrim	0dcd2b40e6	[TTI] Remove default condition type and predicate arguments from getCmpSelInstrCost We need to be better at exposing the comparison predicate to getCmpSelInstrCost calls as some targets (e.g. X86 SSE) have very different costs for different comparisons (PR48337), and we can't always rely on the optional Instruction argument. This initial commit requires explicit condition type and predicate arguments. The next step will be to review a lot of the existing getCmpSelInstrCost calls which have used BAD_ICMP_PREDICATE even when the predicate is known. Differential Revision: https://reviews.llvm.org/D111024	2021-10-06 15:40:35 +01:00
Simon Pilgrim	21661607ca	[llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine) As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.	2021-10-06 12:04:30 +01:00
kpyzhov	7e390dfea7	[AMDGPU] Correction to `095c48fdf3`. Differential Revision: https://reviews.llvm.org/D110337	2021-10-05 21:47:25 -04:00
kpyzhov	095c48fdf3	[AMDGPU] Use "hostcall" module flag instead of searching for ockl_hostcall_internal() declaration. The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module seems to be not reliable enough. The LTO may rename the "ockl_hostcall_internal()" function when an application is compiled with "-fgpu-rdc", and MetadataStreamer pass to fail to detect hostcalls, therefore it does not set the "hidden_hostcall_buffer" kernel argument. This change adds a new module flag: hostcall that can be used to detect whether GPU functions use host calls for printf. Differential revision: https://reviews.llvm.org/D110337	2021-10-05 09:56:04 -04:00
Sjoerd Meijer	cdfc678572	[SCCPSolver] Fix use-after-free in markArgInFuncSpecialization In SCCPSolver::markArgInFuncSpecialization, the ValueState map may be reallocated after the initial ValueLatticeElement reference is grabbed, but before its use in copy initialization. This causes a use-after-free. To fix this, this commit changes the behavior to create the new ValueLatticeElement before assigning the old one to it. Patch by: https://github.com/duck-37/ Differential Revision: https://reviews.llvm.org/D111112	2021-10-05 12:56:32 +01:00
Bjorn Pettersson	7f84fa4ad4	[TargetLibraryInfo] Refactor size_t checks in isValidProtoForLibFunc. NFC In TargetLibraryInfoImpl::isValidProtoForLibFunc we no longer need the IsSizeTTy lambda function and the SizeTTy object. Instead we just follow the regular structure of checking for integer types given an exepected number of bits.	2021-10-04 15:46:39 +02:00
Jay Foad	a9bceb2b05	[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC. Stop using APInt constructors and methods that were soft-deprecated in D109483. This fixes all the uses I found in llvm, except for the APInt unit tests which should still test the deprecated methods. Differential Revision: https://reviews.llvm.org/D110807	2021-10-04 08:57:44 +01:00
Dávid Bolvanský	5f2f611880	Fixed more warnings in LLVM produced by -Wbitwise-instead-of-logical	2021-10-03 13:58:10 +02:00
Arthur Eubanks	a7b4ce9cfd	[NFC][AttributeList] Replace index_begin/end with an iterator We expose the fact that we rely on unsigned wrapping to iterate through all indexes. This can be confusing. Rather, keeping it as an implementation detail through an iterator is less confusing and is less code. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D110885	2021-10-01 10:17:41 -07:00
Kazu Hirata	4f0225f6d2	[Transforms] Migrate from getNumArgOperands to arg_size (NFC) Note that getNumArgOperands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-01 09:57:40 -07:00
Krasimir Georgiev	685f1bfd0a	Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns" It appears to cause stage2 clang build failures, e.g., https://lab.llvm.org/buildbot/#/builders/74/builds/7145. This reverts commit `1fb37334bd`.	2021-10-01 11:39:43 +02:00
David Sherwood	1fb37334bd	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-01 08:41:03 +01:00
Kazu Hirata	f631173d80	[llvm] Migrate from arg_operands to args (NFC) Note that arg_operands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-09-30 08:51:21 -07:00
Anna Thomas	452714f8f8	[BPI] Keep BPI available in loop passes through LoopStandardAnalysisResults This is analogous to D86156 (which preserves "lossy" BFI in loop passes). Lossy means that the analysis preserved may not be up to date with regards to new blocks that are added in loop passes, but BPI will not contain stale pointers to basic blocks that are deleted by the loop passes. This is achieved through BasicBlockCallbackVH in BPI, which calls eraseBlock that updates the data structures in BPI whenever a basic block is deleted. This patch does not have any changes in the upstream pipeline, since none of the loop passes in the pipeline use BPI currently. However, since BPI wasn't previously preserved in loop passes, the loop predication pass was invoking BPI on the entire function every time it ran in an LPM. This caused massive compile time in our downstream LPM invocation which contained loop predication. See updated test with an invocation of a loop-pipeline containing loop predication and -debug-pass turned ON. Reviewed-By: asbirlea, modimo Differential Revision: https://reviews.llvm.org/D110438	2021-09-30 10:27:05 -04:00
Djordje Todorovic	f8dfc35256	NFC: [Debugify] Fix a typo when checking variables in the original mode	2021-09-29 04:35:10 -07:00
Adrian Prantl	1b998a5f0c	Add salvageDebugInfo support for truncating/extending ptr/int conversions. This patch enables debug info salvaging for truncating/extending ptr int conversions. The testcase uncovered a bug in adce, which is addressed separately. rdar://80227769 Differential Revision: https://reviews.llvm.org/D110461	2021-09-28 10:24:50 -07:00
Congzhe Cao	c42772752a	[CodeMoverUtils] Enhance isSafeToMoveBefore() when control flow equivalence is satisfied With improved analysis in determining CFG equivalence that does not require strict dominance and post-dominance conditions, we now relax isSafeToMoveBefore() such that an instruction I can be moved before InsertPoint even if they do not strictly dominate each other, as long as they follow the same control flow path. For example, we can move Instruction 0 before Instruction 1, and vice versa. ``` if (cond1) // Instruction 0: %add = add i32 1, 2 if (cond1) // Instruction 1: %add2 = add i32 2, 1 ``` Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D110456	2021-09-27 18:37:36 -04:00
Jun Ma	3a998c06a8	Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""" This reverts commit `8ba2adcf9e`.	2021-09-27 20:39:05 +08:00
Congzhe Cao	751be2a064	[CodeMoverUtils] Enhance isSafeToMoveBefore() when moving BBs When moving an entire basic block BB before InsertPoint, currently we check for all instructions whether the operands dominates InsertPoint, however, this can be improved such that even an operand does not dominate InsertPoint, as long as it appears as a previous instruction in the same BB, it is safe to move. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D110378	2021-09-24 05:48:15 -04:00
Fangrui Song	1a6e1ee42a	Resolve {GlobalValue,GloalIndirectSymol}::getBaseObject confusion While both GlobalAlias and GlobalIFunc are GlobalIndirectSymbol, their `getIndirectSymbol()` usage is quite different (GlobalIFunc's resolver is an entity different from GlobalIFunc itself). As discussed on https://lists.llvm.org/pipermail/llvm-dev/2020-September/144904.html ("[IR] Modelling of GlobalIFunc"), the name `getBaseObject` is confusing when used with GlobalIFunc. To resolve the confusion: * Move GloalIndirectSymol::getBaseObject to GlobalAlias:: (GlobalIFunc should use `getResolver` instead) * Change GlobalValue::getBaseObject not to inspect GlobalIFunc. Note: the function has 7 references. * Add GlobalIFunc::getResolverFunction to peel off potential ConstantExpr indirection (`strlen` in `test/LTO/Resolution/X86/ifunc.ll`) Note: GlobalIFunc::getResolver (like GlobalAlias::getAliasee which does not peel off ConstantExpr indirection) is kept to be used by ValueEnumerator. Reviewed By: ibookstein Differential Revision: https://reviews.llvm.org/D109792	2021-09-23 09:23:35 -07:00
Simon Pilgrim	5f2c53bdf4	Pass some DataLayout arguments by const-ref Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-23 15:50:31 +01:00
Bjorn Pettersson	85a586501b	[BasicBlockUtils] Fixup of an assumed typo in MergeBlockIntoPredecessor The NFC commit `e5692a564a` changed the logic for DomTreeUpdates to use the range [succ_begin, succ_begin) when looking for SuccsOfPredBB rather than using [succ_begin, succ_end). As the commit was NFC this is identified as a typo (it has been discussed briefly in phabricator). The typo was found when inspecting the code, so I've got no idea if changing back to the old range has any significant impact (such as solving any PR:s or causing some new problems). But at least this restores the code to the originally indented behavior.	2021-09-23 13:03:26 +02:00
Arthur Eubanks	e7249e4acf	[SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest When determining whether to fold branches to a common destination by merging two blocks, SimplifyCFG will count the number of instructions to be moved into the first basic block. However, there's no reason to count free instructions like bitcasts and other similar instructions. This resolves missed branch foldings with -fstrict-vtable-pointers in llvm-test-suite's lambda benchmark. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108837	2021-09-22 09:52:37 -07:00
Max Kazantsev	073b254cff	[SimplifyCFG] Redirect switch cases that lead to UB into an unreachable block When following a case of a switch instruction is guaranteed to lead to UB, we can safely break these edges and redirect those cases into a newly created unreachable block. As result, CFG will become simpler and we can remove some of Phi inputs to make further analyzes easier. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D109428 Reviewed By: lebedev.ri	2021-09-21 10:45:19 +07:00
Nikita Popov	0fc624f029	[IR] Return AAMDNodes from Instruction::getMetadata() (NFC) getMetadata() currently uses a weird API where it populates a structure passed to it, and optionally merges into it. Instead, we can return the AAMDNodes and provide a separate merge() API. This makes usages more compact. Differential Revision: https://reviews.llvm.org/D109852	2021-09-16 21:06:57 +02:00
Arthur Eubanks	d49cb5b303	[SimplifyCFG] Add bonus when seeing vector ops to branch fold to common dest This makes some tests in vector-reductions-logical.ll more stable when applying D108837. The cost of branching is higher when vector ops are involved due to potential SLP transformations. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108935	2021-09-16 10:50:36 -07:00
Kazu Hirata	24c8eaec94	[Transforms] Use make_early_inc_range (NFC)	2021-09-15 19:55:24 -07:00
Owen Anderson	68079ef0eb	Teach SimplifyCFG to fold switches into lookup tables in more cases. In particular, it couldn't handle cases where lookup table constant expressions involved bitcasts. This does not seem to come up frequently in C++, but comes up reasonably often in Rust via `#[derive(Debug)]`. Originally reported by pcwalton. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109565	2021-09-15 22:07:08 +00:00
Anna Thomas	b6cb03e6b9	Revert use of getUniqueUndroppableUser in AssumeBundleBuilder Fix build bot failure in rG4ac4e521 caused due to assumeBundleBuilder using new API (getUniqueUndroppableUser). We now continue using the existing API for AssumeBundleBuilder (getSingleUndroppableUser). Sorry for the noise here. Tests-Run: failing testcase passes.	2021-09-15 17:45:09 -04:00
Anna Thomas	4ac4e52189	[InstCombine] Improve TryToSinkInstruction with multiple uses This patch allows sinking an instruction which can have multiple uses in a single user. We were previously over-restrictive by looking for exactly one use, rather than one user. Also, the API for retrieving undroppable user has been updated accordingly since in both usecases (Attributor and InstCombine), we seem to care about the user, rather than the use. Reviewed-By: nikic Differential Revision: https://reviews.llvm.org/D109700	2021-09-15 20:39:38 +00:00
Markus Lavin	1ac209ed76	[NPM] Added -print-pipeline-passes print params for a few passes. Added '-print-pipeline-passes' printing of parameters for those passes declared with _WITH_PARAMS macro in PassRegistry.def. Note that it only prints the parameters declared inside _WITH_PARAMS as in a few cases there appear to be additional parameters not parsable. The following passes are now covered (i.e. all of those with *_WITH_PARAMS in PassRegistry.def). LoopExtractorPass - loop-extract HWAddressSanitizerPass - hwsan EarlyCSEPass - early-cse EntryExitInstrumenterPass - ee-instrument LowerMatrixIntrinsicsPass - lower-matrix-intrinsics LoopUnrollPass - loop-unroll AddressSanitizerPass - asan MemorySanitizerPass - msan SimplifyCFGPass - simplifycfg LoopVectorizePass - loop-vectorize MergedLoadStoreMotionPass - mldst-motion GVN - gvn StackLifetimePrinterPass - print<stack-lifetime> SimpleLoopUnswitchPass - simple-loop-unswitch Differential Revision: https://reviews.llvm.org/D109310	2021-09-15 08:34:04 +02:00
Kazu Hirata	abca4c012f	[Utils] Use make_early_inc_range (NFC)	2021-09-13 08:57:23 -07:00
Johannes Doerfert	c09fbbdcfb	Reapply "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals"" This reapplies commit `7dbba3376f`, or, put differently, this reverts commit `d9a8d20827`. The test now requires the amdgpu and nvptx backend explicitly as it won't work without properly.	2021-09-10 15:22:56 -05:00
Johannes Doerfert	d9a8d20827	Revert "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals" This reverts commit `7dbba3376f`. There seems to be a problem with the tests, investigating now: https://lab.llvm.org/buildbot/#/builders/61/builds/14574	2021-09-10 12:23:08 -05:00
Johannes Doerfert	7dbba3376f	[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals Not all address spaces support initializers for globals and we can therefore not set them without checking if they are allowed. This patch adds a hook into TTI to check if an AS allows non-undef initializers. We disable it for all but address space 0 by default, NVPTX and AMDGPU targets allow all but address space 3. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D109337	2021-09-10 12:08:50 -05:00
Chris Lattner	735f46715d	[APInt] Normalize naming on keep constructors / predicate methods. This renames the primary methods for creating a zero value to `getZero` instead of `getNullValue` and renames predicates like `isAllOnesValue` to simply `isAllOnes`. This achieves two things: 1) This starts standardizing predicates across the LLVM codebase, following (in this case) ConstantInt. The word "Value" doesn't convey anything of merit, and is missing in some of the other things. 2) Calling an integer "null" doesn't make any sense. The original sin here is mine and I've regretted it for years. This moves us to calling it "zero" instead, which is correct! APInt is widely used and I don't think anyone is keen to take massive source breakage on anything so core, at least not all in one go. As such, this doesn't actually delete any entrypoints, it "soft deprecates" them with a comment. Included in this patch are changes to a bunch of the codebase, but there are more. We should normalize SelectionDAG and other APIs as well, which would make the API change more mechanical. Differential Revision: https://reviews.llvm.org/D109483	2021-09-09 09:50:24 -07:00
Kazu Hirata	92c9ff6d5f	[IR, Transforms] Use arg_empty (NFC)	2021-09-09 08:50:10 -07:00
Roman Lebedev	909cba9699	[SimplifyCFG] performBranchToCommonDestFolding(): require block-closed SSA form for bonus instructions (PR51125) I can't seem to wrap my head around the proper fix here, we should be fine without this requirement, iff we can form this form, but the naive attempt (https://reviews.llvm.org/D106317) has failed. So just to unblock the release, put up a restriction. Fixes https://bugs.llvm.org/show_bug.cgi?id=51125	2021-09-09 12:28:09 +03:00
Jun Ma	8ba2adcf9e	Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."" Differential Revision: https://reviews.llvm.org/D106056	2021-09-09 16:53:33 +08:00
Andrew Litteken	144cd22bae	[CodeExtractor] Creating exit stubs based off original order branch instructions. Previously the CodeExtractor created exit stubs, and the subsequent return value of the outlined function based on the order of out-of-region blocks after splitting any phi nodes, and collecting the blocks to be outlined. This could cause differences in order if there was a difference of exit block phi nodes between the two regions. This patch moves the collection of the output target blocks to be before this occurs, so that the assignment of target block to output value will be the same, regardless of the contents of the output block. Reviewers: paquette, roelofs Differential Revision: https://reviews.llvm.org/D108657	2021-09-08 15:15:15 -07:00
Nikita Popov	6dfdc6bfd2	[SROA] Support opaque pointers Make the following changes in order to support opaque pointers in SROA: * Generate i8 GEPs for opaque pointers. * Explicitly enforce that promotable allocas only have stores of the alloca type -- previously this was implicitly enforced. * Replace a check for pointer element type with load/store type. Differential Revision: https://reviews.llvm.org/D109259	2021-09-08 22:25:44 +02:00
Akira Hatanaka	dea6f71af0	[ObjC][ARC] Use the addresses of the ARC runtime functions instead of integer 0/1 for the operand of bundle "clang.arc.attachedcall" https://reviews.llvm.org/D102996 changes the operand of bundle "clang.arc.attachedcall". This patch makes changes to llvm that are needed to handle the new IR. This should make it easier to understand what the IR is doing and also simplify some of the passes as they no longer have to translate the integer values to the runtime functions. Differential Revision: https://reviews.llvm.org/D103000	2021-09-08 11:58:03 -07:00
Max Kazantsev	29d054bf12	[SimplifyCFG] Preserve knowledge about guarding condition by adding assume This improvement adds "assume" after removal of branch basing on UB in successor block. Consider the following example: ``` pred: x = ... cond = x > 10 br cond, bb, other.succ bb: phi [nullptr, pred], ... // other possible preds load(phi) // UB if we came from pred other.succ: // here we know that x <= 10, but this knowledge is lost // after the branch is turned to unconditional unless we // preserve it with assume. ``` If we remove the branch basing on knowledge about UB in a successor block, then the fact that x <= 10 is other.succ might be lost if this condition is not inferrable from any dominating condition. To preserve this knowledge, we can add assume intrinsic with (possibly inverted) branch condition. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D109054 Reviewed By: lebedev.ri	2021-09-08 14:05:17 +07:00
Andy Kaylor	34528c32d2	Copy Elementtype Attribute to IR at Link step Copying IR during linking causes a type mismatch due to the field being missing in IRMover/Valuemapper. Adds the full range of typed attributes including elementtype attribute in the copy functions. Patch by Chenyang Liu Differential Revision: https://reviews.llvm.org/D108796	2021-09-07 11:41:43 -07:00
Kazu Hirata	5648f7170e	[Analysis, Target, Transforms] Construct SmallVector with iterator ranges (NFC)	2021-09-07 09:19:33 -07:00
Dávid Bolvanský	9c476172b9	[InstCombine] stpcpy(d,s) -> strcpy(d,s) if the result is not used	2021-09-05 12:12:07 +02:00
Philip Reames	fa82a3d016	[runtimeunroll] Support epilogue unrolling with a parent loop This patch adds support for unrolling inner loops using epilogue unrolling. The basic issue is that the original latch exit block of the inner loop could be outside the outer loop. When we clone the inner loop and split the latch exit, the cloned blocks need to be in the outer loop. Differential Revision: https://reviews.llvm.org/D108476	2021-09-02 16:29:20 -07:00
Philip Reames	45c672e20d	[runtimeunroll] Under EXPENSIVE_CHECKS, validate loop info Requested in review comment on D108476	2021-09-02 16:28:46 -07:00
Nikita Popov	c86e1ce73b	[SCEVExpander] Simplify pointer overflow check This is a followup to D104662 to generate slightly nicer code for pointer overflow checks. Bypass expandAddToGEP and instead explicitly generate i8 GEPs. This saves some bitcasts and negates the value in a more obvious way. In particular, this prevents SCEV from looking through the umul.with.overflow, same as in the integer case. The wrapping-pointer-ni.ll test deserves a comment: Previously, this generated a typed GEP which used the umulo argument rather than the multiplication result. This results in more compact IR in that case, but effectively does the multiplication twice, the second one is just hidden in the GEP. Reusing the umulo result seems pretty reasonable to me. Differential Revision: https://reviews.llvm.org/D109093	2021-09-02 20:15:59 +02:00
Philip Reames	c3b3aa277a	Fix a missing MemorySSA update in breakLoopBackedge This is a case I'd missed in 6a8237. The odd bit here is that missing the edge removal update seems to produce MemorySSA which verifies, but is still corrupt in a way which bothers following passes. I wasn't able to reduce a single pass test case, which is why the reported test case is taken as is. Differential Revision: https://reviews.llvm.org/D109068	2021-09-01 16:59:01 -07:00
Philip Reames	e735f2bf37	[SCEVExpander] Prefer pointer expansion for overflow checks We'd special cased this logic to use pointer types for non-integral pointers, but there's no reason we can't do that for all pointer types. Doing it this was has a few advantages: a) The code itself becomes more straight forward, and easier to test. b) We avoid introducing ptrtoint into programs which didn't have them in the source. c) The resulting codegen is easier to analyze and simplify (mostly due to lack of ptrtoint). Note that there are some test diffs, but a) running them through instcombine helps a ton, and b) there's enough missing obvious transforms on both before and after IR that it's clear this isn't performance sensitive. This is mostly motivated by cleaning up mentions of non-integrals to have a clearer idea of what we actually need to support. Differential Revision: https://reviews.llvm.org/D104662	2021-09-01 13:11:25 -07:00
Arthur Eubanks	52e6d70c40	[NFC] Use newly introduced *AtIndex methods Introduced in D108788. These are clearer.	2021-09-01 11:18:41 -07:00
Philip Reames	b604fcb7bc	[runtime] Move prolog/epilog block to a post-simplify strategy The runtime unroller will try to produce a non-loop if the unroll count is 2 and thus the prolog/epilog loop would only run at most one iteration. The old implementation did this by avoiding loop construction entirely. This patches instead constructs the trivial loop and then explicitly breaks the backedge and simplifies. This does result in some additional code churn when triggered, but a) results in better quality code and b) removes a codepath which didn't work properly for multiple exit epilogs. One oddity that I want to draw to reviewer attention is that this somehow changes revisit order. The new order looks equivalent to me, but I don't understand how creating and erasing an extra loop here creates this effect. Differential Revision: https://reviews.llvm.org/D108521	2021-08-31 09:29:36 -07:00
Doug Beck	ed6cff667e	Fix typo s/beloinging/belonging Differential Revision: https://reviews.llvm.org/D107099	2021-08-31 12:01:50 +05:30
Roman Lebedev	795d142d23	[NFCI][IndVars] rewriteLoopExitValues(): don't expand SCEV's until needed Previously, we'd expand ALL the SCEV's eagerly, because we needed to check with `isValidRewrite()`, and discard bad rewrite candidates, but now that we do not do that, we also don't need to always expand. In particular, this avoids expanding potentially-huge SCEV's that we would discard anyways because they are high-cost and we aren't rewriting aggressively.	2021-08-30 12:28:24 +03:00
Roman Lebedev	7b0d59da9a	[IndVars] Drop check for the validity of rewrite `isValidRewrite()` checks that the both the original SCEV, and the rewrite SCEV have the same base pointer. I //believe//, after all the recent SCEV improvements, this invariant is already enforced by SCEV itself. I originally tried changing it into an assert in D108043, but that showed that it triggers on e.g. https://reviews.llvm.org/D108043#2946621, where SCEV manages to forward the store to load, test added. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D108655	2021-08-30 12:06:58 +03:00
Nikita Popov	9f7873784d	[SCEVExpander] Reuse removePointerBase() for canonical addrecs ExposePointerBase() in SCEVExpander implements basically the same functionality as removePointerBase() in SCEV, so reuse it. The SCEVExpander code assumes that the pointer operand on adds is the last one -- I'm not sure that always holds. As such this might not be strictly NFC.	2021-08-29 21:12:35 +02:00
Nikita Popov	0886fd5b3a	[SCEVExpander] Remove unnecessary mul/udiv check (NFC) Pointer-typed SCEV expressions can no longer be mul or udiv, so we do not need to specially handle them here.	2021-08-29 20:47:00 +02:00
Nikita Popov	3f162e8e6d	[SCEVExpander] Assert single pointer op in add (NFC) There can only be one pointer operand in an add expression, and we have sorted operands to guarantee that it is the first. As such, the pointer check for other operands is dead code.	2021-08-29 20:30:56 +02:00
Philip Reames	6a82376012	Special case common branch patterns in breakLoopBackedge (try 2) Changes since aec08e: * Adjust placement of a closing brace so that the general case actually runs. Turns out we had no coverage of the switch case. I added one in `eae90fd`. * Drop .llvm.loop.* metadata from the new branch as there is no longer a loop to annotate. Original commit message: This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability. I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.	2021-08-27 10:27:16 -07:00
Andrew Litteken	9d2c859ebb	[CodeExtractor] Making the arguments outlined easier to access from the outside The Code Extractor does not provide an easy mechanism for determining the inputs and outputs after extraction has occurred, this patch gives the ability to pass in empty SetVectors to be filled with the inputs and outputs if they need to be analyzed. Added Tests: - InputOutputMonitoring in unittests/Transforms/Utils/CodeExtractorTests.cpp Reviewers: paquette Differential Revision: https://reviews.llvm.org/D106991	2021-08-26 09:47:53 -07:00
Vyacheslav Zakharin	2e192ab1f4	[CodeExtractor] Preserve topological order for the return blocks. Differential Revision: https://reviews.llvm.org/D108673	2021-08-25 08:09:01 -07:00
Florian Hahn	90d09eb300	[LoopPeel] Allow peeling with multiple unreachable-terminated exit blocks. Support for peeling with multiple exit blocks was added in D63921/77bb3a486fa6. So far it has only been enabled for loops where all non-latch exits are 'de-optimizing' exits (D63923). But peeling of multi-exit loops can be highly beneficial in other cases too, like if all non-latch exiting blocks are unreachable. The motivating case are loops with runtime checks, like the C++ example below. The main issue preventing vectorization is that the invariant accesses to load the bounds of B is conditionally executed in the loop and cannot be hoisted out. If we peel off the first iteration, they become dereferenceable in the loop, because they must execute before the loop is executed, as all non-latch exits are terminated with unreachable. This subsequently allows hoisting the loads and runtime checks out of the loop, allowing vectorization of the loop. int sum(std::vector<int> A, std::vector<int> B, int N) { int cost = 0; for (int i = 0; i < N; ++i) cost += A->at(i) + B->at(i); return cost; } This gives a ~20-30% increase of score for Geekbench5/HDR on AArch64. Note that this requires a follow-up improvement to the peeling cost model to actually peel iterations off loops as above. I will share that shortly. Also, peeling of multi-exits might be beneficial for exit blocks with other terminators, but I would like to keep the scope limited to known high-reward cases for now. I removed the option to disable peeling for multi-deopt exits because the code is more general now. Alternatively, the option could also be generalized, but I am not sure if there's much value in the option? Reviewed By: reames Differential Revision: https://reviews.llvm.org/D108108	2021-08-25 13:26:40 +01:00
Philip Reames	1e07f19bfc	Revert "Special case common branch patterns in breakLoopBackedge" This reverts commit `aec08e8600`. Several problems have been reported with malformed loopinfo after this change, see discussion on https://reviews.llvm.org/rGaec08e86004b.	2021-08-24 08:53:42 -07:00
Philip Reames	d8d84c9df8	[runtimeunroll] Use early return to reduce nesting [nfc]	2021-08-22 11:34:50 -07:00
Philip Reames	aec08e8600	Special case common branch patterns in breakLoopBackedge This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability. I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.	2021-08-22 10:42:23 -07:00
Alexander Potapenko	b0391dfc73	[clang][Codegen] Introduce the disable_sanitizer_instrumentation attribute The purpose of __attribute__((disable_sanitizer_instrumentation)) is to prevent all kinds of sanitizer instrumentation applied to a certain function, Objective-C method, or global variable. The no_sanitize(...) attribute drops instrumentation checks, but may still insert code preventing false positive reports. In some cases though (e.g. when building Linux kernel with -fsanitize=kernel-memory or -fsanitize=thread) the users may want to avoid any kind of instrumentation. Differential Revision: https://reviews.llvm.org/D108029	2021-08-20 14:01:06 +02:00
Roman Lebedev	5d4f37e895	[NFCI][SimplifyCFG] Rewrite `createUnreachableSwitchDefault()` The only thing that function should do as per it's semantic, is to ensure that the switch's default is a block consisting only of an `unreachable` terminator. So let's just create such a block and update switch's default to point to it. There should be no need for all this weird dance around predecessors/successors.	2021-08-20 13:28:08 +03:00
Akira Hatanaka	898dc4590c	Refactor inlineRetainOrClaimRVCalls. NFC This is in preparation for committing https://reviews.llvm.org/D103000.	2021-08-19 14:55:45 -07:00
Arthur Eubanks	44a3241f10	[NFC] Replace some attribute methods that use confusing indexes	2021-08-19 14:10:26 -07:00
Philip Reames	17b9cb1817	[runtimeunroll] Support multiple exits to latch exit w/prolog loop This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used. This is the prolog companion to D107381. Since this was LGTMed, a problem with DT updating was reported against that patch. I roled in the analogous fix here as it seemed obvious, and not worth re-review. As an aside, our prolog form leaves a lot of potential value on the floor when there is an invariant load or invariant condition in the loop being runtime unrolled. We should probably consider a "required prolog" heuristic. (Alternatively, maybe we should be peeling these cases more aggressively?) Differential Revision: https://reviews.llvm.org/D108262	2021-08-19 11:43:52 -07:00
Philip Reames	447256f22b	[runtimeunroll] Fix reported DT verification error after `94d0914` In `94d0914`, I added support for unrolling of multiple exit loops which have multiple exits reaching the latch. Per reports on the review post commit, I'd missed updating the domtree for one case. This fix addresses that ommission. There's no new test as this is covered by existing tests with expensive verification turned on.	2021-08-19 11:06:17 -07:00
Arthur Eubanks	33d44b762e	[OpaquePtr][Inline] Use byval type instead of pointee type Reviewed By: #opaque-pointers, dblaikie Differential Revision: https://reviews.llvm.org/D105711	2021-08-19 09:56:08 -07:00
Sanjay Patel	ec54e275f5	Revert "[CVP] processSwitch: Remove default case when switch cover all possible values." This reverts commit `9934a5b2ed`. This patch may cause miscompiles because it missed a constraint as shown in the examples from: https://llvm.org/PR51531	2021-08-19 08:43:51 -04:00
Arthur Eubanks	fde0eb1f9a	[NFC] A couple more removeAttribute() cleanups	2021-08-18 11:15:20 -07:00
Arthur Eubanks	3f4d00bc3b	[NFC] More get/removeAttribute() cleanup	2021-08-17 21:05:41 -07:00
Arthur Eubanks	de0ae9e89e	[NFC] Cleanup more AttributeList::addAttribute()	2021-08-17 21:05:41 -07:00
Arthur Eubanks	ad727ab7d9	[NFC] Migrate some callers away from Function/AttributeLists methods that take an index These methods can be confusing.	2021-08-17 21:05:40 -07:00
Arthur Eubanks	46cf82532c	[NFC] Replace Function handling of attributes with less confusing calls To avoid magic constants and confusing indexes.	2021-08-17 21:05:40 -07:00
Jun Ma	9934a5b2ed	[CVP] processSwitch: Remove default case when switch cover all possible values. Differential Revision: https://reviews.llvm.org/D106056	2021-08-18 10:23:13 +08:00
Philip Reames	94d0914292	[runtimeunroll] Support multiple exits to latch exit w/epilogue loop This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used. I decided to restrict this to the epilogue case. Given the changes ended up being pretty generic, we may be able to unblock the prolog case too, but I want to do that in a separate change to reduce the amount of code we all have to understand at one time. Differential Revision: https://reviews.llvm.org/D107381	2021-08-17 17:52:04 -07:00
Philip Reames	982da7a20c	[SCEVExpander] Stop hoisting IR when reusing phis his is a fix for PR43678, and is an alternate patch to D105723. The basic issue we're running into is that LSR + SCEVExpander are moving the very instruction whose operand we're in the process of expanding. This breaks the subtle and ill-documented invariant which let LSR work. (Full story can be found here: https://reviews.llvm.org/D105723#2878473) Rather than attempting a fix, this change just removes the optimization entirely. The code is entirely untested, and removing it appears to have no impact I can find. This code was added back in 2014 by `1e12f8563d` with a single test which does not seem to actually test the hoisting logic. From a philosophical standpoint, it also seems very strange to have the expander implementing optimizations which should live in a dedicated transform pass. Differential Revision: https://reviews.llvm.org/D106178	2021-08-17 09:38:32 -07:00
Arthur Eubanks	0d822da2bd	[NFC] Remove/replace some confusing attribute getters on Function	2021-08-16 16:12:37 -07:00
Nikita Popov	735a590471	[MemorySSA] Remove -enable-mssa-loop-dependency option This option has been enabled by default for quite a while now. The practical impact of removing the option is that MSSA use cannot be disabled in default pipelines (both LPM and NPM) and in manual LPM invocations. NPM can still choose to enable/disable MSSA using loop vs loop-mssa. The next step will be to require MSSA for LICM and drop the AST-based implementation entirely. Differential Revision: https://reviews.llvm.org/D108075	2021-08-16 20:59:37 +02:00
Nikita Popov	570c9beb8e	[MemorySSA] Remove unnecessary MSSA dependencies LoopLoadElimination, LoopVersioning and LoopVectorize currently fetch MemorySSA when construction LoopAccessAnalysis. However, LoopAccessAnalysis does not actually use MemorySSA and we can pass nullptr instead. This saves one MemorySSA calculation in the default pipeline, and thus improves compile-time. Differential Revision: https://reviews.llvm.org/D108074	2021-08-16 20:40:55 +02:00
Roman Lebedev	febcedf18c	Revert "[NFCI][IndVars] rewriteLoopExitValues(): nowadays SCEV should not change `GEP` base pointer" https://bugs.llvm.org/show_bug.cgi?id=51490 was filed. This reverts commit `35a8bdc775`.	2021-08-16 14:30:29 +03:00
David Sherwood	9b19b77883	[NFC] Remove unused code in llvm::createSimpleTargetReduction	2021-08-16 09:50:45 +01:00
Roman Lebedev	2eb554a9fe	Revert "Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" This is still wrong, as failing bots suggest. This reverts commit `3d9beefc7d`.	2021-08-16 11:07:42 +03:00
Roman Lebedev	3d9beefc7d	Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125) ... with test change this time. LLVM IR SSA form is "implicit" in `@pr51125`. While is a valid LLVM IR, and does not require any PHI nodes, that completely breaks the further logic in `CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses()` that updates the live-out uses of the bonus instructions. What i believe we need to do, is to first make the SSA form explicit, by inserting tautological PHI nodes, and rewriting the offending uses. ``` $ /builddirs/llvm-project/build-Clang12/bin/opt -load /repositories/alive2/build-Clang-release/tv/tv.so -load-pass-plugin /repositories/alive2/build-Clang-release/tv/tv.so -tv -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=10 -tv -o /dev/null /tmp/test.ll ---------------------------------------- @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: br label %L %L: %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 br i1 %iszero, label %exit, label %L2 %L2: store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp eq i32 %ld, 4294967295 br i1 %cmp, label %L, label %exit %exit: %r = phi i32 [ %ld, %L2 ], [ %ld, %L ] ret i32 %r } => @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: %ld.old = load i32, * @global_pr51125, align 4 %iszero.old = icmp eq i32 %ld.old, 0 br i1 %iszero.old, label %exit, label %L2 %L2: %ld2 = phi i32 [ %ld.old, %entry ], [ %ld, %L2 ] store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp ne i32 %ld2, 4294967295 %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 %or.cond = select i1 %cmp, i1 1, i1 %iszero br i1 %or.cond, label %exit, label %L2 %exit: %ld1 = phi i32 [ poison, %L2 ], [ %ld.old, %entry ] %r = phi i32 [ %ld2, %L2 ], [ %ld.old, %entry ] ret i32 %r } Transformation seems to be correct! ``` Fixes https://bugs.llvm.org/show_bug.cgi?id=51125 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D106317	2021-08-15 19:16:04 +03:00
Roman Lebedev	60dd0121c9	Revert "[SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" Forgot to stage the test change. This reverts commit `78af5cb213`.	2021-08-15 19:15:09 +03:00
Roman Lebedev	78af5cb213	[SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125) LLVM IR SSA form is "implicit" in `@pr51125`. While is a valid LLVM IR, and does not require any PHI nodes, that completely breaks the further logic in `CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses()` that updates the live-out uses of the bonus instructions. What i believe we need to do, is to first make the SSA form explicit, by inserting tautological PHI nodes, and rewriting the offending uses. ``` $ /builddirs/llvm-project/build-Clang12/bin/opt -load /repositories/alive2/build-Clang-release/tv/tv.so -load-pass-plugin /repositories/alive2/build-Clang-release/tv/tv.so -tv -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=10 -tv -o /dev/null /tmp/test.ll ---------------------------------------- @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: br label %L %L: %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 br i1 %iszero, label %exit, label %L2 %L2: store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp eq i32 %ld, 4294967295 br i1 %cmp, label %L, label %exit %exit: %r = phi i32 [ %ld, %L2 ], [ %ld, %L ] ret i32 %r } => @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: %ld.old = load i32, * @global_pr51125, align 4 %iszero.old = icmp eq i32 %ld.old, 0 br i1 %iszero.old, label %exit, label %L2 %L2: %ld2 = phi i32 [ %ld.old, %entry ], [ %ld, %L2 ] store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp ne i32 %ld2, 4294967295 %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 %or.cond = select i1 %cmp, i1 1, i1 %iszero br i1 %or.cond, label %exit, label %L2 %exit: %ld1 = phi i32 [ poison, %L2 ], [ %ld.old, %entry ] %r = phi i32 [ %ld2, %L2 ], [ %ld.old, %entry ] ret i32 %r } Transformation seems to be correct! ``` Fixes https://bugs.llvm.org/show_bug.cgi?id=51125 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D106317	2021-08-15 19:02:34 +03:00
Roman Lebedev	35a8bdc775	[NFCI][IndVars] rewriteLoopExitValues(): nowadays SCEV should not change `GEP` base pointer Currently/previously, while SCEV guaranteed that it produces the same value, the way it was produced may be illegal IR, so we have an ugly check that the replacement is valid. But now that the SCEV strictness wrt the pointer/integer types has been improved, i believe this invariant is already upheld by the SCEV itself, natively. I think we should add an assertion, wait for a week, and then, if all is good, rip out all this checking. Or we could just do the latter directly i guess. This reverts commit rL127839. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D108043	2021-08-15 18:59:32 +03:00
Arthur Eubanks	c19d7f8af0	[CallPromotion] Check for inalloca/byval mismatch Previously we would allow promotion even if the byval/inalloca attributes on the call and the callee didn't match. It's ok if the byval/inalloca types aren't the same. For example, LTO importing may rename types. Fixes PR51397. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D107998	2021-08-13 16:52:04 -07:00
Arthur Eubanks	a9831cce1e	[NFC] Remove public uses of AttributeList::getAttributes() Use methods that better convey the intent.	2021-08-13 11:38:12 -07:00
Arthur Eubanks	80ea2bb574	[NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes() This is more consistent with similar methods.	2021-08-13 11:16:52 -07:00
Roman Lebedev	c46546bd52	Reland "[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological"" The commit originally unearthed a problem, reported as https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff#1019890 Now that the problem has been fixed, and the assertion no longer fires, let's see if there are other cases it fires on. This reverts commit `5c8c24d2de`, relanding commit `f30a7dff8a`.	2021-08-13 15:45:03 +03:00
Roman Lebedev	2702fb1148	[SimplifyCFG] Restart if `removeUndefIntroducingPredecessor()` made changes It might changed the condition of a branch into a constant, so we should restart and constant-fold terminator, instead of continuing with the tautological "conditional" branch. This fixes the issue reported at https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff	2021-08-13 15:45:03 +03:00
Roman Lebedev	5c8c24d2de	Revert "[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological" The assertion does not hold on a provided reproducer. Reverting until after fixing the problem. This reverts commit `f30a7dff8a`.	2021-08-13 13:16:22 +03:00
Roman Lebedev	f30a7dff8a	[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological We really shouldn't deal with a conditional branch that can be trivially constant-folded into an unconditional branch. Indeed, barring failure to trigger BB reprocessing, that should be true, so let's assert as much, and hope the assertion never fires. If it does, we have a bug to fix.	2021-08-12 20:03:09 +03:00
Roman Lebedev	628f63d3d5	[SimplifyCFG] If FoldTwoEntryPHINode() changed things, restart Mainly, i want to add an assertion that `SimplifyCFGOpt::simplifyCondBranch()` doesn't get asked to deal with non-unconditional branches, and if i do that, then said assertion fires on existing tests, and this is what prevents it from firing.	2021-08-12 20:03:09 +03:00
Adrian Prantl	d6b6880172	Streamline the API of salvageDebugInfoImpl (NFC) This patch refactors / simplifies salvageDebugInfoImpl(). The goal here is to simplify the implementation of coro::salvageDebugInfo() in a followup patch. 1. Change the return value to I.getOperand(0). Currently users of salvageDebugInfoImpl() assume that the first operand is I.getOperand(0). This patch makes this information explicit. A nice side-effect of this change is that it allows us to salvage expressions such as add i8 1, %a in the future. 2. Factor out the creation of a DIExpression and return an array of DIExpression operations instead. This change allows users that call salvageDebugInfoImpl() in a loop to avoid the costly creation of temporary DIExpressions and to defer the creation of a DIExpression until the end. This patch does not change any functionality. rdar://80227769 Differential Revision: https://reviews.llvm.org/D107383	2021-08-10 15:21:18 -07:00
Carl Ritson	a1783b54e8	[SimpifyCFG] Remove recursion from FoldCondBranchOnPHI. NFCI. Avoid stack overflow errors on systems with small stack sizes by removing recursion in FoldCondBranchOnPHI. This is a simple change as the recursion was only iteratively calling the function again on the same arguments. Ideally this would be compiled to a tail call, but there is no guarantee. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107803	2021-08-10 19:14:31 +09:00
Michael Liao	b5e470aa2e	[LowerMemIntrinsics] Typo fix.	2021-08-08 22:38:58 -04:00
Momchil Velikov	f171149e0d	[SimpifyCFG] Speculate a store preceded by a local non-escaping load In SimplifyCFG we may simplify the CFG by speculatively executing certain stores, when they are preceded by a store to the same location. This patch allows such speculation also when the stores are similarly preceded by a load. In order for this transformation to be correct we need to ensure that the memory location is writable and the store in the new location does not introduce a data race. Local objects (created by an `alloca` instruction) are always writable, so once we are past a read from a location it is valid to also write to that same location. Seeing just a load does not guarantee absence of a data race (unlike if we see a store) - the load may still be part of a race, just not causing undefined behaviour (cf. https://llvm.org/docs/Atomics.html#optimization-outside-atomic). In the original program, a data race might have been prevented by the condition, but once we move the store outside the condition, we must be sure a data race wasn't possible anyway, no matter what the condition evaluates to. One way to be sure that a local object is never concurrently read/written is check that its address never escapes the function. Hence this transformation is restricted to local, non-escaping objects. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D107281	2021-08-05 15:54:42 +01:00
Dawid Jurczak	06206a8cd1	[BuildLibCalls][NFC] Remove redundant attribute list from emitCalloc Additionally with this patch aligned DSE which is the only user of emitCalloc. Differential Revision: https://reviews.llvm.org/D103523	2021-08-05 16:18:38 +02:00
Dawid Jurczak	f8cdde7195	[SimplifyLibCalls][NFC] Clean up LibCallSimplifier from 'memset + malloc into calloc' transformation FoldMallocMemset can be safely removed because since https://reviews.llvm.org/D103009 such transformation is already performed in DSE. Differential Revision: https://reviews.llvm.org/D103451	2021-08-05 16:08:32 +02:00
Craig Topper	b818da27ab	[SimplifyCFG] Enable switch to lookup table for more types. This transform has been restricted to legal types since https://reviews.llvm.org/rG65df808f6254617b9eee931d00e95d900610b660 in 2012. This is particularly restrictive on RISCV64 which only has i64 as a legal integer type. i32 is a very common type in code generated from C, but we won't form a lookup table with it. This also effects other common types like i8/i16 types on ARM, AArch64, RISCV, etc. This patch proposes to allow power of 2 types larger than 8 bit, if they will fit in the largest legal integer type in DataLayout. These types are common in C code so generally well handled in the backends. We could probably do this for other types like i24 and rely on alignment and padding to allow the backend to use a single wider load. This isn't my main concern right now and it will need more tests. We could also allow larger types up to some limit and let the backend split into multiple loads, but we need to define that limit. It's also not my main concern right now. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107233	2021-08-03 15:35:16 -07:00
Philip Reames	223835f08b	[runtimeunroll] A bit of style cleanup to simplify a following change [NFC] Use for-range, use the idiomatic pattern for non-loop values, etc..	2021-08-03 10:28:46 -07:00
Nikita Popov	c7770574f9	Revert "[unroll] Move multiple exit costing into consumer pass [NFC]" This reverts commit `76940577e4`. This causes Transforms/LoopUnroll/ARM/multi-blocks.ll to fail.	2021-08-02 22:23:34 +02:00
Philip Reames	76940577e4	[unroll] Move multiple exit costing into consumer pass [NFC] This aligns the multiple exit costing with all the other cost decisions. Note that UnrollAndJam, which is the only other caller of the original home of this code, unconditionally bails out of multiple exit loops.	2021-08-02 12:46:23 -07:00
Philip Reames	9016beaa24	[unrollruntime] Pull out a helper function for readability and eventual reuse [nfc]	2021-08-02 11:47:27 -07:00
Philip Reames	ebc4c4e3b0	[unroll] Add clarifying comment The option to not preserve LCSSA is in fact not tested at all in upstream. I was tempted to just remove the code entirely, but realized I didn't need to for my actual goal.	2021-08-02 10:44:56 -07:00
Shimin Cui	732b05555c	[GlobalOpt] support ConstantExpr use of global address for OptimizeGlobalAddressOfMalloc I'm working on extending the OptimizeGlobalAddressOfMalloc to handle some more general cases. This is to add support of the ConstantExpr use of the global variables. The function allUsesOfLoadedValueWillTrapIfNull is now iterative with the added CE use of GV. Also, the recursive function valueIsOnlyUsedLocallyOrStoredToOneGlobal is changed to iterative using a worklist with the GEP case added. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D106589	2021-07-31 18:42:02 -04:00
Kazu Hirata	e76ddfa9ef	[Transforms] Remove HasValueForBlock (NFC) The function seems to be unused for at least one year.	2021-07-30 08:56:49 -07:00
Chris Jackson	0ba8595287	[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR Reapply commit `d675b594f4` that was reverted due to buildbot failures. A simple fix has been applied to remove an assertion. Differential Revision: https://reviews.llvm.org/D105207	2021-07-28 23:04:59 +01:00
Jeroen Dobbelaere	03b8c69d06	[PredicateInfo] Use Intrinsic::getDeclaration now that it handles unnamed types. This is a second attempt to fix the EXPENSIVE_CHECKS issue that was mentioned In D91661#2875179 by @jroelofs. (The first attempt was in D105983) D91661 more or less completely reverted D49126 and by doing so also removed the cleanup logic of the created declarations and calls. This patch is a replacement for D91661 (which must itself be reverted first). It replaces the custom declaration creation with the generic version and shows the test impact. It also tracks the number of NamedValues to detect if a new prototype was added instead of looking at the available users of a prototype. Reviewed By: jroelofs Differential Revision: https://reviews.llvm.org/D106147	2021-07-28 19:30:29 +02:00
Jeroen Dobbelaere	dc5570d149	Revert "Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types." This reverts commit `77080a1eb6`. This change introduced issues detected with EXPENSIVE_CHECKS. Reverting to restore the needed function cleanup. A next patch will then just improve on the name mangling.	2021-07-28 19:30:29 +02:00
Chris Jackson	3992896043	Revert "[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR" Reverted due to buildbot failures. This reverts commit `d675b594f4`.	2021-07-28 16:44:54 +01:00
Chris Jackson	d675b594f4	[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR Reapply commit `796b84d26f` that was reverted due to reports of crashes. A minor change now guards against getVariableLocationOperand() returning a nullptr. Differential Revision: https://reviews.llvm.org/D106659	2021-07-28 16:28:46 +01:00
Chris Jackson	04b94c7cae	Revert "[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR" Crashes were reported on the upstreamm revision: https://reviews.llvm.org/D105207 This reverts commit `796b84d26f`.	2021-07-28 10:05:54 +01:00
Anna Thomas	8ee5759fd5	Strip undef implying attributes when moving calls When hoisting/moving calls to locations, we strip unknown metadata. Such calls are usually marked `speculatable`, i.e. they are guaranteed to not cause undefined behaviour when run anywhere. So, we should strip attributes that can cause immediate undefined behaviour if those attributes are not valid in the context where the call is moved to. This patch introduces such an API and uses it in relevant passes. See updated tests. Fix for PR50744. Reviewed By: nikic, jdoerfert, lebedev.ri Differential Revision: https://reviews.llvm.org/D104641	2021-07-27 10:57:05 -04:00
Chris Jackson	796b84d26f	[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR This reapplies commit `76f3ffb2b2` that was reverted due to buildbot failures. - Update lit tests with REQUIRES condition. - Abandon salvage attempt if SCEVUnknown::getValue() returns nullptr. Differential Revision: https://reviews.llvm.org/D105207	2021-07-27 14:22:09 +01:00
Chris Jackson	1930c4410d	[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR This reverts commit `76f3ffb2b2` because of a failure on sanitixer-X86-64-linux-autoconf.	2021-07-27 13:36:56 +01:00
Chris Jackson	76f3ffb2b2	[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR This patch extends salvaging of debuginfo in the Loop Strength Reduction (LSR) pass by translating Scalar Evaluations (SCEV) into DIExpressions. The method is as follows: - Cache dbg.value intrinsics that are salvageable. - Obtain a loop Induction Variable (IV) from ScalarExpressionExpander or the loop header. - Translate the IV SCEV into an expression that recovers the current loop iteration count. Combine this with the dbg.value's location op SCEV to create a DIExpression that salvages the value. Review by: jmorse Differential Revision: https://reviews.llvm.org/D105207	2021-07-27 13:00:36 +01:00
Johannes Doerfert	25a3130d89	[Local] Do not introduce a new `llvm.trap` before `unreachable` This is the second attempt to remove the `llvm.trap` insertion after https://reviews.llvm.org/rGe14e7bc4b889dfaffb7180d176a03311df2d4ae6 reverted the first one. It is not clear what the exact issue was back then and it might already be gone by now, it has been >5 years after all. Replaces D106299. Differential Revision: https://reviews.llvm.org/D106308	2021-07-26 23:33:36 -05:00
Roman Lebedev	1901c98dd8	[SimplifyCFG] SwitchToLookupTable(): don't increase ret count The very next SimplifyCFG pass invocation will tail-merge these two ret's anyways, there is not much point in creating more work for ourselves.	2021-07-26 23:29:55 +03:00
Roman Lebedev	08efc2e68d	[SimplifyCFG] Drop support for simplifying cond branch to two (different) ret's Nowadays, simplifycfg pass already tail-merges all the ret blocks together before doing anything, and it should not increase the count of ret's, so this is dead code.	2021-07-26 23:29:52 +03:00
Roman Lebedev	7c5f104e45	[SimplifyCFG] Drop support for duplicating ret's into uncond predecessors This functionality existed only under a default-off flag, and simplifycfg nowadays prefers to not increase the count of ret's.	2021-07-26 23:29:21 +03:00

... 5 6 7 8 9 ...

6465 Commits