llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	57aaeefc18	[InstCombine] Pass ICmpInsts to foldAndOrOfICmpsUsingRanges() (NFC) Pass the whole instruction rather than unpacking it. This makes it easier to reuse the function in another place, as the entire logic is encapsulated.	2022-04-29 12:46:31 +02:00
Nikita Popov	1f53932a95	[InstCombine] Remove foldAndOrOfEqualityCmpsWithConstants() fold This fold handles a special subset of foldAndOrOfICmpsUsingRanges(), use the more generic implementation instead. The result can differ if a representation using a range comparison is possible, in which case that is preferred over masking. There is a canonicalization opportunity here.	2022-04-29 12:23:00 +02:00
Nikita Popov	5515263e44	[InstCombine] Fold and of two ranges differing by mask This is the de Morgan conjugated variant of the existing fold for ors. Implement this by switching the range code to always work on ors and perform invert operands at the start and end. This makes reasoning easier and makes the extension more obviosuly correct.	2022-04-29 12:01:38 +02:00
Florian Hahn	fb4113ef0c	[Passes] Remove legacy LoopUnswitch pass. The legacy LoopUnswitch pass is only used in the legacy pass manager pipeline, which is deprecated. The NewPM replacement is SimpleLoopUnswitch and I think it is time to remove the legacy LoopUnswitch code. Fixes #31000. Reviewed By: aeubanks, Meinersbur, asbirlea Differential Revision: https://reviews.llvm.org/D124376	2022-04-29 10:30:49 +01:00
Nikita Popov	d5ee20fcc9	[InstCombine] Switch an or of icmps fold to use constant ranges We can express this fold more naturally when working on the constant range implementation. This change is not entirely NFC, because the code now also handles cases that don't match the precise pattern this previously looked for, e.g. we can omit an add on one of the ranges.	2022-04-29 11:15:54 +02:00
David Green	7047c47918	[VecCombine] Fix sort comparator logic in foldShuffleFromReductions I think this sort comparator was overly complex, and the windows expensive check bot agreed, failing as it was not giving a strict weak ordering. Change it to use the comparison of the mask values as unsigned integers. This should sort the undef elements to the end whilst keeping X<Y otherwise.	2022-04-29 09:30:02 +01:00
Nikita Popov	884e9a877b	[SimplifyCFG] Replace condition value when threading Replace the condition value with the known constant value on the threaded edge. This happens implicitly with phi threading because we replace with the incoming value, but not for non-phi threading.	2022-04-29 09:50:27 +02:00
Nikita Popov	4e545bdb35	[SimplifyCFG] Thread branches on same condition in more cases (PR54980) SimplifyCFG implements basic jump threading, if a branch is performed on a phi node with constant operands. However, InstCombine canonicalizes such phis to the condition value of a previous branch, if possible. SimplifyCFG does support this as well, but only in the very limited case where the same condition is used in a direct predecessor -- notably, this does not include the common diamond pattern (i.e. two consecutive if/elses on the same condition). This patch extends the code to look back a limited number of blocks to find a branch on the same value, rather than only looking at the direct predecessor. Fixes https://github.com/llvm/llvm-project/issues/54980. Differential Revision: https://reviews.llvm.org/D124159	2022-04-29 09:44:05 +02:00
Florian Hahn	f4e1eaa375	Revert "[VPlan] Remove uneeded needsVectorIV check." This reverts commit `43842b887e` while I investigate a buildbot failure. It also reverts the follow-up commit `2883de0514`.	2022-04-28 20:16:21 +01:00
David Green	ded8187e35	[VectorCombine] Try to reduce shuffle cost for commutative reduction operands Given a shuffle feeding a commutative reduction, the lane ordering of the shuffle will not alter the result. This is also true if there are a number of operations between the reduction and the shuffle, providing they only operate lane-wise. This patch searches for cases like that in Vector Combine, allowing us to check the cost of the shuffle vs an in-order identity shuffle and replace the order if possible. This only handles a single shuffle at the moment to keep things simple, and is able to ignore splats that produce results where every result is the same. This is a more powerful version of a combine that already happens in instrcombine, capable of optimizing more cases by looking through more instructions and being able to cost the shuffle. Differential Revision: https://reviews.llvm.org/D123494	2022-04-28 19:46:12 +01:00
Alexey Bataev	75e1cf4a6a	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-28 10:04:41 -07:00
Pavel Samolysov	9197959e13	[ArgPromotion] Move ArgPart and OffsetAndArgPart to anonymous namespace The structure ArgPart and alias OffsetAndArgPart have been moved into the anonymous namespace. NFC. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D124617	2022-04-28 09:51:46 -07:00
Pavel Samolysov	6b825e50f7	[ArgPromotion] Change the condition to check the promotion limit The condition should be 'ArgParts.size() > MaxElements', so that if we have exactly 3 elements in the 'ArgParts' vector, the promotion should be allowed because the 'MaxElement' threshold is not exceeded yet. The default value for 'MaxElement' has been decreased to 2 in order to avoid an actual change in argument promoting behavior. However, this changes byval argument transformation behavior by allowing adding not more than 2 arguments to the function instead of 3 allowed before. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D124178	2022-04-28 09:42:58 -07:00
Florian Hahn	2883de0514	[VPlan] Fix comment formatting from `43842b887e`.	2022-04-28 16:31:48 +01:00
Florian Hahn	43842b887e	[VPlan] Remove uneeded needsVectorIV check. Remove one of the last remaining uses of ::needsVectorIV, preparing for its removal. Now that usesScalars is available and based on the information explicit in VPlan, there is no need to use the pre-computed needsVectorIV. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123720	2022-04-28 16:27:34 +01:00
Alexey Bataev	9861ca0c23	Revert "[COST]Improve cost model for shuffles in SLP." This reverts commit `29a470e380` to fix a crash reported in https://reviews.llvm.org/D100486#3479989.	2022-04-28 08:11:56 -07:00
Pavel Samolysov	744a837838	[ArgPromotion] Rename variables according to the code style. NFC Some loop counters ('i', 'e') and variables ('type') were named not in accordance with the code style and clang-tidy issues warnings about the using of such variables. This patch renames the variables and fixes some typos in the comments within the source file. Differential Revision: https://reviews.llvm.org/D123662	2022-04-28 15:32:05 +02:00
Chris Jackson	c792884589	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] Reland `3f2b76ec90` with the test corrected to require x86-registered-target. Differential Revision: https://reviews.llvm.org/D120169	2022-04-28 14:21:56 +01:00
Chris Jackson	cd5f9efc4d	Revert "[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]" This reverts commit `3f2b76ec90`.	2022-04-28 14:07:31 +01:00
Nikita Popov	90dba831ae	[InstCombine] Fold or of icmp ne trunc/and This adds the de Morgan conjugated variant for the existing "and eq" style fold. Proof: https://alive2.llvm.org/ce/z/tkNAcG	2022-04-28 15:07:16 +02:00
Chris Jackson	3f2b76ec90	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] Reland commit `74273d575f` following a fix for a memory leak. The DVIRecoveryRecord vectors now use unique_ptr. Differential Revision: https://reviews.llvm.org/D120169	2022-04-28 13:55:49 +01:00
Nikita Popov	b9dc565147	[GVN] Encode GEPs in offset representation When using opaque pointers, convert GEPs into offset representation of the form P + V1 * Scale1 + V2 * Scale2 + ... + ConstantOffset. This allows us to recognize equivalent address calculations even if the GEPs don't use the same source element type. This fixes an opaque pointer codegen regression seen in rustc. Differential Revision: https://reviews.llvm.org/D124527	2022-04-28 09:32:05 +02:00
Max Kazantsev	35f38583d2	[JumpThreading][NFC][CompileTime] Do not recompute BPI/BFI analyzes They can already be available, and even if not, DT/LI can be available. We should not recompute them. Old PM is unchanged because it would require changing dependencies, and we don't care enough about it. Differential Revision: https://reviews.llvm.org/D124439 Reviewed By: nikic, aeubanks	2022-04-28 10:46:08 +07:00
Wenju He	96d3be8443	[InferAddressSpaces] Check if AS are the same in isNoopPtrIntCastPair isNoopAddrSpaceCast is expecting SrcAS is different from DestAS. If the two AS are the same, consider ptrtoint/inttoptr as noop cast. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123573	2022-04-28 11:10:55 +08:00
Arthur Eubanks	4e65291837	[OpaquePtr][GlobalOpt] Don't attempt to evaluate global constructors with arguments Previously all entries in global_ctors had to have the void()* type and we'd skip evaluating bitcasted functions. With opaque pointers we may see the function directly. Fixes #55147. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D124553	2022-04-27 19:00:44 -07:00
Fangrui Song	c74a706893	[LegacyPM] Remove ThreadSanitizerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove ThreadSanitizerLegacyPass. Reviewed By: #sanitizers, vitalybuka Differential Revision: https://reviews.llvm.org/D124209	2022-04-27 16:25:41 -07:00
Kirill Stoimenov	761366e6ae	Revert "[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]" This reverts commit `74273d575f`. Buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/22795 Failing with memory leak.	2022-04-27 23:11:48 +00:00
Nicolas Abram Lujan	f8a574bf4d	[InstCombine] C0 >> (X - C1) --> (C0 << C1) >> X With the right pre-conditions, we can fold the offset into the shifted constant: https://alive2.llvm.org/ce/z/drMRBU https://alive2.llvm.org/ce/z/cUQv-_ Fixes #55016 Differential Revision: https://reviews.llvm.org/D124369	2022-04-27 14:18:30 -04:00
Martin Sebor	efa0f12c0b	[InstCombine] Fold strnlen calls in equality to zero. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123818	2022-04-27 12:03:24 -06:00
Alexey Bataev	29a470e380	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-27 10:56:26 -07:00
Wei Wang	26a0d53b15	[CHR] Skip region containing llvm.coro.id When a block containing llvm.coro.id is cloned during CHR, it inserts an invalid PHI node with token type to the beginning of the block containing llvm.coro.begin. To avoid such case, we exclude regions with llvm.coro.id. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D124418	2022-04-27 10:27:25 -07:00
Roman Lebedev	ffafa71f64	[InstCombine] 'round up integer': if bias is just right, just reuse instructions This is only useful if we can't create new instruction because %x.aligned has other uses and already sticks around.	2022-04-27 17:27:02 +03:00
Roman Lebedev	aac0afd1dd	[InstCombine] Fold 'round up integer' pattern (when alignment is a power of two) But don't deal with non-splats. The test coverage is sufficiently exhaustive, and alive is happy about the changes there. Example with constants: https://alive2.llvm.org/ce/z/EUaJ5- / https://alive2.llvm.org/ce/z/Bkng2X General proof: https://alive2.llvm.org/ce/z/3RjJ5A	2022-04-27 17:26:55 +03:00
Shilei Tian	a6b355dd31	[SLP] Fix a typo that causes redundant assertion and potential segment fault Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D124497	2022-04-27 10:07:59 -04:00
Anna Thomas	c515b2f39e	[IRCE] Avoid computing potentially unnecessary analyses. NFC IRCE is a function pass that operates on loops. If there are no loops in the function (as seen through LI), we should avoid computing the remaining expensive analyses (such as BPI). Reordered the analyses requests and early return if there are no loops. This is an NFC with compile time improvement. The same will be done in a follow-up patch for the loop vectorizer. Reviewed-By: nikic Differential Revision: https://reviews.llvm.org/D124478	2022-04-27 09:22:10 -04:00
Chris Jackson	74273d575f	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] This relands commit `8f550368b1`. The test is amended with REQUIRES: x86-registered-target, in line with the other debuginfo-scev-salvage tests. Differential Revision: https://reviews.llvm.org/D120169	2022-04-27 13:10:30 +01:00
Chris Jackson	855752e563	Revert [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics[2/2] This reverts commit `8f550368b1`.	2022-04-27 13:06:03 +01:00
Chris Jackson	8f550368b1	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] Second of two patches to extend SCEV-based salvaging to dbg.value intrinsics that have multiple location ops pre-LSR. This second patch adds the core implementation. Reviewers: @StephenTozer, @djtodoro Differential Revision: https://reviews.llvm.org/D120169	2022-04-27 12:47:35 +01:00
Chris Jackson	c45e4c140f	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [1/2] [NFC] First of two patches that extends SCEV-based salvaging to enable salvaging of dbg.value instrinsics that have multiple locations ops before the Loop Strength Reduction pass. The existing single-op SCEV-based salvaging can generate variadic dbg.value intrinsics in order to salvage a dbg.value that has a single location op. If a dbg.value has multiple location ops before LSR, and LSR optimises away one or more of the location operands, then currently no salvaging will be attempted. Salvaging can now be added, but first this patch cleans up consistency in both the code and comments, and applies some refactoring to make application of the new salvaging implementation more straightforward. - Use SCEVDbgValueBuilder for both types of recovery expressions: IV-offset based and iteration count based. - Combine the functions that write the final DIExpression. - Move some static functions into member functions. Reviewers: @Orlando Differential Revision: https://reviews.llvm.org/D120168	2022-04-27 11:43:05 +01:00
Nikita Popov	c103f5e9da	[InstCombine] Combine opaque pointer GEPs with mismatching element types Currently, two GEPs will only be combined if the result element type of one is the same as the source element type of the other. However, this means we may miss folding opportunities where the second GEP could be rewritten using a different element type. This is especially relevant for opaque pointers, where constant GEPs often use i8 element type. Address this by converting GEP indices to offsets, adding them, and then converting them back to indices. The first (inner) GEP is allowed to have variable indices as well, in which case only the constant suffix is converted into an offset. This should address the regression reported in https://reviews.llvm.org/D123300#3467615. Differential Revision: https://reviews.llvm.org/D124459	2022-04-27 09:33:47 +02:00
Alexandros Lamprineas	a910337b5d	[FuncSpec] Conditional jump or move depends on uninitialised value(s). I found this bug when performing a two-stage build of clang with Function Specialization enabled and tuned aggressively. The crash appears only on release builds. Fixes https://github.com/llvm/llvm-project/issues/55000. Before accessing the contents of the ArgInfo iterator inside SCCPInstVisitor::markArgInFuncSpecialization, we should be checking that the iterator is valid. Differential Revision: https://reviews.llvm.org/D124114	2022-04-27 07:28:25 +01:00
Martin Sebor	ffed0cfcdb	[SimplifyLibCalls] avoid slicing 64-bit integers in an ILP32 build (PR #54739 ) Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123472	2022-04-26 17:20:56 -06:00
Martin Sebor	449adafabe	[InstCombine] Fold strnlen of constant strings. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123817	2022-04-26 16:15:28 -06:00
Ricky Zhou	4041c44853	[InstCombine] Update predicate when canonicalizing comparisons in canonicalizeClampLike. canonicalizeClampLike canonicalizes the ule/ugt comparisons to ult/uge, respectively. However, it does not update the variable holding the comparison predicate type after doing this. Later code fails to handle the non-canonical predicate type (specifically, the swap of ThresholdLowIncl and ThresholdHighExcl when Pred0 has been canonicalized from ugt to uge). This leads to the miscompile reported in PR53252. Fix this by updating the comparison predicate after canonicalizing. Fixes #53252 Differential Revision: https://reviews.llvm.org/D119690	2022-04-26 17:35:45 -04:00
Michael Kruse	ff289feeba	[OpenMPIRBuilder] Remove ContinuationBB argument from Body callback. The callback is expected to create a branch to the ContinuationBB (sometimes called FiniBB in some lambdas) argument when finishing. This creates problems: 1. The InsertPoint used for CodeGenIP does not need to be the end of a block. If it is not, a naive callback will insert a branch instruction into the middle of the block. 2. The BasicBlock the CodeGenIP is pointing to may or may not have a terminator. There is an conflict where to branch to if the block already has a terminator. 3. Some API functions work only with block having a terminator. Some workarounds have been used to insert a temporary terminator that is removed again. 4. Some callbacks are sensitive to whether the BasicBlock has a terminator or not. This creates a callback ordering problem where different callback may have different behaviour depending on whether a previous callback created a terminator or not. The problem also exists for FinalizeCallbackTy where some callbacks do create branch to another "continue" block, but unlike BodyGenCallbackTy does not receive the target as argument. This is not addressed in this patch. With this patch, the callback receives an CodeGenIP into a BasicBlock where to insert instructions. If it has to insert control flow, it can split the block at that position as needed but otherwise no separate ContinuationBB is needed. In particular, a callback can be empty without breaking the emitted IR. If the caller needs the control flow to branch to a specific target, it can insert the branch instruction itself and pass an InsertPoint before the terminator to the callback. Certain frontends such as Clang may expect the current IRBuilder position to be at the end of a basic block. In this case its callbacks must split the block at CodeGenIP before setting the IRBuilder position such that the instructions after CodeGenIP are moved to another basic block and before returning create a new branch instruction to the split block. Some utility functions such as `splitBB` are supporting correct splitting of BasicBlocks, independent of whether they have a terminator or not, returning/setting the InsertPoint of an IRBuilder to the end of split predecessor block, and optionally omitting creating a branch to the split successor block to be added later. Reviewed By: kiranchandramohan Differential Revision: https://reviews.llvm.org/D118409	2022-04-26 16:35:01 -05:00
Vasileios Porpodas	fa8a9fea47	Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `6a9bbd9f20`. Code review: https://reviews.llvm.org/D124202	2022-04-26 14:02:40 -07:00
Martin Sebor	ce8f42d4af	[InstCombine] Fold memrchr calls with a constant character. Reviewed By: nikic Differential Revision: //reviews.llvm.org/D123629	2022-04-26 14:02:50 -06:00
Martin Sebor	10c99ce67d	[InstCombine] Fold memrchr calls with constant size, bail on excessive. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123626 Differential Revision: https://reviews.llvm.org/D123628	2022-04-26 14:02:50 -06:00
Martin Sebor	25febbd155	[InstCombine] Fold strnlen with a bound of zero and one. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123816	2022-04-26 14:02:50 -06:00
Martin Sebor	2807c420cd	[InstCombine] add a strnlen handler stub. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123815	2022-04-26 14:02:49 -06:00
Sanjay Patel	903aa5e0f8	[InstCombine] try to fold icmp with mismatched extended operands If a value is known to be non-negative and zexted, that's the same thing as sexted. So for the purpose of looking past the casts with an icmp, treat it as if it was a sext: https://alive2.llvm.org/ce/z/_BDsGV This is necessary, but not enough to solve the motivating problem: https://github.com/llvm/llvm-project/issues/55013 Differential Revision: https://reviews.llvm.org/D124419	2022-04-26 14:26:36 -04:00
Vasileios Porpodas	6a9bbd9f20	Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `55ce296d6f`.	2022-04-26 11:25:26 -07:00
Sanjay Patel	c8ed784ee6	[InstCombine] fold freeze of partial undef/poison vector constants We can always replace the undef elements in a vector constant with regular constants to get rid of the freeze: https://alive2.llvm.org/ce/z/nfRb4F The select diffs show that we might do better by adjusting the logic for a frozen select condition. We may also want to refine the vector constant replacement to consider forming a splat. Differential Revision: https://reviews.llvm.org/D123962	2022-04-26 14:16:11 -04:00
Vasileios Porpodas	55ce296d6f	[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost` Before this patch `Args` was used to pass a broadcat's arguments by SLP. This patch changes this. `Args` is now used for passing the operands of the shuffle. Differential Revision: https://reviews.llvm.org/D124202	2022-04-26 11:11:29 -07:00
Augie Fackler	a907d36cfe	Attributes: add a new `allocptr` attribute This continues the push away from hard-coded knowledge about functions towards attributes. We'll use this to annotate free(), realloc() and cousins and obviate the hard-coded list of free functions. Differential Revision: https://reviews.llvm.org/D123083	2022-04-26 13:57:11 -04:00
Igor Kudrin	39ce68886b	[LoopPeel][NFCI] Simplify the code to calculate peel count for PGO This reorganizes the code as a preparation for D123865: * Use more descriptive names for variables * Simplify a condition by use an already calculated value for `MaxPeelCount` * Remove a duplicate log entry * Report basic values for loop costs Differential Revision: https://reviews.llvm.org/D124388	2022-04-26 18:44:24 +04:00
Igor Kudrin	c71890e158	[LoopPeel][NFC] Exit early if there is no room for peeling Differential Revision: https://reviews.llvm.org/D123864	2022-04-26 18:43:56 +04:00
Florian Hahn	c59d95f6a4	[ConstraintElimination] Check if const. is small enough before using it Check if the value of a ConstantInt is small enough to be used for solving before calling getSExtValue. Fixes #55085	2022-04-26 13:56:32 +01:00
Liqiang Tao	b9fc18f89a	[llvm][Inline] Remove PriorityInlineOrder in SCC inliner Since the size of most of SCC's is 1, the PriorityInlineOrder would not change the inline order in SCC inliner. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D123608	2022-04-26 20:20:10 +08:00
Florian Hahn	857c612d89	[IPSCCP] Support unfeasible default dests for switch. At the moment, unfeasible default destinations are not handled properly in removeNonFeasibleEdges. So far, only unfeasible cases are removed, but later code expects unreachable blocks to have no predecessors. This is causing the crash reported in PR49573. If the default destination is unfeasible it won't be executed. Create a new unreachable block on demand and use that as default destination. Note that at the moment this only is relevant for cases where resolvedUndefsIn marks the first case as executable. Regular switch handling has a FIXME/TODO to support determining whether the default case is feasible or not. Fixes #48917. Differential Revision: https://reviews.llvm.org/D113497	2022-04-26 12:41:41 +01:00
Dmitry Makogon	d03d2d8aea	[RS4GC] Prune inputs of BDV if they are BDV themselves Don't check whether an input of BDV can be pruned if the input is the BDV itself. BDV is present in the states map, so in case the input is the BDV itself, we'd return false. So explicitly check this case. Differential Revision: https://reviews.llvm.org/D123846	2022-04-26 16:05:00 +07:00
Sanjay Patel	6631907ad2	[InstCombine] use isKnownNonNegative to reduce code duplication; NFC We may be able to make the ValueTracking wrapper smarter in the future (for example, analyze a simple recurrence), so this will automatically benefit if that happens.	2022-04-25 17:13:29 -04:00
Valery N Dmitriev	88b9e46fb5	[SLP] Steer for the best chance in tryToVectorize() when rooting with binary ops. tryToVectorize() method implements one of searching paths for vectorizable tree roots in SLP vectorizer, specifically for binary and comparison operations. Order of making probes for various scalar pairs was defined by its implementation: the instruction operands, then climb over one operand if the instruction is its sole user and then perform same actions for another operand if previous attempts failed. Problem with this approach is that among these options we can have more than a single vectorizable tree candidate and it is not necessarily the one that encountered first. Trying to build vectorizable tree for each possible combination for just evaluation is expensive. But we already have lookahead heuristics mechanism which we use for finding best pick among operands of commutative instructions. It calculates cumulative score for candidates in two consecutive lanes. This patch introduces use of the heuristics for choosing the best pair among several combinations. We only try one that looks as most promising for vectorization. Additional benefit is that we reduce total number of vectorization trees built for probes because we skip those looking non-profitable early. Reviewed By: Alexey Bataev (ABataev), Vasileios Porpodas (vporpo) Differential Revision: https://reviews.llvm.org/D124309	2022-04-25 12:25:33 -07:00
Nathan Lanza	950c95cfdd	[coroutines] Get an IntegerType from the value instead of defaulting to 64 bit This AliasPtr is being created always from an Int64 even for targets where 32 bit is the proper type. e.g. “thumbv7-none-linux-android16”. This causes the assert in the `get` func to fail as we're getting a 32 bit from the APInt. Fix this by simply always just getting the type from the value instead. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D123272	2022-04-25 11:10:46 -07:00
Fangrui Song	39e23bb059	[LegacyPM] Remove HWAsanSanitizerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove AddressSanitizerLegacyPass... ..., ModuleAddressSanitizerLegacyPass, and ASanGlobalsMetadataWrapperPass. MemorySanitizerLegacyPass was removed in D123894. AddressSanitizerLegacyPass was removed in D124216. Reviewed By: #sanitizers, vitalybuka Differential Revision: https://reviews.llvm.org/D124337	2022-04-25 10:21:26 -07:00
David Green	9727c77d58	[NFC] Rename Instrinsic to Intrinsic	2022-04-25 18:13:23 +01:00
Florian Hahn	6a6cc5542b	[SimpleLoopUnswitch] Enable freezing of conditions by default. This fixes a series of mis-compiles by SimpleLoopUnswitch. My measurements showed no performance regression with -O3 on AArch64 in SPEC2006, SPEC2017 and a set of internal benchmarks. Fixes #50387, #50430 Depends on D124251. Reviewed By: nikic, aqjune Differential Revision: https://reviews.llvm.org/D124252	2022-04-25 14:26:41 +01:00
Nikita Popov	e8945110d2	[InstCombine] Remove redundant unsigned underflow fold (NFCI) This is now handled as a combination of two other folds: (A+B) <= A & (A+B) != 0 --> (A+B)-1 < A (A+B)-1 < A --> -B < A	2022-04-25 14:22:43 +02:00
Nikita Popov	ee50925894	[InstCombine] Fold (X != 0) & (Y u>= X) This adds the De Morgan conjugated fold for the existing (X == 0) \| (Y u< X) fold. Proof: https://alive2.llvm.org/ce/z/3Me3JQ	2022-04-25 13:16:47 +02:00
Nikita Popov	2bec8d6d59	[InstCombine] Fold X + Y + C u< X This is a variation on the X + Y u< X fold with an extra constant. Proof: https://alive2.llvm.org/ce/z/VNb8pY	2022-04-25 12:53:39 +02:00
Max Kazantsev	606a000d1a	[LoopInstSimplify] Ignore users in unreachable blocks. PR55072 Logic in this pass assumes that all users of loop instructions are either in the same loop or are LCSSA Phis. In fact, there can also be users in unreachable blocks that currently break assertions. Such users don't need to go to the next round of simplifications. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D124368	2022-04-25 17:35:28 +07:00
Chenbing Zheng	5805cfb901	[InstCombine] Complete folding of fneg-of-fabs This patch add a function foldSelectWithFCmpToFabs, and do more combine for fneg-of-fabs. With 'nsz': fold (X < +/-0.0) ? X : -X or (X <= +/-0.0) ? X : -X to -fabs(x) fold (X > +/-0.0) ? X : -X or (X >= +/-0.0) ? X : -X to -fabs(x) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D123830	2022-04-25 09:53:36 +08:00
Valery N Dmitriev	edf7bed87b	[SLP][NFC] Outline lookahead heuristics into a separate helper class. Minor refactoring to reduce size of functional change D124309: look-ahead scoring routines pulled out of VLOperands and formed new LookAheadHeuristics helper class. Reviewed By: Alexey Bataev (ABataev), Vasileios Porpodas (vporpo) Differential Revision: https://reviews.llvm.org/D124313	2022-04-22 18:59:08 -07:00
Paul Kirth	4683a2effa	[llvm][misexpect] Avoid division by 0 when using sample profiling MisExpect diagnostics should not prevent compilation from succeeding, and the assertion is insufficient to prevent division by zero in release builds. This patch addresses that by replacing the assert with an early return. Additionally, it disables MisExpect diagnostics when using sample profiling, since this is the only known case where this error has manifested. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D124302	2022-04-22 22:48:00 +00:00
Florian Hahn	b341c44010	[SimpleLoopUnswitch] Check if freeze is needed for partial unswitching. We only need to insert a Freeze instruction if any of the conditions may be poison. Similar checks are already done in the other places SimpleLoopUnswitch creates Freeze instruction. Reviewed By: aeubanks, efriedma Differential Revision: https://reviews.llvm.org/D124259	2022-04-22 21:24:55 +01:00
Simon Pilgrim	ffe13960b5	[InstCombine] Fold (A & 2^C1) + A => A & (2^C1 - 1) iff bit C1 in A is a sign bit (PR21929) Alive2: https://alive2.llvm.org/ce/z/Ygq26C This is the final missing fold to handle the modulo2 simplification: https://github.com/llvm/llvm-project/issues/22303 Fixes #22303 Differential Revision: https://reviews.llvm.org/D123374	2022-04-22 16:59:02 +01:00
Nikita Popov	369ef9bf60	[InstCombine] Extract code for or of icmp eq zero and icmp fold (NFC) To make it easier to extend this to the congruent and case.	2022-04-22 16:48:59 +02:00
Nikita Popov	ba46ae7bd8	[InstCombine] Merge foldAndOfICmps() and foldOrOfICmps() (NFCI) Folds are supposed to always be added in conjugated pairs for and and or. Merge the two functions to make folds for which this is currently not the case more obvious.	2022-04-22 12:48:03 +02:00
Nikita Popov	3e1d2c352c	[InstCombine] Fix or of commuted foldable predicates `1d90e53044` switch this code to store the predicates and operands in variables, but retained a swapOperands() call here. Thus the commuted cases were no longer folded. Additionally, as the change was not reported, the next InstCombine iteration would not pick it up either.	2022-04-22 12:31:26 +02:00
Nikita Popov	993b166deb	Reapply [SimplifyCFG] Handle branch on same condition in pred more directly Reapplying without changes, after a fix to a dependent patch. ----- Rather than creating a PHI node and then using the PHI threading code, directly handle this case in FoldCondBranchOnValueKnownInPredecessor(). This change is supposed to be NFC-ish, but may cause changes due to different transform order.	2022-04-22 10:27:38 +02:00
Nikita Popov	df18e37541	Reapply [SimplifyCFG] Make FoldCondBranchOnPHI more amenable to extension (NFCI) Reapply with SmallMapVector instead of SmallDenseMap, which should address the non-determinism issue. ----- This general threading transform can be performed whenever we know a constant value for the condition in a predecessor, which would currently just be the case of a phi node with constant arguments.	2022-04-22 09:42:11 +02:00
Fangrui Song	16a4d3a85c	[LegacyPM] Remove AddressSanitizerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove AddressSanitizerLegacyPass, ModuleAddressSanitizerLegacyPass, and ASanGlobalsMetadataWrapperPass. MemorySanitizerLegacyPass was removed in D123894. Reviewed By: #sanitizers, vitalybuka Differential Revision: https://reviews.llvm.org/D124216	2022-04-21 19:25:57 -07:00
Nico Weber	0e0759f441	Revert "[LegacyPM] Remove AddressSanitizerLegacyPass" This reverts commit `e68c589e53`. Breaks check-llvm, see comments on https://reviews.llvm.org/D124216	2022-04-21 22:14:36 -04:00
Fangrui Song	e68c589e53	[LegacyPM] Remove AddressSanitizerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove AddressSanitizerLegacyPass, ModuleAddressSanitizerLegacyPass, and ASanGlobalsMetadataWrapperPass. MemorySanitizerLegacyPass was removed in D123894. Reviewed By: #sanitizers, vitalybuka Differential Revision: https://reviews.llvm.org/D124216	2022-04-21 18:18:39 -07:00
Vitaly Buka	9be90748f1	Revert "[asan] Emit .size directive for global object size before redzone" Revert "[docs] Fix underline" Breaks a lot of asan tests in google. This reverts commit `365c3e85bc`. This reverts commit `78a784bea4`.	2022-04-21 16:21:17 -07:00
Alex Brachet	78a784bea4	[asan] Emit .size directive for global object size before redzone This emits an `st_size` that represents the actual useable size of an object before the redzone is added. Reviewed By: vitalybuka, MaskRay, hctim Differential Revision: https://reviews.llvm.org/D123010	2022-04-21 20:46:38 +00:00
Sanjay Patel	664ae7bbcc	[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X (2nd try) The first attempt at this missed a check to make sure the offset constant was in range and caused many bot failures. That was missed in the Alive2 proof because on overshift creates poison rather than the assert from APInt. Here's an alternate attempt at a proof using count-trailing-zeros: https://alive2.llvm.org/ce/z/pnXQYR Original commit message: This is similar to an existing pre-shift-of-constant fold: `8a9c70fc01` ...but in this case, we need no-wrap on the shl and a negative offset: https://alive2.llvm.org/ce/z/_RVz99	2022-04-21 16:18:46 -04:00
Fangrui Song	35e350d5ba	Revert "[SimplifyCFG] Handle branch on same condition in pred more directly" and "[SimplifyCFG] Make FoldCondBranchOnPHI more amenable to extension" This reverts commit `3df86e799e`. This reverts commit `8988254667`. `[SimplifyCFG] Handle branch on same condition in pred more directly` caused non-determinism when compiling opt with a bootstrapped clang. I have to revert the dependent commit as well.	2022-04-21 12:58:58 -07:00
Fangrui Song	409eb5dc3e	[LegacyPM] Remove GCOVProfilerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove GCOVProfilerLegacyPass. I have checked many LLVM users and only llvm-hs[1] uses the legacy gcov pass. [1]: https://github.com/llvm-hs/llvm-hs/issues/392 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123829	2022-04-21 10:59:30 -07:00
Fangrui Song	d133538b8b	[LegacyPM] Remove MemorySanitizerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove MemorySanitizerLegacyPass. Differential Revision: https://reviews.llvm.org/D123894	2022-04-21 10:21:46 -07:00
Vasileios Porpodas	889588ee97	[SLP] Refactoring isLegalBroadcastLoad() to use `ElementCount`. Replacing `unsigned` with `ElementCount` in the argument of `isLegalBroadcastLoad()`. This helps reduce the diff of a future SLP patch for AArch64.	2022-04-21 10:19:00 -07:00
chenglin.bi	25aba1abb5	Revert "[InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1)" This reverts commit `b543d28df7`.	2022-04-22 00:56:20 +08:00
chenglin.bi	b543d28df7	[InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1) Follow up D123453, add one-use limitation for (X * C2) << C1 --> X * (C2 << C1) to make consistent with lshr (mul nuw x, MulC), ShAmtC -> mul nuw x, (MulC >> ShAmtC) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124183	2022-04-22 00:32:36 +08:00
Sanjay Patel	8960ba7491	Revert "[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X" This reverts commit `5819f4a422`. This caused bots to fail with a crash/assert during the fold, so some constraint was missed.	2022-04-21 12:15:27 -04:00
Sanjay Patel	5819f4a422	[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X This is similar to an existing pre-shift-of-constant fold: `8a9c70fc01` ...but in this case, we need no-wrap on the shl and a negative offset: https://alive2.llvm.org/ce/z/_RVz99 Fixes #54890	2022-04-21 11:38:27 -04:00
Nikita Popov	46c2b41d02	[InstCombine] Remove dead code (NFC) This was a leftover condition without code.	2022-04-21 15:53:53 +02:00
Nikola Tesic	c5600aef88	[Debugify] Limit number of processed functions for original mode Debugify in OriginalDebugInfo mode, does (DebugInfo) collect-before-pass & check-after-pass for each instruction, which is pretty expensive. When used to analyze DebugInfo losses in large projects (like LLVM), this raises the build time unacceptably. This patch introduces a limit for the number of processed functions per compile unit. By default, the limit is set to UINT_MAX (practically unlimited), and by using the introduced option -debugify-func-limit the limit could be set to any positive integer number. Differential revision: https://reviews.llvm.org/D115714	2022-04-21 13:58:17 +02:00
Nikita Popov	3df86e799e	[SimplifyCFG] Handle branch on same condition in pred more directly Rather than creating a PHI node and then using the PHI threading code, directly handle this case in FoldCondBranchOnValueKnownInPredecessor(). This change is supposed to be NFC-ish, but may cause changes due to different transform order.	2022-04-21 11:22:02 +02:00
Nikita Popov	8988254667	[SimplifyCFG] Make FoldCondBranchOnPHI more amenable to extension This general threading transform can be performed whenever we know a constant value for the condition in a predecessor, which would currently just be the case of a phi node with constant arguments.	2022-04-21 10:49:49 +02:00
Chuanqi Xu	7eaa84eac3	[NFC] Code cleanups for coroutine after we remvoed legacy passes	2022-04-21 15:32:46 +08:00
Chuanqi Xu	483efc9ad0	[Pipelines] Remove Legacy Passes in Coroutines The legacy passes are deprecated now and would be removed in near future. This patch tries to remove legacy passes in coroutines. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D123918	2022-04-21 10:59:11 +08:00
Florian Hahn	bea69b232f	[VPlan] Initial modeling of middle block in VPlan. This patch extends the scope of VPlan to also include the exit (aka middle) block. For now, the exit block remains empty, but handling of exit values will subsequently be moved to VPlan, by adding recipes to model exit values in the exit block. As a first step, this will allow fixing #51366. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123457	2022-04-20 19:34:41 +01:00
Craig Topper	e3f6c2d288	[InstCombine] Don't look through bitcast from vector in collectInsertionElements. We're making a recursive call here and everything in the function assumes we're looking at scalars. This would be violated if we looked through a bitcast from vectors. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124015	2022-04-20 09:15:32 -07:00
chenglin.bi	1fae4b492d	[InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor if c is divisible by (1 << ShAmtC), we can fold this pattern: lshr (mul nuw x, c), ShAmtC -> mul nuw x, (c >> ShAmtC) https://alive2.llvm.org/ce/z/ox4wAt Fix https://github.com/llvm/llvm-project/issues/54824 Reviewed By: spatel, lebedev.ri, craig.topper Differential Revision: https://reviews.llvm.org/D123453	2022-04-21 00:13:36 +08:00
Sanjay Patel	bf09a925f2	[InstCombine] remove likely redundant ValueTracking-based folds for shifts This is not expected to have a functional difference as discussed in the post-commit comments for `8a9c70fc01`. All of the motivating tests for the older fold still optimize as expected because other code can infer the 'nuw'.	2022-04-20 11:28:31 -04:00
Nikita Popov	d727505e40	[SimplifyCFG] Remove one-use limitation in FoldCondBranchOnPHI() BlockIsSimpleEnoughToThreadThrough() already checks that the phi (and all other instructions) are not used outside the block, so this one-use check is not necessary for legality. I also don't see any reason why it would be necessary for profitability (in fact, those extra uses will be replaced with constants, which should be generally profitable).	2022-04-20 15:56:20 +02:00
Chuanqi Xu	5b6742a6bd	[NFC] Return correct PreservedAnalysis for CoroEarly This is a fix for previous typo. It makes no sense to return PreservedAnalyses::all() if anything is change. This change simplify codes further.	2022-04-20 16:47:10 +08:00
Fangrui Song	14d9390721	Revert D123198 "[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building libcalls." test/Transforms/InstCombine/pr39177.ll failed in a -DLLVM_USE_SANITIZER=Undefined build. ``` lib/Transforms/Utils/BuildLibCalls.cpp:1217:17: runtime error: reference binding to null pointer of type 'llvm::Function' ``` `Function &F = *M->getFunction(Name);` This reverts commit `0f8c626723`.	2022-04-19 22:26:10 -07:00
Andrew Browne	204c12eef9	[DFSan] Print an error before calling null extern_weak functions, incase dfsan instrumentation optimized out a null check. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D124051	2022-04-19 17:01:41 -07:00
Ilia Diachkov	6c69427e88	[SPIR-V](3/6) Add MC layer, object file support, and InstPrinter The patch adds SPIRV-specific MC layer implementation, SPIRV object file support and SPIRVInstPrinter. Differential Revision: https://reviews.llvm.org/D116462 Authors: Aleksandr Bezzubikov, Lewis Crawford, Ilia Diachkov, Michal Paszkowski, Andrey Tretyakov, Konrad Trifunovic Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com> Co-authored-by: Ilia Diachkov <iliya.diyachkov@intel.com> Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com> Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com> Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>	2022-04-20 01:10:25 +02:00
Paul Kirth	bac6cd5bf8	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D115907	2022-04-19 21:23:48 +00:00
Jonas Paulsson	0f8c626723	[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building libcalls. A new set of overloaded functions named getOrInsertLibFunc() are now supposed to be used instead of getOrInsertFunction() when building a libcall from within an LLVM optimizer(). The idea is that this new function also makes sure that any mandatory argument attributes are added to the function prototype (after calling getOrInsertFunction()). inferLibFuncAttributes() is renamed to inferNonMandatoryLibFuncAttrs() as it only adds attributes that are not necessary for correctness but merely helping with later optimizations. Generally, the front end is responsible for building a correct function prototype with the needed argument attributes. If the middle end however is the one creating the call, e.g. when replacing one libcall with another, it then must take this responsibility. This continues the work of properly handling argument extension if required by the target ABI when building a lib call. getOrInsertLibFunc() now does this for all libcalls currently built by any LLVM optimizer. It is expected that when in the future a new optimization builds a new libcall with an integer argument it is to be added to getOrInsertLibFunc() with the proper handling. Note that not all targets have it in their ABI to sign/zero extend integer arguments to the full register width, but this will be done selectively as determined by getExtAttrForI32Param(). Review: Eli Friedman, Nikita Popov, Dávid Bolvanský Differential Revision: https://reviews.llvm.org/D123198	2022-04-19 21:22:07 +02:00
Sanjay Patel	8a9c70fc01	[InstCombine] C0 shift (X add nuw C) --> (C0 shift C) shift X With 'nuw' we can convert the increment of the shift amount into a pre-shift (constant fold) of the shifted constant: https://alive2.llvm.org/ce/z/FkTyR2 Fixes issue #41976	2022-04-19 15:21:34 -04:00
Vasileios Porpodas	8d4b5e0833	[NFC][SLP] Improved description of getShallowScore() and getScoreAtLevelRec() Differential Revision: https://reviews.llvm.org/D124027	2022-04-19 12:15:36 -07:00
Florian Hahn	4026b718b8	[VPlan] Remove unused SCEV forward declaration (NFC).	2022-04-19 17:16:17 +02:00
Alexey Bataev	883571928c	Revert "[SLP]Improve reductions analysis and emission, part 1." This reverts commit `0e1f4d4d3c` to fix a crash reported in PR54976	2022-04-19 06:17:03 -07:00
Florian Hahn	a65f2730d2	[VPlan] Expand induction step in VPlan pre-header. This patch moves SCEV expansion of steps used by VPWidenIntOrFpInductionRecipes to the pre-header using VPExpandSCEVRecipe. This ensures that those steps are expanded while the CFG is in a valid state. Previously, SCEV expansion may happen during vector body code-generation, during which the CFG may be invalid, causing issues with SCEV expansion. Depends on D122095. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D122096	2022-04-19 13:06:39 +02:00
Chuanqi Xu	f9bee35689	[Pipelines] Hoist CoroEarly as a module pass This change could reduce the time we call `declaresCoroEarlyIntrinsics`. And it is helpful for future changes. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D123925	2022-04-19 11:04:24 +08:00
Michael Kruse	2d92ee97f1	Reapply "[OpenMP] Refactor OMPScheduleType enum." This reverts commit `af0285122f`. The test "libomp::loop_dispatch.c" on builder openmp-gcc-x86_64-linux-debian fails from time-to-time. See #54969. This patch is unrelated.	2022-04-18 21:56:47 -05:00
Vasileios Porpodas	b1333f03d9	Recommit "[SLP] Support internal users of splat loads" Code review: https://reviews.llvm.org/D121940 This reverts commit `359dbb0d3d`.	2022-04-18 15:58:01 -07:00
Michael Kruse	af0285122f	Revert "[OpenMP] Refactor OMPScheduleType enum." This reverts commit `9ec501da76`. It may have caused the openmp-gcc-x86_64-linux-debian buildbot to fail. https://lab.llvm.org/buildbot/#/builders/4/builds/20377	2022-04-18 14:38:31 -05:00
Sanjay Patel	3a27b51b27	[InstCombine] reduce code for freeze of undef The description was ambiguous about the behavior when boths select arms are constant or both arms are not constant. I don't think there's any evidence to support either way, but this matches the code with a more specified description. We can extend this to deal with vector constants with undef/poison elements. Currently, those don't get folded anywhere.	2022-04-18 15:14:02 -04:00
Vasileios Porpodas	359dbb0d3d	Revert "[SLP] Support internal users of splat loads" This reverts commit `f8e1337115`.	2022-04-18 12:12:34 -07:00
Michael Kruse	9ec501da76	[OpenMP] Refactor OMPScheduleType enum. The OMPScheduleType enum stores the constants from libomp's internal sched_type in kmp.h and are used by several kmp API functions. The enum values have an internal structure, namely each scheduling algorithm (e.g.) exists in four variants: unordered, orderend, normerge unordered, and nomerge ordered. This patch (basically a followup to D114940) splits the "ordered" and "nomerge" bits into separate flags, as was already done for the "monotonic" and "nonmonotonic", so we can apply bit flags operations on them. It also now contains all possible combinations according to kmp's sched_type. Deriving of the OMPScheduleType enum from clause parameters has been moved form MLIR's OpenMPToLLVMIRTranslation.cpp to OpenMPIRBuilder to make available for clang as well. Since the primary purpose of the flag is the binary interface to libomp, it has been made more private to LLVMFrontend. The primary interface for generating worksharing-loop using OpenMPIRBuilder code becomes `applyWorkshareLoop` which derives the OMPScheduleType automatically and calls the appropriate emitter function. While this is mostly a NFC refactor, it still applies the following functional changes: * The logic from OpenMPToLLVMIRTranslation to derive the OMPScheduleType also applies to clang. Most notably, it now applies the nonmonotonic flag for non-static schedules by default. * In OpenMPToLLVMIRTranslation, the nonmonotonic default flag was previously not applied if the simd modifier was used. I assume this was a bug, since the effect was due to `loop.schedule_modifier()` returning `mlir::omp::ScheduleModifier::none` instead of `llvm::Optional::None`. * In OpenMPToLLVMIRTranslation, the nonmonotonic default flag was set even if ordered was specified, in breach to what the comment before citing the OpenMP specification says. I assume this was an oversight. The ordered flag with parameter was not considered in this patch. Changes will need to be made (e.g. adding/modifying function parameters) when support for it is added. The lengthy names of the enum values can be discussed, for the moment this is avoiding reusing previously existing enum value names such as `StaticChunked` to avoid confusion. Reviewed By: peixin Differential Revision: https://reviews.llvm.org/D123403	2022-04-18 14:03:17 -05:00
Vasileios Porpodas	f8e1337115	[SLP] Support internal users of splat loads Until now we would only accept a broadcast load pattern if it is only used by a single vector of instructions. This patch relaxes this, and allows for the broadcast to have more than one user vector, as long as all of its uses are internal to the SLP graph and vectorized. Differential Revision: https://reviews.llvm.org/D121940	2022-04-18 11:59:44 -07:00
Arthur Eubanks	2e6ac54cf4	[LegacyPM] Remove ThinLTO/LTO pipelines Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove the (Thin)LTO pipelines. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D123882	2022-04-18 10:09:41 -07:00
Zarko Todorovski	ce87133120	[llvm][IPO] Inclusive language: Rename mergefunc-sanity to mergefunc-verify and remove other instances of sanity in MergeFunctions.cpp This patch renames the mergefunc-sanity to mergefunc-verify and renames the related functions to use more inclusive language Reviewed By: cebowleratibm Differential Revision: https://reviews.llvm.org/D114374	2022-04-18 11:50:08 -04:00
Aaron Ballman	86cdb2929c	Silence a "not all control paths return a value" warning; NFC	2022-04-18 08:54:08 -04:00
Johannes Doerfert	e87f10a771	[Attributor] CGSCC pass should not recompute results outside the SCC (reapply) When we run the CGSCC pass we should only invest time on the SCC. We can initialize AAs with information from the module slice but we should not update those AAs. We make an exception for are call site of the SCC as they are helpful providing information for the SCC. Minor modifications to pointer privatization allow us to perform it even in the CGSCC pass, similar to ArgumentPromotion.	2022-04-17 12:48:49 -05:00
Andrew Litteken	d7c56a076e	[IROutliner] Ensure that phi values that are passed in as arguments are remapped as arguments Issue: https://github.com/llvm/llvm-project/issues/54430 For incoming values of phi nodes added to an outlined function to accommodate different exit paths in the function, when a value is a constant that is passed into the outlined function as an argument, we find the corresponding value in the first extracted function used to fill the overall outlined function. When this value is an argument, the corresponding value used will be the old value, prior to outlining. This patch maintains a mapping from these values to arguments, and uses this mapping to update the added phi node accordingly. Reviewers: paquette Recommit of `d6eb480afb` Differential Revision: https://reviews.llvm.org/D122206	2022-04-16 15:47:52 -05:00
eop Chen	38ec33d6b9	[LSR] Update outdated comment	2022-04-16 12:11:15 -07:00
Joseph Huber	984a0dc386	[OpenMP] Use new offloading binary when embedding offloading images The previous patch introduced the offloading binary format so we can store some metada along with the binary image. This patch introduces using this inside the linker wrapper and Clang instead of the previous method that embedded the metadata in the section name. Differential Revision: https://reviews.llvm.org/D122683	2022-04-15 20:35:26 -04:00
Johannes Doerfert	3be3b40188	[Attributor][NFCI] Introduce AttributorConfig to bundle all options Instead of lengthy constructors we can now set the members of a read-only struct before the Attributor is created. Should make it clearer what is configurable and also help introducing new options in the future. This actually added IsModulePass and avoids deduction through the Function set size. No functional change was intended.	2022-04-15 18:17:19 -05:00
Florian Hahn	73f5d7d0d6	[VPlan] Handle equal address and store ops in onlyFirstLaneDemanded. With opaque pointers, the stored value and address can be the same. Previously the code in VPWidenMemoryInstructionRecipe::onlyFirstLaneDemanded incorrectly considers stores with matching store and pointer operands as only demanding the first lane, causing a crash.	2022-04-15 22:53:33 +02:00
Johannes Doerfert	04f3a224bc	[Attributor][NFC] Introduce a flag to distinguish the scope of a query	2022-04-15 14:56:10 -05:00
Johannes Doerfert	bd72acf4d8	[Attributor][NFC] Code cleanup to minimize follow up changes	2022-04-15 14:56:09 -05:00
Johannes Doerfert	2d8e7834b0	[Attributor][NFC] Rename AAPotentialValues to AAPotentialConstantValues	2022-04-15 14:56:09 -05:00
Johannes Doerfert	1fb415fee9	[AMDGPU][FIX] Proper load-store-vectorizer result with opaque pointers The original code relied on the fact that we needed a bitcast instruction (for non constant base objects). With opaque pointers there might not be a bitcast. Always check if reordering is required instead. Fixes: https://github.com/llvm/llvm-project/issues/54896 Differential Revision: https://reviews.llvm.org/D123694	2022-04-15 13:42:46 -05:00
Fangrui Song	04e094a336	[PGO] Remove legacy PM passes Legacy PM for optimization pipeline was deprecated in 13.0.0 and Clang dropped legacy PM support in D123609. This change removes legacy PM passes for PGO so that downstream projects won't be able to use it. It seems appropriate to start removing such "add-on" features like instrumentations, before we remove more stuff after 15.x is branched. I have checked many LLVM users and only ldc[1] uses the legacy PGO pass. [1]: https://github.com/ldc-developers/ldc/issues/3961 Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D123834	2022-04-15 10:26:43 -07:00
Andrew Browne	cddcf2170a	[DFSan] Avoid replacing uses of functions in comparisions. This can cause crashes by accidentally optimizing out checks for extern_weak_func != nullptr, when replaced with a known-not-null wrapper. This solution isn't perfect (only avoids replacement on specific patterns) but should address common cases. Internal reference: b/185245029 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D123701	2022-04-14 14:14:52 -07:00
Sanjay Patel	2c2568f39e	[InstCombine] canonicalize select with signbit test This is part of solving issue #54750 - in that example we have both forms of the compare and do not recognize the equivalence.	2022-04-14 14:28:47 -04:00
Andrew Litteken	6f8eba06c2	Revert "[IROutliner] Ensure that phi values that are passed in as arguments are remapped as arguments" Failing test due to typo This reverts commit `d6eb480afb`.	2022-04-14 12:23:33 -05:00
Andrew Litteken	d6eb480afb	[IROutliner] Ensure that phi values that are passed in as arguments are remapped as arguments Issue: https://github.com/llvm/llvm-project/issues/54430 For incoming values of phi nodes added to an outlined function to accommodate different exit paths in the function, when a value is a constant that is passed into the outlined function as an argument, we find the corresponding value in the first extracted function used to fill the overall outlined function. When this value is an argument, the corresponding value used will be the old value, prior to outlining. This patch maintains a mapping from these values to arguments, and uses this mapping to update the added phi node accordingly. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D122206	2022-04-14 12:16:23 -05:00
Andrew Litteken	a919d3d888	[IROutliner] Ensure that incoming blocks of PHINodes are included in the unique numbering gneration for phi nodes for each exit path Issue: https://github.com/llvm/llvm-project/issues/54431 PHINodes that need to be generated to accommodate a PHINode outside the region due to different output paths need to have their own numbering to determine the number of output schemes required to properly handle all the outlined regions. This numbering was previously only determined by the order and values of the incoming values, as well as the parent block of the PHINode. This adds the incoming blocks to the calculation of a hash value for these PHINodes as well, and the supporting infrastructure to give each block in a region a corresponding canonical numbering. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D122207	2022-04-14 12:13:17 -05:00
chenglin.bi	00871e2f4f	[SimplifyCFG] Try to fold switch with single result value and power-of-2 cases to mask+select When switch with 2^n cases go to one result, check if the 2^n cases can be covered by n bit masks. If yes we can use "and condition, ~mask" to simplify the switch case 0 2 4 6 -> and condition, -7 https://alive2.llvm.org/ce/z/jjH_0N case 0 2 8 10 -> and condition, -11 https://alive2.llvm.org/ce/z/K7E-2V case 2 4 8 12 -> and (sub condition, 2), -11 https://alive2.llvm.org/ce/z/CrxbYg Fix one case of https://github.com/llvm/llvm-project/issues/39957 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D122485	2022-04-15 00:10:00 +08:00
Florian Hahn	2c14cdf831	[VPlan] Turn external defs in Value -> VPValue mapping. This addresses an existing TODO by keeping a mapping of external IR Value * definitions wrapped in VPValues for use in a VPlan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123700	2022-04-14 12:03:09 +02:00
Ruiling Song	1e01f95057	LowerSwitch: Avoid inserting NewDefault block The NewDefault was used to simplify the updating of PHI nodes, but it causes some inefficiency for target that will run structurizer later. For example, for a simple two-case switch, the extra NewDefault is causing unstructured CFG like: O / \ O O / \ / \ C1 ND C2 \ \| / \ \| / D The change is to avoid the ND(NewDefault) block, that is we will get a structured CFG for above example like: O / \ / \ O O / \ / \ C1 \ / C2 \-> D <-/ The IR change introduced by this patch should be trivial to other targets, so I am doing this unconditionally. Fall-through among the cases will also cause unstructured CFG, but it need more work and will be addressed in a separate change. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D123607	2022-04-14 13:30:56 +08:00
Sanjay Patel	0ef46dc0f9	[SimplifyCFG] improve readability in switch-to-select; NFC	2022-04-13 17:14:45 -04:00
serge-sans-paille	fa5a4e1b95	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `a96638e50e` detected a few regressions, fixing them.	2022-04-13 20:53:19 +02:00
Florian Hahn	ad95255b92	Revert "[LICM] Only create load in pre-header when promoting load." This reverts commit `4bf3b7dc92`. This might be causing another buildbot failure.	2022-04-13 20:24:28 +02:00
serge-sans-paille	262eba01b3	Revert "[ValueTracking] Make getStringLenth aware of strdup" This reverts commit `e810d55809`. The commit was not taken into account the fact that strduped string could be modified. Checking if such modification happens would make the function very costly, without a test case in mind it's not worth the effort.	2022-04-13 19:17:28 +02:00
Anna Thomas	28f27dd264	Check users of instrinsics instead of traversing entire function.NFC Updated LowerGuardIntrinsic and LowerWidenableCondition to check for users of the respective intrinsic, instead of checking for guards and widenable conditions by traversing the entire function. This is an NFC. Should save some compile time.	2022-04-13 12:28:51 -04:00
Florian Hahn	4bf3b7dc92	Recommit "[LICM] Only create load in pre-header when promoting load." This reverts the revert commit `1ddc719680`. This version of the patch sets the initial available value to poison, which resolves an issue with the SSAUpdater breaking LCSSA form.	2022-04-13 17:20:39 +02:00
Nikita Popov	8c74169990	[SimplifyLibCalls] Don't mark memchr() memory as fully dereferenceable C11 specifies memchr() as follows: > The memchr function locates the first occurrence of c (converted > to an unsigned char) in the initial n characters (each interpreted > as unsigned char) of the object pointed to by s. The implementation > shall behave as if it reads the characters sequentially and stops > as soon as a matching character is found. In particular, it is well-defined to specify a memchr size larger than the underlying object, as long as the character is found before the end of the object. Differential Revision: https://reviews.llvm.org/D123665	2022-04-13 16:46:18 +02:00
Sanjay Patel	cd0d0d633b	[SimplifyCFG] make a debug option for case max when converting switch to select This should be "NFC" as written, but it will make D122485 smaller and give us more flexibility to experiment with optimization level vs. compile-time. Differential Revision: https://reviews.llvm.org/D123625	2022-04-13 06:55:13 -04:00
Vitaly Buka	79fa8be4ae	[NFC][msan] Switch pointer to a reference	2022-04-12 18:45:50 -07:00
Alexey Bataev	0e1f4d4d3c	[SLP]Improve reductions analysis and emission, part 1. Currently SLP vectorizer walks through the instructions and selects 3 main classes of values: 1) reduction operations - instructions with same reduction opcode (add, mul, min/max, etc.), which build the reduction, 2) reduced values - instructions with the same opcodes, but different from the reduction opcode, 3) extra arguments - all other values, instructions from the different basic block rather than the root node, instructions with to many/less uses. This scheme is not very efficient. It excludes some instructions and all non-instruction values from the reductions (constants, proficient gathers), to many possibly reduced values are marked as extra arguments. Patch improves this process by introducing a bit extended analysis stage. During this stage, we still try to select 3 classes of the values: 1) reduction operations - same as before, 2) possibly reduced values - all instructions from the current block/non-instructions, which may build a vectorization tree, 3) extra arguments - instructions from the different basic blocks. Additionally, an extra sorting of the possibly reduced values occurs to build the scalar sequences which highly likely will bed vectorized, e.g. loads are grouped by the distance between them, constants are grouped together, cmp instructions are sorted by their compare types and predicates, extractelement instructions are sorted by the vector operand, etc. Also, these groups are reordered by their length so the longest group is the first in the list of the possibly reduced values. The vectorization process tries to emit the reductions for all these groups. These reductions, remaining non-vectorized possible reduced values and extra arguments are then combined into the final expression just like it was before. Differential Revision: https://reviews.llvm.org/D114171	2022-04-12 17:46:11 -07:00
Muhammad Omair Javaid	42ebfa8269	Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth" This reverts commit `64b6192e81`. This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage: https://lab.llvm.org/buildbot/#/builders/176/builds/1515 llvm-tblgen crashes after applying this patch.	2022-04-13 04:53:07 +05:00
Arthur Eubanks	51561b5e80	[ArgPromo][OpaquePointer] Don't promote mismatched function types Mismatched call/callee function types is considered an indirect call. Fixes crash in https://reviews.llvm.org/D123300#3446023.	2022-04-12 15:17:45 -07:00
Nikita Popov	0adadfa68f	[MSan] Ensure argument shadow initialized on memcpy We need to explicitly query the shadow here, because it is lazily initialized for byval arguments. Without opaque pointers this used to mostly work out, because there would be a bitcast to `i8*` present, and that would query, and copy in case of byval, the argument shadow. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D123602	2022-04-12 14:53:02 -07:00
Vitaly Buka	efdc90baaa	Revert "[MSan] Ensure argument shadow initialized on memcpy" Invalid author. This reverts commit `163a9f4552`.	2022-04-12 14:53:02 -07:00
Vitaly Buka	163a9f4552	[MSan] Ensure argument shadow initialized on memcpy We need to explicitly query the shadow here, because it is lazily initialized for byval arguments. Without opaque pointers this used to mostly work out, because there would be a bitcast to `i8*` present, and that would query, and copy in case of byval, the argument shadow. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D123602	2022-04-12 14:49:52 -07:00
Sanjay Patel	d9211be13d	[SimplifyCFG] cleanup code for converting switch to select (NFC) This renames functions for more general usage (and current capitalization style) before a proposed logic change in D122485. Differential Revision: https://reviews.llvm.org/D123614	2022-04-12 12:17:54 -04:00
serge-sans-paille	e810d55809	[ValueTracking] Make getStringLenth aware of strdup During strlen compile-time evaluation, make it possible to track size of strduped strings. Differential Revision: https://reviews.llvm.org/D123497	2022-04-12 14:47:29 +02:00
Liqin Weng	fa4b4f0fcb	[InstCombine] fold more constant remainder to select-of-constants remainder Reviewed By: xbolva00, spatel, Chenbing.Zheng Differential Revision: https://reviews.llvm.org/D123486	2022-04-12 09:40:56 +08:00
Alexander Shaposhnikov	f6bb156fb1	[InstCombine] Fold icmp(X) ? f(X) : C This diff extends foldSelectInstWithICmp to handle the case icmp(X) ? f(X) : C when f(X) is guaranteed to be equal to C for all X in the exact range of the inverse predicate. This addresses the issue https://github.com/llvm/llvm-project/issues/54089. Differential revision: https://reviews.llvm.org/D123159 Test plan: make check-all	2022-04-12 01:32:55 +00:00
Sanjay Patel	1206a18d41	[InstCombine] guard against splat-mul corner case The test is already simplified, and I'm not sure how to write a test to exercise the new clause. But it protects the 2-bit pattern from miscompiling as noted in D123453. https://alive2.llvm.org/ce/z/QPyVfv (If we managed to fall into the mul transform, it would wrongly create a zero on this pattern.)	2022-04-11 15:50:13 -04:00
Whitney Tsang	80304c5f88	[LoopUnroll] Always respect user unroll pragma IMO when user provide unroll pragma, compiler should always respect it. It is not clear to me why loop unroll pass currently ensure that the unrolled loop size is limited by PragmaUnrollThreshold. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D119148	2022-04-11 14:33:24 -04:00
Sanjay Patel	7783db55af	[InstCombine] try to fold low-mask of ashr to lshr With one-use, we handle this via demanded-bits. But We need to handle extra uses to improve issue #54750. https://alive2.llvm.org/ce/z/aDYkPv	2022-04-11 11:56:40 -04:00
Florian Hahn	1ddc719680	Revert "[LICM] Only create load in pre-header when promoting load." This reverts commit `42229b96bf`. This appears to cause crashes on multiple bots.	2022-04-11 17:37:23 +02:00
Nikita Popov	9af8cc8d17	[SimplifyLibCalls] Remove unnecessary inbounds check Even if the GEP is not inbounds, the GEP will have provenance of the global, and accessing past the extent of the global would be undefined behavior.	2022-04-11 16:51:09 +02:00
Florian Hahn	42229b96bf	[LICM] Only create load in pre-header when promoting load. When only a store is sunk, there is no need to create a load in the pre-header, as the result of the load will never get used. The dead load can can introduce UB, if the function is marked as writeonly. Fixes #51248. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123473	2022-04-11 16:45:18 +02:00
Simon Pilgrim	431e93f4f5	[InstCombine] Fold sub(add(x,y),min/max(x,y)) -> max/min(x,y) (PR38280) As discussed on Issue #37628, we can flip a min/max node if we're subtracting from the sum of the node's operands Alive2: https://alive2.llvm.org/ce/z/W_KXfy Differential Revision: https://reviews.llvm.org/D123399	2022-04-11 11:32:56 +01:00
Florian Hahn	5f1eb74850	[VPlan] Place VPExpandSCEVRecipe in pre-header. After D121624 models the pre-header in VPlan, VPExpandSCEVRecipes can be placed there. This ensures SCEV expansion happens before modifying the CFG during VPlan execution, when CFG is incomplete. Depends on D121624. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D122095	2022-04-10 10:26:20 +02:00
Florian Hahn	256c6b0ba1	[VPlan] Model pre-header explicitly. This patch extends the scope of VPlan to also model the pre-header. The pre-header can be used to place recipes that should be code-gen'd outside the loop, like SCEV expansion. Depends on D121623. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121624	2022-04-09 14:19:47 +02:00
Matt Arsenault	9fdd25848a	Transforms: Fix code duplication between LowerAtomic and AtomicExpand	2022-04-08 19:06:36 -04:00
Florian Hahn	467dbcd9f1	[LV] Set debug loc after setting insert point. This fixes the code to actually use the location of the instruction, if available. Previously, SetInsertPoint would overwrite the insert point set from the instruction.	2022-04-08 20:34:40 +02:00
Arthur Eubanks	b22ffc7b98	[CaptureTracking] Ignore ephemeral values in EarliestEscapeInfo And thread DSE's ephemeral values to EarliestEscapeInfo. This allows more precise analysis in DSEState::isReadClobber() via BatchAA. Followup to D123162. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123342	2022-04-08 10:07:26 -07:00
Florian Hahn	29fe998eaa	[VPlan] Preserve debug location when creating branch. Update createEmptyBasicBlock to preserve the debug location of the previous terminator.	2022-04-08 17:22:53 +02:00
Zaara Syeda	07005440ae	[LSR] Optimize unused IVs to final values in the exit block Loop Strength Reduce sometimes optimizes away all uses of an induction variable from a loop but leaves the IV increments. When the only remaining use of the IV is the PHI in the exit block, this patch will call rewriteLoopExitValues to replace the exit block PHI with the final value of the IV to skip the updates in each loop iteration. Differential Revision: https://reviews.llvm.org/D118808	2022-04-08 11:16:37 -04:00
Nikita Popov	c8c6362560	[LICM] Pass MemorySSAUpdater by referene (NFC) Make it clearer that this is a required dependency.	2022-04-08 10:08:57 +02:00
Nikita Popov	5cefe7d9f5	[LoopSink] Require MemorySSA This makes MemorySSA in LoopSink required, and removes the AST-based implementation, as well as the related support code in LICM. Differential Revision: https://reviews.llvm.org/D123288	2022-04-08 09:49:44 +02:00
serge-sans-paille	aa15ea47e2	[builtin_object_size] Basic support for posix_memalign It actually implements support for seeing through loads, using alias analysis to refine the result. This is rather limited, but I didn't want to rely on more than available analysis at that point (to be gentle with compilation time), and it does seem to catch common scenario, as showcased by the included tests. Differential Revision: https://reviews.llvm.org/D122431	2022-04-08 09:31:11 +02:00
Evgeniy Brevnov	da41214d65	Add support for atomic memory copy lowering Currently, the utility supports lowering of non atomic memory transfer routines only. This patch adds support for atomic version of memcopy. This may be useful for targets not supporting atomic memcopy. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D118443	2022-04-08 10:41:31 +07:00
Austin Kerbow	26b14c3ea7	[InferAddressSpaces] Fix assert on invalid bitcast placement Similar to the problem in `0bb25b4603`, bitcasts that are inserted must dominate all uses. When rewriting "values" with "new values" that have the updated address space, we may replace the "new value" with a bitcast if one of the original users is an addresspace cast. This bitcast must be inserted before ALL users, not only before the addresspace cast. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122964	2022-04-07 20:07:53 -07:00
Chenbing Zheng	467cbb6249	[InstCombine] fold more constant divisor to select-of-constants divisor By adding a parameter to function FoldOpIntoSelect， we can fold more Ops to Select. For this example, we tend to fold the division instruction, so we no longer care whether SelectInst is one use. This patch slove TODO left in InstCombine/div.ll. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122967	2022-04-08 10:19:24 +08:00
Augie Fackler	f3c702fbd1	InstCombineCalls: fix annotateAnyAllocCallSite to report changes Spotted during review of D123052. Differential Revision: https://reviews.llvm.org/D123232	2022-04-07 13:49:09 -04:00
Arthur Eubanks	17fdaccccf	[CaptureTracking] Ignore ephemeral values when determining pointer escapeness Ephemeral values cannot cause a pointer to escape. No change in compile time: https://llvm-compile-time-tracker.com/compare.php?from=4371710085ba1c376a094948b806ddd3b88319de&to=c5ddbcc4866f38026737762ee8d7b9b00395d4f4&stat=instructions This partially fixes some regressions caused by more calls to `__builtin_assume` (D122397). Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D123162	2022-04-07 10:11:14 -07:00
Augie Fackler	b916414096	BuildLibCalls: also set allocsize() attributes This is part of being able to get rid of two more columns in MemoryBuiltins.cpp's large table. We'll have two more changes before we can finish the job. Differential Revision: https://reviews.llvm.org/D119582	2022-04-07 12:38:44 -04:00
Augie Fackler	f120be6c86	InstCombineCalls: when adding an align attribute, never reduce it Sometimes we can infer an align from an allocalign but the function already promised it'd be more-aligned than the allocalign and there's an existing align that we shouldn't reduce. Make sure we handle that correctly. Differential Revision: https://reviews.llvm.org/D121642	2022-04-07 12:38:44 -04:00
Augie Fackler	ca051a46fb	InstCombineCalls: infer return alignment from allocalign attributes This exposes a couple of lingering bugs, which will be fixed in the next two commits. Differential Revision: https://reviews.llvm.org/D123052	2022-04-07 12:38:44 -04:00
Nikita Popov	e22af03a79	[Sink] Don't sink non-willreturn calls (PR51188) Fixes https://github.com/llvm/llvm-project/issues/51188.	2022-04-07 16:35:05 +02:00
Simon Pilgrim	afa1ae9e0c	[InstCombine] SimplifyDemandedUseBits - allow and(srem(X,Pow2),C) -> and(X,C) to work on vector types Replace m_ConstantInt with m_APInt to match uniform (no-undef) vector remainder amounts.	2022-04-07 15:24:45 +01:00
Simon Pilgrim	5909c67883	[InstCombine] SimplifyDemandedUseBits - add TODO to remove shl node if we only demand known sign bits of the shift source Similar to what we already perform for ashr/lshr	2022-04-07 14:35:11 +01:00
Simon Pilgrim	5e90224839	[InstCombine] SimplifyDemandedUseBits - remove lshr node if we only demand known sign bit This is a lshr equivalent to D122340 - if we don't demand any of the additional sign bits introduced by the ashr, the lshr can be treated as an ashr and we can remove the shift entirely if we only demand already known sign bits. Another step towards PR21929 https://alive2.llvm.org/ce/z/6f3kjq Differential Revision: https://reviews.llvm.org/D123118	2022-04-07 14:33:31 +01:00
Benjamin Kramer	ff485d727f	Transforms: Remove unused include Utils can't depend on Scalar transforms.	2022-04-07 10:40:28 +02:00
Florian Hahn	4388c979da	[VPlan] Use vector.body as header name in VPlan native path. This brings the VPlan block naming in line with the naming of the generated basic blocks.	2022-04-07 10:31:12 +02:00
Nikita Popov	674ee4d353	[LoopSink] Use MemorySSA with legacy pass manager LoopSink with the legacy pass manager still uses AST, because we can't compute MemorySSA conditionally. I think now that the legacy pass manager will be removed soon(TM) we don't need to care about compile-time impact here anymore. Additionally, since MemorySSA is no longer eagerly optimized, the impact is actually not that high anymore (~0.2% geomean regression on CTMark). This just makes legacy PM and new PM behavior line up -- as a followup I'll drop these options entirely and make MemorySSA use mandatory. Differential Revision: https://reviews.llvm.org/D123216	2022-04-07 09:40:29 +02:00
Matt Arsenault	39f1568633	Transforms: Split LowerAtomics into separate Utils and pass This will allow code sharing from AtomicExpandPass. Not entirely sure why these exist as separate passes though.	2022-04-06 20:54:45 -04:00
Alina Sbirlea	08075a7ee8	Revert `f7381a795a` Roll-forward `29fada4a3d`. Issue triggered was due to UB. Differential Revision: https://reviews.llvm.org/D121987	2022-04-06 16:06:14 -07:00
Congzhe Cao	eac3487510	[LoopInterchange] Try to achieve the most optimal access pattern after interchange Motivated by pr43326 (https://bugs.llvm.org/show_bug.cgi?id=43326), where a slightly modified case is as follows. void f(int e[10][10][10], int f[10][10][10]) { for (int a = 0; a < 10; a++) for (int b = 0; b < 10; b++) for (int c = 0; c < 10; c++) f[c][b][a] = e[c][b][a]; } The ideal optimal access pattern after running interchange is supposed to be the following void f(int e[10][10][10], int f[10][10][10]) { for (int c = 0; c < 10; c++) for (int b = 0; b < 10; b++) for (int a = 0; a < 10; a++) f[c][b][a] = e[c][b][a]; } Currently loop interchange is limited to picking up the innermost loop and finding an order that is locally optimal for it. However, the pass failed to produce the globally optimal loop access order. For more complex examples what we get could be quite far from the globally optimal ordering. What is proposed in this patch is to do a "bubble-sort" fashion when doing interchange. By comparing neighbors in `LoopList` in each iteration, we would be able to move each loop onto a most appropriate place, hence this is an approach that tries to achieve the globally optimal ordering. The motivating example above is added as a test case. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D120386	2022-04-06 15:31:56 -04:00
Nikita Popov	1dc1d5a0d2	[SimplifyLibCalls] Use KnownBits helper APIs (NFC) Use helper APIs for isNonNegative() and getMaxValue() instead of flipping the zero value and having a long comment explaining why that is necessary.	2022-04-06 16:01:24 +02:00
Martin Storsjö	46776f7556	Fix warnings about variables that are set but only used in debug mode Add void casts to mark the variables used, next to the places where they are used in assert or `LLVM_DEBUG()` expressions. Differential Revision: https://reviews.llvm.org/D123117	2022-04-06 10:01:46 +03:00
Evgeniy Brevnov	acfc785c0e	Preserve aliasing info during memory intrinsics lowering By specification, source and destination of llvm.memcpy.* must either be equal or non-overlapping. This semantics is hard or impossible to figure out once lowered. This patch explicitly marks loads from source and stores to destination as not aliasing if source and destination is known to be not equal. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D118441	2022-04-06 11:33:54 +07:00
Johannes Doerfert	af30de7788	[Attributor] Introduce AAInstanceInfo The Attributor, as many other parts in LLVM, uses pointer equivalence for `llvm::Value`s. This only works as long as `llvm::Value`s are dynamically unique, or, to be exact, we will never end up with the same `llvm::Value` representing two dynamic instances. We already provided a helper to check the former, namely `AA::isDynamicallyUnique`, however we could not check the latter. In this patch we move the logic into a separate AA which helps with the growing complexity and use cases. We also extend the interface to answer the second question rather than the first. So we do not determine dynamically uniqueness but if we might end up with the `llvm::Value` describing a different dynamic instance. Note that the latter is very much tied to the Attributor capabilities to look through memory, recursion, etc. so we need to update the logic as we go.	2022-04-05 23:07:13 -05:00
Johannes Doerfert	c42aa1be74	[Attributor] Keep loads feeding in `llvm.assume` if stores stays If a load is only used by an `llvm.assume` and the stores feeding into the load are not removable, keep the load.	2022-04-05 23:07:12 -05:00
Johannes Doerfert	857bf306d7	[Attributor] Remove broken and duplicated load simplification We look through loads in the "generic value traversal" and we consequently don't need to look through them again in AAValueSimplify*. The test changes stem from the fact that we allowed any simplified value, incl. non-dynamically unique ones, as long as the underlying memory was an alloca. This doesn't seem to make sense as allocas do not protect against dynamically non-unique values. We need to make the unique check better rather than excluding allocas. That in mind, we can remove a lot of code by simply relying on the generic value traversal load look through. To soften the blow some minor adjustments have been made that allow more simplification through the now used scheme and some tests have been given a `norecurse` for now.	2022-04-05 20:49:03 -05:00
Johannes Doerfert	a8610d7523	[Attributor] Move recursion reasoning into `AA::isPotentiallyReachable` With D106397 we ensured that `AAReachability` will not answer queries for potentially recursive functions. This was necessary as we did not treat recursion explicitly otherwise. Now that we have `AA::isPotentiallyReachable` we can make `AAReachability` a purely intra-procedural AA which does not care about recursion. `AA::isPotentiallyReachable`, however, does already deal with "going back" the call graph and can now do so for potentially recursive functions.	2022-04-05 20:49:03 -05:00
Teresa Johnson	ced9a795fd	[WPD] Add statistics Add statistics to count overall devirtualized targets as well as the various types of devirtualizations applied at callsites. Differential Revision: https://reviews.llvm.org/D123152	2022-04-05 18:48:23 -07:00
Gulfem Savrun Yeniceri	bcf8f2188b	Revert "[InstrProfiling] No runtime hook for unused funcs" This reverts commit `c7f91e227a`. This patch caused an issue in Fuchsia source code coverage builders.	2022-04-06 01:41:44 +00:00
Johannes Doerfert	3e8c4366e2	[Attributor] Visit droppable uses in AAIsDead If we ignore droppable users everything only used in llvm.assume (among other things) is going to be deleted as dead. This is not helpful. Instead we want to only delete things we actually don't need anymore. A follow up will deal with loads in a smarter way.	2022-04-05 18:20:45 -05:00
Bert Abrath	019e7b7f6e	[PartiallyInlineLibCalls] Don't partially inline a musttail libcall. Partially inlining a libcall that has the musttail attribute leads to broken LLVM IR, triggering an assertion in the IR verifier. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D123116	2022-04-05 22:30:50 +03:00
Andrew Browne	5748219fd2	[DFSan] Add dfsan-combine-taint-lookup-table option as work around for false negatives when dfsan-combine-pointer-labels-on-load=0 and dfsan-combine-offset-labels-on-gep=0 miss data flows through lookup tables. Example case: `628a2825f8/absl/strings/ascii.h (L182)` Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D122787	2022-04-05 11:05:10 -07:00
Matt Devereau	2c3f66519c	[SVE] Extend support for folding select + masked gathers Extend the work done in D106376 to include masked gathers Differential Revision: https://reviews.llvm.org/D122896	2022-04-05 16:27:11 +00:00
serge-sans-paille	1e02737593	[iwyu] Fix some header include regression Running iwyu-diff from https://github.com/serge-sans-paille/preprocessor-utils makes it possible to quickly spot regression in unused includes. This patch contains the few regressions since the last header cleanup. Differential Revision: https://reviews.llvm.org/D123036	2022-04-05 15:02:03 +02:00
Jingu Kang	64b6192e81	[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth Set the maximum VF of AArch64 with 128 / the size of smallest type in loop. Differential Revision: https://reviews.llvm.org/D118979	2022-04-05 13:16:52 +01:00
Florian Hahn	1ff022e21b	[LV] Add vector.body block to parent loop during skeleton creation. When creating induction resume values, SCEV queries may rely on LoopInfo. Make sure vector.body gets added to the loop of the pre-header during skeleton construction. %vector.body will be moved to the vector preheader during VPlan execution. Fixes #54745.	2022-04-05 11:54:17 +01:00
Jonas Paulsson	dbb6a75fbb	[LibCalls] Respect TLI.getExtAttrForI32Param() in inferLibFuncAttributes(). getExtAttrForI32Param() is the method to be used for determining the type of extension attribute (if any) that is to be added for a signed/unsigned argument. Previously, the SExt attribute was always added to the i32 ldexp* argument as it was expected to be ignored by targets not needing it. This patch now changes this so that it is only added for the targets that need it in the first place. Putchar() argument is now also extended as required by the target (SystemZ in the test), to fix the issue below. Many more libcalls will be handled similarly in a following patch. Fixes https://github.com/llvm/llvm-project/issues/54532. Differential Revision: https://reviews.llvm.org/D123030 Review: Eli Friedman	2022-04-05 10:29:42 +02:00
Johannes Doerfert	79962df386	[Attributor] Allow to reproduce instructions for simplification When simplify values we might end up with an instruction from a different scope or just one that does not dominate the use. If the instruction can be reproduced without side-effect (incl. UB) we can now do that. For now this is mostly used for speculatable (intrinsic) calls but as we learn to make things like arguments or loads available this will become more powerful. This will also allow us to remove dead stores more easily in a follow up.	2022-04-04 12:28:08 -05:00
Florian Hahn	1817c526e1	[VPlan] Update VPInterleavedAccessInfo to use getVectorLoopRegion. Update VPInterleavedAccessInfo to use the generic getVectorLoopRegion helper instead of relying on the entry block being the top-most vector loop region.	2022-04-04 10:26:39 +01:00
Martin Sebor	5ccfd5f6d4	[SimplifyLibCalls] Optimize memchr() with known char+str and unknown length If both the character and string are known, but the length potentially isn't, we can optimize the memchr() call to a select of either the known position of the character or null. Split off from https://reviews.llvm.org/D122836.	2022-04-04 11:01:33 +02:00
Martin Sebor	5197d2791f	[SimplifyLibCalls] Move handling of constant char earlier (NFC) Handle the simple constant char case before the bitmask optimization. This will allow extending the code to handle a non-constant size argument in a followup change. Split out from https://reviews.llvm.org/D122836.	2022-04-04 11:01:33 +02:00
Martin Sebor	d18991debf	[SimplifyLibCalls] Fold memchr() with size 1 If the memchr() size is 1, then we can convert the call into a single-byte comparison. This works even if both the string and the character are unknown. Split off from https://reviews.llvm.org/D122836.	2022-04-04 10:41:20 +02:00
Florian Hahn	8cd1892725	[VPlan] Remember previous loop and reset vector loop. At the moment this is NFC, but will be needed once nested loops are also modeled as regions. Preparation for D123005.	2022-04-04 09:27:15 +01:00
Nikita Popov	a5c3b5748c	[MemCpyOpt] Work around PR54682 As discussed on https://github.com/llvm/llvm-project/issues/54682, MemorySSA currently has a bug when computing the clobber of calls that access loop-varying locations. I think a "proper" fix for this on the MemorySSA side might be non-trivial, but we can easily work around this in MemCpyOpt: Currently, MemCpyOpt uses a location-less getClobberingMemoryAccess() call to find a clobber on either the src or dest location, and then refines it for the src and dest clobber. This was intended as an optimization, as the location-less API is cached, while the location-affected APIs are not. However, I don't think this really makes a difference in practice, because I don't think anything will use the cached clobbers on those calls later anyway. On CTMark, this patch seems to be very mildly positive actually. So I think this is a reasonable way to avoid the problem for now, though MemorySSA should also get a fix. Differential Revision: https://reviews.llvm.org/D122911	2022-04-04 10:19:51 +02:00
Nikita Popov	c0cc98251a	[Float2Int] Make sure dependent ranges are calculated first (PR54669) The range calculation in walkForwards() assumes that the ranges of the operands have already been calculated. With the used visit order, this is not necessarily the case when there are multiple roots. (There is nothing guaranteeing that instructions are visited in topological order.) Fix this by queuing instructions for reprocessing if the operand ranges haven't been calculated yet. Fixes https://github.com/llvm/llvm-project/issues/54669. Differential Revision: https://reviews.llvm.org/D122817	2022-04-04 10:18:39 +02:00
Augie Fackler	603ae73146	AttributorAttributes: guard against TLI being nullptr I didn't dig into this very much because it appears to be totally valid (especially once these properties can come from attributes instead of only from hard-coded library functions) for TLI to not be defined, and nothing broke when I added this check, including with all my other patches applied. Differential Revision: https://reviews.llvm.org/D122917	2022-04-03 23:19:23 -04:00
Philip Reames	88de27e3fd	[LV] Handle non-integral types when considering interleave widening legality In general, anywhere we might need to insert a blind bitcast, we need to make sure the types are losslessly convertible. This fixes pr54634.	2022-04-03 20:16:20 -07:00
Philip Reames	7c51669c21	[memcpyopt] Restructure store(load src, dest) form of callslotopt for compile time The search for the clobbering call is fairly expensive if uses are not optimized at construction. Defer the clobber walk to the point in the implementation we need it; there are a bunch of bailouts before that point. (e.g. If the source pointer is not an alloca, we can't do callslotopt.) On a test case which involves a bunch of copies from argument pointers, this switches memcpyopt from > 1/2 second to < 10ms.	2022-04-03 20:16:20 -07:00
Xiang1 Zhang	f830392be7	Correct spelling error in TLS-Load-Hoist	2022-04-04 08:27:54 +08:00
Alexander Shaposhnikov	6cf10b7e6e	[InstCombine] Fold srem(X, PowerOf2) == C into (X & Mask) == C for positive C This diff extends InstCombinerImpl::foldICmpSRemConstant to handle the cases srem(X, PowerOf2) == C and srem(X, PowerOf2) != C for positive C. This addresses the issue https://github.com/llvm/llvm-project/issues/54650 Differential revision: https://reviews.llvm.org/D122942 Test plan: make check-all	2022-04-03 03:57:05 +00:00
Sanjay Patel	5f8c2b884d	[InstCombine] limit icmp fold with sub if other sub user is a phi This is a hacky fix for: https://github.com/llvm/llvm-project/issues/54558 As discussed there, codegen regressed when we opened up this transform to allow extra uses ( `61580d0949` ), and it's not clear how to undo the transforms at the later stage of compilation. As noted in the code comments, there's a set of remaining folds that are still limited to one-use, so we can try harder to refine and expand the limitations on these folds, but it's likely to be an up-and-down battle as we find and overcome similar regressions. Differential Revision: https://reviews.llvm.org/D122909	2022-04-02 19:23:42 -04:00
Sanjay Patel	97ac0cd6c4	[InstCombine] fold fcmp with lossy casted constant (2nd try) This is a retry of `9397bdc67e` - that was reverted until we had a clang warning in place to alert users about a possible mistake in source. The warning was added with `ab982eace6`. This is noted as a missing clang warning in #54222, but it is also a missing optimization opportunity. Alive2 proofs: https://alive2.llvm.org/ce/z/Q8drDq https://alive2.llvm.org/ce/z/pE6LRt I don't see a single conversion for all predicates using "getFCmpCode" logic, so other predicates are left as a TODO item.	2022-04-02 19:23:01 -04:00
Roman Lebedev	308ca349cb	[InstCombine] Fold `(X \| C2) ^ C1 --> (X & ~C2) ^ (C1^C2)` These two are equivalent, and i think the `and` form is more-ish canonical. General proof: https://alive2.llvm.org/ce/z/RrF5s6 If constant on the (outer) `xor` is an `undef`, the whole lane is dead: https://alive2.llvm.org/ce/z/mu4Sh2 However, if the constant on the (inner) `or` is an `undef`, we must sanitize it first: https://alive2.llvm.org/ce/z/MHYJL7 I guess, producing a zero `and`-mask is optimal in that case. alive-tv is happy about the entirety of `xor-of-or.ll`.	2022-04-03 00:12:56 +03:00
Florian Hahn	95b2aa511e	[VPlan] Set VPlan header block name to vector.body. This brings the VPlan block naming in line with the naming of the generated basic blocks.	2022-04-02 19:34:32 +01:00
Florian Hahn	5bedc1f093	[ConstraintElimination] Move logic to build worklist to helper (NFC). This refactor makes it easier to extend the logic to collect information from blocks in the future, without even further increasing the size of eliminateConstriants.	2022-04-02 16:55:05 +01:00
Serge Pavlov	c625b6051c	Remove duplicate code from wouldInstructionBeTriviallyDead There is a similar check few lines above in this function.	2022-04-02 16:04:39 +07:00
Florian Hahn	f8101e4d68	Recommit "[LV] Remove unneeded createHeaderBranch.(NFCI)" This reverts commit `14e3650f01`. The issue causing the revert were fixed independently in `a08c90a402` and `14e5f9785c`.	2022-04-01 16:53:39 +01:00
Florian Hahn	14e5f9785c	[LV] Add SCEV workaround from `80e8025` to epilogue vector code path. This was exposed by `14e3650f`. The recommit of `14e3650f` will hit the problematic code path requiring the workaround. test case that crashes without the workaround.	2022-04-01 15:14:47 +01:00
Florian Hahn	a08c90a402	[LV] Re-use TripCount from EPI.TripCount. During skeleton construction for the epilogue vector loop, generic helpers use getOrCreateTripCount, which will re-expand the trip count computation. Instead, re-use the TripCount created during main loop vectorization.	2022-04-01 13:47:34 +01:00
Nikita Popov	792f80e166	[CoroSplit] Use freeze instead of bitcast for dummy instructions Not all types that can appear in arguments can be bitcasts -- in particular, bitcasts do not support struct types.	2022-04-01 13:07:25 +02:00
Xiang1 Zhang	a56f264958	Refine tls-load-hoista llvm option Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D122890	2022-04-01 19:03:58 +08:00
Nikita Popov	68d27587e4	[CoroSplit] Handle argument being the frame pointer (PR54523) If the frame pointer is an argument of the original pointer (which happens with opaque pointers), then we currently first replace the argument with undef, which will prevent later replacement of the old frame pointer with the new one. Fix this by replacing arguments with some dummy instructions first, and then replacing those with undef later. This gives us a chance to replace the frame pointer before it becomes undef. Fixes https://github.com/llvm/llvm-project/issues/54523. Differential Revision: https://reviews.llvm.org/D122375	2022-04-01 12:37:29 +02:00
Alexandros Lamprineas	f364278c45	[FuncSpec][NFC] Cache code metrics for analyzed functions. This isn't expected to reduce compilation times as 'max-iters' is set to one by default, but it helps with recursive functions that require higher iteration counts. Differential Revision: https://reviews.llvm.org/D122819	2022-04-01 10:58:26 +01:00
Jorge Gorbe Moya	fc7573f29c	Revert "[misexpect] Re-implement MisExpect Diagnostics" This reverts commit `46774df307`.	2022-03-31 14:54:41 -07:00
Florian Hahn	14e3650f01	Revert "Recommit "[LV] Remove unneeded createHeaderBranch.(NFCI)"" This reverts commit `8378a71b6c`. It looks like this patch uncovered another issue, e.g. see https://lab.llvm.org/buildbot/#/builders/168/builds/5518	2022-03-31 19:00:48 +01:00
Paul Kirth	46774df307	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D115907	2022-03-31 17:38:21 +00:00
Nikita Popov	33ac23e7cf	[Float2Int] Avoid unnecessary lamdbas (NFC) Instead of first creating a lambda for calculating the range, then collecting the ranges for the operands, and then calling the lambda on those ranges, we can first calculate the operand ranges and then calculate the result directly in the switch.	2022-03-31 16:13:13 +02:00
Nikita Popov	f66975555f	[Float2Int] Extract calcRange() method (NFC) This avoids the awkward "Abort" flag, because we can simply early-return instead.	2022-03-31 16:13:13 +02:00
Florian Hahn	8378a71b6c	Recommit "[LV] Remove unneeded createHeaderBranch.(NFCI)" This reverts the revert commit `2760cdc9c6`. This version pulls in the code to create the vector loop object in VPlan from D121624. This is needed because otherwise existing LoopInfo verification will fail, as a loop block doesn't have in-loop successors now that we do not replace the branch. Now that we do not add new loops during skeleton construction, there's also no need to verify LI there.	2022-03-31 14:48:32 +01:00
Serge Pavlov	47b3b76825	Implement inlining of strictfp functions According to the current design, if a floating point operation is represented by a constrained intrinsic somewhere in a function, all floating point operations in the function must be represented by constrained intrinsics. It imposes additional requirements to inlining mechanism. If non-strictfp function is inlined into strictfp function, all ordinary FP operations must be replaced with their constrained counterparts. Inlining strictfp function into non-strictfp is not implemented as it would require replacement of all FP operations in the host function, which now is undesirable due to expected performance loss. Differential Revision: https://reviews.llvm.org/D69798	2022-03-31 19:15:52 +07:00
Alexandros Lamprineas	b4417075dc	[FuncSpec] Constant propagate multiple arguments for recursive functions. This fixes a TODO in constantArgPropagation() to make it feature complete. However, I do find myself in agreement with the review comments in https://reviews.llvm.org/D106426. I don't think we should pursue specializing such recursive functions as the code size increase becomes linear to 'max-iters'. Compiling the modified test just with -O3 (no function specialization) generates the same code. Differential Revision: https://reviews.llvm.org/D122755	2022-03-31 13:00:08 +01:00
Florian Hahn	2760cdc9c6	Revert "[LV] Remove unneeded createHeaderBranch.(NFCI)" This reverts commit `32bc83d11e`. This is causing bots with expensive-checks to fail. Revert while I investigate.	2022-03-31 12:32:50 +01:00
Florian Hahn	32bc83d11e	[LV] Remove unneeded createHeaderBranch.(NFCI) The only remaining use was to get the exit block of the loop. Instead of relying on the loop, use the successor of VectorHeaderBB (LoopMiddleBlock) directly to set VPTransformState::CFG::ExitB Depends on D121621. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121623	2022-03-31 11:48:52 +01:00
Florian Hahn	2c494f0941	[VPlan] Remove unneeded Loop variable (NFC). Suggested in D121623. The remaining uses of L can be replaced, reducing the need for the variable.	2022-03-31 10:34:28 +01:00
Marco Elver	b8e49fdcb1	[AddressSanitizer] Allow prefixing memintrinsic calls in kernel mode Allow receiving memcpy/memset/memmove instrumentation by using __asan or __hwasan prefixed versions for AddressSanitizer and HWAddressSanitizer respectively when compiling in kernel mode, by passing params -asan-kernel-mem-intrinsic-prefix or -hwasan-kernel-mem-intrinsic-prefix. By default the kernel-specialized versions of both passes drop the prefixes for calls generated by memintrinsics. This assumes that all locations that can lower the intrinsics to libcalls can safely be instrumented. This unfortunately is not the case when implicit calls to memintrinsics are inserted by the compiler in no_sanitize functions [1]. To solve the issue, normal memcpy/memset/memmove need to be uninstrumented, and instrumented code should instead use the prefixed versions. This also aligns with ASan behaviour in user space. [1] https://lore.kernel.org/lkml/Yj2yYFloadFobRPx@lakrids/ Reviewed By: glider Differential Revision: https://reviews.llvm.org/D122724	2022-03-31 11:14:42 +02:00
David Green	b65267ca7b	[LV] Invalidate widening decisions after maximizing vector bandwidth When MaximizeVectorBandwidth is enabled, we can end up (via calls to collectUniformsAndScalars/setCostBasedWideningDecision through calculateRegisterUsage) making widening decisions before we have decided whether to fold the tail by masking. These decisions will be wrong if we later decided to fold the tail, for example when the trip count is very low. It will use incorrect costs for loads that should get masked, using standard memory operation costs instead. This still at the moment uses the EmulatedMaskMemRefHack costs (a bit unfortunately), but the old costs without this change were 1, leading to too optimistic vectorization. This slightly changes the way that the MaximizeVectorBandwidth option works to make it easier to test, always honouring the option if it is set. Differential Revision: https://reviews.llvm.org/D120215	2022-03-31 09:19:31 +01:00
Aditya Kumar	368681f803	[GVNHoist] drop debug location according to the debug info guide According to the LLVM debug info update guide: https://llvm.org/docs/HowToUpdateDebugInfo.html, "Hoisting identical instructions which appear in several successor blocks into a predecessor block. In this case there is no single merged instruction. The rule for dropping locations applies". Thanks to Yuanbo Li for reporting this. Reviewed By: dblaikie Reviewers: sebpop, tejohnson, dblaikie Differential Revision: https://reviews.llvm.org/D122730	2022-03-30 20:17:53 -07:00
Stephen Long	e02f4976ac	[LoopIdiom] Merge TBAA of adjacent stores when creating memset Factor in the TBAA of adjacent stores instead of just the head store when merging stores into a memset. We were seeing GVN remove a load that had a TBAA that matched the 2nd store because GVN determined it didn't match the TBAA of the memset. The memset had the TBAA of only the first store. i.e. Loading the field pi_ of shared_count after memset to create an array of shared_ptr template<class T> class shared_ptr { T p; shared_count refcount; }; class shared_count { sp_counted_base pi_; }; Differential Revision: https://reviews.llvm.org/D122205	2022-03-30 16:54:49 -07:00
Florian Hahn	e4543af4e6	[VPlan] Track current vector loop in VPTransformState (NFC). Instead of looking up the vector loop using the header, keep track of the current vector loop in VPTransformState. This removes the requirement for the vector header block being part of the loop up front. A follow-up patch will move the code to generate the Loop object for the vector loop to VPRegionBlock. Depends on D121619. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121621	2022-03-30 22:16:40 +01:00
Chang-Sun Lin Jr	c28ce745cf	Value-number GVNHoist loads by result type as well as pointer address. Avoids merge errors when opaque pointers are loaded into different types. Reviewed by: jcranmer-intel, hiraditya Differential Revision: https://reviews.llvm.org/D122521	2022-03-30 11:33:49 -07:00
Florian Hahn	e8673f2f20	[LV] Do not create separate latch block in VPlan::execute. Now that all dependencies on creating the latch block up-front have been removed, there is no need to create it early. Depends on D121618. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121619	2022-03-30 17:31:38 +01:00
Florian Hahn	8a4077fac0	[LV] Pass LoopHeaderBB directly to updateDominatorTree. (NFC) At the call site, we already know what the vector header block is. Pass it directly.	2022-03-30 13:11:20 +01:00
Florian Hahn	ecb4171dcb	[LV] Handle zero cost loops in selectInterleaveCount. In some case, like in the added test case, we can reach selectInterleaveCount with loops that actually have a cost of 0. Unfortunately a loop cost of 0 is also used to communicate that the cost has not been computed yet. To resolve the crash, bail out if the cost remains zero after computing it. This seems like the best option, as there are multiple code paths that return a cost of 0 to force a computation in selectInterleaveCount. Computing the cost at multiple places up front there would unnecessarily complicate the logic. Fixes #54413.	2022-03-29 22:52:43 +01:00
Chris Bieneman	9130e471fe	Add DXContainer DXIL is wrapped in a container format defined by the DirectX 11 specification. Codebases differ in calling this format either DXBC or DXILContainer. Since eventually we want to add support for DXBC as a target architecture and the format is used by DXBC and DXIL, I've termed it DXContainer here. Most of the changes in this patch are just adding cases to switch statements to address warnings. Reviewed By: pete Differential Revision: https://reviews.llvm.org/D122062	2022-03-29 14:34:23 -05:00
Florian Hahn	d1d3563278	[LV] Move code to place pointer induction increment to VPlan post-processing. This patch moves the code to set the correct incoming block for the backedge value to VPlan::execute. When generating the phi node, the backedge value is temporarily added using the pre-header as incoming block. The invalid phi node will be fixed up during VPlan::execute after main VPlan code generation. At the same time, the backedge value is also moved to the latch. This change removes the requirement to create the latch block up-front for VPWidenInductionPHIRecipe::execute, which in turn will enable modeling the pre-header in VPlan. Depends on D121617. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121618	2022-03-29 20:27:59 +01:00
Hirochika Matsumoto	a3cffc1150	[InstCombine] Fold (ctpop(X) == 1) \| (X == 0) into ctpop(X) < 2 https://alive2.llvm.org/ce/z/94yRMN Fixes #54177 Differential Revision: https://reviews.llvm.org/D122077	2022-03-29 11:30:06 -04:00
Nikita Popov	682ef39b1a	[InstCombine] Remove call to getPointerElementType() This was erroneously re-introduced as part of `bb0b23174e`.	2022-03-29 16:52:29 +02:00
Florian Hahn	3dbb5eb2cd	[ConstraintElimination] Move ConstraintInfo after ConstraintTy. (NFC) Code movement to it slightly easier to use ConstraintTy & co in ConstraintInfo directly, for follow-up patches.	2022-03-29 09:59:03 +01:00
Serguei Katkov	6444a65514	[LSR] Fixup canonicalization formula and its checker. According to definition of canonical form, it is a canonical if scale reg does not contain addrec for loop L then none of bases should contain addrec for this loop. The critical word here is "contains". Current checker of canonical form checks not "containing" property but "is". So it does not check whether it contains but whether it is. Fix the checker and canonicalizing utility to follow definition. Without this fix in the test attached the base formula looking as reg((-1 * {0,+,8}<nuw><nsw><%bb2>)<nsw>) + 1reg((8 (%arg /u 8))<nuw>) is considered as conanocial while base contains an addrec. And modified formula we want to insert reg({0,+,8}<nuw><nsw><%bb2>) + 1reg((-8 (%arg /u 8))) is considered as not canonical. Reviewed By: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D122457	2022-03-29 14:05:04 +07:00
serge-sans-paille	01be9be2f2	Cleanup includes: final pass Cleanup a few extra files, this closes the work on libLLVM dependencies on my side. Impact on libLLVM preprocessed output: -35876 lines Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122576	2022-03-29 09:00:21 +02:00
Paul Kirth	90cb325abd	Revert "[misexpect] Re-implement MisExpect Diagnostics" This reverts commit `2add3fbd97`.	2022-03-29 06:20:30 +00:00
Philip Reames	33deaa13b8	[memcpyopt] Common code into performCallSlotOptzn [NFC] We have the same code repeated in both callers, sink it into callee. The motivation here isn't just code style, we can also defer the relatively expensive aliasing checks until the cheap structural preconditions have been validated. (e.g. Don't bother aliasing if src is not an alloca.) This helps compile time significantly.	2022-03-28 20:10:13 -07:00
Philip Reames	7d6e8f2a96	[slp] Delete dead scalar instructions feeding vectorized instructions If we vectorize a e.g. store, we leave around a bunch of getelementptrs for the individual scalar stores which we removed. We can go ahead and delete them as well. This is purely for test output quality and readability. It should have no effect in any sane pipeline. Differential Revision: https://reviews.llvm.org/D122493	2022-03-28 20:10:13 -07:00
Johannes Doerfert	7df2eba7fa	[Attributor][OpenMP] Add assumption for non-call assembly instructions Inline assembly is scary but we need to support it for the OpenMP GPU device runtime. The new assumption expresses the fact that it may not have call semantics, that is, it will not call another function but simply perform an operation or side-effect. This is important for reachability in the presence of inline assembly. Differential Revision: https://reviews.llvm.org/D109986	2022-03-28 20:57:52 -05:00
Johannes Doerfert	bb0b23174e	[InstCombineCalls] Optimize call of bitcast even w/ parameter attributes Before we gave up if a call through bitcast had parameter attributes. Interestingly, we allowed attributes for the return value already. We now handle both the same way, namely, we drop the ones that are incompatible with the new type and keep the rest. This cannot cause "more UB" than initially present. Differential Revision: https://reviews.llvm.org/D119967	2022-03-28 20:57:52 -05:00
Paul Kirth	2add3fbd97	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D115907	2022-03-28 23:30:04 +00:00
Alina Sbirlea	f7381a795a	Revert `29fada4a3d` Seeing a test failure with asan in Halide generated code, reverting while I investigate. Differential Revision: https://reviews.llvm.org/D121987	2022-03-28 16:17:41 -07:00
chenglin.bi	9a53793ab8	[InstCombine] Fold two select patterns into and-or select (~a \| c), a, b -> and a, (or c, b) https://alive2.llvm.org/ce/z/bnDobs select (~c & b), a, b -> and b, (or a, c) https://alive2.llvm.org/ce/z/k2jJHJ Differential Revision: https://reviews.llvm.org/D122152	2022-03-28 16:07:55 -04:00
Florian Hahn	e7bf2ea934	[LV] Move code to place induction increment to VPlan post-processing. This patch moves the code to set the correct incoming block for the backedge value to VPlan::execute. When generating the phi node, the backedge value is temporarily added using the pre-header as incoming block. The invalid phi node will be fixed up during VPlan::execute after main VPlan code generation. At the same time, the backedge value is also moved to the latch. This change removes the requirement to create the latch block up-front for VPWidenIntOrFpInductionRecipe::execute, which in turn will enable modeling the pre-header in VPlan. As an alternative, the increment could be modeled as separate recipe, but that would require more work and a bit of redundant code, as we need to create the step-vector during VPWidenIntOrFpInductionRecipe::execute anyways, to create the values for different parts. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121617	2022-03-28 16:20:02 +01:00
Nikita Popov	db561064f6	[GlobalOpt] Handle non-instruction MTI source (PR54572) This was reusing a cast to GlobalVariable to check for an Instruction, which means we'll try to dereference a null pointer if it's not actually a GlobalVariable. We should be casting MTI->getSource() instead. I don't think this problem is really specific to opaque pointers, but it certainly makes it a lot easier to reproduce. Fixes https://github.com/llvm/llvm-project/issues/54572.	2022-03-28 14:28:47 +02:00
Alexandros Lamprineas	8045bf9d0d	[FuncSpec] Support function specialization across multiple arguments. The current implementation of Function Specialization does not allow specializing more than one arguments per function call, which is a limitation I am lifting with this patch. My main challenge was to choose the most suitable ADT for storing the specializations. We need an associative container for binding all the actual arguments of a specialization to the function call. We also need a consistent iteration order across executions. Lastly we want to be able to sort the entries by Gain and reject the least profitable ones. MapVector fits the bill but not quite; erasing elements is expensive and using stable_sort messes up the indices to the underlying vector. I am therefore using the underlying vector directly after calculating the Gain. Differential Revision: https://reviews.llvm.org/D119880	2022-03-28 12:01:53 +01:00
Gulfem Savrun Yeniceri	ead8586645	[InstrProfiling] Add comments for no runtime hook This patch adds comments about `c7f91e227a`, and follows LLVM style guideline about nested if statements.	2022-03-26 00:26:43 +00:00
Philip Reames	f80aaa675f	[SLP] Simplify eraseInstruction [NFC] This simplifies the implementation of eraseInstruction by moving the odd-replace-users-with-undef handling back to the only caller which uses it. This handling was not obviously correct, so add the asserts which make it clear why this is safe to do at all. The result is simpler code and stronger assertions.	2022-03-25 12:01:52 -07:00
Florian Hahn	8c3281db49	[ConstraintElimination] Use AddOverflow for offset summation. Fixes an incorrect transformation due to values overflowing https://alive2.llvm.org/ce/z/uizoea	2022-03-25 18:08:24 +00:00
Philip Reames	48cc9287f5	Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3) The original commit exposed several missing dependencies (e.g. latent bugs in SLP scheduling). Most of these were fixed over the weekend and have had several days to bake. The last was fixed this morning after being noticed in manual review of test changes yesterday. See the review thread for links to each change. Original commit message follows: SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-03-25 10:39:23 -07:00
Philip Reames	ec858f0201	[SLP] Optimize stacksave dependence handling [NFC] After writing the commit message for 4b1bace28, realized that the mentioned optimization was rather straight forward. We already have the code for scanning a block during region initialization, we can simply keep track if we've seen a stacksave or stackrestore. If we haven't, none of these dependencies are relevant and we can avoid the relatively expensive scans entirely.	2022-03-25 10:04:10 -07:00
Philip Reames	a16308c282	[SLP] Explicit track required stacksave/alloca dependency (try 3) This is an extension of commit b7806c to handle one last case noticed in test changes for D118538. Again, this is thought to be a latent bug in the existing code, though this time I have not managed to reduce tests for the original algoritthm. The prior attempt had failed to account for this case: %a = alloca i8 stacksave stackrestore store i8 0, i8* %a If we allow '%a' to reorder into the stacksave/restore region, then the alloca will be deallocated before the use. We will have taken a well defined program, and introduced a use-after-free bug. There's also an inverse case where the alloca originally follows the stackrestore, and we need to prevent the reordering it above the restore. Compile time wise, we potentially do an extra scan of the block for each alloca seen in a bundle. This is significantly more expensive than the stacksave rooted version and is why I'd tried to avoid this in the initial patch. There is room to optimize this (by essentially caching a "has stacksave" bit per block), but I'm leaving that to future work if it actually shows up in practice. Since allocas in bundles should be rare in practice, I suspect we can defer the complexity for a long while.	2022-03-25 10:04:10 -07:00
Gulfem Savrun Yeniceri	c7f91e227a	[InstrProfiling] No runtime hook for unused funcs CoverageMappingModuleGen generates a coverage mapping record even for unused functions with internal linkage, e.g. static int foo() { return 100; } Clang frontend eliminates such functions, but InstrProfiling pass still pulls in profile runtime since there is a coverage record. Fuchsia uses runtime counter relocation, and pulling in profile runtime for unused functions causes a linker error: undefined hidden symbol: __llvm_profile_counter_bias. Since `389dc94d4b`, we do not hook profile runtime for the binaries that none of its translation units have been instrumented in Fuchsia. This patch extends that for the instrumented binaries that consist of only unused functions. Differential Revision: https://reviews.llvm.org/D122336	2022-03-25 17:03:03 +00:00
Florian Hahn	e47d220230	[LV] Use getVectorLoopRegion to retrieve header. (NFC) Update all places that currently assume the entry block to the plan is also the vector loop header to use getVectorLoopRegion instead. getVectorLoopRegion will keep doing the right thing when the pre-header is modeled explicitly (and becomes the new entry block in the plan).	2022-03-25 16:57:12 +00:00
Hongtao Yu	7a316c0a1f	[CSSPGO] Turn on profi and ext-tsp when using probe-based profile. Probe-based profile leads to a better performance when combined with profi and ext-tsp block layout. I'm turning them on by default. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122442	2022-03-25 09:09:21 -07:00
Philip Reames	d9756fa723	[slp] Factor out a lambda to avoid uplicating code a third time in upcoming patch [nfc]	2022-03-25 09:02:39 -07:00
Simon Pilgrim	6a094a6264	[InstCombine] SimplifyDemandedUseBits - remove ashr node if we only demand known sign bits We already do this for SelectionDAG, but we're missing it here. Noticed while re-triaging PR21929 Differential Revision: https://reviews.llvm.org/D122340	2022-03-25 15:39:08 +00:00
Johannes Doerfert	a81fff8afd	Reapply "[Intrinsics] Add `nocallback` to the default intrinsic attributes" This reverts commit `c5f789050d` and reapplies `7aea3ea8c3` with additional test changes.	2022-03-25 09:36:50 -05:00
Roman Lebedev	f6b60b3b79	[SimplifyCFG] `FoldBranchToCommonDest()`: allow branch-on-select This whole check is bogus, it's some kind of a profitability check. For now, simply extend it to not only allow branch-on-binary-ops, but also on poison-safe logic ops. Refs. https://github.com/llvm/llvm-project/issues/53861 Refs. https://github.com/llvm/llvm-project/issues/54553	2022-03-25 16:12:17 +03:00
Simon Pilgrim	1a943923b8	[Utils] stripDebugifyMetadata - use cast<> instead of dyn_cast_or_null<> to avoid dereference of nullptr The pointer is dereferenced immediately, so assert the cast is correct instead of returning nullptr	2022-03-25 10:25:04 +00:00
Fraser Cormack	2e44b7872b	[VectorCombine] Insert addrspacecast when crossing address space boundaries We can not bitcast pointers across different address spaces. This was previously fixed in D89577 but then in D93229 an enhancement was added which peeks further through the ponter operand, opening up the possibility that address-space violations could be introduced. Instead of bailing as the previous fix did, simply insert an addrspacecast cast instruction. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D121787	2022-03-24 19:08:08 +00:00
Johannes Doerfert	c5f789050d	Revert "[Intrinsics] Add `nocallback` to the default intrinsic attributes" This reverts commit `7aea3ea8c3` as it breaks the buildbots. I didn't see these failures in the pre-merge checks, looking into it.	2022-03-24 14:04:41 -05:00
Johannes Doerfert	7aea3ea8c3	[Intrinsics] Add `nocallback` to the default intrinsic attributes Most intrinsics, especially "default" ones, will not call back into the IR module. `nocallback` encodes this nicely. As it was not used before, this patch also makes use of `nocallback` in the Attributor which results in many more `norecurse` deductions. Tablegen part is mechanical, test updates by script. Differential Revision: https://reviews.llvm.org/D118680	2022-03-24 13:50:54 -05:00
Simon Pilgrim	597aefa89c	Fix unused variable warning by embedding inside assertion	2022-03-24 17:41:24 +00:00
Florian Hahn	46432a0088	[VPlan] Add VPWidenPointerInductionRecipe. This patch moves pointer induction handling from VPWidenPHIRecipe to its own recipe. In the process, it adds all information required to generate code for pointer inductions without relying on Legal to access the list of induction phis. Alternatively VPWidenPHIRecipe could also take an optional pointer to InductionDescriptor. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121615	2022-03-24 14:58:45 +00:00
Sanjay Patel	5dbb53b1b4	[InstCombine] merge shuffled vector negate and multiply Add the "(0 - X) --> (X * -1)" reverse identity to the list of alternate form binops. We need a little hack to make the existing logic work because it does not expect to move constants from op0 to op1, but the code comment hopefully makes that clear. I don't think there are any other identities like that. Fixes #54364 Differential Revision: https://reviews.llvm.org/D122390	2022-03-24 10:25:16 -04:00
Djordje Todorovic	9dbc687a5e	NFC: [LICM] Update some stale comments After removing the MaybePromotable, some comments became stale. This improves them. Differential Revision: https://reviews.llvm.org/D122319	2022-03-24 14:37:20 +01:00
Alexey Bataev	20973c0841	[SLP][NFC]Fix param name in comments, NFC.	2022-03-24 05:58:42 -07:00
Dávid Bolvanský	4397504c2d	[NFCI] Fix set-but-unused warning in InstCombineAddSub.cpp	2022-03-24 08:33:40 +01:00
Dávid Bolvanský	470e1d9584	[NFCI] Fix set-but-unused warning in AddressSanitizer.cpp	2022-03-24 08:13:29 +01:00
Julian Lettner	64902d335c	Reland "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121736	2022-03-23 18:36:55 -07:00
Vasileios Porpodas	39aa202aff	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit `e6ead19b77`.	2022-03-23 18:32:17 -07:00
Zequan Wu	581dc3c729	Revert "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" This reverts commit `22570bac69`.	2022-03-23 16:11:54 -07:00
Johannes Doerfert	ee94a4a3d0	[Attributor][FIX] Avoid endless recursion, simple case There is potential for endless recursion if we try to determine the underlying objects of a load, just to end up with the load as underlying object. A proper solution will require us to pass a visited set around. This will happen as we cleanup genericValueTraversal soon.	2022-03-23 15:55:32 -05:00
minglotus-6	e2074de6a8	[ProfSampleLoader] When disable-sample-loader-inlining is true, merge profiles of inlined instances to outlining versions. When --disable-sample-loader-inlining is true, skip inline transformation, but merge profiles of inlined instances to outlining versions. Differential Revision: https://reviews.llvm.org/D121862	2022-03-23 13:02:48 -07:00
chenglin.bi	52f323d0f1	[InstCombine] Fold abs of known negative operand when source is sub When abs source comes from (x - y), check if a "x > y" dominating condition exists. Fixes #54132 Differential Revision: https://reviews.llvm.org/D122013	2022-03-23 15:21:33 -04:00
Arthur Eubanks	9bd66b312c	[PassManager][Coroutine] Run passes under -O0 conditionally and run GlobalDCE CoroSplit lowers various coroutine intrinsics. It's a CGSCC pass and CGSCC passes don't run on unreachable functions. Normally GlobalDCE will come along and delete unreachable functions, but we don't run GlobalDCE under -O0, so an unreachable function with coroutine intrinsics may never have CoroSplit run on it. This patch adds GlobalDCE when coroutines intrinsics are present. It also now runs all coroutine passes conditional when coroutine intrinsics are present. This should also solve the -O0 regression reported in D105877 due to LazyCallGraph construction. Fixes https://github.com/llvm/llvm-project/issues/54117 Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D122275	2022-03-23 11:03:26 -07:00
Arthur Eubanks	e6ead19b77	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash." This reverts commit `27bd8f9492`. Causes crashes, see comments in D121973	2022-03-23 10:57:45 -07:00
Nikita Popov	29fada4a3d	[EarlyCSE] Don't eagerly optimize MemoryUses EarlyCSE currently optimizes all MemoryUses upfront. However, EarlyCSE only actually queries the clobbering memory access for a subset of uses, namely those where a CSE candidate has already been identified. Delaying use optimization to the clobber query improves compile-time in practice. This change is not NFC because EarlyCSE has a limit on the number of clobber queries (EarlyCSEMssaOptCap), in which case it falls back to the defining access. The defining access for uses will now no longer coincide with the optimized access. If there are performance regressions from this change, we should be able to address them by raising this limit. Differential Revision: https://reviews.llvm.org/D121987	2022-03-23 16:47:35 +01:00
Sanjay Patel	0fcff69bcb	[InstCombine] try to narrow shifted bswap-of-zext (2nd try) The first attempt at this missed a validity check. This version includes a test of the narrow source type for modulo-16-bits. Original commit message: This is the IR counterpart to `370ebc9d9a` which provided a bswap narrowing fix for issue #53867. Here we can be more general (although I'm not sure yet what would happen for illegal types in codegen - too rare to worry about?): https://alive2.llvm.org/ce/z/3-CPfo This will be more effective if we have moved the shift after the bswap as proposed in D122010, but it is independent of that patch. Differential Revision: https://reviews.llvm.org/D122166	2022-03-23 11:28:37 -04:00
Alexandros Lamprineas	a687f96b0f	[FuncSpec][NFC] Clang-format the source code and fix debug typo.	2022-03-23 14:39:58 +00:00
Nikita Popov	ba36556145	[InstrProfiling] Account for missing bitcast/GEP This code is supposed to clean up a constexpr bitcast/GEP, but with opaque pointers this ends up dropping references to the global.	2022-03-23 15:39:39 +01:00
serge-sans-paille	1b89c83254	Cleanup includes: Transforms/Instrumentation & Transforms/Vectorize Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122181	2022-03-23 11:06:13 +01:00
Nathan Chancellor	4e0008dcbe	Revert "[InstCombine] try to narrow shifted bswap-of-zext" This reverts commit `9e9bda2e8f`. This causes a backend error when building the Linux kernel for arm64. See https://reviews.llvm.org/D122166 for a simplified reproducer.	2022-03-22 17:32:33 -07:00
Vasileios Porpodas	27bd8f9492	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit `f7d7d2a08d`.	2022-03-22 16:41:55 -07:00
Philip Reames	7abefc4222	[instcombine] Fold away memset/memmove from otherwise unused alloca The motivation for this is that while both memcpyopt and dse will catch this case, both are limited by MSSA's walk back threshold when finding clobbers. As such, if you have a memcpy of an otherwise dead alloca placed towards the end of a long basic block with lots of other memory instructions, it would be missed. This is a bit undesirable for such an "obviously" useless bit of code. As noted in comments, we should probably generalize instcombine's escape analysis peephole (see visitAllocInst) to allow read xor write. Doing that would subsume this code in a more general way, but is also a more involved change. For the moment, I went with the easiest fix.	2022-03-22 13:48:48 -07:00
Arthur Eubanks	f7d7d2a08d	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads."" This reverts commit `79613185d3`. Causes crashes, see comments in https://reviews.llvm.org/D121973.	2022-03-22 13:33:49 -07:00
Sanjay Patel	ccf8c969c2	[InstCombine] reorder code, fix formatting; NFC The affected code can be updated to solve #54364, so make some cosmetic diffs before real changes.	2022-03-22 16:33:01 -04:00
Florian Hahn	50c8588e44	[LV] Remove Loop argument from createInductionResumeValues (NFCI). createInductionResumeValues only uses its loop argument only to get the pre-header, but the pre-header is already known (we created/cached it earlier). Remove the unneeded loop argument.	2022-03-22 14:23:12 +00:00
Sanjay Patel	60820e53ec	[InstCombine] try to canonicalize logical shift after bswap When shifting by a byte-multiple: bswap (shl X, C) --> lshr (bswap X), C bswap (lshr X, C) --> shl (bswap X), C This is an IR implementation of a transform suggested in D120648. The "swaps cancel" test models the motivating optimization from that proposal. Alive2 checks (as noted in the other review, we could use knownbits to handle shift-by-variable-amount, but that can be an enhancement patch): https://alive2.llvm.org/ce/z/pXUaRf https://alive2.llvm.org/ce/z/ZnaMLf Differential Revision: https://reviews.llvm.org/D122010	2022-03-22 09:10:55 -04:00
Djordje Todorovic	91ea247039	[Debugify] Use DebugifyLevel in Debugify original mode Before this patch the DebugifyLevel option was used for the synthetic mode, so after this, it will be used in the original mode as well. Differential Revision: https://reviews.llvm.org/D115623	2022-03-22 14:04:56 +01:00
Nikita Popov	afb526b3f4	[LICM] Handle store of pointer to itself (PR54495) Rather than iterating over users and comparing operands, iterate over uses and check operand number. Otherwise, we'll end up promoting a store twice if it has two equal operands. This can only happen with opaque pointers, as otherwise both operands differ by a level of indirection, so a bitcast would have to be involved. Fixes https://github.com/llvm/llvm-project/issues/54495.	2022-03-22 14:00:07 +01:00
Sanjay Patel	9e9bda2e8f	[InstCombine] try to narrow shifted bswap-of-zext This is the IR counterpart to `370ebc9d9a` which provided a bswap narrowing fix for issue #53867. Here we can be more general (although I'm not sure yet what would happen for illegal types in codegen - too rare to worry about?): https://alive2.llvm.org/ce/z/3-CPfo This will be more effective if we have moved the shift after the bswap as proposed in D122010, but it is independent of that patch. Differential Revision: https://reviews.llvm.org/D122166	2022-03-22 08:22:30 -04:00
Djordje Todorovic	73777b4c35	[Debugify] Optimize debugify original mode Before we start addressing the issue with having a lot of false positives when using debugify in the original mode, we have made a few patches that should speed up the execution of the testing utility Passes. For example, when testing a large project (let's say LLVM project itself), we can face a lot of potential DI issues. Usually, we use -verify-each-debuginfo-preserve (that is very similar to -debugify-each) -- it collects DI metadata before each Pass, and after the Pass it checks if the Pass preserved the DI metadata. However, we can speed up this process, since we don't need to collect DI metadata before each Pass -- we could use the DI metadata that are collected after the previous Pass from the pipeline as an input for the next Pass. This patch speeds up the utility for ~2x. Differential Revision: https://reviews.llvm.org/D115622	2022-03-22 12:14:00 +01:00
serge-sans-paille	a53b689f0c	Fix missing include under -DEXPENSIVE_CHECK Regression introduced by `f1985a3f85`	2022-03-22 10:37:56 +01:00
serge-sans-paille	f1985a3f85	Cleanup includes: Transforms/IPO Preprocessor output diff: -238205 lines Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122183	2022-03-22 10:06:28 +01:00
Chuanqi Xu	902f4708fe	[NFC] [Coroutines] Remove unnecessary check and constraints on SmallVector The CoroSplit pass would check the existence of coroutine intrinsic before starting work. It is not necessary and wasteful since it would iterate over the Module. This patch also removes the constraint on the corresponding of the SmallVector for the possible coroutines in the Modules. The original value is 4. Given coroutines is used actually in practice. 4 is really relatively a low threshold.	2022-03-22 14:24:46 +08:00
Vasileios Porpodas	79613185d3	Recommit "[SLP] Fix lookahead operand reordering for splat loads." Original review: https://reviews.llvm.org/D121354 The original commit `9136145eb0` broke the build on several targets. Differential Revision: https://reviews.llvm.org/D121973	2022-03-21 15:57:32 -07:00
Hirochika Matsumoto	86f970e595	[IROutliner][NFC] Fix typo in doc of findOrCreatePHIInBlock Typo Fix in Documentation Author: hkmatsumoto Reviewers: AndrewLitteken Differential Revision: https://reviews.llvm.org/D121627	2022-03-21 12:34:20 -05:00
Philip Reames	ee7324b898	Rename mayBeMemoryDependent to mayHaveNonDefUseDependency [nfc]	2022-03-21 10:01:40 -07:00
Andrew Litteken	4e500df89e	[IROutliner] Fix phi nodes when self referential within block but doesn't contain branch When outlining a phi node, if the the incoming branch is a block contained in the region and the branch from that block is not outlined, we create broken code. The fix is to recognize when that branch from the included incoming block is not contained, and ignore the region. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D121311	2022-03-21 11:05:15 -05:00
psamolysov-intel	2ed030ba88	[InferAddressSpaces][NFC] Small code improvements for the InferAddressSpaces pass There is a bunch of code improvements in the patch: marking as const everything what can be const and fixing some typos in comments. Also the patch removes the shadowing parameter TTI from the rewriteWithNewAddressSpaces method, the TTI parameter is not required because the same field is in the class. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D121671	2022-03-21 11:03:12 -05:00
Alexey Bataev	79a182371e	[SLP]Make stricter check for instructions that do not require scheduling. Need to check that the instructions with external operands can be reordered safely before actualy exclude them from the scheduling.	2022-03-21 06:09:12 -07:00
Sophia	72bde608d2	[LV] Fix typo in comment Reviewed by: fhahn (Florian Hahn) Differential Revision: https://reviews.llvm.org/D121781	2022-03-21 20:30:05 +08:00
Florian Hahn	0ebac76e6e	[LV] Remove unneeded Loop argument from completeLoopSkeleton. (NFCI) completeLoopSkeleton only uses its loop argument only to get the pre-header, but the pre-header is already known (we created/cached it earlier). Remove the unneeded loop argument.	2022-03-21 10:07:25 +00:00
Andrew Litteken	38e8880e93	[IROutliner] Do not outlined from functions with optnone Since the IROutliner is performing an optimization, it should not outline from functions explicitly marked with optnone. This adds an extra check and test to make sure this does not occur. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D121567	2022-03-20 23:39:23 -05:00
Florian Hahn	487629cc61	[LV] Remove dead Loop argument from emitMemRuntimeChecks. (NFC)	2022-03-20 21:01:15 +00:00
Philip Reames	b7806c8b37	[SLP] Explicit track required stacksave/alloca dependency The semantics of an inalloca alloca instruction requires that it not be reordered with a preceeding stacksave intrinsic call. Unfortunately, there's no def/use edge or memory dependence edge. (THe memory point is slightly subtle, but in general a new allocation can't alias with a call which executes strictly before it comes into existance.) I'd tried to tackle this same case previously in `689babdf6`, but the fix chosen there turned out to be incomplete. As such, this change contains a fully revert of the first fix attempt. This was noticed when investigating problems which surfaced with D118538, but this is definitely an existing bug. This time around, I managed to reduce a couple of additional cases, including one which was being actively miscompiled even without the new scheduling change. (See test diffs) Compile time wise, we only spend extra time when seeing a stacksave (rare), and even then we walk the block at most once per schedule window extension. Likely a non-issue.	2022-03-20 13:58:45 -07:00
Kazu Hirata	bce1bf0ee2	[Transform] Apply clang-tidy fixes for readability-redundant-smartptr-get (NFC)	2022-03-20 10:41:22 -07:00
Philip Reames	6253b77da9	[SLP] Respect control dependence within a block during scheduling This fixes an active miscompile visible in the test changes. The basic problem is that the scheduling dependency graph didn't have any edges for control dependence within a single basic block. The result is that we could (and in some rare cases did) perform reorderings within a block which could introduce new undefined behavior along paths which didn't previously contain any. Impact wise, we have two major cases where control is not guaranteed to reach a later instruction in the block: may throw calls, and calls containing infinite loops. * The former case was mostly covered by the memory dependencies, and to trigger require a function which can throw, but not write to memory. In theory, such a case is possible, but not likely in practice. * The later case is likely more of an issue in practice. After this code was first written, we changed the IR semantics to allow well defined infinite loops without satisifying mustprogress. Even for C/C++ - which do imply mustprogress - recent changes to how we treat atomics (e.g. an atomic read does not always imply a write) could expose this issue. I'm a bit shocked we don't seem to have a bug report which hit this in real code actually. Compile time wise, this results in a single extra scan of the scheduling window in the common case. Since we stop scanning at the next instruction which isn't guaranteed to execute, no matter what order we traverse instructions in, we scan the block once. The exception to this is that when we extend the scheduling window downwards, we invalidate all dependencies, and thus rescan. So the potentially expensive case is when we a call in a big schedule window which is frequently extended. We could optimize this case (by caching the last instruction not guaranteeed to transfer execution and scanning only the extended window) and starting there), but I decided to leave the complexity until it mattered. That same case is already degenerate with memory dependences which is more expensive than the control dependence scan. We could also consider combining the memory dependence and control dependence sets to reduce memory usage, but since it complicates the code slightly and makes debugging a bit harder, I went with the simplest scheme for now. This was noticed while trying to understand the failures reported against D118538, but is not otherwise related to that change.	2022-03-19 13:36:24 -07:00
Florian Hahn	1a820ff039	[LV] Remove unnecessary uses of Loop* (NFC). Update functions that previously took a loop pointer but only to get the pre-header. Instead, pass the block directly. This removes the requirement for the loop object to be created up-front.	2022-03-19 20:18:47 +00:00
Johannes Doerfert	4166738c38	[OpenMP][FIX] Do not crash when kernels are debug wrapper functions With debug information enabled (-g) Clang will wrap the actual target region into a new function which is called from the "kernel". The problem is that the "kernel" is now basically a wrapper without all the things we expect. More importantly, if we end up asking for an AAKernelInfo for the "target region function" we might try to turn it into SPMD mode. That used to cause an assertion as that function doesn't have an appropriately named `_exec_mode` global. While the global is going away soon we still need to make sure to properly handle this case, e.g., perform optimizations reliably. Differential Revision: https://reviews.llvm.org/D122043	2022-03-19 14:15:55 -05:00
Fangrui Song	c6692f819e	[GlobalOpt] Don't replace alias with aliasee if either alias/aliasee may be preemptible Generalize D99629 for ELF. A default visibility non-local symbol is preemptible in a -shared link. `isInterposable` is an insufficient condition. Moreover, a non-preemptible alias may be referenced in a sub constant expression which intends to lower to a PC-relative relocation. Replacing the alias with a preemptible aliasee may introduce a linker error. Respect dso_preemptable and suppress optimization to fix the abose issues. With the change, `alias = 345` will not be rewritten to use aliasee in a `-fpic` compile. ``` int aliasee; extern int alias __attribute__((alias("aliasee"), visibility("hidden"))); void foo() { alias = 345; } // intended to access the local copy ``` While here, refine the condition for the alias as well. For some binary formats like COFF, `isInterposable` is a sufficient condition. But I think canonicalization for the changed case has little advantage, so I don't bother to add the `Triple(M.getTargetTriple()).isOSBinFormatELF()` or `getPICLevel/getPIELevel` complexity. For instrumentations, it's recommended not to create aliases that refer to globals that have a weak linkage or is preemptible. However, the following is supported and the IR needs to handle such cases. ``` int aliasee __attribute__((weak)); extern int alias __attribute__((alias("aliasee"))); ``` There are other places where GlobalAlias isInterposable usage may need to be fixed. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D107249	2022-03-18 14:17:05 -07:00
Philip Reames	1093949cff	[SLP] Add comment clarifying assumption that tripped me up [NFC] I keep thinking this assumption is probably exploitable for a bug in the existing implementation, but all of my attempts at writing a test case have failed. So for the moment, just document this very subtle assumption.	2022-03-18 11:40:19 -07:00
Kazu Hirata	3e0f7c7881	[Vectorize] Fix an 'unused function' warning This patch fixes: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3917:13: error: unused function 'needToScheduleSingleInstruction' [-Werror,-Wunused-function]	2022-03-18 11:24:57 -07:00
Kazu Hirata	b3d8c0d069	[Vectorize] Fix an 'unused variable' warning This patch fixes: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8148:18: error: unused variable 'SDTE' [-Werror,-Wunused-variable]	2022-03-18 11:24:54 -07:00
Nick Desaulniers	e1bae23f6f	[SCCP] do not clean up dead blocks that have their address taken [SCCP] do not clean up dead blocks that have their address taken Fixes a crash observed in IPSCCP. Because the SCCPSolver has already internalized BlockAddresses as Constants or ConstantExprs, we don't want to try to update their Values in the ValueLatticeElement. Instead, continue to propagate these BlockAddress Constants, continue converting BasicBlocks to unreachable, but don't delete the "dead" BasicBlocks which happen to have their address taken. Leave replacing the BlockAddresses to another pass. Fixes: https://github.com/llvm/llvm-project/issues/54238 Fixes: https://github.com/llvm/llvm-project/issues/54251 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121744	2022-03-18 11:02:15 -07:00
Philip Reames	8f108c32bc	Revert "[SLP] Optionally preserve MemorySSA" This reverts commit `1cfa986d68`. See https://github.com/llvm/llvm-project/issues/54256 for why I'm discontinuing the project. Seperately, it turns out that while this patch does correctly preserve MSSA, it's correct only at the end of the pass; not between vectorization attempts. Even if we decide to resurrect this, we'll need to fix that before reapplying.	2022-03-18 10:45:59 -07:00
Florian Mayer	078b546555	[HWASan] do not replace lifetime intrinsics with tagged address. Quote from the LLVM Language Reference If ptr is a stack-allocated object and it points to the first byte of the object, the object is initially marked as dead. ptr is conservatively considered as a non-stack-allocated object if the stack coloring algorithm that is used in the optimization pipeline cannot conclude that ptr is a stack-allocated object. By replacing the alloca pointer with the tagged address before this change, we confused the stack coloring algorithm. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D121835	2022-03-18 10:39:51 -07:00
Florian Mayer	dbc918b649	Revert "[HWASan] do not replace lifetime intrinsics with tagged address." Failed on buildbot: /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/llc: error: : error: unable to get target for 'aarch64-unknown-linux-android29', see --version and --triple. FileCheck error: '<stdin>' is empty. FileCheck command line: /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/FileCheck /home/buildbot/buildbot-root/llvm-project/llvm/test/Instrumentation/HWAddressSanitizer/stack-coloring.ll --check-prefix=COLOR This reverts commit `208b923e74`.	2022-03-18 10:04:48 -07:00
Florian Hahn	5ab421fb4e	[LICM] Add allowspeculation pass options. This adds a new option to control AllowSpeculation added in D119965 when using `-passes=...`. This allows reproducing #54023 using opt. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D121944	2022-03-18 16:51:57 +00:00
Florian Mayer	208b923e74	[HWASan] do not replace lifetime intrinsics with tagged address. Quote from the LLVM Language Reference If ptr is a stack-allocated object and it points to the first byte of the object, the object is initially marked as dead. ptr is conservatively considered as a non-stack-allocated object if the stack coloring algorithm that is used in the optimization pipeline cannot conclude that ptr is a stack-allocated object. By replacing the alloca pointer with the tagged address before this change, we confused the stack coloring algorithm. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D121835	2022-03-18 09:45:05 -07:00
Nikita Popov	ab2284a643	[LowerConstantIntrinsics] Make TLI a required dependency The way the pass is actually used in the optimization pipeline, TLI will be available, but this is not the case when running just -lower-constant-intrinsics in tests, which ends up being quite confusing. Require TLI unconditionally, as we usually do.	2022-03-18 14:59:18 +01:00
Nikita Popov	fc8946fae7	[InstCombine] Remove integer SPF of SPF folds (NFCI) Now that we canonicalize to intrinsics, these folds should no longer be needed. Only one fold that also applies to floating-point min/max is retained.	2022-03-18 10:20:48 +01:00
Nikita Popov	f96428e16d	[MemorySSA] Don't optimize uses during construction This changes MemorySSA to be constructed in unoptimized form. MemorySSA::ensureOptimizedUses() can be called to optimize all uses (once). This should be done by passes where having optimized uses is beneficial, either because we're going to query all uses anyway, or because we're doing def-use walks. This should help reduce the compile-time impact of MemorySSA for some use cases (the reason why I started looking into this is D117926), which can avoid optimizing all uses upfront, and instead only optimize those that are actually queried. Actually, we have an existing use-case for this, which is EarlyCSE. Disabling eager use optimization there gives a significant compile-time improvement, because EarlyCSE will generally only query clobbers for a subset of all uses (this change is not included in this patch). Differential Revision: https://reviews.llvm.org/D121381	2022-03-18 09:56:16 +01:00
Florian Hahn	4a699ae9c6	[LoopSimplifyCFG] Check predecessors of exits before marking them dead. LoopSimplifyCFG may process loops that are not in loop-simplify/canonical form. For loops not in canonical form, exit blocks may be reachable from non-loop blocks and we cannot consider them as dead if they only are not reachable from the loop itself. Unfortunately the smallest test I could come up with requires running multiple passes: -passes='loop-mssa(loop-instsimplify,loop-simplifycfg,simple-loop-unswitch)' The reason is that loops are canonicalized at the beginning of loop pipelines, so a later transform has to break canonical form in a way that breaks LoopSimplifyCFG's dead-exit analysis. Alternatively we could try to require all loop passes to maintain canonical form. That in turn would also require additional verification. Fixes #54023, #49931. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121925	2022-03-18 08:54:44 +00:00
Andrew Wei	0af3e6a22d	[InstCombine] Sink instructions with multiple users in a successor block. This patch tries to sink instructions when they are only used in a successor block. This is a further enhancement patch based on Anna's commit: D109700, which allows sinking an instruction having multiple uses in a single user. In this patch, sink instructions with multiple users in a single successor block will be supported. It could fix a known issue from rust: https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610 Reviewed By: nikic, reames Differential Revision: https://reviews.llvm.org/D121585	2022-03-18 11:53:45 +08:00
Vasileios Porpodas	9136145eb0	Revert "[SLP] Fix lookahead operand reordering for splat loads." due to build failures This reverts commit `5efa78985b`.	2022-03-17 18:22:04 -07:00
Vasileios Porpodas	5efa78985b	[SLP] Fix lookahead operand reordering for splat loads. Splat loads are inexpensive in X86. For a 2-lane vector we need just one instruction: `movddup (%reg), xmm0`. Using the standard Splat score leads to worse code. This patch adds a new score dedicated for splat loads. Please note that a splat is usually three IR instructions: - It is usually a load and 2 inserts: %ld = load double, double* %gep %ins1 = insertelement <2 x double> poison, double %ld, i32 0 %ins2 = insertelement <2 x double> %ins1, double %ld, i32 1 - But it can also be a load, an insert and a shuffle: %ld = load double, double* %gep %ins = insertelement <2 x double> poison, double %ld, i32 0 %shf = shufflevector <2 x double> %ins, <2 x double> poison, <2 x i32> zeroinitializer Because of this some of the lit tests contain more IR instructions. Differential Revision: https://reviews.llvm.org/D121354	2022-03-17 18:05:54 -07:00
Paul Kirth	964398ccb1	Revert "Revert "Revert "[misexpect] Re-implement MisExpect Diagnostics""" This reverts commit `6cf560d69a`.	2022-03-18 00:21:33 +00:00
Paul Kirth	6cf560d69a	Revert "Revert "[misexpect] Re-implement MisExpect Diagnostics"" I mistakenly reverted my commit, so I'm relanding it. This reverts commit `10866a1df4`.	2022-03-18 00:04:22 +00:00
Paul Kirth	10866a1df4	Revert "[misexpect] Re-implement MisExpect Diagnostics" This reverts commit `e7749d4713`.	2022-03-17 23:54:26 +00:00
Paul Kirth	e7749d4713	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Differential Revision: https://reviews.llvm.org/D115907	2022-03-17 23:46:23 +00:00
Johannes Doerfert	4308fdf83b	[Attributor] Remove more non-deterministic behavior and debug output	2022-03-17 17:42:32 -05:00
Johannes Doerfert	59a6b668ab	[OpenMP][FIX] Initialize member to avoid undefined value in debug output	2022-03-17 17:42:32 -05:00
Johannes Doerfert	88ea86c369	[Attributor][FIX] Remove reference into map that might dangle The reference was taken and the map was modified after. This can (and did) lead to dangling pointers and all sorts of problems afterwards.	2022-03-17 17:42:32 -05:00
Ellis Hoag	f6b5142ac2	[AlwaysInliner] Emit inline remark only when successful Failures in `InlineFunction()` are caught after D121722, but `emitInlinedIntoBasedOnCost()` should only be called when inlining is successful. This also removes an unnecessary call to `shouldInline()` which always returned `InlineCost::getAlways()`. Reviewed By: kyulee, nikic Differential Revision: https://reviews.llvm.org/D121946	2022-03-17 15:40:24 -07:00
Kyungwoo Lee	ddb85f34f5	[ObjCARC] Fix non-determinism We often failed in the assertion, non-deterministically with a large IR: ``` Assertion `notDifferentParent(LocA.Ptr, LocB.Ptr) && "BasicAliasAnalysis doesn't support interprocedural queries." ``` Looking at the comment in https://reviews.llvm.org/D87806, it appears it's actually a module pass for new PM while the legacy PM still works as a function pass. The fix is to align the same behavior in between new PM and old PM, which initializes ObjCARCContract for each function. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D121949	2022-03-17 15:01:09 -07:00
Andrew Litteken	f7d90ad57b	[IROutliner] Make sure that loop debug info is stripped. As pointed out in https://github.com/llvm/llvm-project/issues/54155#issuecomment-1057465479, there was a crash when loop info was being outlined. It was not being properly stripped and adjusted, so would point to the wrong location. This uses similar logic found in the CodeExtractor to adjust the loop debug info. Reviewer: fhahn, paquette Differential Revision: https://reviews.llvm.org/D120869	2022-03-17 14:41:53 -06:00
Alexey Bataev	d65cc85977	[SLP]Do not schedule instructions with constants/argument/phi operands and external users. No need to schedule entry nodes where all instructions are not memory read/write instructions and their operands are either constants, or arguments, or phis, or instructions from others blocks, or their users are phis or from the other blocks. The resulting vector instructions can be placed at the beginning of the basic block without scheduling (if operands does not need to be scheduled) or at the end of the block (if users are outside of the block). It may save some compile time and scheduling resources. Differential Revision: https://reviews.llvm.org/D121121	2022-03-17 11:03:45 -07:00
Julian Lettner	22570bac69	Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121736	2022-03-17 10:47:13 -07:00
Ellis Hoag	84c6689b15	[AlwaysInliner] Check inliner errors even without assserts When we build clang without asserts we should still check the result of `InlineFunction()` to be sure there wasn't an error. Otherwise we could incorrectly merge attributes in the next line. This also removes a redundent call to `getCaller()`. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121722	2022-03-17 10:16:23 -07:00
Fraser Cormack	fe74183564	[Coroutines][NFC] Format line to 80 cols	2022-03-17 15:34:24 +00:00
Marco Elver	cbe1e67ead	[Instruction] Introduce getAtomicSyncScopeID() An analysis may just be interested in checking if an instruction is atomic but system scoped or single-thread scoped, like ThreadSanitizer's isAtomic(). Unfortunately Instruction::isAtomic() can only answer the "atomic" part of the question, but to also check scope becomes rather verbose. To simplify and reduce redundancy, introduce a common helper getAtomicSyncScopeID() which returns the scope of an atomic operation. Start using it in ThreadSanitizer. NFCI. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D121910	2022-03-17 14:59:37 +01:00
Florian Hahn	151c144350	[LV] Use usesScalars in widenPHIInstruction. This uses the existing VPlan helpers to check whether there are scalar uses of a phi recipe. It remove one of the few remaining dependencies on the cost model from VPlan code generation. Depends on D121612. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121613	2022-03-17 13:16:32 +00:00
Florian Hahn	a6e70e4056	[VPlan] VPInterleaveRecipe only requires the first lane of the address. VPInterleaveRecipe only uses the first lane of the address. Add onlyFirstLaneUsed implementation. This is needed for a follow-up patch. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121612	2022-03-17 11:56:43 +00:00
Nikita Popov	1dbeb64493	[SLP] Avoid unnecessary getIncomingValueForBlock() call (NFC) This code just wants to check all incoming values, we don't care care what the incoming block is here.	2022-03-17 12:23:46 +01:00
Nikita Popov	4010a7a5d0	Reapply [InstCombine] Support switch in phi to cond fold Reapply with an explicit check for multi-edges, as the expected behavior of multi-edge dominance is unclear (D120811). ----- For conditional branches, we know the value is i1 0 or i1 1 along the outgoing edges. For switches we can apply exactly the same optimization, just with the known values determined by the switch cases.	2022-03-17 10:03:09 +01:00
Alexey Bataev	150ea76543	Revert "[SLP]Do not schedule instructions with constants/argument/phi operands and external users." This reverts commit `1eeb2bfe72` to fix a bug reported in https://reviews.llvm.org/D121121	2022-03-16 13:54:59 -07:00
Florian Hahn	470a975c84	[ConstraintElimination] Add missing dominance check. When dealing with an unconditional branch, the condition can only added if BB properly dominates the successor.	2022-03-16 20:01:24 +00:00
Malhar Jajoo	a36d269658	[VPlan] Avoid collecting scalars for SVE This patch ensures scalars (except for uniforms) are no longer collected (prior to LVP planning phase) for scalable vectorization. This is to avoid the chances of generating scalarized instructions later (during LVP execute phase) as they are not supported for scalable vectorization. Relevant test has also been added. Differential Revision: https://reviews.llvm.org/D121452	2022-03-16 16:33:34 +00:00
Nikita Popov	d7cf7ec05d	[SROA] Handle over-large loads during presplitting When a load extends past the extent of the alloca, SROA will restrict the slice size to extend to the end of the alloca only. However, presplitting was asserting that the load size and the slice size match exactly, which does not hold in this case. Relax the assertion to only require that the load size is greater or equal than the slice size.	2022-03-16 15:41:11 +01:00
Florian Hahn	f473d4aa80	[ConstraintElimination] Support BBs with single successor in CanAdd. If BB has a single successor, conditions can be added safely.	2022-03-16 14:13:52 +00:00
Alexey Bataev	1eeb2bfe72	[SLP]Do not schedule instructions with constants/argument/phi operands and external users. No need to schedule entry nodes where all instructions are not memory read/write instructions and their operands are either constants, or arguments, or phis, or instructions from others blocks, or their users are phis or from the other blocks. The resulting vector instructions can be placed at the beginning of the basic block without scheduling (if operands does not need to be scheduled) or at the end of the block (if users are outside of the block). It may save some compile time and scheduling resources. Differential Revision: https://reviews.llvm.org/D121121	2022-03-16 06:05:43 -07:00
Florian Hahn	e5822ded56	[FunctionAttrs] Infer argmemonly . This patch adds initial argmemonly inference, by checking the underlying objects of locations returned by MemoryLocation. I think this should cover most cases, except function calls to other argmemonly functions. I'm not sure if there's a reason why we don't infer those yet. Additional argmemonly can improve codegen in some cases. It also makes it easier to come up with a C reproducer for `7662d1687b` (already fixed, but I'm trying to see if C/C++ fuzzing could help to uncover similar issues.) Compile-time impact: NewPM-O3: +0.01% NewPM-ReleaseThinLTO: +0.03% NewPM-ReleaseLTO+g: +0.05% https://llvm-compile-time-tracker.com/compare.php?from=067c035012fc061ad6378458774ac2df117283c6&to=fe209d4aab5b593bd62d18c0876732ddcca1614d&stat=instructions Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121415	2022-03-16 10:24:33 +00:00
Nikita Popov	20531b3a6b	[RelLookupTableConverter] Avoid querying TTI for declarations This code queries TTI on a single function, which is considered to be representative. This is a bit odd, but probably fine in practice. However, I think we should at least avoid querying declarations, which e.g. will generally lack target attributes, and for which we don't seem to ever query TTI in other places.	2022-03-16 10:39:28 +01:00
Philip Reames	1cfa986d68	[SLP] Optionally preserve MemorySSA This initial patch adds code to preserve MemorySSA through a run of SLP vectorizer. The eventual plan is to use MemorySSA to accelerate SLP's memory dependence checking, but we're a ways from that. In particular, this patch is correct, but really slow. It's being landed so that we can work incrementally in tree, not because it's expected to be useful to anyone just yet. The broader effort is being tracked in https://github.com/llvm/llvm-project/issues/54256. Its worth noting expicitly that this may not work out, and if not, we will be reverting all of the MSSA support in SLP at some point in the next few weeks. Differential Revision: https://reviews.llvm.org/D117926	2022-03-15 16:36:15 -07:00
Florian Hahn	014f5bcf7a	[FunctionAttrs] Replace MemoryAccessKind with FMRB. Update FunctionAttrs to use FunctionModRefBehavior instead MemoryAccessKind. This allows for adding support for inferring argmemonly and others, see D121415. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121460	2022-03-15 19:35:54 +00:00
Sanjay Patel	598721f866	[InstCombine] try harder to propagate 'nsz' through fneg-of-select This can be viewed as swapping the select arms: https://alive2.llvm.org/ce/z/jUvFMJ ...so we don't have the 'nsz' problem with the more general fold. This unlocks other folds for the motivating fabs example. This was discussed in issue #38828.	2022-03-15 11:05:29 -04:00
Simon Pilgrim	7e4cf582cf	[InstCombine] Add general constant support to eq/ne icmp(add(X,C1),add(Y,C2)) -> icmp(add(X,C1-C2),Y) fold A further extension for Issue #32161 For eq/ne comparisons - the sign mismatch and bounds constraints are redundant, so if the that fold fails, fallback and just fold the constants directly. https://alive2.llvm.org/ce/z/cdodNQ The loop rotation test change looks mostly benign - the backend doesn't seem to suffer? https://gcc.godbolt.org/z/dErMY78To Differential Revision: https://reviews.llvm.org/D121551	2022-03-15 14:17:38 +00:00
Simon Pilgrim	7262eacd41	Revert rG9c542a5a4e1ba36c24e48185712779df52b7f7a6 "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" Mane of the build bots are complaining: Unknown command line argument '-lower-global-dtors'	2022-03-15 13:01:35 +00:00
Nikita Popov	875782bd9e	[OpenMPOpt] Avoid pointer element type access during region merging Hardcode the function type as ParallelTask, which is the guaranteed pointee type of this runtime function argument (if pointee types exist). The elimination of the callee bitcast is left for InstCombine. Differential Revision: https://reviews.llvm.org/D120885	2022-03-15 09:52:46 +01:00
Florian Hahn	ca1b2fc9fb	[LV] Remove LoopVectorBody from InnerLoopVectorizer. (NFCI) Update places still referencing LoopVectorBody to use the vector loop to get the vector loop header. This is needed to move vector loop code-generation to VPlan completely, which in turn is needed to model pre-header & exit blocks in VPlan as well.	2022-03-15 08:22:31 +00:00
Julian Lettner	9c542a5a4e	Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121327	2022-03-14 17:51:18 -07:00
Andrew Browne	dbf8c00b09	[DFSan] Remove trampolines to unblock opaque pointers. (Reland with fix) https://github.com/llvm/llvm-project/issues/54172 Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D121250	2022-03-14 16:03:25 -07:00
Andrew Litteken	228cc2c38b	[IROutliner] Ensure merged PHINodes respect order and incoming blocks, not just incoming values When matching PHINodes when margining functions the IROutliner only checks that an incoming value exists in phi node in overall function. It doesn't check the length, the order, or that the incoming block also matches. In the given example, we see that both phi nodes have the same incoming values, but from different blocks. The fix is to to enforce stricter a match of the incoming value, and the incoming block as well when matching the created phi nodes. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D121310	2022-03-14 16:48:21 -05:00
Craig Topper	ce78e68261	[InstCombine] Fold select based logic of fcmps with same operands when FMF is present. If we have a logical and/or in select form and the true/false operand is an fcmp with poison generating FMF, we won't be able to fold it to an and/or instruction. This prevents us from optimizing the case where it is a logical operation of two fcmps with identical operands. This patch adds explicit checks for this case that doesn't rely on converting to and/or to do the optimization. It reuses the existing foldLogicOfFCmps, but adds a new flag to disable the other combine that is inside that function. FMF flags from the two FCmps are intersected using the logic added in D121243. The FIXME has been updated to indicate that we can only use a union for the non-select form. This allows us to optimize cases like this from compare-fp-3.c in the gcc torture suite with fast math. void test1 (float x, float y) { if ((x==y) && (x!=y)) link_error0(); } Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D121323	2022-03-14 14:45:07 -07:00
Nick Desaulniers	236695e70c	[IRLinker] make IRLinker::AddLazyFor optional (llvm::unique_function). NFC 2 of the 3 callsite of IRMover::move() pass empty lambda functions. Just make this parameter llvm::unique_function. Came about via discussion in D120781. Probably worth making this change regardless of the resolution of D120781. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D121630	2022-03-14 14:37:34 -07:00
Andrew Browne	edc33fa569	Revert "[DFSan] Remove trampolines to unblock opaque pointers." This reverts commit `84af90336f`.	2022-03-14 13:47:41 -07:00
Andrew Browne	84af90336f	[DFSan] Remove trampolines to unblock opaque pointers. https://github.com/llvm/llvm-project/issues/54172 Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D121250	2022-03-14 13:39:49 -07:00
Andrew Litteken	c79ab1065e	[IROutliner] Separate split PHI nodes from multiple exits by different outlinable regions. The IR Outliner is supposed to extract the outputs contained in an external phi node and place them into a phi node contained within the outlined function. However, when the output values of two outlined functions with two different output sets are contained within the same phi node, they are counted as the same exit path when first analyzed. In reality, these create two different phi nodes, creating an inconsistency, resulting in a mismatch in the expected number of output paths and a crash. This fixes that counting when analyzing the outputs by also analyzing the incoming blocks rather than just the incoming values. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D121313	2022-03-14 14:56:59 -05:00
Florian Hahn	4a0481e981	[LV] Check for users of truncated IVs, add more detailed comment. Add missing outside user check for truncated IVs. Also hoist the code in the helper with additional explanations. Fixes #54370.	2022-03-14 19:39:30 +00:00
Teresa Johnson	fee0bde4c6	[WPD] Extend checking mode to support fallback to indirect call Extend -wholeprogramdevirt-check to support both the existing trapping mode on an incorrect devirtualization, as well as a new mode to fallback to an indirect call on a mismatch. The new mode is The new mode is useful in cases where we want to enable devirtualization but cannot fully guarantee whole program visibility (e.g in the case where LTO has been disabled for a small set of objects that could potentially override virtual methods without having a symbol reference to anything in the base class including the vtable). Remove !prof and !callees metadata (which are used by indirect call promotion) from both the new direct call and the fallback indirect call (so that we don't perform another round of promotion on the latter). Also remove it from the direct call in the non-fallback cases, which was an oversight, although it didn't seem to cause any issues. Add tests for the metadata removal covering the various cases. Differential Revision: https://reviews.llvm.org/D121419	2022-03-14 10:16:28 -07:00
Andrew Litteken	3c90812f3b	[IROutliner] Avoid reusing PHINodes that have already been matched when merging outlined functions' phi node blocks When there are two external phi nodes for two different outlined regions, when compressing the created phi nodes between the two regions, the matching for the second phi node in the second region matches the first phi node created for the first region rather than the second phi node created for the first region. This adds an extra output path where there should not be one. The fix is the ignore phi nodes that have already been matched for each region. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D121312	2022-03-14 12:00:01 -05:00
Nikita Popov	8361c5da30	[SLPVectorizer] Handle external load/store pointer uses with opaque pointers In this case we may not generate a bitcast, so the new load/store becomes the external user.	2022-03-14 16:55:09 +01:00
Florian Hahn	d621ae30e2	[LV] Remove dead Loop argument from emitMinimumVector... (NFC) The argument is not used, remove it.	2022-03-14 15:47:40 +00:00
Florian Hahn	3ee2d908a9	[LV] Remove dead Loop argument from emitSCEVChecks. (NFC) The argument is not used, remove it.	2022-03-14 13:00:03 +00:00
Nikita Popov	ce6ca00a92	[CoroSplit] Avoid self-replacement With opaque pointers, the bitcast might be a no-op, and this can end up trying to replace a value with itself, which is illegal.	2022-03-14 13:53:31 +01:00
Florian Hahn	8896c36624	[LV] Do not set insert point in completeLoopSkeleton. (NFCI) The insertion point for the builder used during VPlan code generation is set during code generation. Setting the insert point here is dead code and can be removed.	2022-03-14 12:21:26 +00:00
Nikita Popov	3ec44c22b1	[DeadArgElim] Guard against function type mismatch If the call function type and function type don't match, we should consider the function live (there is effectively a bitcast sitting in between).	2022-03-14 13:03:04 +01:00
Nikita Popov	cf18ec445d	[GVN] Check load type in select PRE This is no longer implicitly guaranteed with opaque pointers.	2022-03-14 12:46:54 +01:00
Benoit Jacob	9879c555f2	Expose ScalarizerPass options to C++ (not just commandline) Context: I needed this for https://github.com/google/iree/pull/8474 . I found that TSan instrumentation expects vector sizes to be <= 16, and in my project (IREE) we have tests with higher vector sizes. That left some test functions uninstrumented, resulting in crashes as instrumented code called into them. Differential Revision: https://reviews.llvm.org/D121182	2022-03-14 12:00:35 +01:00
Florian Hahn	1c0fc1f074	[VPlan] Ensure each iv user is only visited once in transform. If a recipe has multiple uses of an IV, we crash. It causes a crash when building llvm-test-suite. Exposed by `95f76bff1c`.	2022-03-13 21:42:17 +00:00
Florian Hahn	95f76bff1c	[LV] Create & use VPScalarIVSteps for all scalar users. This patch is a follow-up to D115953. It updates optimizeInductions to also introduce new VPScalarIVStepsRecipes if an IV has both vector and scalar uses. It updates all uses that only need scalar values to use the newly created recipe for the scalar steps. This completes untangling of VPWidenIntOrFpInductionRecipe code-generation. Now the recipe only creates the widened vector values, as it says on the tin. The code to genereate IR has been moved directly to VPWidenIntOrFpInductionRecipe::execute. Note that the recipe has been updated to hold a reference to ScalarEvolution, which is needed to expand the step, until we can place the corresponding SCEV expansion in the pre-header. Depends on D120827. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D120828	2022-03-13 17:15:24 +00:00
serge-sans-paille	ed98c1b376	Cleanup includes: DebugInfo & CodeGen Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121332	2022-03-12 17:26:40 +01:00
Johannes Doerfert	85daf6973d	[Attributor] Remove capture tracker usage and follow uses explicitly Before we used the capture tracker to follow pointer uses, now we do it explicitly ourselves through the Attributor API. There are multiple benefits: For one, the boilerplate is cut down by a lot. The class, potential copies vector, etc. is all not needed anymore. We also do avoid explicitly looking through memory here, something that was duplicated and should only live in the `checkForAllUses~ helper. More importantly, as we do simplifications we need to make sure all parties are in sync when they reason about uses. The old way did not allow us to do this but the new one does as every use visiting AA goes through `checkForAllUses` now..	2022-03-11 22:56:16 -06:00
Johannes Doerfert	f44f60a297	[Attributor] Avoid replacing return operands twice As replacements will become more complex it is better to have a single AA responsible for replacing a use. Before this patch AAValueSimplify* and AAValueSimplifyReturned could both try to replace the returned value. The latter was marginally better for the old pass manager when a function was already carrying a `returned` attribute and when the context of the return instruction was important. The second shortcoming was resolved by looking for return attributes in the AAValueSimplifyCallSiteReturned initialization. The old PM impact is not concerning. This is yet another step towards the removal of AAReturnedValues, the very first AA we should now try to eliminate due to the overlapping logic with value simplification.	2022-03-11 21:55:19 -06:00
Johannes Doerfert	55a970fbd4	[Attributor][FIX] Make sure to not ignore non-load users of stores When we look through memory for a store we used to allow any other use of the memory that is reachable. This is generally OK but we need to make sure to actually let the user look at these properly. For now, we simply require loads (via exact reloads).	2022-03-11 18:41:13 -06:00
Johannes Doerfert	f3ad8cf00e	[Attributor] Cleanup manifest and liveness for CGSCC passes There was some ad-hoc handling of liveness and manifest to avoid breaking CGSCC guarantees. Things always slipped through though. This cleanup will: 1) Prevent us from manifesting any "information" outside the CGSCC. This might be too conservative but we need to opt-in to annotation not try to avoid some problematic ones. 2) Avoid running any liveness analysis outside the CGSCC. We did have some AAIsDeadFunction handling to this end but we need this for all AAIsDead classes. The reason is that AAIsDead information is only correct if we actually manifest it, since we don't (see point 1) we cannot actually derive/use it at all. We are currently trying to avoid running any AA updates outside the CGSCC but that seems to impact things quite a bit. 3) Assert, don't check, that our modifications (during cleanup) modifies only CGSCC functions.	2022-03-11 16:46:02 -06:00
Benjamin Kramer	dbc32e2aa7	[LoopUnswitch] Use SmallPtrSet instead of std::set. NFCI.	2022-03-11 19:14:34 +01:00
Florian Hahn	d3e1094473	[VPlan] Implement VPCanonicalIVPHIRecipe::onlyFirstLaneUsed. The recipe only uses the first lane of its operands. Suggested & split off D120827.	2022-03-11 18:07:26 +00:00
Johannes Doerfert	9ddb1a49ac	[Attributor][FIX] Avoid double free (and useless state copy) In an attempt to remove the memory leak we introduced a double free. The problem was that we allowed a plain copy of the state and it was actually used. The use was useless, so it is gone now. The copy constructor is gone as well. The move constructor ensures the Accesses pointers are owned by a single state, I hope. Reported by: https://lab.llvm.org/buildbot/#/builders/16/builds/25820	2022-03-11 10:10:36 -06:00
Johannes Doerfert	3570b0c5c7	[Attributor][FIX] Remove memory leak The leak was introduced when we made things deterministic. It was reported by the sanitizer buildbot: https://lab.llvm.org/buildbot/#/builders/168	2022-03-11 09:52:44 -06:00
Florian Hahn	ecea477df3	[VPlan] Helper to check if a recipe uses scalar values of op. This patch adds a helper to check if a recipe only uses scalars of a given operand. This is similar to onlyFirstLaneUsed, which was introduced earlier. By default, usesScalars falls back on onlyFirstLaneUsed. Will be used by D120828. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D120827	2022-03-11 13:41:08 +00:00
Florian Hahn	e07b899192	[FunctionAttrs] Rename addReadAttrs -> addMemoryAttrs. The addReadAttrs name is out of date, as the function also adds the writeonly attribute. addMemoryAttrs is more accurate.	2022-03-11 11:49:22 +00:00
Johannes Doerfert	e8fadafe77	[Attributor][NFCI] Make AAPointerInfo deterministic The order in which we kept accesses was non-deterministic and a debug output was a pointer value. Fixed both.	2022-03-10 23:27:47 -06:00
Johannes Doerfert	7211dbd01d	[Attributor][NFCI] Remove non-deterministic behavior and debug output	2022-03-10 23:27:47 -06:00
Sanjay Patel	3491f2f4b0	[InstCombine] replace negated operand in fcmp with 0.0 X (any pred) -X --> X (any pred) 0.0 This works with all FP values and preserves FMF. Alive2 examples: https://alive2.llvm.org/ce/z/dj6jhp This can also create one of the patterns that we match as "fabs" as shown in one of the test diffs.	2022-03-10 12:53:32 -05:00
Sanjay Patel	9fac110bf7	Revert "[InstCombine] fold fcmp with lossy casted constant" This reverts commit `9397bdc67e`. This optimization is likely to surprise programmers as seen in post-commit comments, so we should add a clang warning first (that is proposed in D121306).	2022-03-10 10:22:22 -05:00
Nikita Popov	067c035012	[GlobalOpt] Handle undef global_ctors gracefully If there are no ctors, then this can have an arbirary zero-sized value. The current code checks for null, but it could also be undef or poison. Replacing the specific null check with a check for non-ConstantArray.	2022-03-10 16:02:12 +01:00
Simon Pilgrim	808d9d260b	[InstCombine] Add vector support to icmp(add(X,C1),add(Y,C2)) -> icmp(add(X,C1-C2),Y) fold As discussed on Issue #32161 this fold can be generalized a lot more than it currently is, but this patch at least adds vector support. Differential Revision: https://reviews.llvm.org/D121358	2022-03-10 13:30:48 +00:00
Nikita Popov	479d684ba5	[Coroutines] Support opaque pointers in solveTypeName() As far as I can tell, these names are only intended to be informative, so just use a generic "PointerType" for opaque pointers. The code in solveDIType() also treats pointers as basic types (and does not try to encode the pointed-to type further), so I believe this should be fine. Differential Revision: https://reviews.llvm.org/D121280	2022-03-10 09:33:55 +01:00
Xiang1 Zhang	c31014322c	TLS loads opimization (hoist) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120000	2022-03-10 09:29:06 +08:00
Florian Mayer	0f770f4d00	[NFC] [HWASan] document why we tag Size but untag AlignedSize.	2022-03-09 16:18:04 -08:00
Michael Gottesman	0b647fc529	[debug-info] Debug salvage llvm.dbg.addr in original function that point into the coroutine frame when splitting coros. We are already doing this in the split functions while we clone. This just handles the original function. I also updated the coroutine split test to validate that we are always referring to the msg in the context object instead of in a shadow copy. rdar://83957028 Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D121324	2022-03-09 14:02:09 -08:00
Congzhe Cao	abc8ca65c3	[LoopInterchange] Detect output dependency of a store instruction with itself This patch is motivated by pr48057 where an output dependency is not detected since loop interchange did not check a store instruction with itself. Fixed that deficiency. Reviewed By: bmahjour, Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D118102	2022-03-09 15:50:27 -05:00
Alok Kumar Sharma	94823500a7	[DebugInfo][SROA] Correct debug info for global variables in case of SROA The existing handling produced crash for test case (attached with patch). Now the function transferSRADebugInfo is modified to - Ignore the current variable if it starts after the current Fragment. - Ignore the current variable if it ends before the current Fragment. - Generate (!DIExpression()) if current variable completely fits the current Fragment. - Otherwise (as earlier), generate the DW_OP_LLVM_fragment in IR if current Fragment partially defines current variable. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D121107	2022-03-10 00:41:30 +05:30
Florian Hahn	f98125abb2	Revert "[PassManager] Add pretty stack entries before P->run() call." This reverts commit `128745cc26`. This increased compile-time unnecessarily. Revert this change and follow ups `2c7afadb47` & `add0c5856d`. http://llvm-compile-time-tracker.com/compare.php?from=338dfcd60f843082bb589b287d890dbd9394eb82&to=128745cc2681c284bc6d0150a319673a6d6e8424&stat=instructions	2022-03-09 18:46:32 +00:00
Andrew Litteken	0b3a6c8d20	[IROutliner] Handling outlined code with no exit paths As a result of adding multiblock outlining, it became possible to outline the entirety of basic block, and branches that only pointed to the basic blocks contained in the outlined section. This means that there are no exit paths, and no return statement. There was a previous assertion from the older version of the outliner that explicitly made sure there was a return statement. This removes that assertion. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D120868	2022-03-09 10:43:48 -08:00
Benoit Jacob	851332a1f2	Fix linking error, undefined class static constants. Reviewed By: spupyrev Differential Revision: https://reviews.llvm.org/D121293	2022-03-09 10:01:38 -08:00
Craig Topper	f72fe2ef67	[InstCombine] Preserve FMF in foldLogicOfFCmps. This patch intersects the fast math flags from the two fcmps instead of dropping them. I poked at this a bunch with Alive2 for nnan and ninf flags and it seemed to check out. With the other flags it told me "Couldn't prove the correctness of the transformation". Not sure if I should just preserve nnan and ninf? Reviewed By: spatel, lebedev.ri Differential Revision: https://reviews.llvm.org/D121243	2022-03-09 09:17:09 -08:00
Florian Hahn	a12403cfea	[LV] Do not consider instrs dead if used by phi that's not in plan. Single value phis won't be modeled in VPlan. If the phi only gets used outside the loop, the current code misses the fact that the incoming value is not dead. Update the code to also look through such phis to check for outside users. Fixes #54266	2022-03-09 16:04:44 +00:00
Nikita Popov	e81f566de6	[Coroutines] Avoid pointer element access for resume function type For switch ABI, the function type is always "void (%frame*)", so just hardcode that rather than fetching it from a pointer element type.	2022-03-09 14:47:17 +01:00
Florian Hahn	128745cc26	[PassManager] Add pretty stack entries before P->run() call. This patch adds PrettyStackEntries before running passes. The entries include the pass name and the IR unit the pass runs on. The information is used the print additional information when a pass crashes, including the name and a reference to the IR unit on which it crashed. This is similar to the behavior of the legacy pass manager. The improved stack trace now includes: Stack dump: 0. Program arguments: bin/opt -loop-vectorize -force-vector-width=4 crash.ll 1. Running pass 'ModuleToFunctionPassAdaptor' on module 'crash.ll' 2. Running pass 'LoopVectorizePass' on function '@a' Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D120993	2022-03-09 13:01:09 +00:00
Nikita Popov	3d9386a349	[CoroFrame] Avoid pointer element type access for swifterror These must have pointer-to-pointer type, and with opaque pointers we don't care about the specific pointer type anymore.	2022-03-09 11:15:10 +01:00
Nikita Popov	f682a8386b	[Attributor] Use byval type instead of pointer element type For compatibility with opaque pointers, use the byval type rather than the pointer element type. Differential Review: https://reviews.llvm.org/D120983	2022-03-09 09:30:42 +01:00
Florian Mayer	4bfd8a2c5f	[NFC] [MTE] [HWASan] fixed orphaned comments.	2022-03-08 16:42:31 -08:00
Florian Mayer	af22478933	[NFC] [MTE] [HWASan] simply code.	2022-03-08 16:36:10 -08:00
Vitaly Buka	ce29a0429b	Revert "Attempt to fix linking issue on the bot" The issue was fixed with `48c74bb2e2` This reverts commit `ac423a8c8a`.	2022-03-08 16:16:01 -08:00
Florian Mayer	e86bd32b71	[NFC] [HWASan] [MTE] Use function_ref over template.	2022-03-08 15:49:55 -08:00
Vitaly Buka	ac423a8c8a	Attempt to fix linking issue on the bot	2022-03-08 15:33:10 -08:00
Fangrui Song	48c74bb2e2	[SampleProfileInference] Work around odr-use of const non-inline static data member to fix -O0 builds after D120508 MinBaseDistance may be odr-used by std::max, leading to an undefined symbol linker error: ``` ld.lld: error: undefined symbol: (anonymous namespace)::MinCostMaxFlow::MinBaseDistance >>> referenced by SampleProfileInference.cpp:744 (/home/ray/llvm-project/llvm/lib/Transforms/Utils/SampleProfileInference.cpp:744) >>> lib/Transforms/Utils/CMakeFiles/LLVMTransformUtils.dir/SampleProfileInference.cpp.o:((anonymous namespace)::FlowAdjuster::jumpDistance(llvm::FlowJump*) const) ``` Since llvm-project is still using C++ 14, workaround it with a cast.	2022-03-08 14:34:53 -08:00
Florian Hahn	e10b0ea371	[ConstraintElimination] Remove over-eager assertion. After moving the CanAdd check in `c60cdb44f7` and using it for the assume cases as well, the passed in block may not have a branch instruction as terminator. This can trigger the assertion. Given the new use case, it doesn't add value any longer and can be removed. Fixes https://github.com/llvm/llvm-project/issues/54281	2022-03-08 22:02:08 +00:00
spupyrev	81aedab7dd	introducing some profi flags Differential Revision: https://reviews.llvm.org/D120508	2022-03-08 12:35:15 -08:00
Sanjay Patel	9397bdc67e	[InstCombine] fold fcmp with lossy casted constant This is noted as a missing clang warning in #54222 (and we should still make that enhancement). Alive2 proofs: https://alive2.llvm.org/ce/z/Q8drDq https://alive2.llvm.org/ce/z/pE6LRt I don't see a single conversion for all predicates using "getFCmpCode" logic, so other predicates are left as a TODO item.	2022-03-08 12:41:12 -05:00
Arnold Schwaighofer	dcdc1f29bb	InstCombine: Can't fold a phi arg load into the phi if the load is from a swifterror address `swifterror` addresses are only allowed as operands to load, store, and calls. The following transformation is not allowed. It would create a phi with a `swifterror` address operand. ``` %addr = alloca swifterror i8* br %cond, label %bb1, label %b22 bb1: %val1 = load i8, i8* %addr br exit bb2: %val2 = load i8, i8* %addr br exit exit: %val = phi [%val1, %bb1] [%val2, %bb2] ``` => ``` %addr = alloca swifterror i8* br %cond, label %bb1, label %b22 bb1: br exit bb2: br exit exit: %val_addr = phi [%addr, %bb1] [%addr, %bb2] %val2 = load i8, i8* %val_addr ``` rdar://89865485 Differential Revision: https://reviews.llvm.org/D121217	2022-03-08 09:09:51 -08:00
Arthur Eubanks	53e5e58670	[NewPM][Inliner] Make inlined calls to functions in same SCC as callee exponentially expensive Introduce a new attribute "function-inline-cost-multiplier" which multiplies the inline cost of a call site (or all calls to a callee) by the multiplier. When processing the list of calls created by inlining, check each call to see if the new call's callee is in the same SCC as the original callee. If so, set the "function-inline-cost-multiplier" attribute of the new call site to double the original call site's attribute value. This does not happen when the original call site is intra-SCC. This is an alternative to D120584, which marks the call sites as noinline. Hopefully fixes PR45253. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D121084	2022-03-07 23:51:09 -08:00
Johannes Doerfert	5b4acb20ff	[OpenMP][FIX] Ensure flag to disable de-globalization works properly If the user disables de-globalization we did not seed the AAHeapToShared and AAHeapToStack but we still could end up with them through in-flight lookups. With this patch we disable AAHeapToShared completely if the user disabled de-globalization. Heap-2-stack is still run though. Differential Revision: https://reviews.llvm.org/D121059	2022-03-07 23:43:05 -06:00
Philip Reames	a2e9c68fcd	[SLP] Extract a helper for buildvector [nfc]	2022-03-07 19:11:40 -08:00
Philip Reames	8ab3befa3f	[SLP] Fix spelling in a lambda name [NFC]	2022-03-07 18:52:57 -08:00
Ahmed Bougacha	1067f2177a	[sancov] Don't instrument calls to bitcast funcs: they're not indirect. Currently, when instrumenting indirect calls, this uses CallBase::getCalledFunction to determine whether a given callsite is eligible. However, that returns null if: this is an indirect function invocation or the function signature does not match the call signature. So, we end up instrumenting direct calls where the callee is a bitcast ConstantExpr, even though we presumably don't need to. Use isIndirectCall to ignore those funky direct calls. Differential Revision: https://reviews.llvm.org/D119594	2022-03-07 12:43:37 -08:00
Roman Lebedev	2f80ea7f4f	[NFC][LV] Use different braces in debug output The analysis passes output function name encapsulated in `'` braces, but LV uses `"`. Harmonizing this may help in creating an update script for the LV costmodel test checks. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D121105	2022-03-07 19:32:37 +03:00
Florian Hahn	4bbee17ecb	[ConstraintElimination] Use ZExtValue for unsigned decomposition. When decomposing constraints for unsigned conditions, we can use negative values by zero-extending them, as long as they are less than the maximum constraint value. Fixes https://github.com/llvm/llvm-project/issues/54224	2022-03-07 13:34:01 +00:00
Florian Hahn	c60cdb44f7	[ConstraintElimination] Only add cond from assume to succs if valid. Add missing CanAdd check before adding a condition from an assume to the successor blocks. When adding information from assume to successor blocks we need to perform the same CanAdd as we do for adding a condition from a branch. Fixes https://github.com/llvm/llvm-project/issues/54217	2022-03-07 12:01:15 +00:00
Nikita Popov	0636c93d3e	[Attributor] Remove restriction on simplifying function pointers Dropping this restriction seems to work fine (there are no assertion failures), so it appears that either the updater got smarter or the problematic cases are restricted elsewhere. If doing this still causes issues, then the place to address it would probably be `8f5bdaf481/llvm/lib/Transforms/IPO/Attributor.cpp (L1856-L1859)`, which already prevents replacement outside the SCC, so I'm not quite sure what this check is intended to avoid. Differential Revision: https://reviews.llvm.org/D120987	2022-03-07 11:54:37 +01:00
Nikita Popov	1bd33691cb	[CoroElide] Remove fallback for frame layout determination Only determine the frame layout based on dereferenceable and align attributes, and remove the type-based fallback, which is incompatible with opaque pointers. The dereferenceable attribute is required, while the align attribute uses default alignment of 1 (commonly, align 1 attributes do not get placed, relying on default alignment). The CoroSplit pass producing the resume function adds the necessary attributes in `7daed35911/llvm/lib/Transforms/Coroutines/CoroSplit.cpp (L840)`, and their presence is checked in coro-debug.ll at least. Differential Revision: https://reviews.llvm.org/D120988	2022-03-07 11:23:02 +01:00
Nikita Popov	9bca4ea364	[Coroutines] Allow FramePtr to be an Argument With opaque pointers, after splitRetconCoroutine() the FramePtr may be an Argument rather than an Instruction. With typed pointers, this currently doesn't happen because the FramePtr would be a bitcast instruction. Fix this by making FramePtr a Value and adding a helper for the "after FramePtr" insertion point, which would be the start of the function in the Argument case. Differential Revision: https://reviews.llvm.org/D120994	2022-03-07 10:58:56 +01:00
Florian Hahn	542c335159	[ConstraintElimination] Remove dead variables when dropping constraints. This patch extends ConstraintElimination to also remove dead variables when removing a constraint. When a constraint is removed because it is out of scope, all new variables added for this constraint can also be removed. This keeps the total size of the systems much smaller, because it reduces the number of variables drastically. It also fixes a bug where variables where removed incorrectly. Fixes https://github.com/llvm/llvm-project/issues/54228	2022-03-07 09:04:07 +00:00
Nikita Popov	a9b03d9e2e	[Attributor] Remove function pointer restriction for AAAlign This check is not compatible with opaque pointers. We can avoid it by adjusting the getPointerAlignment() implementation to avoid creating unnecessary ptrtoint expressions for bitcasted pointers. The code already uses OnlyIfReduced to not create an expression if it does not simplify, and this makes sure that folding a bitcast and ptrtoint into a ptrtoint doesn't count as a simplification. Differential Revision: https://reviews.llvm.org/D120904	2022-03-07 10:02:45 +01:00
Nikita Popov	d1e880acaa	[SCEV] Enable verification in LoopPM Currently, we hardly ever actually run SCEV verification, even in tests with -verify-scev. This is because the NewPM LPM does not verify SCEV. The reason for this is that SCEV verification can actually change the result of subsequent SCEV queries, which means that you see different transformations depending on whether verification is enabled or not. To allow verification in the LPM, this limits verification to BECounts that have actually been cached. It will not calculate new BECounts. BackedgeTakenInfo::getExact() is still not entirely readonly, it still calls getUMinFromMismatchedTypes(). But I hope that this is not problematic in the same way. (This could be avoided by performing the umin in the other SCEV instance, but this would require duplicating some of the code.) Differential Revision: https://reviews.llvm.org/D120551	2022-03-07 09:46:20 +01:00
Johannes Doerfert	5af11ec34b	[Attributor] Determine potentially loaded values through memory We already look through memory to determine where a value that is stored might pop up again (potential copies). This patch introduces the other direction with similar logic. If a value is loaded, we can follow all the accesses to the pointer (or better object) and try to determine what value might have been stored.	2022-03-06 23:26:37 -06:00
Johannes Doerfert	eb73af4af4	[Attributor] Handle undef and null in AAAlignFloating Both `undef` and `nullptr` are maximally aligned. This is especially important as we often see `undef` until a proper value has been identified during simplification.	2022-03-06 23:26:22 -06:00
Johannes Doerfert	ad26e199ff	[Attributor] Use CFG reasoning also for read accesses With D106397 we used CFG reasoning to filter out writes that will not interfere with a given load instruction. With this patch we use the same logic (modulo the reversal in reachability check order) for store instructions. As an example, we can now proof stores to shared memory are dead if all the loads of the shared memory are not reachable from them.	2022-03-06 23:26:22 -06:00
Johannes Doerfert	acb3773491	[Attributor] Improve isValidAtPosition (mostly for old PM) To minimize the test difference between old and new PM we perform some local dominance check if no dominator tree is available.	2022-03-06 23:26:21 -06:00
Johannes Doerfert	ff758372bd	[Attributor][NFCI] Introduce fine-grained anonymous namespaces	2022-03-06 21:28:38 -06:00
Johannes Doerfert	192a34ddb0	[Attributor][OpenMPOpt][FIX] Register simplification callbacks Heap-2-stack and heap-2-shared can replace an allocation call with something else. To avoid us deriving information from the allocator implementation we register a simplification callback now that will force us to stop at the call site. We probably should create the replacement memory eagerly and return that instead though.	2022-03-06 21:28:38 -06:00
Johannes Doerfert	5859ae6a5d	[Attributor][FIX] Use maximal access for dereferenceability deduction While we can use range information when we derive dereferenceability we must make sure to pick he right end of the range. Before we always went with the minimal offset, which is not correct if we want to combine the base dereferenceability with some offset. In that case it's the maximum that gives the correct result.	2022-03-06 21:28:38 -06:00
Johannes Doerfert	1fcd4d0e3b	[Attributor][FIX] Initialize stack variable	2022-03-06 21:28:38 -06:00
Johannes Doerfert	6158f4a466	[Attributor][NFCI] No repeated manifest of AAValueSimplifyReturned (CGSCC)	2022-03-06 19:59:23 -06:00
Johannes Doerfert	efedf70aa5	[Attributor][NFC] Expose helper with more generic interface This simply makes the function argument of the `Attributor::checkForAllInstructions` helper explicit so one can iterate over instructions in other functions.	2022-03-06 19:59:23 -06:00
Johannes Doerfert	8fa839aa58	[Attributor][NFC] Improve debug messages	2022-03-06 19:59:22 -06:00
William S. Moses	87ec6f41bb	[OpenMPIRBuilder] Allocate temporary at the correct block in a nested parallel The OpenMPIRBuilder has a bug. Specifically, suppose you have two nested openmp parallel regions (writing with MLIR for ease) ``` omp.parallel { %a = ... omp.parallel { use(%a) } } ``` As OpenMP only permits pointer-like inputs, the builder will wrap all of the inputs into a stack allocation, and then pass this allocation to the inner parallel. For example, we would want to get something like the following: ``` omp.parallel { %a = ... %tmp = alloc store %tmp[] = %a kmpc_fork(outlined, %tmp) } ``` However, in practice, this is not what currently occurs in the context of nested parallel regions. Specifically to the OpenMPIRBuilder, the entirety of the function (at the LLVM level) is currently inlined with blocks marking the corresponding start and end of each region. ``` entry: ... parallel1: %a = ... ... parallel2: use(%a) ... endparallel2: ... endparallel1: ... ``` When the allocation is inserted, it presently inserted into the parent of the entire function (e.g. entry) rather than the parent allocation scope to the function being outlined. If we were outlining parallel2, the corresponding alloca location would be parallel1. This causes a variety of bugs, including https://github.com/llvm/llvm-project/issues/54165 as one example. This PR allows the stack allocation to be created at the correct allocation block, and thus remedies such issues. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121061	2022-03-06 18:34:25 -05:00
Florian Hahn	bc00f47c01	[LoopSink] Do not try to sink phi nodes. Skip phi nodes in the preheader. They may not be considered loop invariant by the assertion below. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D121010	2022-03-06 11:16:22 +00:00
Roman Lebedev	e47257e251	Revert "Reland [SROA] Maintain shadow/backing alloca when some slices are noncapturnig read-only calls to allow alloca partitioning/promotion" There seems to be one more uncaught problem, SROA may now end up trying to re-re-repromote the just-promoted shadow alloca, and do that endlessly. This reverts commit `adc0984d81`.	2022-03-05 01:09:51 +03:00
Roman Lebedev	adc0984d81	Reland [SROA] Maintain shadow/backing alloca when some slices are noncapturnig read-only calls to allow alloca partitioning/promotion This is inspired by the original variant of D109749 by Graham Hunter, but is a more general version. Roughly, instead of promoting the alloca, we call it a shadow/backing alloca, go through all it's slices, clone(!) instructions that operated on it, but make them operate on the cloned alloca, and promote cloned alloca instead. This keeps the shadow/backing alloca, and all the original instructions around, which results in said shadow/backing alloca being a perfect mirror/representation of the promoted alloca's content, so calls that take the alloca as arguments (non-capturingly!) can be supported. For now, we require that the calls also don't modify the alloca's content, but that is only to simplify the initial implementation, and that will be supported in a follow-up. Overall, this leads to smaller codesize: https://llvm-compile-time-tracker.com/compare.php?from=a8b4f5bbab62091835205f3d648902432a4a5b58&to=aeae054055b125b011c1122f82c86457e159436f&stat=size-total and is roughly neutral compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=a8b4f5bbab62091835205f3d648902432a4a5b58&to=aeae054055b125b011c1122f82c86457e159436f&stat=instructions This relands commit `703240c71f`, that was reverted by commit `7405581f7c`, because the assertion `isa<LoadInst>(OrigInstr)` didn't hold in practice, as the newly added test `@select_of_ptrs` shows: If the pointers into alloca are used by select's/PHI's, then even if we manage to fracture the alloca, some sub-alloca's will likely remain. And if there are any non-capturing calls, then we will also decide to keep the original backing alloca around, and we suddenly ~doubled the alloca size, and the amount of memory traffic. I'm not sure if this is a problem or we could live with it, but let's leave that for later... Reviewed By: djtodoro Differential Revision: https://reviews.llvm.org/D113520	2022-03-05 00:14:12 +03:00
Augie Fackler	b32735d599	BuildLibCalls: add allocalign attributes for memalign and aligned_alloc This gets us close to being able to remove a column from the table in MemoryBuiltins.cpp. Differential Revision: https://reviews.llvm.org/D117923	2022-03-04 15:57:53 -05:00
Augie Fackler	d664c4b73c	Attributes: add a new allocalign attribute This will let us start moving away from hard-coded attributes in MemoryBuiltins.cpp and put the knowledge about various attribute functions in the compilers that emit those calls where it probably belongs. Differential Revision: https://reviews.llvm.org/D117921	2022-03-04 15:57:53 -05:00
Johannes Doerfert	f9c2d6005e	[OpenMP][FIX] Ensure custom state machine works The custom state machine had a check for surplus threads that filtered the main thread if the kernel was executed by a single warp only. We now first check for the main thread, then for surplus threads, avoiding to filter the former out. Fixes #54214. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D121011	2022-03-04 13:51:19 -05:00
Roman Lebedev	7405581f7c	Revert "[SROA] Maintain shadow/backing alloca when some slices are noncapturnig read-only calls to allow alloca partitioning/promotion" Bots are reporting that the assertion about only expecting loads is wrong. This reverts commit `703240c71f`.	2022-03-04 21:49:30 +03:00
Roman Lebedev	703240c71f	[SROA] Maintain shadow/backing alloca when some slices are noncapturnig read-only calls to allow alloca partitioning/promotion This is inspired by the original variant of D109749 by Graham Hunter, but is a more general version. Roughly, instead of promoting the alloca, we call it a shadow/backing alloca, go through all it's slices, clone(!) instructions that operated on it, but make them operate on the cloned alloca, and promote cloned alloca instead. This keeps the shadow/backing alloca, and all the original instructions around, which results in said shadow/backing alloca being a perfect mirror/representation of the promoted alloca's content, so calls that take the alloca as arguments (non-capturingly!) can be supported. For now, we require that the calls also don't modify the alloca's content, but that is only to simplify the initial implementation, and that will be supported in a follow-up. Overall, this leads to smaller codesize: https://llvm-compile-time-tracker.com/compare.php?from=a8b4f5bbab62091835205f3d648902432a4a5b58&to=aeae054055b125b011c1122f82c86457e159436f&stat=size-total and is roughly neutral compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=a8b4f5bbab62091835205f3d648902432a4a5b58&to=aeae054055b125b011c1122f82c86457e159436f&stat=instructions Reviewed By: djtodoro Differential Revision: https://reviews.llvm.org/D113520	2022-03-04 21:08:43 +03:00
Augie Fackler	5e4c75db3b	InstructionCombining: avoid eliding mismatched alloc/free pairs Prior to this change LLVM would happily elide a call to any allocation function and a call to any free function operating on the same unused pointer. This can cause problems in some obscure cases, for example if the body of operator::new can be inlined but the body of operator::delete can't, as in this example from jyknight: #include <stdlib.h> #include <stdio.h> int allocs = 0; void operator new(size_t n) { allocs++; void mem = malloc(n); if (!mem) abort(); return mem; } __attribute__((noinline)) void operator delete(void mem) noexcept { allocs--; free(mem); } void deleteit(inti) { delete i; } int main() { int*i = new int; deleteit(i); if (allocs != 0) printf("MEMORY LEAK! allocs: %d\n", allocs); } This patch addresses the issue by introducing the concept of an allocator function family and uses it to make sure that alloc/free function pairs are only removed if they're in the same family. Differential Revision: https://reviews.llvm.org/D117356	2022-03-04 10:41:10 -05:00
Nikita Popov	6467d1d275	[CoroFrame] Remove unused insertSpills() return value (NFC)	2022-03-04 15:11:24 +01:00
Nikita Popov	6b5b367858	[Attributor] Remove function pointer type check (NFCI) This check is not relevant for correctness, it can only avoid walking some recursive uses if the cast is to a non-function pointer type. As this distinction will no longer be possible with opaque pointers and all users will have to be walked anyway, I'm dropping the check in advance.	2022-03-04 12:09:51 +01:00
Nikita Popov	d3a52089eb	Reapply [MergeICmps] Don't require GEP Recommit without changes over `53abe3ff66`, which addressed the cause of the reported crash. ----- With opaque pointers, the zero-offset load will generally not use a GEP. Allow a direct load without GEP, which is treated the same way as a zero-offset GEP.	2022-03-04 11:39:11 +01:00
Nikita Popov	53abe3ff66	[MergeICmp] Make instruction move robust against empty block (NFCI) Use the overload that support moving into an empty block. I don't think that this situation can occur right now, but it can happen with the change from `e7fb1c15cb`, and the test is derived from the issue reported there.	2022-03-04 11:15:08 +01:00
Jez Ng	dd29597e10	[LTO] Initialize canAutoHide() using canBeOmittedFromSymbolTable() Per discussion on https://reviews.llvm.org/D59709#inline-1148734, this seems like the right course of action. `canBeOmittedFromSymbolTable()` subsumes and generalizes the previous logic. In addition to handling `linkonce_odr` `unnamed_addr` globals, we now also internalize `linkonce_odr` + `local_unnamed_addr` constants. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D120173	2022-03-03 19:04:11 -05:00
Arthur Eubanks	bc1574b495	Revert "[MergeICmps] Don't require GEP" This reverts commit `e7fb1c15cb`. Causes crashes, see https://reviews.llvm.org/rGe7fb1c15cb85d748c1c4fdd5a2eb5613ec7bef1d.	2022-03-03 15:01:39 -08:00
Philip Reames	00a877f96a	[DSE] Cache liveOnEntry as clobbering access This builds on @fhahn's D112313, and caches the liveOnEntry node as a optimized access. D112313 tied to only cache a known clobber. This change adds caching the fact that no clobber exists. It still does not cache may-clobber results. Differential Revision: https://reviews.llvm.org/D120842	2022-03-03 11:36:21 -08:00
Philip Reames	deae979a2c	Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""" This reverts commit `738042711b`. A second, apparently separate, issue has been reported on the original review.	2022-03-03 11:35:34 -08:00
Arthur Eubanks	f0b61f7957	Revert "[GlobalOpt] Don't replace alias with aliasee if either alias/aliasee may be preemptible" This reverts commit `30e8f83c84`. Causes huge compile time regressions on certain large files. Will followup offline with author.	2022-03-03 11:04:14 -08:00
Craig Topper	608161225e	[InstCombine][Analysis] Move getFCmpCode and getPredForFCmpCode to CmpInstAnalysis. NFC The similar getICmpCode and getPredForICmpCode are already there. This moves FP for consistency. I think InstCombine is currently the only user of both. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120754	2022-03-03 09:33:24 -08:00
Paul Robinson	7b85f0f32f	[PS4] isPS4 and isPS4CPU are not meaningfully different	2022-03-03 11:36:59 -05:00
Nikita Popov	1b6663a104	[FuncSpec] Remove unnecessary function pointer type check We will check a bit later that the constant is in fact a function, so the separate check for a function pointer type is largely redunant. Also simplify the cast stripping with stripPointerCasts().	2022-03-03 15:20:11 +01:00
Alexandros Lamprineas	910eb988eb	[FuncSpec][NFC] Refactor internal structures. `ArgInfo` is reduced to only contain a pair of {formal,actual} values. The specialized function `Fn` and the `Partial` flag are redundant in this structure. The `Gain` is moved to a new struct `SpecializationInfo`. The value mappings created by cloneCandidateFunction() are being used by rewriteCallSites() for matching the formal arguments of recursive functions. The list of specializations is passed by reference to calculateGains() instead of being returned by value. The `IsPartial` flag is removed from isArgumentInteresting() and getPossibleConstants() as it's no longer used anywhere in the code. Differential Revision: https://reviews.llvm.org/D120753	2022-03-03 13:08:13 +00:00
Nikita Popov	c1b9667148	[InstCombine] Support opaque pointers in callee bitcast fold To make this actually trigger, we also need to check whether the function types differ, which is a hidden cast under opaque pointers. The transform is somewhat less relevant there because it is primarily about pointer bitcasts, but it can also happen with other bit- or pointer-castable types. Byval handling is easier with opaque pointers because there is no need to adjust the byval type, we only need to make sure that it's still a pointer.	2022-03-03 11:07:39 +01:00
Nikita Popov	6c8adc5054	[InstCombine] Remove unnecessary byval check in callee cast fold The logic for handling this was fixed in `8d7f118ab2`, but the check for byval on the callee was retained. This resulted in a weird situation where the transform would work depending on whether the byval was only on the call or on both the call and the function.	2022-03-03 10:55:14 +01:00
Nikita Popov	c262ba2aab	[Scalarizer] Avoid pointer element type accesses Pass through the load/store type to the Scatterer instead.	2022-03-03 10:28:58 +01:00
serge-sans-paille	f90a66a544	Add missing include under -DEXPENSIVE_CHECKS This is a follow-up to `59630917d6`	2022-03-03 10:19:39 +01:00
Nikita Popov	b214f550f7	[DSE] Drop redundant WalkerStepLimit adjustment There is a general WalkerStepLimit adjustment higher up in the loop, and I don't see any reason why this particular case would need additional adjustment. Furthermore, this could underflow.	2022-03-03 09:42:38 +01:00
serge-sans-paille	59630917d6	Cleanup includes: Transform/Scalar Estimated impact on preprocessor output line: before: 1062981579 after: 1062494547 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120817	2022-03-03 07:56:34 +01:00
spupyrev	f2ade65fb2	[CSSPGO] Even flow distribution Differential Revision: https://reviews.llvm.org/D118640	2022-03-02 13:12:05 -08:00
Philip Reames	738042711b	Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" Root issue which triggered the revert was fixed in 689bab. No changes in the reapplied patch. Original commit message follows: SLP currently schedules all instructions within a scheduling window which stretches from the first instr uction potentially vectorized to the last. This window can include a very large number of unrelated instruct ions which are not being considered for vectorization. This change switches the code to only schedule the su b-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-03-02 10:47:20 -08:00
Philip Reames	689babdf68	[SLP] Don't try to vectorize allocas While a collection of allocas are technically vectorizeable - by forming a wider alloca - this was not a transform SLP actually knows how to do. Instead, we were forming a bundle with missing dependencies, and then relying on the scheduling code to preserve program order if multiple instructions were scheduleable at once. I haven't been able to write a test case, but I'm 99% sure this was wrong in some edge case. The unknown op case was flowing down the shufflevector path. This did result in some splat handling being lost with this change, but the same lack of splat handling is visible in a whole bunch of simple examples for the gather path. I didn't consider this interesting to fix given how narrow the splat of allocas case is.	2022-03-02 10:08:43 -08:00
Stephen Long	2f6c14816a	[LoopPeel] Add EXPENSIVE_CHECKS ifdef guard around domtree verify call The verify call was taking 50% of the compile time in our internal LLVM fork when trying to unroll many loops. Differential Revision: https://reviews.llvm.org/D113028	2022-03-02 09:56:20 -08:00
Florian Hahn	8777cb66a8	[VPlan] Remove reliance on underlying instr for ScalarIVSteps (NFCI). Instead of relying on underlying instructions, this patch updates VPScalarIVStepsRecipe to only store the required type information. This removes access to unrelated information, as well as avoiding issues with the same underlying instruction being shared by multiple recipes. This change should only change the debug output and not cause any codegen changes, hence NFCI.	2022-03-02 16:23:19 +00:00
Nikita Popov	61580d0949	Reapply [InstCombine] Remove one-use limitation from X-Y==0 fold This is a recommit without changes. I originally reverted this due to a significant code-size regression on tramp3d-v4, however further investigation showed that in the tramp3d-v4 case this change enables additional optimizations (in particular more jump threading), which happens to reduce the size of a function just enough to be eligible for inlining at hot callsites, which results in the code size increase. As such, this was just bad luck. ----- This one-use limitation is artificial, we do not increase instruction count if we perform the fold with multiple uses. The motivating case is shown in @sub_eq_zero_select, where the one-use limitation causes us to miss a subsequent select fold. I believe the backend is pretty good about reusing flag-producing subs for cmps with same operands, so I think doing this is fine. Differential Revision: https://reviews.llvm.org/D120337	2022-03-02 16:43:33 +01:00
spupyrev	bcdc047731	speeding up ext-tsp for huge instances Differential Revision: https://reviews.llvm.org/D120780	2022-03-02 07:17:48 -08:00
Florian Hahn	9e46866c0c	[LV] Remove dead EntryVal argument from buildScalarSteps (NFC). The EntryVal argument is not needed after recent refactoring. Remove it.	2022-03-02 14:59:22 +00:00
Nikita Popov	5cf06d10f8	Revert "[InstCombine] Support switch in phi to cond fold" This reverts commit `0817ce86b5`. Seeing some ppc64le stage2 failures, reverting to investigate.	2022-03-02 12:49:47 +01:00
Nikita Popov	0817ce86b5	[InstCombine] Support switch in phi to cond fold For conditional branches, we know the value is i1 0 or i1 1 along the outgoing edges. For switches we can apply exactly the same optimization, just with the known values determined by the switch cases.	2022-03-02 12:16:32 +01:00
Xiang1 Zhang	65588a0776	Revert "TLS loads opimization (hoist)" Revert for more reviews This reverts commit `30e612ebdf`.	2022-03-02 14:10:11 +08:00
Hongtao Yu	07846e3387	[CSSPGO][PriorityInliner] Do not use block weight to drive callsite inlining. The priority-based inliner currenlty uses block count combined with callee entry count to drive callsite inlining. This doesn't work well with LTO where postlink inlining is driven by prelink-annotated block count which could be based on the merge of all context profiles. I'm fixing it by using callee profile entry count only which should be context-sensitive. I'm seeing 0.2% perf improvment for one of our internal large benchmarks with probe-based non-CS profile. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D120784	2022-03-01 18:43:19 -08:00
Xiang1 Zhang	30e612ebdf	TLS loads opimization (hoist) Reviewed By: Wang Pheobe, Topper Craig Differential Revision: https://reviews.llvm.org/D120000	2022-03-02 10:37:24 +08:00
Arthur Eubanks	9c6250ee41	Revert "[SLP] Schedule only sub-graph of vectorizable instructions" This reverts commit `0539a26d91`. Causes a miscompile, see comments on D118538. Required updating bottom-to-top-reorder.ll.	2022-03-01 17:31:16 -08:00
Arthur Eubanks	6987ac7903	Revert "[SLP] Remove SchedulingPriority from ScheduleData [NFC]" This reverts commit `a3e9b32c00`. Required for reverting D118538.	2022-03-01 17:28:52 -08:00
Florian Mayer	1d730d80ce	[HWASAN] erase lifetime intrinsics if tag is outside. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D120437	2022-03-01 14:47:33 -08:00
Joseph Huber	6632180745	[OpenMP][NFC] Add an option to print the module before in OpenMPOpt Previously there was a debug flag to print the module after optimizations. Sometimes we wanted to print the module before optimizations so this is being split into two flags. `-openmp-opt-print-module` is now `-openmp-opt-print-module-after`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120768	2022-03-01 17:09:09 -05:00
serge-sans-paille	a494ae43be	Cleanup includes: TransformsUtils Estimation on the impact on preprocessor output: before: 1065307662 after: 1064800684 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120741	2022-03-01 21:00:07 +01:00
Craig Topper	7bc6667845	[Analysis] Simplify the interface to llvm::getICmpCode. NFC Instead of passing an InstCmpInt * and a bool just pass the predicate from the caller. I'm considering moving the similar FCmp functions from InstCombine over here and this makes the interface consistent with what is used for FCmp. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120609	2022-03-01 09:53:27 -08:00
Tong Zhang	17ce89fa80	[SanitizerBounds] Add support for NoSanitizeBounds function Currently adding attribute no_sanitize("bounds") isn't disabling -fsanitize=local-bounds (also enabled in -fsanitize=bounds). The Clang frontend handles fsanitize=array-bounds which can already be disabled by no_sanitize("bounds"). However, instrumentation added by the BoundsChecking pass in the middle-end cannot be disabled by the attribute. The fix is very similar to D102772 that added the ability to selectively disable sanitizer pass on certain functions. In this patch, if no_sanitize("bounds") is provided, an additional function attribute (NoSanitizeBounds) is attached to IR to let the BoundsChecking pass know we want to disable local-bounds checking. In order to support this feature, the IR is extended (similar to D102772) to make Clang able to preserve the information and let BoundsChecking pass know bounds checking is disabled for certain function. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D119816	2022-03-01 18:47:02 +01:00
serge-sans-paille	71c3a5519d	Cleanup includes: LLVMAnalysis Number of lines output by preprocessor: before: 1065940348 after: 1065307662 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120659	2022-03-01 18:01:54 +01:00
Nikita Popov	a1f442b278	[InstCombine] Support phi to cond fold with more than two preds This transform can still be applied if there are more than two phi inputs, as long as phi inputs with the same value are dominated by the same idom edge.	2022-03-01 16:31:49 +01:00
Nikita Popov	26748bb15a	[InstCombine] Slightly relax one-use check in abs canonicalization Treat the icmp and sub symmetrically, and require that one of them has one use, not the icmp in particular. This could be further relaxed in the abs (but not nabs) case to not check one-use at all.	2022-03-01 15:06:41 +01:00
Sanjay Patel	84812b9b07	[InstCombine] drop FMF in select->copysign transform It is not correct to propagate flags from the select to the new instructions: https://alive2.llvm.org/ce/z/tNATrd https://alive2.llvm.org/ce/z/VwcVzn Fixes #54077	2022-03-01 08:51:41 -05:00
Nikita Popov	c2428a4fad	[InstCombine] Remove SPF min/max check from select demanded bits (NFCI) This should no longer be necessary now that we canonicalize to intrinsics. This may not be entirely NFC in practice if worklist order gets inverted and we perform demanded bits simplification of a select user before the select is canonicalized.	2022-03-01 14:50:37 +01:00
Alexandros Lamprineas	33830326aa	[FuncSpec] Remove definitions of fully specialized functions. A function is basically dead when: * it has no uses * it has only self-referencing uses (it's recursive) Differential Revision: https://reviews.llvm.org/D119878	2022-03-01 11:57:08 +00:00
Alexandros Lamprineas	b803aee67b	[FuncSpec][NFC] Improve debug messages. Adds diagnostic messages when debugging the pass. Differential Revision: https://reviews.llvm.org/D119875	2022-03-01 11:55:08 +00:00
Alexandros Lamprineas	7b74123a3d	[FuncSpec][NFC] Variable renaming. Just preparing the ground for follow up patches to make the reviews easier. Differential Revision: https://reviews.llvm.org/D119874	2022-03-01 11:38:57 +00:00
Kirill Stoimenov	b7fd30eac3	[ASan] Removed unused AddressSanitizerPass functional pass. This is a clean-up patch. The functional pass was rolled into the module pass in D112732. Reviewed By: vitalybuka, aeubanks Differential Revision: https://reviews.llvm.org/D120674	2022-03-01 00:41:29 +00:00
Philip Reames	8cb0ac5825	[SLP] Check invariant that all instructions in bundle are in same block [NFC]	2022-02-28 13:17:44 -08:00
Sanjay Patel	278b407a30	[InstCombine] fold mul-with-overflow intrinsic with -1 operand extractvalue (any_mul_with_overflow X, -1), 0 --> -X There are similar other potential transforms that we could do as noted by the last TODO in the test diffs. Fixes #54053	2022-02-28 14:13:48 -05:00
Sanjay Patel	f422c5d871	[InstCombine] fold select-of-zero-or-ones with negated op (X u< 2) ? -X : -1 --> sext (X != 0) (X u> 1) ? -1 : -X --> sext (X != 0) https://alive2.llvm.org/ce/z/U3y5Bb https://alive2.llvm.org/ce/z/hgi-4p This is part of solving:	2022-02-28 12:07:49 -05:00
Alexey Bataev	e4b9640867	[SLP]Improve bottom-to-top reordering. Currently bottom-to-top reordering analysis counts orders of the operands and then adds natural order counts for the operand users. It is very conservative, this the user nodes themselves may require reordering. Patch improves bottom-to-top analysis by checking for the user nodes if they require/allows the reordring. If the user node must be reordered, has reused scalars, is an alternate op vectorization node, is a non-ordered gather node or may allow reordering because of the reordered operands, such node is considered as the node that allows reodring and is not counted as a node with the natural order. Differential Revision: https://reviews.llvm.org/D120492	2022-02-28 06:48:46 -08:00
Florian Hahn	b3e8ace198	Recommit "[VPlan] Introduce recipe to build scalar steps." This reverts the revert commit `ff93260bf6`. The underlying issue causing the PPC bot failures has been fixed in `cbaac14734` and a corresponding test case has been added in `ad2cad1c52`. Original message: This patch adds a new VPScalarIVStepsRecipe to handle building scalar steps. In the first patch, it only handles the case where there is no vector induction variable needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115953	2022-02-28 14:12:20 +00:00
Florian Hahn	cbaac14734	[LV] Remove induction recipes only used outside vector loop. Exit values of vector inductions are generated completely independent of the induction recipes. Consider them for removal, if they are not used in loop. This fixes a crash exposed by `49b23f451c`.	2022-02-28 11:14:22 +00:00
Nikita Popov	5423b0a525	[InstCombine] Remove not of SPF min/max fold (NFCI) This should no longer be necessary now that we canonicalize to intrinsics. Might not be strictly NFC due to worklist order.	2022-02-28 11:02:31 +01:00
Nikita Popov	d5ea3b2f33	[InstCombine] Remove sub of SPF min/max fold (NFCI) This isn't necessary anymore, now that we canonicalize SPF min/max to intrinsics. Might not be strictly NFC due to worklist order changes.	2022-02-28 10:57:24 +01:00
Nikita Popov	9353ed6a53	[InstCombine] Don't call matchSAddSubSat() for SPF (NFC) Only call it for intrinsic min/max. The moved implementation is unchanged apart from the one-use check: It is now hardcoded to one-use, without the two-use special case for SPF.	2022-02-28 10:41:56 +01:00
Nikita Popov	53602e4c70	[InstCombine] Remove SPF moveAddAfterMinMax() (NFC) As SPF min/max is canonicalized to intrinsics before this point, this change should be entirely NFC.	2022-02-28 10:28:16 +01:00
Nikita Popov	ee62dcdb34	[InstCombine] Remove SPF moveNotAfterMinMax() (NFC) This happens after SPF -> intrinsic canonicalization, and as such should be entirely NFC.	2022-02-28 10:23:07 +01:00
Nikita Popov	0bc3e233d7	[InstCombine] Remove SPF factorizeMinMaxTree() (NFC) SPF integer min/max is canonicalized to min/max intrinsics before this code is reached, so this should be entirely NFC.	2022-02-28 10:22:05 +01:00
Philip Reames	319265328c	[SLP] Remove field unused after `33ce97f` to silence buildbots [NFC]	2022-02-27 10:18:10 -08:00
Florian Hahn	ff93260bf6	Revert "[VPlan] Introduce recipe to build scalar steps." This reverts commit `49b23f451c`. This appears to break some PPC build bots. Revert while I investigate.	2022-02-27 17:51:19 +00:00
Philip Reames	33ce97f413	[SLP] Use BatchAA to reduce capture analysis cost [NFC] SLP makes very heavy use of aliasing queries to construct pointer dependencies for scheduling purposes. AA internally usings pointerMayBeCaptured to prove some noalias results. In a local profile, we were spending about 4% of total O2 time in capture tracking. By using BatchAA interface - which caches capture results - this drops to 2%. Note that there is no invalidation of BatchAA here. This assumes that no transformation done by SLP invalidates alias or capture results. This is the same assumption made by the existing AliasCache, so this is not a new assumption in the code.	2022-02-27 09:47:24 -08:00
Florian Hahn	49b23f451c	[VPlan] Introduce recipe to build scalar steps. This patch adds a new VPScalarIVStepsRecipe to handle building scalar steps. In the first patch, it only handles the case where there is no vector induction variable needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115953	2022-02-27 17:32:41 +00:00
Florian Hahn	9bc866cc6f	[VPlan] Add recipe to handle SCEV expansion (NFC). This can be used to explicitly model VPValues that depend on SCEV expansion, like the step for inductions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116288	2022-02-27 12:47:02 +00:00
Florian Hahn	da740492b0	[VPlan] Remove dead header-phi recipes. This patch adds a new transform to remove dead recipes. For now, it only removes dead recipes in the header, to keep the number tests that require updating manageable. Future patches will extend this to remove dead recipes across the whole plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D118051	2022-02-26 16:26:39 +00:00
Craig Topper	1b1f8d6eff	[SeparateConstOffsetFromGEP] Remove TargetMachine.h include. NFC This doesn't appear to be used and it would be a layering violation if it was.	2022-02-25 21:40:00 -08:00
Evgeniy Brevnov	10e99eb7e4	[SLP] "Normal" instructions should not go between PHI and Lading pad Currently, SLP can insert "shuffle" instruction beween PHI and Landing pad instruction. The problem is demonstrated by LIT test. The solution is to adjust insertion point once we are done with PHI generation. Differential Revision: https://reviews.llvm.org/D120552	2022-02-26 11:44:26 +07:00
Nikita Popov	e7fb1c15cb	[MergeICmps] Don't require GEP With opaque pointers, the zero-offset load will generally not use a GEP. Allow a direct load without GEP, which is treated the same way as a zero-offset GEP.	2022-02-25 17:38:02 +01:00
Simon Pilgrim	3b422455dd	[IPO] AAFunctionReachabilityFunction.updateImpl - reduce AAReachability scope. NFCI. We already have a check for !InstQueries.empty(), so move the for-range over InstQueries inside to avoid the AAReachability uninitialized variable static analysis warnings.	2022-02-25 14:42:31 +00:00
Nikita Popov	4736e57199	[IndVars] Use phis() (NFC)	2022-02-25 12:08:12 +01:00
Nikita Popov	e1608a9df8	[InstCombine] Remove SPF min/max canonicalization Now that we canonicalize SPF min/max to intrinsics, there's no need to canonicalize the structure of the SPF min/max itself anymore. This is conceptually NFC, but in practice does slightly impact results due to folding order differences.	2022-02-25 11:24:09 +01:00
Nikita Popov	16a2d5f885	[SCEVExpander] Use early returns in FindValueInExprValueMap() (NFC)	2022-02-25 10:09:16 +01:00
Nikita Popov	2d0fc3e46f	[SCEV] Return ArrayRef from getSCEVValues() (NFC) Return a read-only view on this set. For the one internal use, directly access ExprValueMap.	2022-02-25 09:32:22 +01:00
Nikita Popov	d9715a7266	[SCEV] Don't try to reuse expressions with offset SCEVs ExprValueMap currently tracks not only which IR Values correspond to a given SCEV expression, but additionally stores that it may be expanded in the form X+Offset. In theory, this allows reusing existing IR Values in more cases. In practice, this doesn't seem to be particularly useful (the test changes are rather underwhelming) and adds a good bit of complexity. Per https://github.com/llvm/llvm-project/issues/53905, we have an invalidation issue with these offseted expressions. Differential Revision: https://reviews.llvm.org/D120311	2022-02-25 09:16:48 +01:00
Anton Afanasyev	904a00d17a	[AggressiveInstCombine] Fix `TruncInstCombine` (fix `f84d732f`) Erase phi-nodes from `InstInfoMap` before erasing themselves	2022-02-25 08:04:11 +03:00
Anton Afanasyev	0dd8401371	[AggressiveInstCombine] Add `phi` nodes support to `TruncInstCombine` Expand `TruncInstCombine` to handle loops by adding `phi` nodes to expression graph. Reviewed by: RKSimon, lebedev.ri (recommit of fixed `f84d732f`, reverted by `8ad6d5e` after sanitizer breakage) Differential Revision: https://reviews.llvm.org/D109817	2022-02-25 07:57:35 +03:00
Vasileios Porpodas	4bbc3290a2	[SLP] Fix for the min/max intrinsic cost. The min/max intrinsic cost is currently too low because in the cost calculation we subtract the cost of the vector compare as we will not emit it. For the cost of the vector compare we are currently passing BAD_ICMP_PREDICATE which returns 3, the worst case cost. I think we should be passing VecPred instead, since we know the predicates of the compare instr. I think this is related to commit `b3b993a7ad` which introduced the predicate argument to getCmpSelInstrCost(). https://reviews.llvm.org/rGb3b993a7ad817c3c5801341fa78f34332900eb83 Differential Revision: https://reviews.llvm.org/D120439	2022-02-24 18:08:40 -08:00
Joseph Huber	7aef8b3754	[OpenMP] Make section variable external to prevent collisions Summary: We use a section to embed offloading code into the host for later linking. This is normally unique to the translation unit as it is thrown away during linking. However, if the user performs a relocatable link the sections will be merged and we won't be able to access the files stored inside. This patch changes the section variables to have external linkage and a name defined by the section name, so if two sections are combined during linking we get an error.	2022-02-24 10:57:09 -05:00
Sanjay Patel	5379f76e63	[InstCombine] try harder to preserve 'nsz' in fneg-of-select transform The corner case where 'nsz' needs to be removed is very narrow as discussed here: https://reviews.llvm.org/rG3cdd05e519dd If the select condition is not undef, there's no problem with propagating 'nsz': https://alive2.llvm.org/ce/z/4GWJdq	2022-02-24 10:43:53 -05:00
Nikita Popov	a266af7211	[InstCombine] Canonicalize SPF to min/max intrinsics Now that integer min/max intrinsics have good support in both InstCombine and other passes, start canonicalizing SPF min/max to intrinsic min/max. Once this sticks, we can stop matching SPF min/max in various places, and can remove hacks we have for preventing infinite loops and breaking of SPF canonicalization. Differential Revision: https://reviews.llvm.org/D98152	2022-02-24 09:01:20 +01:00
Nikita Popov	aa551ad198	Revert "[InstCombine] Remove one-use limitation from X-Y==0 fold" This reverts commit `65dc78d63e`. This caused a major code-size regression on tramp3d-v4, revert until I can investigate.	2022-02-24 08:50:40 +01:00
Matthias Braun	6a383369f9	PGOInstrumentation, GCOVProfiling: Split indirectbr critical edges regardless of PHIs The `SplitIndirectBrCriticalEdges` function was originally designed for `CodeGenPrepare` and skipped splitting of edges when the destination block didn't contain any `PHI` instructions. This only makes sense when reducing COPYs like `CodeGenPrepare`. In the case of `PGOInstrumentation` or `GCOVProfiling` it would result in missed counters and wrong result in functions with computed goto. Differential Revision: https://reviews.llvm.org/D120096	2022-02-23 16:27:37 -08:00
minglotus-6	142cedc283	[SampleProf][Inliner] Add an option to turn off inliner in sample-profile pass. Use case is offline evaluation (for inliner effectiveness) or debugging. Differential Revision: https://reviews.llvm.org/D120344	2022-02-23 14:21:33 -08:00
Philip Reames	ed54296ea3	[SLP] Fastpath instructions not in block being scheduled [nfc]	2022-02-23 13:51:36 -08:00
Philip Reames	a4541fdfe4	[SLP] Replace a impossible branch condition with an assert [NFC] An entire bundle must be inside the scheduling window. Assert that this property holds as opposed to checking it at runtime.	2022-02-23 13:43:45 -08:00
Philip Reames	9a40f9f681	{SLP] Make it clear ScheduleDataMap is keyed by instructions [NFC]	2022-02-23 13:31:36 -08:00
Philip Reames	9392c0d4ef	Revert "[SLP] Remove cap on schedule window size" This reverts commit `6adf4b039e`. Reverting while investigating https://github.com/llvm/llvm-project/issues/54029	2022-02-23 13:12:07 -08:00
Philip Reames	a83441e8cd	Revert "[SLP] Simplify extendSchedulingRegion" This reverts commit `8c85f3a052`.	2022-02-23 13:12:07 -08:00
Philip Reames	222e8610f1	[SLP] Rearrange fields in ScheduleData for density [NFC]	2022-02-23 12:33:43 -08:00
Philip Reames	a3e9b32c00	[SLP] Remove SchedulingPriority from ScheduleData [NFC] First step in trying to shrink the memory footprint of ScheduleData to improve cache locality.	2022-02-23 11:43:46 -08:00
Philip Reames	8c85f3a052	[SLP] Simplify extendSchedulingRegion This change uses instruction's comesBefore method to simplify the code significantly. There's little compile time concern here because getSpillCost already calls comesBefore on every basic block which contains a vectorization candidate. The only additional times we'll build basic block ordering is when we can't schedule a vector candidate anywhere in the containing block. Differential Revision: https://reviews.llvm.org/D120364	2022-02-23 11:23:38 -08:00
Augie Fackler	95f3cc222a	AttributorAttributes: avoid a crashing on bad alignments Prior to this change, LLVM would attempt to optimize an aligned_alloc(33, ...) call to the stack. This flunked an assertion when trying to emit the alloca, which crashed LLVM. Avoid that with extra checks. Differential Revision: https://reviews.llvm.org/D119604	2022-02-23 14:21:02 -05:00
Arthur Eubanks	1fd980de04	Revert "AttributorAttributes: avoid a crashing on bad alignments" This reverts commit `70ff6fbeb9`. Breaks bots, e.g. http://45.33.8.238/linux/69375/step_12.txt.	2022-02-23 09:08:03 -08:00
Augie Fackler	70ff6fbeb9	AttributorAttributes: avoid a crashing on bad alignments Prior to this change, LLVM would attempt to optimize an aligned_alloc(33, ...) call to the stack. This flunked an assertion when trying to emit the alloca, which crashed LLVM. Avoid that with extra checks. Differential Revision: https://reviews.llvm.org/D119604	2022-02-23 11:46:15 -05:00
Philip Reames	6adf4b039e	[SLP] Remove cap on schedule window size This cap was first added in `848c1aa45` (back in 2015). Per the original commit message, the purpose was to avoid a compile time explosion in long basic blocks. The algorithmic problem in scheduling has now been fixed in `0539a26d`. In the meantime, the code has rotten fairly badly. Some intermediate refactoring caused the size to only be incremented if both iterators advance in the window search. This causes the size to be badly undercounted when near one end of a basic block. We no longer have any test which exercises the logic in an intentional way; there's one test which differs with this change, but the changes appear fairly orthoganol to the purpose of the test file. Unfortunately, we no longer have the original motivating example, so it's possible that it also hits some other issue. I tested locally with a large example, but even at it's worst, that one doesn't demonstrate anything too extreme even without the algorithmic fix. It's clearly faster with, but only by ~20% which doesn't seem in line with the original commit message. If regressions with this patch are seen, please file a bug and I'll try to fix any other algorithmic problems which fall out.	2022-02-23 08:27:45 -08:00
Nikita Popov	587c7ff15c	[InstCombine] Support min/max intrinsics in udiv->lshr fold This complements the existing fold for selects. This fold is a bit more conservative, requiring one-use. The other folds here should probably also be subjected to a one-use restriction. https://alive2.llvm.org/ce/z/Q9eCDU https://alive2.llvm.org/ce/z/8YK2CJ	2022-02-23 15:51:36 +01:00
Nikita Popov	03e6efb8c2	[InstCombine] Further simplify udiv -> lshr folding Rather than queuing up actions, have one function that does the log2() fold in the obvious way, but with a flag that allows us to check whether the fold will succeed without actually performing it.	2022-02-23 15:29:21 +01:00
Nikita Popov	5ccb0582c2	[InstCombine] Simplify udiv -> lshr folding What we're really doing here is converting Op0 udiv Op1 into Op0 lshr log2(Op1), so phrase it in that way. Actually pushing the lshr into the log2(Op1) expression should be seen as a separate transform.	2022-02-23 14:55:23 +01:00
Anton Afanasyev	8ad6d5e465	Revert "[AggressiveInstCombine] Add `phi` nodes support to `TruncInstCombine`" This reverts commit `f84d732f8c`. Breakage of "sanitizer-x86_64-linux-fast"	2022-02-23 15:56:11 +03:00
Nikita Popov	5fb65557e3	[InstCombine] Remove unused visitUDivOperand() argument (NFC) This function only works on the RHS operand.	2022-02-23 13:16:44 +01:00
Anton Afanasyev	f84d732f8c	[AggressiveInstCombine] Add `phi` nodes support to `TruncInstCombine` Expand `TruncInstCombine` to handle loops by adding `phi` nodes to expression graph. Reviewed by: RKSimon, lebedev.ri Differential Revision: https://reviews.llvm.org/D109817	2022-02-23 14:01:55 +03:00
Nikita Popov	e2f627e5e3	[InstCombine] Fold sub of umin to usub.sat We were handling sub of umax, but not the conjugated umin case. https://alive2.llvm.org/ce/z/4fdZfy https://alive2.llvm.org/ce/z/BhUQBM	2022-02-23 12:00:34 +01:00
Bill Wendling	a5bbc6ef99	[NFC] Remove unnecessary "#include"s from header files	2022-02-23 01:20:48 -08:00
Nikita Popov	65dc78d63e	[InstCombine] Remove one-use limitation from X-Y==0 fold This one-use limitation is artificial, we do not increase instruction count if we perform the fold with multiple uses. The motivating case is shown in @sub_eq_zero_select, where the one-use limitation causes us to miss a subsequent select fold. I believe the backend is pretty good about reusing flag-producing subs for cmps with same operands, so I think doing this is fine. Differential Revision: https://reviews.llvm.org/D120337	2022-02-23 09:37:30 +01:00
minglotus-6	f415d74d1d	[SampleProfile] Handle the case when the option `MaxNumPromotions` is zero. In places where `MaxNumPromotions` is used to allocated an array, bail out early to prevent allocating an array of length 0. Differential Revision: https://reviews.llvm.org/D120295	2022-02-22 21:44:32 -08:00
Brendon Cahoon	3cc15e2cb6	[SLP] Fix assert from non-constant index in insertelement A call to getInsertIndex() in getTreeCost() is returning None, which causes an assert because a non-constant index value for insertelement was not expected. This case occurs when the insertelement index value is defined with a PHI. Differential Revision: https://reviews.llvm.org/D120223	2022-02-22 15:57:14 -06:00
Dmitry Vassiliev	90a3b31091	[Transforms] Enhance CorrelatedValuePropagation to handle both values of select The "Correlated Value Propagation" pass was missing a case when handling select instructions. It was only handling the "false" constant value, while in NVPTX the select may have the condition (and thus the branches) inverted, for example: ``` loop: %phi = phi i32* [ %sel, %loop ], [ %x, %entry ] %f = tail call i32* @f(i32* %phi) %cmp1 = icmp ne i32* %f, %y %sel = select i1 %cmp1, i32* %f, i32* null %cmp2 = icmp eq i32* %sel, null br i1 %cmp2, label %return, label %loop ``` But the select condition can be inverted: ``` %cmp1 = icmp eq i32* %f, %y %sel = select i1 %cmp1, i32* null, i32* %f ``` The fix is to enhance "Correlated Value Propagation" to handle both branches of the select instruction. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D119643	2022-02-23 00:11:20 +04:00
Philip Reames	8612b11c86	[SLP] Use isInSchedulingRegion consistently [NFC]	2022-02-22 10:27:16 -08:00
Philip Reames	0539a26d91	[SLP] Schedule only sub-graph of vectorizable instructions SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-02-22 10:15:55 -08:00
Jay Foad	0e74d75a29	[StructurizeCFG] Fix boolean not bug D118623 added code to fold not-of-compare into a compare with the inverted predicate, if the compare had no other uses. This relies on accurate use lists in the IR but it was run before setPhiValues, when some phi inputs are still stored in a data structure on the side, instead of being real uses in the IR. The effect was that a phi that should be using the original compare result would now get an inverted result instead. Fix this by moving simplifyConditions after setPhiValues. Differential Revision: https://reviews.llvm.org/D120312	2022-02-22 17:36:20 +00:00
Egor Zhdan	3a1cb36237	Add DriverKit support This patch is the first in a series of patches to upstream the support for Apple's DriverKit. Once complete, it will allow targeting DriverKit platform with Clang similarly to AppleClang. This code was originally authored by JF Bastien. Differential Revision: https://reviews.llvm.org/D118046	2022-02-22 13:42:53 +00:00
Kerry McLaughlin	12fb133eba	[LoopVectorize] Support conditional in-loop vector reductions Extends getReductionOpChain to look through Phis which may be part of the reduction chain. adjustRecipesForReductions will now also create a CondOp for VPReductionRecipe if the block is predicated and not only if foldTailByMasking is true. Changes were required in tryToBlend to ensure that we don't attempt to convert the reduction Phi into a select by returning a VPBlendRecipe. The VPReductionRecipe will create a select between the Phi and the reduction. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117580	2022-02-22 12:04:35 +00:00
Nikita Popov	3c0096a1d4	[MergeICmps] Don't call comesBefore() if in different blocks (PR53959) Only call comesBefore() if the instructions are in the same block. Otherwise make a conservative assumption. Fixes https://github.com/llvm/llvm-project/issues/53959.	2022-02-22 12:27:20 +01:00
Nikita Popov	f8d7210032	[GlobalStatus] Keep Visited set in isSafeToDestroyConstant() Constants cannot be cyclic, but they can be tree-like. Keep a visited set to ensure we do not degenerate to exponential run-time. This fixes the problem reported in https://reviews.llvm.org/D117223#3335482, though I haven't been able to construct a concise test case for the issue. This requires a combination of dead constants and the kind of constant expression tree that textual IR cannot represent (because the textual representation, unlike the in-memory representation, is also exponential in size).	2022-02-22 10:02:37 +01:00
Florian Hahn	7662d1687b	[MemCpyOpt] Check all access for MemoryUses in writtenBetween. Currently writtenBetween can miss clobbers of Loc between End and Start, if End is a MemoryUse. To guarantee we see all write clobbers of Loc between Start and End for MemoryUses, restrict to Start and End being in the same block and check all accesses between them. This fixes 2 mis-compiles illustrated in llvm/test/Transforms/MemCpyOpt/memcpy-byval-forwarding-clobbers.ll Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119929	2022-02-21 16:54:30 +00:00
Arthur Eubanks	053c2a0020	[SimplifyCFG][OpaquePtr] Check store type when merging conditional store	2022-02-20 11:29:54 -08:00
Florian Hahn	c141d158e5	[VectorCombine] Remove redundant checks (NFC). The removed conditions are already checked by the if above. Fixes #53761.	2022-02-19 21:05:32 +00:00
Philip Reames	6f9d557e08	[instcombine] Cleanup foldAllocaCmp slightly [NFC]	2022-02-18 18:49:39 -08:00
Philip Reames	3ad0bdae8f	[SLP] Address post commit comment from `2e50760`	2022-02-18 10:57:15 -08:00
Simon Pilgrim	be1ffda0a5	[InstCombine] visitCallInst - pull out repeated bswap scalar type bitwidth. NFC.	2022-02-18 17:33:11 +00:00
Florian Hahn	00ab91b70d	[ConstraintElimination] Remove ConstraintListTy (NFCI). This patch simplifies constraint handling by removing the ConstraintListTy wrapper struct and moving the Preconditions directly into ConstraintTy. This reduces the amount of memory needed for managing constraints. The only use case for ConstraintListTy was adding 2 constraints to model ICMP_EQ conditions. But this can be handled by adding an IsEq flag. When adding an equality constraint, we need to add the constraint and the inverted constraint.	2022-02-18 14:35:01 +00:00
Joseph Huber	0136a4401f	[OpenMP] Add an option to limit shared memory usage in OpenMPOpt One of the optimizations performed in OpenMPOpt pushes globalized variables to static shared memory. This is preferable to keeping the runtime call in all cases, however if too many variables are pushed to hared memory the kernel will crash. Since this is an optimization and not something the user specified explicitly, there should be an option to limit this optimization in those cases. This path introduces the `-openmp-opt-shared-limit=` option to limit the amount of bytes that will be placed in shared memory from HeapToShared. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120079	2022-02-18 08:35:26 -05:00
Alexey Bataev	b0a0df9809	[SLP]Fix vectorization of the alternate cmp instruction with swapped predicates. If the alternate cmp instruction is a swapped predicate of the main cmp instruction, need to generate alternate instruction, not the one with the swapped predicate. Also, the lane with the alternate opcode should be selected only, if the corresponding operands are not compatible. Correctness confirmed: https://alive2.llvm.org/ce/z/94BG66 Differential Revision: https://reviews.llvm.org/D119855	2022-02-18 04:27:45 -08:00
Alexander Potapenko	c85a26454d	[asan] Add support for disable_sanitizer_instrumentation attribute For ASan this will effectively serve as a synonym for __attribute__((no_sanitize("address"))). Adding the disable_sanitizer_instrumentation to functions will drop the sanitize_XXX attributes on the IR level. This is the third reland of https://reviews.llvm.org/D114421. Now that TSan test is fixed (https://reviews.llvm.org/D120050) there should be no deadlocks. Differential Revision: https://reviews.llvm.org/D120055	2022-02-18 09:51:54 +01:00
Kuba Mracek	6b53ad298e	[GlobalDCE] [VFE] Avoid dropping vfunc dependencies when an invalid vtable entry is present When we scan vtables for a particular vload in ScanVTableLoad and an entry in one possible vtable is invalid (null or non-fptr), we bail in a wrong way -- we completely stop the scanning of vtables and this results in dropped dependencies and incorrectly removed vfuncs from vtables. Let's fix that by correcting the bailing logic to keep iterating and only skip the invalid entries. Differential Revision: https://reviews.llvm.org/D120006	2022-02-17 19:41:46 -08:00
William S. Moses	d9da6a535f	[LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information. This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D119965	2022-02-17 20:13:07 -05:00
Arthur Eubanks	af6b9939aa	[EarlyCSE][OpaquePtr] Check access type when performing DSE This will bail out on target specific intrinsics. If those are deemed important enough for EarlyCSE to handle, we can augment MemIntrinsicInfo with an access type for TargetTransformInfo::getTgtMemIntrinsic() to handle. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D120077	2022-02-17 11:58:53 -08:00
Joseph Huber	74cacf212b	[OpenMP] Add RTL function to externalization RAII This patch adds the '_kmpc_get_hardware_num_threads_in_block' OpenMP RTL function to the externalization RAII struct. This was getting optimized out and then being replaced with an undefined value once added back in, causing bugs for complex reductions. Fixes #53909. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120076	2022-02-17 14:30:58 -05:00
Johannes Doerfert	254d6da020	[Attributor][FIX] Ensure stable iteration order With `668c5c688b` we introduced an ordering issue revealed by the reverse iteration buildbot. Depending on the order of the map that tracks the AAIsDead AAs we ended up with slightly different attributes. This is not totally unexpected and can happen. We should however be deterministic in our orderings to avoid such issues.	2022-02-17 12:53:10 -06:00
Daniil Suchkov	7c3e2b92cf	[RewriteStatepointsForGC] Fix an incorrect assertion The assertion verifying that a newly computed value matches what is already cached used stripPointerCasts() to strip bitcasts, however the values can be not only pointers, but also vectors of pointers. That is problematic because stripPointerCasts() doesn't handle vectors of pointers. This patch introduces an ad-hoc utility function to strip all bitcasts regardless of the value type. Reviewed By: skatkov, reames Differential Revision: https://reviews.llvm.org/D119994	2022-02-17 18:44:57 +00:00
Arthur Eubanks	4a26abc0b9	[InstCombine][OpaquePtr] Check store type in DSE implementation	2022-02-17 10:01:14 -08:00
Arthur Eubanks	129af4daa7	[SCEVExpander][OpaquePtr] Check GEP source type when finding identical GEP Fixes an opaque pointers miscompile. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D120004	2022-02-17 08:48:11 -08:00
Jay Foad	9071393c18	[GlobalDCE] Simplify and return Changed = true less often Removing dead constants should not count as making a change to the module. This means that RemoveUnusedGlobalValue simplifies to just calling removeDeadConstantUsers, so inline it. Differential Revision: https://reviews.llvm.org/D120052	2022-02-17 16:03:13 +00:00
Sanjay Patel	58df2da054	[InstCombine] push constant operand down/outside in sequence of min/max intrinsics A generalization like this was suggested in D119754. This is the inverse direction of D119851, and we get all of the folds there plus the one that was missed. There is precedence for this kind of transform in instcombine with "or" instructions (but strangely only with that one opcode AFAICT). Similar justification as in the other patch: The line between instcombine and reassociate for these kinds of folds is blurry. This doesn't appear to have much cost and gives us the expected wins from repeated folds as seen in the last set of test diffs. Differential Revision: https://reviews.llvm.org/D119955	2022-02-17 10:36:37 -05:00
Alexey Bataev	d1cd64ffdd	[SLP][NFC]Fix misprint in function name, NFC.	2022-02-17 05:57:51 -08:00
Nikita Popov	36fdfaba19	[RelLookupTableConverter] Ensure that GV, GEP and load types match This code could be generalized to be type-independent, but for now just ensure that the same type constraints are enforced with opaque pointers as with typed pointers.	2022-02-17 12:05:05 +01:00
Roman Lebedev	371fcb720e	[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in https://github.com/llvm/llvm-project/issues/53853 and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574 This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes https://github.com/llvm/llvm-project/issues/53853 Refs. https://github.com/rust-lang/rust/issues/85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854	2022-02-17 12:13:55 +03:00
Florian Mayer	c195addb60	[NFC] [MTE] [HWASan] Remove unnecessary member of AllocaInfo Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119981	2022-02-16 15:19:30 -08:00
Arthur Eubanks	826fae51d2	[SLPVectorizer][OpaquePtrs] Check GEP source element type Fixes a miscompile with opaque pointers. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D119980	2022-02-16 14:47:20 -08:00
Johannes Doerfert	8ad39fbaf2	[Attributor][FIX] Heap2Stack needs to use the alloca AS When we move an allocation from the heap to the stack we need to allocate it in the alloca AS and then cast the result. This also prevents us from inserting the alloca after the allocation call but rather right before. Fixes https://github.com/llvm/llvm-project/issues/53858	2022-02-16 15:58:32 -06:00
Johannes Doerfert	668c5c688b	[Attributor][FIX] Use liveness information of the right function When we use liveness for edges during the `genericValueTraversal` we need to make sure to use the AAIsDead of the correct function. This patch adds the proper logic and some simple caching scheme. We also add an assertion to the `isEdgeDead` call to make sure future misuse is detected earlier. Fixes https://github.com/llvm/llvm-project/issues/53872	2022-02-16 15:58:32 -06:00
Johannes Doerfert	6ed1ef0643	[Attributor][FIX] Pipe UsedAssumedInformation through more interfaces `UsedAssumedInformation` is a return argument utilized to determine what information is known. Most APIs used it already but `genericValueTraversal` did not. This adds it to `genericValueTraversal` and replaces `AllCallSitesKnown` of `checkForAllCallSites` with the commonly used `UsedAssumedInformation`. This was supposed to be a NFC commit, then the test change appeared. Turns out, we had one user of `AllCallSitesKnown` (AANoReturn) and the way we set `AllCallSitesKnown` was wrong as we ignored the fact some call sites were optimistically assumed dead. Included a dedicated test for this as well now. Fixes https://github.com/llvm/llvm-project/issues/53884	2022-02-16 14:44:20 -06:00
Nikita Popov	c9032f1a69	[LowerMemIntrinsics] Explicitly use i8 type in memmove lowering By convention, memcpy/memmove intrinsics are always used with i8 pointers (though this is not enforced), so in practice this code was always using an i8 type. Make that explicit. Of course, i8 is not a very profitable choice, and this code could be more performant by picking an appropriate larger type. But that would require additional test coverage and correctness review, and certainly shouldn't be a decision based on the pointer element type.	2022-02-16 16:31:55 +01:00
Florian Hahn	d03d3d7966	[DSE] Fall back to CFG scan for unreachable terminators. Blocks with UnreachableInst terminators are considered as root nodes in the PDT. This pessimize DSE, if there are no aliasing reads from the potentially dead store and the block with the unreachable terminator. If any of the root nodes of the PDF has UnreachableInst as terminator, fall back to the CFG scan, even the common dominator of all killing blocks does not post-dominate the block with potentially dead store. It looks like the compile-time impact for the extra scans is negligible. https://llvm-compile-time-tracker.com/compare.php?from=779bbbf27fe631154bdfaac7a443f198d4654688&to=ac59945f1bec1c6a7d7f5590c8c69fd9c5369c53&stat=instructions Fixes #53800. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119760	2022-02-16 14:06:40 +00:00
Bin Cheng	dfec0b3053	[FuncSpec] Save compilation time by caching uses for propagation We only need to do propagation on use instructions of the original value, rather than the replacing const value which might have lots of irrelavant uses. This is done by caching uses before replacing. Differential Revision: https://reviews.llvm.org/D119815	2022-02-16 10:46:26 +08:00
Philip Reames	2e50760775	[SLP] Add assert that entities are scheduled as expected Requested in D118538	2022-02-15 12:21:49 -08:00
Florian Mayer	59e7de26aa	[HWASan] remove replacement of DbgVariableIntrinsics. This code was dead because we AI->replaceUsesWithIf above. I verified this doesn't actually get run by applying https://gist.github.com/fmayer/aea7cbb4700cfe2c9d932591ae1073c3 to the Android toolchain and building AOSP, without any crash. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119802	2022-02-15 11:40:58 -08:00
Max Kazantsev	bfc1217119	[NFC] Introduce option to switch off compatible invokes merge Does not affect default behavior (transform is on).	2022-02-15 21:51:03 +07:00
Alexander Potapenko	05ee1f4af8	Revert "[asan] Add support for disable_sanitizer_instrumentation attribute" This reverts commit `dd145f953d`. https://reviews.llvm.org/D119726, like https://reviews.llvm.org/D114421, still causes TSan to fail, see https://lab.llvm.org/buildbot/#/builders/70/builds/18020 Differential Revision: https://reviews.llvm.org/D119838	2022-02-15 15:04:53 +01:00
Sanjay Patel	6357ccf57f	[InstCombine] reassociate min/max intrinsics with constant operands Integer min/max operations are associative: max (max X, C0), C1 --> max X, (max C0, C1) --> max X, NewC https://alive2.llvm.org/ce/z/wW5HVM This would avoid a regression when we canonicalize to min/max intrinsics (see D98152 ). Differential Revision: https://reviews.llvm.org/D119754	2022-02-15 08:31:23 -05:00
Simon Pilgrim	9606c69087	[InstCombine] Fold sub(Y,and(lshr(X,C),1)) --> add(ashr(shl(X,(BW-1)-C),BW-1),Y) (PR53610) As noted on PR53610, we can fold a 'bit splat' negation of a shifted bitmask pattern into a pair of shifts. https://alive2.llvm.org/ce/z/eGrsoN Differential Revision: https://reviews.llvm.org/D119715	2022-02-15 13:24:20 +00:00
Anton Afanasyev	b7574b092a	[SLP] Don't try to vectorize pair with insertelement Particularly this breaks vectorization of insertelements where some of intermediate (i.e. not last) insertelements are used externally. Fixes PR52275 Fixes #51617 Differential Revision: https://reviews.llvm.org/D119679	2022-02-15 16:12:59 +03:00
Alexander Potapenko	dd145f953d	[asan] Add support for disable_sanitizer_instrumentation attribute For ASan this will effectively serve as a synonym for __attribute__((no_sanitize("address"))) This is a reland of https://reviews.llvm.org/D114421 Reviewed By: melver, eugenis Differential Revision: https://reviews.llvm.org/D119726	2022-02-15 14:06:12 +01:00
Nikita Popov	2460a2ce47	[DSE] Extract a common PDT check (NFC)	2022-02-15 13:05:45 +01:00
Hongtao Yu	62ef77ca63	[CSSPGO] Do not merge a context that is already duplicated into the base profile. Do not merge a context that is already duplicated into the base profile. Also fixing a typo caused by previous refactoring. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119735	2022-02-14 18:07:11 -08:00
Florian Mayer	8de457eafc	[HWASAN] use common alignAndPadAlloca Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119614	2022-02-14 15:28:32 -08:00
Florian Mayer	205308de6b	[NFC] [MTE] Move alignAndPadAlloca to MemoryTaggingSupport. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119610	2022-02-14 14:54:04 -08:00
Nick Desaulniers	9dcb006165	[funcattrs] check reachability to improve noreturn There was a fixme in the code pertaining to attributing functions as noreturn. By using reachability, if none of the blocks that are reachable from the entry return, then the function is noreturn. Previously, the code only checked if any blocks returned. If they're unreachable, then they don't matter. This improves codegen for the Linux kernel. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1563 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119571	2022-02-14 14:01:59 -08:00
Ahmed Bougacha	c703f852c9	[IR] Define "ptrauth" operand bundle. This introduces a new "ptrauth" operand bundle to be used in call/invoke. At the IR level, it's semantically equivalent to an @llvm.ptrauth.auth followed by an indirect call, but it additionally provides additional hardening, by preventing the intermediate raw pointer from being exposed. This mostly adds the IR definition, verifier checks, and support in a couple of general helper functions. Clang IRGen and backend support will come separately. Note that we'll eventually want to support this bundle in indirectbr as well, for similar reasons. indirectbr currently doesn't support bundles at all, and the IR data structures need to be updated to allow that. Differential Revision: https://reviews.llvm.org/D113685	2022-02-14 11:27:35 -08:00
Nikita Popov	41c5a762e5	[DeadArgElim] Check that function type is the same If the function types differ, the call arguments don't necessarily correspon to the function arguments. It's likely not worthwhile to handle this more precisely, but at least we shouldn't crash.	2022-02-14 14:08:42 +01:00
Anton Afanasyev	954ea0f044	[SLP] Simplify indices processing for insertelements Get rid of non-constant and undef indices of insertelements at `buildTree()` stage. Fix bugs. Differential Revision: https://reviews.llvm.org/D119623	2022-02-14 14:50:44 +03:00
Nikita Popov	7c83f8c45d	[InstCombine] Check GEP source type in select of gep fold This is no longer implicitly checked through the pointer type with opaque pointers.	2022-02-14 11:46:45 +01:00
Nikita Popov	efece08ae2	[InstCombine] Remove manual debug loc transfer While this might be marginally more precise, we generally don't bother with this in InstCombine, and let the IRBuilder assign the debug location. I don't see why this one fold, out of the thousands done in InstCombine, should be treated specially.	2022-02-14 11:07:05 +01:00
Nikita Popov	18bf42c0a6	[CVP] Extract helper from phi processing (NFC) So we can use early returns and avoid those awkward !V checks.	2022-02-14 10:51:34 +01:00
Dávid Bolvanský	1be1fd735d	[AlwaysInliner] Check for callsite noinline attribute simplified	2022-02-14 09:33:30 +01:00
Kazu Hirata	befeb5acf6	[Transforms] Use default member initialization in MemmoveVerifier (NFC)	2022-02-13 10:34:03 -08:00
Kazu Hirata	fd3e8044cd	[Transforms] Use default member initialization in Prefetch (NFC)	2022-02-13 10:34:02 -08:00
Kazu Hirata	0b9a610a75	[Transforms] Use default member initialization in ConditionInfo (NFC)	2022-02-13 10:34:00 -08:00
Kazu Hirata	fda6a1ad42	[Transforms] Use default member initialization in CHRStats (NFC)	2022-02-13 10:33:56 -08:00
Florian Hahn	2cd22ce0d0	[LV] Pass start value directly to emitTransformedIndex (NFC).	2022-02-12 19:03:32 +00:00
Florian Mayer	6759cdd829	[NFC] [MTE] Use helpers for stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119503	2022-02-11 16:01:46 -08:00
Florian Mayer	bf2f72fa10	[hwasan] keep debug intrinsicts in AllocaInfo. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119498	2022-02-11 16:01:02 -08:00
Michael Gottesman	19279ffc77	[debug-info] If one sees a spill with a dbg.addr use, salvageDebugInfo upon it and don't hoist it. This ensures that if we have a dbg.addr in a coroutine funclet that is on one of our function arguments, that the dbg.addr is not mapped to undef and also that later it isn't hoisted to the front of the basic block. Instead it remains at its original cloned location. rdar://83957028 Differential Revision: https://reviews.llvm.org/D119576	2022-02-11 15:15:13 -08:00
Florian Mayer	26dbc47468	Revert "[hwasan] keep debug intrinsicts in AllocaInfo." This reverts commit `19fdf85f58`.	2022-02-11 14:41:24 -08:00
Florian Mayer	b1bd64aeee	Revert "[NFC] [MTE] Use helpers for stack tagging." This reverts commit `8f0e5b4e26`.	2022-02-11 14:41:24 -08:00
Florian Hahn	66400fc2dd	[ConstraintElimination] Support add with precondition. If we can prove that an addition without wrap flags won't wrap, decompse the operation. Issue #48253	2022-02-11 20:26:25 +00:00
Arthur Eubanks	b59a402237	[MSan][OpaquePtr] Use inline asm elementtype instead of getPointerElementType()	2022-02-11 11:50:35 -08:00
Florian Mayer	8f0e5b4e26	[NFC] [MTE] Use helpers for stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119503	2022-02-11 10:59:09 -08:00
Florian Mayer	19fdf85f58	[hwasan] keep debug intrinsicts in AllocaInfo. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119498	2022-02-11 10:56:53 -08:00
Florian Mayer	e7356fb3e2	[nfc] [hwasan] factor out logic to collect info about stack this is the first step in unifying some of the logic between hwasan and mte stack tagging. this only moves around code, changes to converge different implementations of the same logic follow later. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118947	2022-02-11 10:54:12 -08:00
Johannes Doerfert	ede248e614	[OpenMP][FIX] The `llvm.amdgcn.s.barrier` is actually not aligned If we assume `llvm.amdgcn.s.barrier` is aligned we may remove it and cause OpenMP GPU applications on the AMD GPU to be stuck or wrongly synchronized. Reported by Carlo Bertolli.	2022-02-11 12:42:50 -06:00
Dávid Bolvanský	d828281e78	[AlwaysInliner] Respect noinline call site attribute ``` always_inline foo() { } bar () { noinline foo(); } ``` We should prefer call site attribute over attribute on decl. This is fix for AlwaysInliner, similar fix is needed for normal Inliner (follow up). Related to https://reviews.llvm.org/D119061 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D119553	2022-02-11 19:23:11 +01:00
Austin Kerbow	0bb25b4603	[InferAddressSpaces] Fix assert on invalid cast ordering If a cast is needed when replacing uses with newly created values, the cast must be inserted after the instruction that defines the new value. Fixes: SWDEV-321215 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D119524	2022-02-11 10:02:30 -08:00
Arthur Eubanks	22f4f94256	[CoroFrame][OpaquePtr] Remove getPointerElementType() call Get it from the byval type instead.	2022-02-11 09:53:20 -08:00
Sameer Sahasrabuddhe	d8f99bb6e0	[AMDGPU] replace hostcall module flag with function attribute The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph. If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument. The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument. Reviewed By: jdoerfert, arsenm, kpyzhov Differential Revision: https://reviews.llvm.org/D119216	2022-02-11 22:51:56 +05:30
Nikita Popov	4c6289c369	[InstCombine] Check source element type in gep of phi of gep fold	2022-02-11 17:10:48 +01:00
Matt Arsenault	52fbb786a6	InferAddressSpaces: Fix assert on inferred source for inttoptr/ptrtoint If we had some source value we could infer an address space from that went through a ptrtoint/inttoptr pair, this would fail since bitcast can't change the address space. Fixes issue 53665.	2022-02-11 10:35:29 -05:00
Anton Afanasyev	cd685f5736	[NFC][SLP] Set default parameter for Offset equal to zero	2022-02-11 17:22:33 +03:00
Nikita Popov	5450963085	[InstCombine] Check source element type in phi of gep fold Rather than checking that the type is the same (which is always the case, given how these are part of the same phi) check that the source element type is the same. With opaque pointers, this is no longer implied.	2022-02-11 14:26:18 +01:00
Nikita Popov	2a1b1f1b1b	[GVN] Store source element type for GEP expressions To avoid incorrectly merging GEPs with different source types under opaque pointers. To avoid increasing the Expression structure size, this reuses the existing type member. The code does not rely on this to be the expression result type, it's only used as a disambiguator.	2022-02-11 13:03:30 +01:00
Simon Pilgrim	a5d6851489	LoopReroll::isLoopControlIV - use cast<> instead of dyn_cast<> to avoid dereference of nullptr The pointer is always dereferenced by isCompareUsedByBranch, so assert the cast is correct instead of returning nullptr	2022-02-11 10:19:25 +00:00
Nikita Popov	e714b98fff	[InstCombine] Check type compatibility in indexed load fold This fold could use a rewrite to an offset-based implementation, but for now make sure it doesn't crash with opaque pointers.	2022-02-11 10:16:27 +01:00
Nikita Popov	3571bdb4f3	[InstCombine] Require equal source element type in icmp of gep fold Without opaque pointers, this is implicitly enforced. This previously resulted in a miscompile.	2022-02-11 09:38:28 +01:00
Nikita Popov	e24067819f	[ArgPromotion] Protect harder against recursive promotion (PR42028) In addition to the self-recursion check, also check whether there is more than one node in the SCC, which implies that there is a larger cycle. I believe checking SCC structure (rather than something like norecurse) is the right thing to do here, because this is specifically about preventing infinite loops over the SCC. Fixes https://github.com/llvm/llvm-project/issues/42028. Differential Revision: https://reviews.llvm.org/D119418	2022-02-11 09:30:39 +01:00
Nico Weber	e76037db44	[llvm] Remove unused file MaximumSpanningTree.h The last use of this file was removed in late 2013 in `ea56494625`. The last use was in PathProfiling.cpp, which had an overview comment of the overall approach. Similar functionality lives in the slight more cryptically named CFGMST.h in this same directory. A similar overview comment is in PGOInstrumentation.cpp. No behavior change.	2022-02-10 21:01:24 -05:00
Philip Reames	5ba115031d	[PSE] Remove assumption that top level predicate is union from public interface [NFC] Note that this doesn't actually cause the top level predicate to become a non-union just yet. The above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.	2022-02-10 16:14:52 -08:00
Teresa Johnson	dd3f483335	[ThinLTO][WPD] LICM set lookup (NFC) Minor efficiency fix. There is no reason to perform the same set lookup repeatedly in the inner loop as it is invariant there. Differential Revision: https://reviews.llvm.org/D119474	2022-02-10 13:16:31 -08:00
Simon Pilgrim	6af7c1371a	[LoopVectorize] getStepVector - reduce scope of local variable. NFC.	2022-02-10 20:44:25 +00:00
Johannes Doerfert	dd75c0ea64	[Attributor][NFC] Expose new API in AAPointerInfo New users might want to check bins without a load or store instruction at hand. Since we use those instructions only to find the offset and size of the access anyway, we can expose an offset and size interface to the outside world as well. This commit mainly moves code around and exposes a class (OffsetAndSize) as well as a method forallInterferingAccesses in AAPointerInfo. Differential Revision: https://reviews.llvm.org/D119249	2022-02-10 13:52:24 -06:00
Johannes Doerfert	d1387a26a5	[Attributor][FIX] Reachability needs to account for readonly callees The oversight caused us to ignore call sites that are effectively dead when we computed reachability (or more precise the call edges of a function). The problem is that loads in the readonly callee might depend on stores prior to the callee. If we do not track the call edge we mistakenly assumed the store before the call cannot reach the load. The problem is nicely visible in: `llvm/test/Transforms/Attributor/ArgumentPromotion/basictest.ll` Caused by D118673. Fixes https://github.com/llvm/llvm-project/issues/53726	2022-02-10 13:52:24 -06:00
Johannes Doerfert	e39b419312	[Attributor][FIX] Honor alloca address space in AAPrivatizablePtr When we privatize a pointer (~argument promotion) we introduce new private allocas as replacement. These need to be placed in the alloca address space as later passes cannot properly deal with them otherwise. Fixes https://github.com/llvm/llvm-project/issues/53725	2022-02-10 13:52:24 -06:00
Simon Pilgrim	aca355a3bb	[InstCombine] Extend fold (icmp sgt smin(PosA, B) 0) -> (icmp sgt B 0) to support smin intrinsic Replace matchSelectPattern pattern match with the more general m_SMin so that it can handle smin intrinsics as well as the icmp+select pattern Noticed while reviewing regressions from D98152	2022-02-10 13:28:15 +00:00
Sanjay Patel	995d400f3a	[InstCombine] reduce mul operands based on undemanded high bits We already do this in SDAG, but mul was left out of the fold for unused high bits in IR. The high bits of a mul's operands do not change the low bits of the result: https://alive2.llvm.org/ce/z/XRj5Ek Verify some test diffs to confirm that they are correct: https://alive2.llvm.org/ce/z/y_W8DW https://alive2.llvm.org/ce/z/7DM5uf https://alive2.llvm.org/ce/z/GDiHCK This gets a fold that was presumed not possible in D114272: https://alive2.llvm.org/ce/z/tAN-WY Removing nsw/nuw is needed for general correctness (and is also done in the codegen version), but we might be able to recover more of those with better analysis. Differential Revision: https://reviews.llvm.org/D119369	2022-02-10 08:10:22 -05:00
Florian Hahn	80eea38d8d	[ConstraintElimination] Remove unnecessary recursion (NFC). Perform predicate normalization in a single switch, rather then going through recursions.	2022-02-10 12:26:35 +00:00
Nikita Popov	8018d6be34	[ArgPromotion] Transfer metadata to promoted loads Also transfer selected non-AA metadata to the promoted load. Only metadata from guaranteed to execute loads is transferred.	2022-02-10 11:28:07 +01:00
Florian Hahn	79d60b93b4	[ConstraintElimination] Skip floating point compares. (NFC) The solver only supports integer conditions. Adding floating point compares to the worklist only adds extra work. Just skip them.	2022-02-09 21:16:49 +00:00
Philip Reames	d39f4ac494	[SCEV] Unwind SCEVUnionPredicate from getPredicatedBackedgeTakenCount [NFC] For those curious, the whole reason for tracking the predicate set seperately as opposed to just immediately registering the dependencies appears to be allowing the printing code to print a result without changing the PSE state. It's slightly questionable if this justifies the complexity, but since we can preserve it with local ugliness, I did so.	2022-02-09 12:55:40 -08:00
David Green	b55d4c2ad8	Revert "[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`" This reverts commit `77a0da926c` as we've received multiple reports of this significantly impacting performance, in ways that don't seem to just be target specific cost models going wrong. I would offer some reproducers, but the test changes here seem to be full of them! Reverting for now and hopefully we can remove the "hack" more carefully as we go.	2022-02-09 20:02:54 +00:00
Florian Hahn	b71eed7e8f	[ConstraintElimination] Remove redundant lookup (NFC).	2022-02-09 18:00:03 +00:00
Florian Hahn	902db4ec1c	[ConstraintElimination] Move some definitions closer to uses (NFC).	2022-02-09 17:29:49 +00:00
Arthur Eubanks	1bdc6eacba	[LoopLoadElim] Support opaque pointers With typed pointers the pointer operand type checks the address space and the load/store type. With opaque pointers we have to check the load/store type separately.	2022-02-09 09:22:21 -08:00
Alexey Bataev	370ea1a199	[SLP][NFC]Fix comment, NFC.	2022-02-09 07:14:14 -08:00
Florian Hahn	8aa122081f	[LV] Pass step to emitTransformedIndex (NFC). Move out the induction step creation from emitTransformedIndex to the callers. In some places (e.g. widenIntOrFpInduction) the step is already created. Passing the step in ensures the steps are kept in sync.	2022-02-09 11:12:45 +00:00
Nikita Popov	68c1eeb4ba	[ArgPromotion] Make implementation offset based This rewrites ArgPromotion to be based on offsets rather than GEP structure. We inspect all loads at constant offsets and remember which types are loaded at which offsets. Then we promote based on those types. This generalizes ArgPromotion to work with bitcasted loads, and is compatible with opaque pointers. This patch also fixes incorrect handling of alignment during argument promotion. Previously, the implementation only checked that the pointer is dereferenceable, but was happy to speculate overaligned loads. (I would have fixed this separately in advance, but I found this hard to do with the previous implementation approach). Differential Revision: https://reviews.llvm.org/D118685	2022-02-09 09:35:01 +01:00
Florian Hahn	c9e6678b56	[LV] Move buildScalarSteps out of ILV (NFC). This makes the function independent of shared state in ILV (ensures no new dependencies on things like the cost model are introduced) and allows for use directly in recipe's ::execute functions.	2022-02-08 21:18:40 +00:00
Sylvestre Ledru	f2c2e924e7	Fix a typo (occured => occurred) Reported: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1005195	2022-02-08 21:35:26 +01:00
Roman Lebedev	c8ba2b67a0	[SimplifyCFG] 'merge compatible invokes': fully support indirect invokes As long as all the invokes in the set are indirect, we can merge them, but don't merge direct invokes into the set, even though it would be legal to do.	2022-02-08 21:29:38 +03:00
Roman Lebedev	414b47645d	[SimplifyCFG] 'merge compatible invokes': don't create trivial PHI's with all-identical incoming values	2022-02-08 21:29:38 +03:00
Joseph Huber	caf7f05c1c	[Attributor] Emit fixed-point remark on function list This patch replaces the function we emit the remark on when we run into the fix-point limit. Previously we got a function to emit a remark on from the worklist's associated function. However, the worklist may not always have an associated function in the case of global variables. Replace this with the function set, and if there are no functions don't emit the remark. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119248	2022-02-08 12:10:21 -05:00
Philip Reames	c302f1e677	[SCEV] Generalize SCEVEqualsPredicate to any compare [NFC] PredicatedScalarEvolution has a predicate type for representing A == B. This change generalizes it into something which can represent a A <pred> B. This generality is currently unused, but is motivated by a couple of recent cases which have come up. In particular, I'm currently playing around with using this to simplify the runtime checking code in LoopVectorizer. Regardless of the outcome of that prototyping, generalizing the compare node seemed useful.	2022-02-08 08:18:09 -08:00
Nikita Popov	074561a4a2	[Mem2Reg] Check that load type matches alloca type Alloca promotion can only deal with cases where the load/store types match the alloca type (it explicitly does not support bitcasted load/stores). With opaque pointers this is no longer enforced through the pointer type, so add an explicit check.	2022-02-08 17:16:15 +01:00
Roman Lebedev	42ca7cc889	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ uses If the original invokes had uses, the uses must have been in PHI's, but that immediately results in the incoming values being incompatible. But we'll replace uses of the original invokes with the use of the merged invoke, so as long as the incoming values become compatible after that, we can merge.	2022-02-08 17:49:38 +03:00
Roman Lebedev	9986d60224	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ PHIs but no uses As long as the incoming values for all the invokes in the set are identical, we can merge the invokes.	2022-02-08 17:49:38 +03:00
Roman Lebedev	8411560fd0	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ no uses, no PHI's Even if the invokes have normal destination, iff it's the same block, we can merge them. For now, require that there are no PHI nodes, and the returned values of invokes aren't used.	2022-02-08 17:49:38 +03:00
Nikita Popov	b896334834	[ArgPromotion] Check dereferenceability on argument as well Before walking all the callers, check whether we have a dereferenceable attribute directly on the argument. Also make it clearer that the code currently does not treat alignment correctly.	2022-02-08 10:29:51 +01:00
Johannes Doerfert	dd101c808b	[Attributor][FIX] Do not use assumed information for UB detection The helper `Attributor::checkForAllReturnedValuesAndReturnInsts` simplifies the returned value optimistically. In `AAUndefinedBehavior` we cannot use such optimistic values when deducing UB. As a result, we assumed UB for the return value of a function because we initially (=optimistically) thought the function return is `undef`. While we later adjusted this properly, the `AAUndefinedBehavior` was under the impression the return value is "known" (=fix) and could never change. To correct this we use `Attributor::checkForAllInstructions` and then manually to perform simplification of the return value, only allowing known values to be used. This actually matches the other UB deductions. Fixes #53647	2022-02-07 20:19:19 -06:00
David Green	b4c6d1bb37	[LoopVectorizer] Don't perform interleaving of predicated scalar loops The vectorizer will choose at times to "vectorize" loops with a scalar factor (VF=1) with interleaving (IC > 1). This can occasionally produce better code than the unroller (notable for reductions where it can produce independent reduction chains that are combined after the loop). At times this is not very beneficial though, for example when runtime checks are needed or when the scalar code requires predication. This addresses the second point, preventing the vectorizer from interleaving when the scalar loop will require predication. This prevents it from making a bit of a mess, that is worse than the original and better left for the unroller to unroll if beneficial. It helps reverse some of the regressions from D118090. Differential Revision: https://reviews.llvm.org/D118566	2022-02-07 19:34:28 +00:00
Florian Hahn	5a72357697	[LV] Use IRBuilderBase in VPlan.h, remove IRBuilder.h include (NFC). By using IRBuilderBase instead of IRBuilder<> a forward declaration can be used instead of including IRBuilder.h	2022-02-07 17:46:16 +00:00
Sanjay Patel	897d92faef	[InstCombine] generalize 2 LSB of demanded bits for X*X This is a follow-up suggested in D119060. Instead of checking each of the bottom 2 bits individually, we can check them together and handle the possibility that we demand both together. https://alive2.llvm.org/ce/z/C2ihC2 Differential Revision: https://reviews.llvm.org/D119139	2022-02-07 11:33:55 -05:00
Nikita Popov	cdc0573f75	[MatrixBuilder] Remove unnecessary IRBuilder template (NFC) IRBuilderBase exists specifically to avoid the need for this.	2022-02-07 16:42:38 +01:00
Sanjay Patel	79b3fe8070	[InstCombine] SimplifyDemandedBits - mul(x,x) is odd iff x is odd https://alive2.llvm.org/ce/z/AXPr3k	2022-02-07 08:43:12 -05:00
Roman Lebedev	77a0da926c	[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model. What it essentially does is prevents scalarized vectorization of masked memory operations: ``` // TODO: Cost model for emulated masked load/store is completely // broken. This hack guides the cost model to use an artificially // high enough value to practically disable vectorization with such // operations, except where previously deployed legality hack allowed // using very low cost values. This is to avoid regressions coming simply // from moving "masked load/store" check from legality to cost model. // Masked Load/Gather emulation was previously never allowed. // Limited number of Masked Store/Scatter emulation was allowed. ``` While i don't really understand about what specifically `is completely broken` was talking about, i believe that at least on X86 with AVX2-or-later, this is no longer true. (or at least, i would like to know what is still broken). So i would like to follow suit after D111460, and like wise disable that hack for AVX2+. But since this was added for X86 specifically, let's just instead completely remove this hack. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114779	2022-02-07 16:08:31 +03:00
Djordje Todorovic	afd54e1ed1	[SLPVectorizer] Fix "unused variable" build warning	2022-02-07 10:38:19 +01:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Kazu Hirata	a1a8d10a17	[Transforms] Use default member initialization in LibCallSimplifier (NFC)	2022-02-06 16:36:27 -08:00
Kazu Hirata	3fce5bb7b0	[Transforms] Use default member initialization in LoopVersioning (NFC)	2022-02-06 16:36:25 -08:00
Congzhe Cao	1ef04326ec	[LoopInterchange] Support loop interchange with floating point reductions Enabled loop interchange support for floating point reductions if it is allowed to reorder floating point operations. Previously when we encouter a floating point PHI node in the outer loop exit block, we bailed out since we could not detect floating point reductions in the early days. Now we remove this limiation since we are able to detect floating point reductions. Reviewed By: #loopoptwg, Meinersbur Differential Revision: https://reviews.llvm.org/D117450	2022-02-06 17:04:47 -05:00
Florian Hahn	541ca12dcd	[LV] Use VPReplicateRecipe::isUniform instead isUniformAfterVec (NFCI). In scalarizeInstruction(), isUniformAfterVectorization is used to detect cases where it is sufficient to always access the first lane. This should map directly checking whether the operand is a uniform replicate recipe. Differential Revision: https://reviews.llvm.org/D116654	2022-02-06 16:37:20 +00:00
Kazu Hirata	2d650ee03e	[Transforms] Use default member initialization in SCEVFindUnsafe (NFC)	2022-02-05 21:39:27 -08:00
Kazu Hirata	cb13ebbf46	[Transforms] Use default member initialization in AAIsDeadCallSiteReturned (NFC)	2022-02-05 21:39:25 -08:00
Kazu Hirata	31d72f0e45	[Transforms] Use default member initialization in TruncInstCombine (NFC)	2022-02-05 21:39:23 -08:00
Kazu Hirata	9ed6800ef9	[Transforms] Use default member initialization in MaskOps (NFC)	2022-02-05 21:39:21 -08:00
Kazu Hirata	e24384b506	[Transforms] Use default member initialization in SimplifyIndvar (NFC)	2022-02-05 16:29:22 -08:00
Benjamin Kramer	ce9417348e	[SLP] Skip a DenseSet<unsigned> -> bit vector conversion. NFCI.	2022-02-06 00:57:47 +01:00
Benjamin Kramer	a40dc4eaf8	Simplify mask creation with llvm::seq. NFCI.	2022-02-05 23:35:41 +01:00
Sanjay Patel	5372160a18	[InstCombine] SimplifyDemandedBits - mul(x,x) - if only demand bit[1] then fold to zero This is a translation of the fold added to codegen with: `2d1390efbe` Part of solving issue #48027	2022-02-05 09:51:38 -05:00
Bill Wendling	c6f0940d99	[NFC] Remove unnecessary #includes An attempt to reduce the number of files that are recompiled due to a change. Differential Revision: https://reviews.llvm.org/D119055	2022-02-04 21:22:41 -08:00
Hongtao Yu	dee058c670	[CSSPGO] Turn on ext-tsp by default for CSSPGO. I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119048	2022-02-04 19:46:44 -08:00
Roman Lebedev	18ff1ec3c3	Reland [SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable` As per LangRef's definition of `noreturn` attribute: ``` noreturn This function attribute indicates that the function never returns normally, hence through a return instruction. This produces undefined behavior at runtime if the function ever does dynamically return. nnotated functions may still raise an exception, i.a., nounwind is not implied. ``` So if we `invoke` a `noreturn` function, and the normal destination of an invoke is not an `unreachable`, point it at the new `unreachable` block. The change/fix from the original commit is that we now actually create the new block, and don't just repurpose the original block, because said normal destination block could have other users. This reverts commit `db1176ce66`, relanding commit `598833c987`.	2022-02-05 02:58:19 +03:00
Roman Lebedev	db1176ce66	Revert "[SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable`" The normal destination may have other uses. This reverts commit `598833c987`.	2022-02-05 02:30:20 +03:00
Roman Lebedev	598833c987	[SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable` As per LangRef's definition of `noreturn` attribute: ``` noreturn This function attribute indicates that the function never returns normally, hence through a return instruction. This produces undefined behavior at runtime if the function ever does dynamically return. nnotated functions may still raise an exception, i.a., nounwind is not implied. ```	2022-02-05 02:15:07 +03:00
Roman Lebedev	cd9e6a9c10	[NFC][InstCombine] `visitCallInst()`: make comment more understandable	2022-02-05 02:15:07 +03:00
Joseph Huber	6b78526b1b	[OpenMP] Emit remark on the captured call instead of the variable Changes the remark to emit on the function call that captures the globalized variable instead of the globalized variable itself. The user should be able to see which variable it was in the argument list of the function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106980	2022-02-04 17:50:53 -05:00
Philip Reames	0cc6165d05	[SLP] Strengthen internal asserts about scheduled node state [NFC] All members of a scheduled bundle must have valid dependencies, with no unscheduled ones, and only the lead element gets marked scheduled.	2022-02-04 12:22:52 -08:00
Philip Reames	f3f8e3da9f	[SLP] Remove ScheduleData::UnscheduledDepsInBundle field [NFC-ish] We can simply compute the value of this field on demand. Doing so clarifies the behavior when one of the instructions within a bundle doesn't have valid dependencies. I vaguely thing this could change behavior slightly, but none of the test cases are affected, and my attempts to write one by hand have failed. This also minorly reduces memory usage, but that's a secondary value at best.	2022-02-04 10:12:09 -08:00
Roman Lebedev	55cd727c9a	[SimplifyCFG] 'merge compatible invokes': allow PHI nodes in landing pads ... iff the incoming values for the invokes-to-be-merged are compatible (identical).	2022-02-04 20:26:44 +03:00
Roman Lebedev	0d384e9228	[NFC][SimplifyCFG] Extract `IncomingValuesAreCompatible()` out of `SafeToMergeTerminators()`	2022-02-04 20:26:44 +03:00
Sanjay Patel	0236c57181	[InstCombine] try to fold one-demanded-bit-of-multiply This is a generalization of the icmp fold in D118061 (and that can be abandoned). We're looking for a disguised form of "odd * odd must be odd". Some Alive2 proofs to show correctness: https://alive2.llvm.org/ce/z/60Y8hz https://alive2.llvm.org/ce/z/HfAP6R Differential Revision: https://reviews.llvm.org/D118539	2022-02-04 11:40:54 -05:00
Benjamin Kramer	85243124cf	Tweak some uses of std::iota to skip initializing the underlying storage. NFCI.	2022-02-04 17:00:50 +01:00
Roman Lebedev	36df803dfd	[SimplifyCFG] Merge compatible `invoke`s of a `landingpad` While nowadays SimplifyCFG knows how to hoist code from then-else blocks, sink code from unconditional predecessors, and even promote the latter by tail-merging `ret`/`resume` function terminators, that isn't everything. While i (& others) have been trying to deal with merging/sinking `unreachable`, apparently perhaps the more impactful remaining problem is merging the `throw` calls. If we start at the `landingpad`, all the predecessors are unwind edges of `invoke`s, and in some cases some of the `invoke`s are mergeable. ``` /// This is a weird mix of hoisting and sinking. Visually, it goes from: /// [...] [...] /// \| \| /// [invoke0] [invoke1] /// / \ / \ /// [cont0] [landingpad] [cont1] /// to: /// [...] [...] /// \ / /// [invoke] /// / \ /// [cont] [landingpad] ``` This simplifies the IR/CFG, at the cost of debug info and extra PHI nodes. Note that we don't require for all the `invokes` of the `landingpad` to be mergeable, they can form more than a single set, we gracefully handle that. For now, i completely disallowed normal destination, PHI nodes and indirect invokes but that can be supported. Out of all the CTMark projects, only 7zip is C++, so there isn't much impact: https://llvm-compile-time-tracker.com/compare.php?from=ba8eb31bd9542828f6424e15a3014f80f14522c8&to=722fc871c84f14157d45c2159bc9c8c7e2825785&stat=size-total ... but there it currently causes size-total decrease. Differential Revision: https://reviews.llvm.org/D117805	2022-02-04 17:04:21 +03:00
Florian Hahn	0a781d98fb	[ConstraintElimination] Add initial signed support. This patch adds initial support for signed conditions. To do so, ConstraintElimination maintains two separate systems, one with facts from signed and one for unsigned conditions. To start with this means information from signed and unsigned conditions is kept completely separate. When it is safe to do so, information from signed conditions may be also transferred to the unsigned system and vice versa. That's left for follow-ups. In the initial version, de-composition of signed values just handles constants and otherwise just uses the value, without trying to decompose the operation. Again this can be extended in follow-up changes. The main benefit of this limited signed support is proving >=s 0 pre-conditions added in D118799. But even this initial version also fixes PR53273. Depends on D118799. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D118806	2022-02-04 14:02:48 +00:00
Florian Hahn	06f3ef6626	[ConstraintElimination] Allow adding pre-conditions for constraints. With this patch pre-conditions can be added to a list of constraints. Constraints with pre-conditions can only be used if all pre-conditions are satisfied when the constraint is used. The pre-conditions at the moment are specified as a list of (Predicate, Value ,Value ) tuples. This allow easily checking them like any other condition, using the existing infrastructure. This then is used to limit GEP decomposition to cases where we can prove that offsets are signed positive. This fixes a couple of incorrect transforms where GEP offsets where assumed to be signed positive, but they were not. Note that this effectively disables GEP decomposition, as there's no support for reasoning about signed predicates. D118806 adds initial signed support. Fixes PR49624. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D118799	2022-02-04 11:45:07 +00:00
serge-sans-paille	ffe8720aa0	Reduce dependencies on llvm/BinaryFormat/Dwarf.h This header is very large (3M Lines once expended) and was included in location where dwarf-specific information were not needed. More specifically, this commit suppresses the dependencies on llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used, this has a decent impact on number of preprocessed lines generated during compilation of LLVM, as showcased below. This is achieved by moving some definitions back to the .cpp file, no performance impact implied[0]. As a consequence of that patch, downstream user may need to manually some extra files: llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h In some situations, codes maybe relying on the fact that llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden dependency now needs to be explicit. $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l after: 10978519 before: 11245451 Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup [0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions Differential Revision: https://reviews.llvm.org/D118781	2022-02-04 11:44:03 +01:00
Nikita Popov	c680eeab30	[IRBuilder][RS4GC] Require FunctionCallee when creating statepoint This makes the statepoint methods in IRBuilder accept a FunctionCallee, which carries both the callee and function type. This is used to add the elementtype attribute to the statepoint call. RS4GC requires an additional tweak to actually preserve that attribute -- previously the attributes on the call were completely overwritten. Differential Revision: https://reviews.llvm.org/D118886	2022-02-04 09:47:32 +01:00
Philip Reames	bb9964ba43	[SLP] Have only ready items in ready list [NFC] This adds the assertion that all items in the ready list are in-fact scheduleable entities ready to be scheduled. This involves changing the ReadyInsts structure to be a set, and fixing a couple places where we left nodes on the list when they were no longer ready.	2022-02-03 19:49:24 -08:00
Serguei Katkov	66f1c6fc71	[RS4GC] Extract rematerilazable candidate search. NFC. Finding re-materialization chain for derived pointer does not depend on call site. To avoid this finding for each call site it can be extracted in a separate routine. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D118676	2022-02-04 09:26:03 +07:00
Florian Mayer	374f5f0df4	[hwasan] [nfc] simplify getAllocaSizeInBytes AllocaInst::getAllocationSize implements essentially the same logic as our custom function. Reviewed By: hctim Differential Revision: https://reviews.llvm.org/D118958	2022-02-03 17:59:24 -08:00
Philip Reames	2cbc92fb11	[SLP] Strengthen internal invariant assertions slightly This builds on the invariant checks introduced in `1519629`, and adds a couple more than seem to hold without additional work.	2022-02-03 14:56:39 -08:00
Philip Reames	1519629a20	[SLP] Add basic self consistency asserts into scheduling The idea here is to have a verify routine we can call during scheduling to ensure broken invariants are reported. The intent is to help in debugging scheduling bugs. At the moment, only the most basic properties are checked as adding several I thought held reported failures.	2022-02-03 13:27:35 -08:00
Kazu Hirata	3710078ceb	[SampleProfile] Reduce indentation with an early return (NFC)	2022-02-03 12:22:23 -08:00
Florian Mayer	8ada962a34	[NFC] [hwasan] use InstIterator Differential Revision: https://reviews.llvm.org/D118865	2022-02-03 11:10:18 -08:00
Philip Reames	6d0c007bc1	[SLP] Fix a typo in comment	2022-02-03 09:11:47 -08:00
Sander de Smalen	eaee477eda	[LV] Use VScaleForTuning to allow wider epilogue VFs. When the main loop is e.g. VF=vscale x 1 and the epilogue VF cannot be any smaller, the vectorizer should try to estimate how many lanes are executed at runtime and allow a suitable fixed-width VF to be chosen. It can use VScaleForTuning to figure out what a suitable fixed-width VF could be. For the case where the main loop VF is VF=vscale x 1, and VScaleForTuning=8, it could still choose an epilogue VF upto VF=4. This was a bit tricky to test, so this patch also introduces a wrapper function to get 'VScaleForTuning' by also considering vscale_range. If min and max are equal, then that will be the vscale we compile for. It makes little sense to tune for a different width if the code will not be portable for other widths. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118709	2022-02-03 15:40:17 +00:00
Alexey Bataev	802ceb8343	[SLP]Excluded external uses from the reordering estimation. Compiler adds the estimation for the external uses during operands reordering analysis, which makes it tend to prefer duplicates in the lanes rather than diamond/shuffled match in the graph. It changes the sizes of the vector operands and may prevent some vectorization. We don't need this kind of estimation for the analysis phase, because we just need to choose the most compatible instruction and it does not matter if it has external user or used in the non-matching lane. Instead, we count the number of unique instruction in the lane and see if the reassociation changes the number of unique scalars to be power of 2 or not. If we have power of 2 unique scalars in the lane, it is considered more profitable rather than having non-power-of-2 number of unique scalars. Metric: SLP.NumVectorInstructions test-suite :: MultiSource/Benchmarks/FreeBench/distray/distray.test 70.00 86.00 22.9% test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test 346.00 353.00 2.0% test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test 346.00 353.00 2.0% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 235.00 239.00 1.7% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 235.00 239.00 1.7% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 8723.00 8834.00 1.3% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1051.00 1064.00 1.2% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1628.00 1646.00 1.1% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1628.00 1646.00 1.1% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9100.00 9184.00 0.9% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3565.00 3577.00 0.3% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3565.00 3577.00 0.3% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4235.00 4245.00 0.2% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1996.00 1998.00 0.1% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1671.00 1672.00 0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 783.00 782.00 -0.1% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 69.00 68.00 -1.4% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 207.00 192.00 -7.2% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 207.00 192.00 -7.2% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 89.00 80.00 -10.1% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 89.00 80.00 -10.1% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 260.00 215.00 -17.3% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 256.00 211.00 -17.6% MultiSource/Benchmarks/Prolangs-C/TimberWolfMC - pretty the same. SingleSource/Benchmarks/Misc/oourafft.test - 2 <2 x > loads replaced by one <4 x> load. External/SPEC/CINT2017speed/641.leela_s - function gets vectorized and not inlined anymore. External/SPEC/CINT2017rate/541.leela_r - same xternal/SPEC/CINT2017rate/531.deepsjeng_r - changed the order in multi-block tree, the result is pretty the same. External/SPEC/CINT2017speed/631.deepsjeng_s - same. MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a - the result is the same as before. MultiSource/Benchmarks/MiBench/consumer-jpeg - same. Differential Revision: https://reviews.llvm.org/D116688	2022-02-03 06:50:06 -08:00
Alexey Bataev	ad2a0ccf8f	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-03 06:24:10 -08:00
Simon Pilgrim	6b4ebdd46f	ModuleUtils - VFABI::setVectorVariantNames - use ArrayRef<> instead of const SmallVector to pass argument	2022-02-03 12:11:48 +00:00
Florian Hahn	413e47ecd4	[ConstraintElimination] Handle degenerate case with branch to same dest. When a conditional branch has the same block as both true and false successor it is not safe to add the condition. Fixes PR49819.	2022-02-03 11:09:14 +00:00
Roman Lebedev	ee4ba9f3a1	Revert "[SimplifyCFG] Start redesigning `FoldTwoEntryPHINode()`." Unfortunately, it seems we really do need to take the long route; start from the "merge" block, find (all the) "dispatch" blocks, and deal with each "dispatch" block separately, instead of simply starting from each "dispatch" block like it would logically make sense, otherwise we run into a number of other missing folds around `switch` formation, missing sinking/hoisting and phase ordering. This reverts commit `85628ce75b`. This reverts commit `c5fff90953`. This reverts commit `34a98e1046`. This reverts commit `1e353f0922`.	2022-02-03 12:32:50 +03:00
Florian Mayer	fa75a62cb5	[NFC] pull retvec logic to MemoryTaggingSupport. we will also need this for aarch64 stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118852	2022-02-02 16:05:52 -08:00
Fangrui Song	85628ce75b	[SimplifyCFG] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds	2022-02-02 15:11:22 -08:00
Florian Mayer	f7a6c341cb	[mte] support more complicated lifetimes (e.g. for exceptions). Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118848	2022-02-02 14:39:22 -08:00
Florian Mayer	1d679097da	[NFC] remove excessive whitespace.	2022-02-02 13:35:33 -08:00
Florian Mayer	712b31e2d4	[NFC] factor isStandardLifetime out of HWASan this is so we can use it for aarch64 stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118836	2022-02-02 13:23:55 -08:00
Alexey Bataev	8a1dfbc4d8	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit `842a2360a8` to fix the bugs reported by users in https://reviews.llvm.org/D115955#3291538.	2022-02-02 12:06:36 -08:00
Anna Thomas	a73e4ce6a5	[LoopFuse] Change DT to reference in FusionCandidate struct. NFC Assertion added in `f50821cff0` confirms that the DT is indeed nonnull. Change it to a reference instead of a pointer to make this explicit in FusionCandidate. Suggested in D118472.	2022-02-02 14:55:37 -05:00
Alexey Bataev	842a2360a8	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-02 10:32:52 -08:00
Alexandros Lamprineas	438a81a284	[Function Specialisation] Fix use after free This is a fix for a use-after-free found by the address sanitizer when compiling GCC: https://github.com/llvm/llvm-project/issues/52821 The Function Specialization pass may remove instructions, cached inside the PredicateBase class, which are later being dereferenced from the SCCPInstVisitor class. To prevent the dangling references I am lazily deleting the dead instructions after the Solver has run. Differential Revision: https://reviews.llvm.org/D118591	2022-02-02 16:32:10 +00:00
Roman Lebedev	c5fff90953	[NFC][SimplifyCFG] Merge `FoldTwoEntryPHINode()` into it's only callee	2022-02-02 17:53:56 +03:00
Roman Lebedev	34a98e1046	[NFC][SimplifyCFG] `FoldTwoEntryPHINode()`: s/BB/MergeBB/	2022-02-02 17:53:56 +03:00
Roman Lebedev	1e353f0922	[SimplifyCFG] Start redesigning `FoldTwoEntryPHINode()`. The current `FoldTwoEntryPHINode()` is not quite designed correctly. It starts from the merge point, and then tries to detect the 'divergence' point. Because of that, it is limited to the simple two-predecessor case, where the PHI completely goes away. but that is rather pessimistic, and it doesn't make much sense from the costmodel side of things. For example if there is some other unrelated predecessor of the merge point, we could split the merge point so that the then/else blocks first branch to an empty block and then to the merge point, and then we'd be able to speculate the then/else code. But if we'd instead simply start at the divergence point, and look for the merge point, then we'll just natively support this case. There's also the fact that `SpeculativelyExecuteBB()` already does just that, but only if there is a single block to speculate, and with a much more restrictive cost model. But that also means we have code duplication. Now, sadly, while this is as much NFCI as possible, there is just no way to cleanly migrate to the proper implementation. The results are going to be different somewhat because of various phase ordering effects and SimplifyCFG block iteration strategy.	2022-02-02 17:53:56 +03:00
Benjamin Kramer	0c3d22a592	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit `83620bd2ad`. It's causing miscompilations, see review comments at https://reviews.llvm.org/D115955	2022-02-02 13:08:51 +01:00
Florian Hahn	1c9f15426f	[GVN] Replace PointerIntPair with separate pointer & kind fields (NFC). After adding another value kind in `8a12cae862`, Value * pointers do not have enough available empty bits to store the kind (e.g. on ARM) To address this, the patch replaces the PointerIntPair with separate value and kind fields.	2022-02-02 09:44:15 +00:00
Florian Hahn	8a12cae862	[GVN] Support load of pointer-select to value-select conversion. This patch extends the available-value logic to detect loads of pointer-selects that can be replaced by a value select. For example, consider the code below: loop: %sel.phi = phi i32* [ %start, %ph ], [ %sel, %ph ] %l = load %ptr %l.sel = load %sel.phi %sel = select cond, %ptr, %sel.phi ... exit: %res = load %sel use(%res) The load of the pointer phi can be replaced by a load of the start value outside the loop and a new phi/select chain based on the loaded values, as illustrated below %l.start = load %start loop: sel.phi.prom = phi i32 [ %l.start, %ph ], [ %sel.prom, %ph ] %l = load %ptr %sel.prom = select cond, %l, %sel.phi.prom ... exit: use(%sel.prom) This is a first step towards alllowing vectorizing loops using common libc++ library functions, like std::min_element (https://clang.godbolt.org/z/6czGzzqbs) #include <vector> #include <algorithm> int foo(const std::vector<int> &V) { return *std::min_element(V.begin(), V.end()); } Reviewed By: reames Differential Revision: https://reviews.llvm.org/D118143	2022-02-02 09:23:09 +00:00
serge-sans-paille	e188aae406	Cleanup header dependencies in LLVMCore Based on the output of include-what-you-use. This is a big chunk of changes. It is very likely to break downstream code unless they took a lot of care in avoiding hidden ehader dependencies, something the LLVM codebase doesn't do that well :-/ I've tried to summarize the biggest change below: - llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h - llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h - llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h - llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h - llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h - llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h - llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h And the usual count of preprocessed lines: $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l before: 6400831 after: 6189948 200k lines less to process is no that bad ;-) Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D118652	2022-02-02 06:54:20 +01:00
Sander de Smalen	2a44eaf20f	[LV] Allow a scalable VF for the epilogue. For some reason we limited the epilogue VF to be fixed-width, but there is not necessarily a reason for doing so. If the main VF=vscale x 16, the epilogue VF could be either fixed-width, or a scalable VF upto vscale x 8. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118688	2022-02-01 22:38:55 +00:00
Anna Thomas	f50821cff0	[LoopFuse] Add assertion for non-null DT in fusion candidate The code paths analyzed (all constructor invocations of fusion candidate) pass in a non-null DT. Adding this assert as requested in D118472 before converting this to a reference argument.	2022-02-01 17:00:09 -05:00
Anna Thomas	bc48a26655	[LoopPeel] Use reference instead of pointer for DT argument Cleanup code in peelLoop API. We already have usage of DT without guarding against a null DT, so this change constant folds the remaining null DT checks. Also make the argument a reference so that it is clear the argument is a nonnull DT. Extracted from D118472.	2022-02-01 17:00:08 -05:00
Florian Mayer	aefb2e134d	[hwasan] work around lifetime issue with setjmp. setjmp can return twice, but PostDominatorTree is unaware of this. as such, it overestimates postdominance, leaving some cases (see attached compiler-rt) where memory does not get untagged on return. this causes false positives later in the program execution. this is a crude workaround to unblock use-after-scope for now, in the longer term PostDominatorTree should bemade aware of returns_twice function, as this may cause problems elsewhere. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118647	2022-02-01 12:14:20 -08:00
Matt Morehouse	de4e8bc3ac	[HWASan] Properly handle musttail calls. Fixes a compile error when the `clang::musttail` attribute is used. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118712	2022-02-01 11:23:43 -08:00
Anna Thomas	4fc52db116	[InstCombine] Remove weaker fence adjacent to a stronger fence We have an instCombine rule to remove identical consecutive fences. We can extend this to remove weaker fences when we have consecutive stronger fence. As stated in the LangRef, a fence with a stronger ordering also implies ordering weaker than itself: "A fence which has seq_cst ordering, in addition to having both acquire and release semantics specified above, participates in the global program order of other seq_cst operations and/or fences." Reviewed-By: reames Differential Revision: https://reviews.llvm.org/D118607	2022-02-01 11:05:34 -08:00
Fangrui Song	30e8f83c84	[GlobalOpt] Don't replace alias with aliasee if either alias/aliasee may be preemptible Generalize D99629 for ELF. A default visibility non-local symbol is preemptible in a -shared link. `isInterposable` is an insufficient condition. Moreover, a non-preemptible alias may be referenced in a sub constant expression which intends to lower to a PC-relative relocation. Replacing the alias with a preemptible aliasee may introduce a linker error. Respect dso_preemptable and suppress optimization to fix the abose issues. With the change, `alias = 345` will not be rewritten to use aliasee in a `-fpic` compile. ``` int aliasee; extern int alias __attribute__((alias("aliasee"), visibility("hidden"))); void foo() { alias = 345; } // intended to access the local copy ``` While here, refine the condition for the alias as well. For some binary formats like COFF, `isInterposable` is a sufficient condition. But I think canonicalization for the changed case has little advantage, so I don't bother to add the `Triple(M.getTargetTriple()).isOSBinFormatELF()` or `getPICLevel/getPIELevel` complexity. For instrumentations, it's recommended not to create aliases that refer to globals that have a weak linkage or is preemptible. However, the following is supported and the IR needs to handle such cases. ``` int aliasee __attribute__((weak)); extern int alias __attribute__((alias("aliasee"))); ``` There are other places where GlobalAlias isInterposable usage may need to be fixed. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D107249	2022-02-01 10:41:16 -08:00
Alexey Bataev	83620bd2ad	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-01 09:54:20 -08:00

... 14 15 16 17 18 ...

31127 Commits