llvm-project

Commit Graph

Author	SHA1	Message	Date
Roman Lebedev	576a45f20d	[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: refactor `i1 cond ? i1 x : i1 C` pattern https://alive2.llvm.org/ce/z/2Q7Du_	2022-02-10 17:42:55 +03:00
Roman Lebedev	9766a0cca0	[SCEV] Recognize `cond ? i1 0 : i1 y` as `umin_seq ~cond, x` By definition, `umin_seq` has the exact same poison stopping properties the original `select` had: https://alive2.llvm.org/ce/z/N6XwV-	2022-02-10 17:42:55 +03:00
Roman Lebedev	418604fd90	[SCEV] Recognize `cond ? i1 x : i1 1` as `~umin_seq cond, ~x` By definition, `umin_seq` has the exact same poison stopping properties the original `select` had: https://alive2.llvm.org/ce/z/aqe9GK	2022-02-10 17:42:55 +03:00
Roman Lebedev	49d9acc242	[SCEV] Recognize logical `or` as `not umin_seq (not, not)` By definition, `umin_seq` has the exact same poison stopping properties the original `select` had: https://alive2.llvm.org/ce/z/MUfbTL	2022-02-10 17:42:55 +03:00
Roman Lebedev	16bc24e7be	[SCEV] Recognize logical `and` as `umin_seq` By definition, `umin_seq` has the exact same poison stopping properties the original `select` had: https://alive2.llvm.org/ce/z/59KuZZ	2022-02-10 17:42:55 +03:00
Roman Lebedev	1c69444863	[SCEV] `createNodeForSelectOrPHI()`: try constant-folding even if not an Instruction We'd catch the tautological select pattern later anyways due to constant folding, so that leaves PHI-like select, but it does not appear to fire there.	2022-02-10 17:42:55 +03:00
Roman Lebedev	97930f85af	[NFC][SCEV] Prepare `createNodeForSelectOrPHI()` for gaining additional strategy Currently `createNodeForSelectOrPHI()` takes an Instruction, and only works on the Cond that is an ICmpInst, but that can be relaxed somewhat. For now, simply rename the existing function, and add a thin wrapper ontop that still does the same thing as it used to.	2022-02-10 17:42:55 +03:00
Roman Lebedev	73990ff8a7	[SCEV] Recognize binary `xor` as bit-wise `add` https://alive2.llvm.org/ce/z/ULuZxB We could transparently handle wider bitwidths, by effectively casting iN to <N x i1> and performing the `add` bit/element -wise, the expression will be rather large, so let's not do that for now.	2022-02-10 17:42:55 +03:00
Roman Lebedev	503541fa93	[SCEV] Recognize binary `and` as bit-wise `umin` https://alive2.llvm.org/ce/z/aKAr94 We could transparently handle wider bitwidths, by effectively casting iN to <N x i1> and performing the `umin` bit/element -wise, the expression will be rather large, so let's not do that for now.	2022-02-10 17:42:54 +03:00
Roman Lebedev	e7e0834f07	[SCEV] Recognize binary `or` as bit-wise `umax` https://alive2.llvm.org/ce/z/SMEaoc We could transparently handle wider bitwidths, by effectively casting iN to <N x i1> and performing the `umax` bit/element -wise, the expression will be rather large, so let's not do that for now.	2022-02-10 17:42:54 +03:00
David Sherwood	1badfbb4fc	Fix incorrect TypeSize->uint64_t cast in InductionDescriptor::isInductionPHI The code was relying upon the implicit conversion of TypeSize to uint64_t and assuming the type in question was always fixed. However, I discovered an issue when running the canon-freeze pass with some IR loops that contains scalable vector types. I've changed the code to bail out if the size is unknown at compile time, since we cannot compute whether the step is a multiple of the type size or not. I added a test here: Transforms/CanonicalizeFreezeInLoops/phis.ll Differential Revision: https://reviews.llvm.org/D118696	2022-02-10 09:39:12 +00:00
Philip Reames	d334fec140	[SCEV] Make SCEVUnionPredicate externally immutable [NFC] This is the last major stepping stone before being able to allocate the node via the folding set allocator. That will in turn allow more general SCEV predicate expression trees.	2022-02-09 13:47:28 -08:00
Philip Reames	e6d9bab558	[SCEV] Remove a direct call to SCEVUnionPredicate::add [NFC]	2022-02-09 13:04:12 -08:00
Philip Reames	d39f4ac494	[SCEV] Unwind SCEVUnionPredicate from getPredicatedBackedgeTakenCount [NFC] For those curious, the whole reason for tracking the predicate set seperately as opposed to just immediately registering the dependencies appears to be allowing the printing code to print a result without changing the PSE state. It's slightly questionable if this justifies the complexity, but since we can preserve it with local ugliness, I did so.	2022-02-09 12:55:40 -08:00
Philip Reames	aa845d7a24	[SCEV] Remove conversion to SCEVUnionPredicate in ExitNotTakenInfo [NFC] This removes one of the places where we mutate an existing union predicate.	2022-02-09 12:10:23 -08:00
Philip Reames	83f895d952	[SCEV] Add interface for constructing generic SCEVComparePredicate [NFC}	2022-02-09 10:29:04 -08:00
Arthur Eubanks	ff31020ee6	[OpaquePtr][LoopAccessAnalysis] Support opaque pointers Previously we relied on the pointee type to determine what type we need to do runtime pointer access checks. With opaque pointers, we can access a pointer with more than one type, so now we keep track of all the types we're accessing a pointer's memory with. Also some other minor getPointerElementType() removals. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D119047	2022-02-09 09:11:27 -08:00
Philip Reames	c302f1e677	[SCEV] Generalize SCEVEqualsPredicate to any compare [NFC] PredicatedScalarEvolution has a predicate type for representing A == B. This change generalizes it into something which can represent a A <pred> B. This generality is currently unused, but is motivated by a couple of recent cases which have come up. In particular, I'm currently playing around with using this to simplify the runtime checking code in LoopVectorizer. Regardless of the outcome of that prototyping, generalizing the compare node seemed useful.	2022-02-08 08:18:09 -08:00
Roman Lebedev	ae9414d562	[ValueTracking] Only check for non-undef/poison if already known to be a self-multiply https://godbolt.org/z/js9fTTG9h ^ we don't care what `isGuaranteedNotToBeUndefOrPoison()` says unless we already knew that the operands were equal.	2022-02-08 18:35:29 +03:00
Simon Pilgrim	1468202748	[ValueTracking] Add support for X*X self-multiplication D108992 added KnownBits handling for 'Quadratic Reciprocity' self-multiplication patterns (bit[1] == 0), which can be used for non-undef values (poison is OK). This patch adds noundef selfmultiply handling to value tracking so demanded bits patterns can make use of it. Differential Revision: https://reviews.llvm.org/D117995	2022-02-08 13:33:27 +00:00
Simon Pilgrim	e2537f6b19	[ValueTracking] Replace dyn_cast with dyn_cast_or_null to account for getTerminator returning null Noticed while running checks on D117995 - a hexagon regression test was managing to return a block without a terminator	2022-02-08 13:33:26 +00:00
Johannes Doerfert	29c8ebad10	[MemoryBuiltins][FIX] Adjust index type size properly wrt. AS casts Use existing functionality to strip constant offsets that works well with AS casts and avoids the code duplication. Since we strip AS casts during the computation of the offset we also need to adjust the APInt properly to avoid mismatches in the bit width. This code ensures the caller of `compute` sees APInts that match the index type size of the value passed to `compute`, not the value result of the strip pointer cast. Fixes #53559. Differential Revision: https://reviews.llvm.org/D118727	2022-02-07 20:19:19 -06:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Bill Wendling	c6f0940d99	[NFC] Remove unnecessary #includes An attempt to reduce the number of files that are recompiled due to a change. Differential Revision: https://reviews.llvm.org/D119055	2022-02-04 21:22:41 -08:00
serge-sans-paille	ffe8720aa0	Reduce dependencies on llvm/BinaryFormat/Dwarf.h This header is very large (3M Lines once expended) and was included in location where dwarf-specific information were not needed. More specifically, this commit suppresses the dependencies on llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used, this has a decent impact on number of preprocessed lines generated during compilation of LLVM, as showcased below. This is achieved by moving some definitions back to the .cpp file, no performance impact implied[0]. As a consequence of that patch, downstream user may need to manually some extra files: llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h In some situations, codes maybe relying on the fact that llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden dependency now needs to be explicit. $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l after: 10978519 before: 11245451 Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup [0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions Differential Revision: https://reviews.llvm.org/D118781	2022-02-04 11:44:03 +01:00
Augie Fackler	b2d091aa5d	[NFC] MemoryBuiltins: tease out a getFreeFunctionDataForFunction helper	2022-02-03 08:36:36 -08:00
Augie Fackler	bad0301cc5	MemoryBuiltins: simplify isLibFreeFunction [NFC] This is in anticipation of my next patch, where I need to store more information about free functions than just their argument count. It felt invasive enough on this function that it seemed worthwhile to just extract this as its own commit that makes no functional changes. Differential Revision: https://reviews.llvm.org/D117350	2022-02-03 08:30:02 -08:00
Serge Pavlov	d2f132f0b7	[ConstantFolding] Fold constrained compare intrinsics The change implements constant folding of ‘llvm.experimental.constrained.fcmp’ and ‘llvm.experimental.constrained.fcmps’ intrinsics. Differential Revision: https://reviews.llvm.org/D110322	2022-02-03 16:45:56 +07:00
Andrew Litteken	30420bc344	[IRSim] Make sure that commutative intrinsics are treated as function calls without commutativity Created to fix: https://github.com/llvm/llvm-project/issues/53537 Some intrinsics functions are considered commutative since they are performing operations like addition or multiplication. Some of these have extra parameters to provide extra information that are not part of the operation itself and are not commutative. This makes sure that if an instruction that is an intrinsic takes the non commutative path to handle this case. Reviewer: paquette Closes Issue #53537 Differential Revision: https://reviews.llvm.org/D118807	2022-02-02 13:24:56 -06:00
Malhar Jajoo	778b455dd6	[LAA] Add Memory dependence remarks. Adds new optimization remarks when vectorization fails. More specifically, new remarks are added for following 4 cases: - Backward dependency - Backward dependency that prevents Store-to-load forwarding - Forward dependency that prevents Store-to-load forwarding - Unknown dependency It is important to note that only one of the sources of failures (to vectorize) is reported by the remarks. This source of failure may not be first in program order. A regression test has been added to test the following cases: a) Loop can be vectorized: No optimization remark is emitted b) Loop can not be vectorized: In this case an optimization remark will be emitted for one source of failure. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D108371	2022-02-02 12:07:51 +00:00
William S. Moses	8cb9c73609	[LoopIdiom] Keep TBAA when creating memcpy/memmove When upgrading a loop of load/store to a memcpy, the existing pass does not keep existing aliasing information. This patch allows existing aliasing information to be kept. Reviewed By: jeroen.dobbelaere Differential Revision: https://reviews.llvm.org/D108221	2022-01-31 16:28:13 -05:00
Eli Friedman	b2837bf2f2	[ScalarEvolution] Add bailout to avoid zext of pointer. The RHS of an isImpliedCond call can be a pointer even if the LHS is not. This is similar to `bfa2a81e`. Not going to include a testcase; an IR testcase would be extremely complicated and fragile. Fixes https://github.com/llvm/llvm-project/issues/51936 . Differential Revision: https://reviews.llvm.org/D114555	2022-01-31 11:41:39 -08:00
Kazu Hirata	cda7b6aaf3	[Analysis] Drop an unnecessary const from a return type (NFC) Identified with readability-const-return-type.	2022-01-30 16:04:58 -08:00
Kazu Hirata	49fdee13c1	[Analysis] Use != to compare strings (NFC) Identified with readability-string-compare.	2022-01-30 12:32:57 -08:00
Nuno Lopes	0dc20e321c	[InstSimplify] fold 'xor X, poison' and 'div/rem X, poison' to poison	2022-01-30 10:46:54 +00:00
William S. Moses	99d2582164	[ScalarEvolution] Handle <= and >= in non infinite loops Extend scalar evolution to handle >= and <= if a loop is known to be finite and the induction variable guards the condition. Specifically, with these assumptions lhs <= rhs is equivalent to lhs < rhs + 1 and lhs >= rhs to lhs > rhs -1. In the case of lhs <= rhs, this is true since the only case these are not equivalent is when rhs == unsigned/signed intmax, which would have resulted in an infinite loop. In the case of lhs >= rhs, this is true since the only case these are not equivalent is when rhs == unsigned/signed intmin, which would again have resulted in an infinite loop. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D118090	2022-01-28 17:41:08 -05:00
Andrew Litteken	3785c1d055	[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match. This also adds an additional command line flag debug option to disable outlining intrinsics. Recommit of: `8de76bd569` Adds extra checking of intrinsic function calls names to avoid taking the address of intrinsic calls when extracting function calls. Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D109450	2022-01-28 13:52:21 -06:00
William S. Moses	0d04c77856	[ScalarEvolution] Mark a loop as finite if in a willreturn function A limited version of (https://reviews.llvm.org/D118090) that only marks a loop as finite if in a willreturn function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118429	2022-01-28 14:17:05 -05:00
Nikita Popov	8a4293f3ef	[Loads] Require Align in isDereferenceableAndAlignedPointer() (NFC) Now that loads always have an alignment, we should not perform an ABI alignment fallback here.	2022-01-28 16:23:32 +01:00
Evgeniy Brevnov	d7424939a6	[BasicAA] Add support for memmove intrinsic Currently, basic AA has special support for llvm.memcpy.* intrinsics. This change extends this support for any memory trancsfer opration and in particular llvm.memmove.* intrinsic. Reviewed By: reames, nikic Differential Revision: https://reviews.llvm.org/D117095	2022-01-28 18:19:36 +07:00
Florian Hahn	1ca02bddb4	[ConstraintSystem] Mark function as const (NFC).	2022-01-27 13:44:47 +00:00
Congzhe Cao	f3e1f44340	[IVDescriptor] Get the exact FP instruction that does not allow reordering This is a bugfix in IVDescriptor.cpp. The helper function `RecurrenceDescriptor::getExactFPMathInst()` is supposed to return the 1st FP instruction that does not allow reordering. However, when constructing the RecurrenceDescriptor, we trace the use-def chain staring from a PHI node and for each instruction in the use-def chain, its descriptor overrides the previous one. Therefore in the final RecurrenceDescriptor we constructed, we lose previous FP instructions that does not allow reordering. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D118073	2022-01-27 00:33:46 -05:00
Nikita Popov	44cfc3a816	[LICM] Generalize unwinding check during scalar promotion This extract a common isNotVisibleOnUnwind() helper into AliasAnalysis, which handles allocas, byval arguments and noalias calls. After D116998 this could also handle sret arguments. We have similar logic in DSE and MemCpyOpt, which will be switched to use this helper as well. The noalias call case is a bit different from the others, because it also requires that the object is not captured. The caller is responsible for doing the appropriate check. Differential Revision: https://reviews.llvm.org/D117000	2022-01-26 11:15:03 +01:00
Andrew Litteken	ba79295c48	[NFC][IROutliner] fix namespace and unused variable	2022-01-25 18:41:30 -06:00
Andrew Litteken	e8f4e41b6b	[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region. We use the same similarity scheme we used for branch instructions for phi nodes, and allow them to be outlined. There is not a lot of special handling needed for these phi nodes when outlining, as they simply act as outputs. The code extractor does not currently allow for non entry blocks within the extracted region to have predecessors, so there are not conflicts to handle with respect to predecessors no longer contained in the function. Recommit of `515eec3553` Reviewers: paquette Differential Revision: https://reviews.llvm.org/D106997	2022-01-25 18:25:50 -06:00
Andrew Litteken	e50b217b4e	Revert "[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region." This reverts commit `515eec3553`. By mistake, commit message was not complete.	2022-01-25 18:24:19 -06:00
Andrew Litteken	515eec3553	[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region.	2022-01-25 18:20:10 -06:00
Andrew Litteken	9c2daf648c	Revert "[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions" This reverts commit `8de76bd569`. Reverting due to failure of different-intrinsics.ll on lld-x86_64-win buildbot.	2022-01-25 18:19:33 -06:00
Andrew Litteken	8de76bd569	[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match. This also adds an additional command line flag debug option to disable outlining intrinsics. Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D109450	2022-01-25 17:06:09 -06:00
Andrew Litteken	f5f377d1fc	[IRSim][IROutliner] Adding support for recognizing and outlining indirect function calls, and function calls with different names, but the same type The outliner currently requires that function calls not be indirect calls, and have that the function name, and function type must match, as well as other attributes such as calling conventions. This patch treats called functions as values, and just another operand, and named function calls as constants. This allows functions to be treated like any other constant, or input and output into the outlined functions. There are also debugging flags added to enforce the old behaviors where indirect calls not be allowed, and to enforce the old rule that function calls names must also match. Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D109448	2022-01-25 15:19:28 -06:00
Nikita Popov	3e2ae92d3f	[SCEV] Remove an unnecessary GEP type check The code already checked that the addrec step size and type alloc size are the same. The actual pointer element type is irrelevant here.	2022-01-25 12:56:46 +01:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Max Kazantsev	c913dccfde	[SCEV] Use lshr in implications This patch adds support for implication inference logic for the following pattern: ``` lhs < (y >> z) <= y, y <= rhs --> lhs < rhs ``` We should be able to use the fact that value shifted to right is not greater than the original value (provided it is non-negative). Differential Revision: https://reviews.llvm.org/D116150 Reviewed-By: apilipenko	2022-01-25 13:25:19 +07:00
Ahmed Bougacha	e7298464c5	[ObjCARC] Use "UnsafeClaimRV" to refer to unsafeClaim in enums. NFC. This matches the actual runtime function more closely. I considered also renaming both RetainRV/UnsafeClaimRV to end with "ARV", for AutoreleasedReturnValue, but there's less potential for confusion there.	2022-01-24 19:37:01 -08:00
Evgeniy Brevnov	0e55d4fab0	[AA] Refine ModRefInfo for llvm.memcpy.* in presence of operand bundles Presence of operand bundles changes semantics in respect to ModRef. In particular, spec says: "From the compilers perspective, deoptimization operand bundles make the call sites theyre attached to at least readonly. They read through all of their pointer typed operands (even if theyre not otherwise escaped) and the entire visible heap. Deoptimization operand bundles do not capture their operands except during deoptimization, in which case control will not be returned to the compiled frame". Fix handling of llvm.memcpy.* according to the spec. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D118033	2022-01-25 10:15:23 +07:00
Mircea Trofin	b1af01fe6a	[NFC][MLGO] Simplify conditional compilation Most of the code that's shared between 'release' and 'development' modes doesn't depend on anything special.	2022-01-24 11:19:04 -08:00
Kazu Hirata	b752eb887f	[Analysis] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-01-23 20:32:56 -08:00
Kazu Hirata	448d0dfab7	[Analysis] Remove a redundant const from a return type (NFC) Identified with readability-const-return-type.	2022-01-23 14:00:03 -08:00
Sanjay Patel	2e26633af0	[IR] document and update ctlz/cttz intrinsics to optionally return poison rather than undef The behavior in Analysis (knownbits) implements poison semantics already, and we expect the transforms (for example, in instcombine) derived from those semantics, so this patch changes the LangRef and remaining code to be consistent. This is one more step in removing "undef" from LLVM. Without this, I think https://github.com/llvm/llvm-project/issues/53330 has a legitimate complaint because that report wants to allow subsequent code to mask off bits, and that is allowed with undef values. The clang builtins are not actually documented anywhere AFAICT, but we might want to add that to remove more uncertainty. Differential Revision: https://reviews.llvm.org/D117912	2022-01-23 11:22:48 -05:00
Nikita Popov	b4900296e4	[ConstantFold] Allow all float types in reinterpret load folding Rather than hardcoding just half, float and double, allow all floating point types.	2022-01-21 09:26:51 +01:00
Nikita Popov	6a19cb837c	[ConstantFold] Support pointers in reinterpret load folding Peculiarly, the necessary code to handle pointers (including the check for non-integral address spaces) is already in place, because we were already allowing vectors of pointers here, just not plain pointers.	2022-01-21 09:13:37 +01:00
Nikita Popov	05cd9a0596	[ConstantFold] Simplify type check in reinterpret load folding (NFC) Keep a list of allowed types, but then always construct the map type the same way. We need an integer with the same width as the original type.	2022-01-21 09:06:35 +01:00
Mircea Trofin	f29256a64a	[MLGO] Improved support for AOT cross-targeting scenarios The tensorflow AOT compiler can cross-target, but it can't run on (for example) arm64. We added earlier support where the AOT-ed header and object would be built on a separate builder and then passed at build time to a build host where the AOT compiler can't run, but clang can be otherwise built. To simplify such scenarios given we now support more than one AOT-able case (regalloc and inliner), we make the AOT scenario centered on whether files are generated, case by case (this includes the "passed from a different builder" scenario). This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of case by case control. A builder can opt out of an AOT case by passing that case's model path as `none`. Note that the overrides still take precedence. This patch controls conditional compilation with case-specific flags, which can be enabled locally, for the component where those are available. We still keep an overall flag for some tests. The 'development/training' mode is unchanged, because there the model is passed from the command line and interpreted. Differential Revision: https://reviews.llvm.org/D117752	2022-01-20 07:05:39 -08:00
Mircea Trofin	e67430cca4	[MLGO] ML Regalloc Eviction Advisor The bulk of the implementation is common between 'release' mode (==AOT-ed model) and 'development' mode (for training), the main difference is that in development mode, we may also log features (for training logs), inject scoring information (currently after the Virtual Register Rewriter) and then produce the log file. This patch also introduces the score injection pass, 'Register Allocation Pass Scoring', which is trivially just logging the score in development mode. Differential Revision: https://reviews.llvm.org/D117147	2022-01-19 11:00:32 -08:00
Nikita Popov	d8bff13a8a	[NFC] Add missing <map> includes These were relying on a transitive include.	2022-01-19 12:29:03 +01:00
Philip Reames	215bd46905	[MemoryBuiltins] Demote isMallocLikeFn to implementation routine since last use has been removed Try 2, this time including the test.	2022-01-18 15:24:52 -08:00
Philip Reames	fcab2d1309	Revert "[MemoryBuiltins] Demote isMallocLikeFn to implementation routine since last use has been removed" This reverts commit `167af7bbfe`. Buildbot breaks since I forgot to remove a unit test.	2022-01-18 15:16:12 -08:00
Philip Reames	167af7bbfe	[MemoryBuiltins] Demote isMallocLikeFn to implementation routine since last use has been removed	2022-01-18 15:12:07 -08:00
Mircea Trofin	3e8553aab4	[mlgo][inline] Improve global state tracking The global state refers to the number of the nodes currently in the module, and the number of direct calls between nodes, across the module. Node counts are not a problem; edge counts are because we want strictly the kind of edges that affect inlining (direct calls), and that is not easily obtainable without iteration over the whole module. This patch avoids relying on analysis invalidation because it turned out to be too aggressive in some cases. It leverages the fact that Node objects are stable - they do not get deleted while cgscc passes are run over the module; and cgscc pass manager invariants. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D115847	2022-01-18 17:45:34 +00:00
Jan Svoboda	5f4ae56457	[llvm] Remove uses of `std::vector<bool>` LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement. This patch does just that for llvm. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D117121	2022-01-18 18:20:45 +01:00
Nikita Popov	3ec7f46e99	[LVI] Handle implication from icmp of trunc (PR51867) Similar to the existing urem code, if we have (trunc X) >= C, then also X >= C. Proof: https://alive2.llvm.org/ce/z/RF4YR2 Fixes https://github.com/llvm/llvm-project/issues/51867.	2022-01-18 11:24:11 +01:00
Nikita Popov	9e68557e64	[LVI] Handle commuted SPF min/max operands We need to check that the operands of the min/max are the operands of the select, but we don't care which order they are in.	2022-01-18 10:43:00 +01:00
Nikita Popov	d15823e300	[LVI] Compute SPF range even if one operands is overdefined If we have a constant range for one operand but not the other, we can generally still compute a useful results for SPF min/max.	2022-01-18 10:40:49 +01:00
Nikita Popov	202d590a01	[LVI] Consistently intersect assumes Integrate intersection with assumes into getBlockValue(), to ensure that it is consistently performed. We were doing it in nearly all places, but for example missed it for select inputs.	2022-01-18 10:15:31 +01:00
Nikita Popov	f104cc38f4	[ConstantFold] Don't fold load from non-byte-sized vector Following up on `1470f94d71 (r63981173)`: The result here (probably) depends on endianness. Don't bother trying to handle this exotic case, just bail out.	2022-01-17 17:01:47 +01:00
Nikita Popov	af12a3f4a9	[ValueTracking] Remove ComputeMultiple() function This function is no longer used since `499f1ca79f`.	2022-01-17 10:28:31 +01:00
Bryce Wilson	dd13744bfb	Revert "[BasicAliasAnalysis] Remove isMallocOrCallocLikeFn" This reverts commit `1f2cfc4fdc`.	2022-01-14 14:42:53 -08:00
Roman Lebedev	650fc40b6d	[NFC][SCEV] Introduce `getCastExpr()` QoL helper	2022-01-15 00:52:22 +03:00
Bryce Wilson	1f2cfc4fdc	[BasicAliasAnalysis] Remove isMallocOrCallocLikeFn Allocation functions should be marked with onlyAccessesInaccessibleMemory (when that is correct for the given function) which is checked elsewhere so this check is no longer needed. Differential Revision: https://reviews.llvm.org/D117180	2022-01-14 12:22:01 -08:00
Philip Reames	dac82b53e2	Revert "[MemoryBuiltins] [NFC] Add missing section comments" This reverts commit `83338d5032`. Comments in source are non-idiomatic and naming choice in head is unclear.	2022-01-14 08:34:21 -08:00
Roman Lebedev	b32077234b	[NFCI][SCEV] `computeExitLimitFromCondFromBinOp()`: rely on `getSequentialMinMaxExpr()` constant relaxation `getSequentialMinMaxExpr()` has been taught to perform this relaxation, so rely on that now. Not sure this can be tested.	2022-01-14 17:07:48 +03:00
Roman Lebedev	8dcba20674	[SCEV] `getSequentialMinMaxExpr()`: relax 2-op umin_seq w/ constant to umin Currently, `computeExitLimitFromCondFromBinOp()` does that directly.	2022-01-14 17:07:48 +03:00
Roman Lebedev	c86a982d7d	[SCEV] `getSequentialMinMaxExpr()`: rewrite deduplication to be fully recursive Since we don't merge/expand non-sequential umin exprs into umin_seq exprs, we may have umin_seq(umin(umin_seq())) chain, and the innermost umin_seq can have duplicate operands still.	2022-01-14 15:42:26 +03:00
Florian Hahn	1ef9bfa013	[InstSimplify] Pass pointer and indices separately to SimplifyGEPInst. This doesn't require callers to put the pointer operand and the indices in a container like a vector when calling the function. This is not really an issue with the existing callers. But when using it from IRBuilder the inputs are available as separate pointer value and indices ArrayRef. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D117038	2022-01-14 09:59:52 +00:00
Nikita Popov	20d9c51dc0	[ConstantFold] Check for uniform value before reinterpret load The reinterpret load code will convert undef values into zero. Check the uniform value case before it to produce a better result for all-undef initializers. However, the uniform value handling will return the uniform value even if the access is out of bounds, while the reinterpret load code will return undef. Add an explicit check to retain the previous result in this case.	2022-01-14 10:18:02 +01:00
Bryce Wilson	83338d5032	[MemoryBuiltins] [NFC] Add missing section comments	2022-01-13 17:43:43 -08:00
Philip Reames	ee02cf0797	[MemoryBuiltins] Demote isCallocLikeFn and isAlignedAllocLikeFn to local helpers after removal of last external use [NFC]	2022-01-13 15:51:17 -08:00
Philip Reames	cf66f01ec1	[Attributor] Share code for abstract interpretation of allocation sizes with getObjectSize [NFC-ish] The basic idea is that we can parameterize the getObjectSize implementation with a callback which lets us replace the operand before analysis if desired. This is what Attributor is doing during it's abstract interpretation, and allows us to have one copy of the code. Note this is not NFC for two reasons: * The existing attributor code is wrong. (Well, this is under-specified to be honest, but at least inconsistent.) The intermediate math needs to be done in the index type of the pointer space. Imagine e.g. i64 arguments in a 32 bit address space. * I did not preserve the behavior in getAPInt where we return 0 for a partially analyzed value. This looks simply wrong in the original code, and nothing test wise contradicts that. Differential Revision: https://reviews.llvm.org/D117241	2022-01-13 15:33:24 -08:00
Bryce Wilson	68874d8b5f	[MemoryBuiltins] [NFC] Remove unused overload of isAlignedAllocLikeFn Differential Revision: https://reviews.llvm.org/D117245	2022-01-13 15:19:04 -08:00
Arthur Eubanks	757e044dce	[Inliner] Don't removeDeadConstantUsers() when checking if a function is dead If a function has many uses, this can take a good chunk of compile times. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D117236	2022-01-13 14:29:45 -08:00
Philip Reames	cd36b29ec7	[MemoryBuiltins] (Slightly) clean up abuse of MallocLike bitmask [NFC]	2022-01-13 12:39:22 -08:00
Nikita Popov	aba7c3c033	[ConstantFold] Check uniform value in ConstantFoldLoadFromConst() This case is automatically handled if ConstantFoldLoadFromConstPtr() is used. Make sure that ConstantFoldLoadFromConst() also handles it.	2022-01-13 14:40:19 +01:00
Hans Wennborg	2bc57d85eb	Don't override __attribute__((no_stack_protector)) by inlining (PR52886) Since `26c6a3e736`, LLVM's inliner will "upgrade" the caller's stack protector attribute based on the callee. This lead to surprising results with Clang's no_stack_protector attribute added in `4fbf84c173` (D46300). Consider the following code compiled with clang -fstack-protector-strong -Os (https://godbolt.org/z/7s3rW7a1q). extern void h(int* p); inline __attribute__((always_inline)) int g() { return 0; } int __attribute__((__no_stack_protector__)) f() { int a[1]; h(a); return g(); } LLVM will inline g() into f(), and f() would get a stack protector, against the users explicit wishes, potentially breaking the program e.g. if h() changes the value of the stack cookie. That's a miscompile. More recently, `bc044a88ee` (D91816) addressed this problem by preventing inlining when the stack protector is disabled in the caller and enabled in the callee or vice versa. However, the problem remained if the callee is marked always_inline as in the example above. This affected users, see e.g. http://crbug.com/1274129 and http://llvm.org/pr52886. One way to fix this would be to prevent inlining also in the always_inline case. Despite the name, always_inline does not guarantee inlining, so this would be legal but potentially surprising to users. However, I think the better fix is to not enable the stack protector in a caller based on the callee. The motivation for the old behaviour is unclear, it seems counter-intuitive, and causes real problems as we've seen. This commit implements that fix, which means in the example above, g() gets inlined into f() (also without always_inline), and f() is emitted without stack protector. I think that matches most developers' expectations, and that's also what GCC does. Another effect of this change is that a no_stack_protector function can now be inlined into a stack protected function, e.g. (https://godbolt.org/z/hafP6W856): extern void h(int* p); inline int __attribute__((__no_stack_protector__)) __attribute__((always_inline)) g() { return 0; } int f() { int a[1]; h(a); return g(); } I think that's fine. Such code would be unusual since no_stack_protector is normally applied to a program entry point which sets up the stack canary. And even if such code exists, inlining doesn't change the semantics: there is still no stack cookie setup/check around entry/exit of the g() code region, but there may be in the surrounding context, as there was before inlining. This also matches GCC. See also the discussion at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722 Differential revision: https://reviews.llvm.org/D116589	2022-01-13 12:04:49 +01:00
Sanjay Patel	6bd127b079	[InstSimplify] use knownbits to fold more udiv/urem We could use knownbits on both operands for even more folds (and there are already tests in place for that), but this is enough to recover the example from: https://github.com/llvm/llvm-project/issues/51934 (the tests are derived from the code in that example) I am assuming no noticeable compile-time impact from this because udiv/urem are rare opcodes. Differential Revision: https://reviews.llvm.org/D116616	2022-01-12 14:59:43 -05:00
Rosie Sumpter	552eb372cb	[LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter This is required to query the legality more precisely in the LoopVectorizer. This adds another TTI function named 'forceScalarizeMaskedGather/Scatter' function to work around the hack introduced for MVE, where isLegalMaskedGather/Scatter would return an answer by second-guessing where the function was called from, based on the Type passed in (vector vs scalar). The new interface makes this explicit. It is also used by X86 to check for vector widths where gather/scatters aren't profitable (or don't exist) for certain subtargets. Differential Revision: https://reviews.llvm.org/D115329	2022-01-12 13:34:12 +00:00
Mircea Trofin	248d55af3e	[NFC][MLGO] Use LazyCallGraph::Node to track functions. This avoids the InlineAdvisor carrying the responsibility of deleting Function objects. We use LazyCallGraph::Node objects instead, which are stable in memory for the duration of the Module-wide performance of CGSCC passes started under the same ModuleToPostOrderCGSCCPassAdaptor (which is the case here) Differential Revision: https://reviews.llvm.org/D116964	2022-01-11 19:23:47 -08:00
Mircea Trofin	1f5dceb1d0	[MLGO] Add support for multiple training traces per module This happens in e.g. regalloc, where we trace decisions per function, but wouldn't want to spew N log files (i.e. one per function). So we output a key-value association, where the key is an ID for the sub-module object, and the value is the tensorflow::SequenceExample. The current relation with protobuf is tenuous, so we're avoiding a custom message type in favor of using the `Struct` message, but that requires the values be wire-able strings, hence base64 encoding. We plan on resolving the protobuf situation shortly, and improve the encoding of such logs, but this is sufficient for now for setting up regalloc training. Differential Revision: https://reviews.llvm.org/D116985	2022-01-11 16:13:31 -08:00
Mircea Trofin	a81b0c978f	[NFC][MLGO] Remove the word "inliner" in a generic error message.	2022-01-11 12:39:16 -08:00
Arthur Eubanks	bf52210e25	[NFC][LazyCallGraph] Remove check in removeDeadFunction() if graph is empty If we're in removeDeadFunction(), we should have already constructed the call graph. Differential Revision: https://reviews.llvm.org/D115676	2022-01-11 10:17:13 -08:00
Florian Hahn	f0ef1ea6dd	[IRBuilder] Introduce folder using inst-simplify, use for Or fold. Alternative to D116817. This introduces a new value-based folding interface for Or (FoldOr), which takes 2 values and returns an existing Value or a constant if the Or can be simplified. Otherwise nullptr is returned. This replaces the more restrictive CreateOr which takes 2 constants. This is the used to implement a folder that uses InstructionSimplify. The logic to simplify `Or` instructions is moved there. Subsequent patches are going to transition other CreateXXX to the more general FoldXXX interface. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D116935	2022-01-11 17:30:48 +00:00
Philip Reames	8f553da492	[instsimplify] Add a comment and test for a highly confusing case	2022-01-11 09:24:10 -08:00
Philip Reames	e838949bee	[GlobalsModRef] Apply indirect-global rule to all globals initialized from noalias calls Extend the existing malloc-family specific optimization to all noalias calls. This allows us to handle allocation wrappers, and removes a dependency on a lib-func check in favor of generic attribute usage. Differential Revision: https://reviews.llvm.org/D116980	2022-01-11 08:44:31 -08:00
Florian Hahn	8a469e2050	[InstSimplify] Fold inbounds GEP to poison if base is undef. D92270 updated constant expression folding to fold inbounds GEP to poison if the base is undef. Apply the same logic to SimplifyGEPInst. The justification is that we can choose an out-of-bounds pointer as base pointer. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D117015	2022-01-11 16:11:22 +00:00
Roman Lebedev	5ceb070bbb	[SCEV] `getSequentialMinMaxExpr()`: look into `umin` when deduplicating operands We could just merge all umin into umin_seq, but that is likely a pessimization, so don't do that, but pretend that we did for the purpose of deduplication.	2022-01-11 18:51:57 +03:00
Roman Lebedev	5e16650792	[SCEV] `getSequentialMinMaxExpr()`: keep only the first instance of an operand Having the same operand more than once doesn't change the outcome here, neither reduction-wise nor poison-wise. We must keep the first instance specifically though.	2022-01-11 16:51:53 +03:00
Roman Lebedev	76a0abbc13	[SCEV] Reenable umin_seq support and fix the `computeSCEVAtScope()` This reverts commit `f62f47f5e1`.	2022-01-11 16:03:35 +03:00
Nikita Popov	3946095b88	[MemoryBuiltins] Remove unused isOpNewLikeFn() (NFC) This function is no longer used since `2cafbcb560`.	2022-01-11 12:27:23 +01:00
Nikita Popov	b56f6f1913	[MemoryBuiltins] Remove unused isStrdupLikeFn() function (NFC) This function is no longer used after `dcbc91f40c`.	2022-01-11 12:26:20 +01:00
Philip Reames	f62f47f5e1	Partial revert of `82fb4f4` Two crashes have been reported. This change disables the new logic while leaving the new node in tree. Hopefully, that's enough to allow investigation without breakage while avoiding massive churn.	2022-01-10 18:18:34 -08:00
Philip Reames	5265ac72c6	[MemoryBuiltin] Add an API for checking if an unused allocation can be removed [NFC] Not all allocation functions are removable if unused. An example of a non-removable allocation would be a direct call to the replaceable global allocation function in C++. An example of a removable one - at least according to historical practice - would be malloc.	2022-01-10 15:43:39 -08:00
Roman Lebedev	82fb4f4b22	[SCEV] Sequential/in-order `UMin` expression As discussed in https://github.com/llvm/llvm-project/issues/53020 / https://reviews.llvm.org/D116692, SCEV is forbidden from reasoning about 'backedge taken count' if the branch condition is a poison-safe logical operation, which is conservatively correct, but is severely limiting. Instead, we should have a way to express those poison blocking properties in SCEV expressions. The proposed semantics is: ``` Sequential/in-order min/max SCEV expressions are non-commutative variants of commutative min/max SCEV expressions. If none of their operands are poison, then they are functionally equivalent, otherwise, if the operand that represents the saturation point* of given expression, comes before the first poison operand, then the whole expression is not poison, but is said saturation point. ``` * saturation point - the maximal/minimal possible integer value for the given type The lowering is straight-forward: ``` compare each operand to the saturation point, perform sequential in-order logical-or (poison-safe!) ordered reduction over those checks, and if reduction returned true then return saturation point else return the naive min/max reduction over the operands ``` https://alive2.llvm.org/ce/z/Q7jxvH (2 ops) https://alive2.llvm.org/ce/z/QCRrhk (3 ops) Note that we don't need to check the last operand: https://alive2.llvm.org/ce/z/abvHQS Note that this is not commutative: https://alive2.llvm.org/ce/z/FK9e97 That allows us to handle the patterns in question. Reviewed By: nikic, reames Differential Revision: https://reviews.llvm.org/D116766	2022-01-10 20:51:26 +03:00
Philip Reames	1d127315e7	Minor style tweaks following `fb93659`	2022-01-10 09:32:29 -08:00
Bryce Wilson	7febd60a90	[instcombine] Add align return attributes for operator new(..., align_val) (Split from original patch to separate non-NFC part and add coverage. I typoed when adding the new test, so this change includes the typo fix to let libfunc recongize the signature. Didn't figure it was worth another separate commit.) Differential Revision: https://reviews.llvm.org/D116851 (part 2 of 2)	2022-01-10 09:15:20 -08:00
Bryce Wilson	fb936595fa	[MemoryBuiltins] Add field for alignment argument [NFC] There are a few places where the alignment argument for AlignedAllocLike functions was previously hardcoded. This patch adds an getAllocAlignment function and a change to the MemoryBuiltin table to allow alignment arguments to be found generically. This will shortly allow alignment inference on operator new's with align_val params and an extension to Attributor's HeapToStack. The former will follow shortly - I split Bryce's patch for purpose of having the large change be NFC. The later will be reviewed separately. Differential Revision: https://reviews.llvm.org/D116851 (part 1 of 2)	2022-01-10 09:15:20 -08:00
Simon Pilgrim	fd1094f318	[ConstantFolding] Clean up Intrinsics::abs undef handling Match cttz/ctlz handling by assuming C1 == 0 if C1 != 1 - I've added an assertion as well. Fixes static analyzer nullptr dereference warnings.	2022-01-10 17:04:03 +00:00
Nikita Popov	92d55e7336	[MemoryBuiltins] Remove isNoAliasFn() in favor of isNoAliasCall() We currently have two similar implementations of this concept: isNoAliasCall() only checks for the noalias return attribute. isNoAliasFn() also checks for allocation functions. We should switch to only checking the attribute. SLC is responsible for inferring the noalias return attribute for non-new allocation functions (with a missing case fixed in `348bc76e35`). For new, clang is responsible for setting the attribute, if -fno-assume-sane-operator-new is not passed. Differential Revision: https://reviews.llvm.org/D116800	2022-01-10 09:18:15 +01:00
Simon Pilgrim	be7dbd674c	[DivergenceAnalysis] Simplify inRegion test based on whether the RegionLoop pointer is null or not More closely matches the documentation Requested by @nikic	2022-01-08 14:30:10 +00:00
Simon Pilgrim	b3f193a980	[DivergenceAnalysis] Fix static analyzer warning about dereference of nullptr We're testing that the RegionLoop pointer is null in the first part of the check, so we need to check that its non-null before dereferencing it in a later part of the check.	2022-01-08 13:57:33 +00:00
Kazu Hirata	b932bdf59f	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-07 17:45:09 -08:00
Philip Reames	f38873537b	[MemoryBuiltin] Cleanup stale todo comments [NFC] strdup/strndup are already partially implemented, move remaining comment to relevant place. Remaining named routines are copy routines and mostly handled via intrinsics already - they do not allocate new memory.	2022-01-07 13:57:20 -08:00
Roman Lebedev	32300375f5	[NFCI] `ScalarEvolution::getRangeRef()`: collapse `SCEVMinMaxExpr` handling	2022-01-08 00:23:08 +03:00
Arthur Eubanks	d51e3474e0	[LazyCallGraph] Ignore empty RefSCCs rather than shift RefSCCs when removing dead functions This is in preparation for D115545 which attempts to delete discardable functions if they are unused. With that change, shifting RefSCCs becomes noticeable in compile time. This change makes the LCG update negligible again. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D116776	2022-01-07 09:42:23 -08:00
Philip Reames	6b0ff0969d	Extract utility function for checking initial value of allocation [NFC, try 2] This is a reoccuring pattern, we can consolidate three copies into one. The main motivation is to reduce usages of isMallocLike. The original commit (which was quickly reverted) didn't account for the allocation function could be an invoke, test coverage for that case added in this commit.	2022-01-07 08:44:08 -08:00
Roman Lebedev	a5a6960d1c	[NFCI][IR] MinMaxIntrinsic: add some more helper methods, and use them	2022-01-07 13:02:11 +03:00
Philip Reames	c6a0c1585a	Revert "Extract utility function for checking initial value of allocation [NFC]" This reverts commit `9ce30fe86f`. Appears to be causing a problem on a buildbot, revert while investigating. https://green.lab.llvm.org/green//job/clang-stage1-RA/26818/consoleFull#-1502953973d489585b-5106-414a-ac11-3ff90657619c	2022-01-06 19:05:51 -08:00
Philip Reames	9ce30fe86f	Extract utility function for checking initial value of allocation [NFC] This is a reoccuring pattern, we can consolidate three copies into one. The main motivation is to reduce usages of isMallocLike.	2022-01-06 18:02:14 -08:00
Philip Reames	5d1cfd4348	Remove unused LookThroughBitCast param in isXAllocLike functions [NFC] This parameter took the non-default value exactly twice, and neither had semantic effect.	2022-01-06 18:02:13 -08:00
Philip Reames	7052670e96	Move getMallocAllocatedType and getMallocArraySize to GlobalOpt [NFC] These are implementation details of the global-opt transform and not easily reuseable, so remove them from the analysis header.	2022-01-06 18:02:13 -08:00
Philip Reames	67a3331e4f	Inline extractMallocCall to sole use and delete [NFC]	2022-01-06 18:02:13 -08:00
Philip Reames	4b0fc924a9	Delete unused extractCallocCall routine [NFC]	2022-01-06 18:02:13 -08:00
Philip Reames	cffd268316	Demote getMallocType to implementation routine in MemoryBuiltins [NFC]	2022-01-06 18:02:13 -08:00
Daniil Suchkov	524abc68f2	Introduce NewPM .dot printers for DomTree This patch adds a couple of NewPM function passes (dot-dom and dot-dom-only) that dump DomTree into .dot files. Reviewed-By: aeubanks Differential Revision: https://reviews.llvm.org/D116629	2022-01-05 23:25:40 +00:00
Nico Weber	085f078307	Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."" This reverts commit `859ebca744`. The change contained many unrelated changes and e.g. restored unit test failes for the old lld port.	2022-01-05 13:10:25 -05:00
David Salinas	859ebca744	Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`." This reverts commit `640beb38e7`. That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup Change-Id: Ibf8e397df94001f248fba609f072088a46abae08 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D115960 Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105	2022-01-05 17:57:32 +00:00
Philip Reames	c16fd6a376	Rename doesNotReadMemory to onlyWritesMemory globally [NFC] The naming has come up as a source of confusion in several recent reviews. onlyWritesMemory is consist with onlyReadsMemory which we use for the corresponding readonly case as well.	2022-01-05 08:52:55 -08:00
Nikita Popov	3dc1907d06	[ConstantFold] Use ConstantFoldLoadFromUniformValue() in more places In particular, this also preserves undef when loading from padding, rather than converting it to zero through a different codepath. This is the remaining part of D115924.	2022-01-05 12:47:50 +01:00
Nikita Popov	99c6b12b92	[ConstantFolding] Unify handling of load from uniform value There are a number of places that specially handle loads from a uniform value where all the bits are the same (zero, one, undef, poison), because we a) don't care about the load offset in that case b) it bypasses casts that might not be legal generally but do work with uniform values. We had multiple implementations of this, with a different set of supported values each time. This replaces two usages with a more complete helper. Other usages will be replaced separately, because they have larger impact. This is part of D115924.	2022-01-05 12:30:46 +01:00
Mircea Trofin	a120fdd337	[NFC][MLGO]Add RTTI support for MLModelRunner and simplify runner setup	2022-01-04 19:46:14 -08:00
Sanjay Patel	1e50d06466	[Analysis] fix swapped operands to computeConstantRange This was noted in post-commit review for D116322 / `0edf99950e` . I am not seeing how to expose the bug in a test though because we don't pass an assumption cache into this analysis from there.	2022-01-04 13:13:50 -05:00
Philip Reames	b061d86c69	[SCEV] Compute exit count from overflow check expressed w/ x.with.overflow intrinsics This ports the logic we generate in instcombine for a single use x.with.overflow check for use in SCEV's analysis. The result is that we can prove trip counts for many checks, and (through existing logic) often discharge them. Motivation comes from compiling a simple example with -ftrapv. Differential Revision: https://reviews.llvm.org/D116499	2022-01-04 09:44:23 -08:00
Florian Hahn	d8276208be	[LAA] Remove overeager assertion for aggregate types. `0a00d64` turned an early exit here into an assertion, but the assertion can be triggered, as PR52920 shows. The later code is agnostic to the accessed type, so just drop the assert. The patch also adds tests for LAA directly and loop-load-elimination to show the behavior is sane.	2022-01-04 15:20:35 +00:00
Nikita Popov	71b2c4a3cf	[ConstantFolding] Remove unused ConstantFoldLoadThroughGEPConstantExpr() This API is no longer used since `bbeaf2aac6`.	2022-01-04 12:37:12 +01:00
Rosie Sumpter	961f51fdf0	[LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions without loads/stores For loops that contain in-loop reductions but no loads or stores, large VFs are chosen because LoopVectorizationCostModel::getSmallestAndWidestTypes has no element types to check through and so returns the default widths (-1U for the smallest and 8 for the widest). This results in the widest VF being chosen for the following example, float s = 0; for (int i = 0; i < N; ++i) s += (float) i*i; which, for more computationally intensive loops, leads to large loop sizes when the operations end up being scalarized. In this patch, for the case where ElementTypesInLoop is empty, the widest type is determined by finding the smallest type used by recurrences in the loop instead of falling back to a default value of 8 bits. This results in the cost model choosing a more sensible VF for loops like the one above. Differential Revision: https://reviews.llvm.org/D113973	2022-01-04 10:12:57 +00:00
Craig Topper	cbcbbd6ac8	[ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC This function returns an upper bound on the number of bits needed to represent the signed value. Use "Max" to match similar functions in KnownBits like countMaxActiveBits. Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old name around to keep this patch size down. Will do a bulk rename as follow up. Rename KnownBits::countMaxSignedBits->countMaxSignificantBits. Reviewed By: lebedev.ri, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D116522	2022-01-03 11:33:30 -08:00
Kazu Hirata	e5947760c2	Revert "[llvm] Remove redundant member initialization (NFC)" This reverts commit `fd4808887e`. This patch causes gcc to issue a lot of warnings like: warning: base class ‘class llvm::MCParsedAsmOperand’ should be explicitly initialized in the copy constructor [-Wextra]	2022-01-03 11:28:47 -08:00
Nikita Popov	5afbfe33e7	[ConstantFold] Make icmp of gep fold offset based We can fold an equality or unsigned icmp between base+offset1 and base+offset2 with inbounds offsets by comparing the offsets directly. This replaces a pair of specialized folds that tried to reason based on the GEP structure instead. One of those folds was plain wrong (because it does not account for negative offsets), while the other is unnecessarily complicated and limited (e.g. it will fail with bitcasts involved). The disadvantage of this change is that it requires data layout, so the fold is no longer performed by datalayout-independent constant folding. I don't think this is a loss in practice, but it does regress the ConstantExprFold.ll test, which checks folding without running any passes. Differential Revision: https://reviews.llvm.org/D116332	2022-01-03 09:41:37 +01:00
Philip Reames	890e685492	[SCEV] Drop unused param from new version of computeExitLimitFromICmp [NFC]	2022-01-02 10:15:17 -08:00
Philip Reames	f19a95bbed	[SCEV] Split computeExitLimitFromICmp into two versions [NFC] This is in advance of a following change which needs to the non-icmp API.	2022-01-02 09:58:32 -08:00
Kazu Hirata	fd4808887e	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-01 16:18:18 -08:00
Sanjay Patel	c054402170	[InstSimplify] fold or-nand-xor ~(A & B) \| (A ^ B) --> ~(A & B) https://alive2.llvm.org/ce/z/hXQucg	2021-12-31 15:11:13 -05:00
Nuno Lopes	64af9f61c3	[InstSimplify] add 'x + poison -> poison' (needed for NewGVN)	2021-12-30 11:52:42 +00:00
Fangrui Song	b69fe48ccf	[IROutliner] Move global namespace cl::opt inside llvm::	2021-12-30 01:12:55 -08:00
Sanjay Patel	0edf99950e	[Analysis] allow caller to choose signed/unsigned when computing constant range We should not lose analysis precision if an 'add' has both no-wrap flags (nsw and nuw) compared to just one or the other. This patch is modeled on a similar construct that was added with D59386. I don't think it is possible to expose a problem with an unsigned compare because of the way this was coded (nuw is handled first). InstCombine has an assert that fires with the example from: https://github.com/llvm/llvm-project/issues/52884 ...because it was expecting InstSimplify to handle this kind of pattern with an smax. Fixes #52884 Differential Revision: https://reviews.llvm.org/D116322	2021-12-28 09:45:37 -05:00
Sanjay Patel	773ab3c665	[Analysis] remove unneeded casts; NFC The callee does the casting too; this matches a plain call later in the same function for 'shl'.	2021-12-27 13:41:50 -05:00
Nikita Popov	ae64c5a0fd	[DSE][MemLoc] Handle intrinsics more generically Remove the special casing for intrinsics in MemoryLocation::getForDest() and handle them through the general attribute based code. On the DSE side, this means that isRemovable() now needs to handle more than a hardcoded list of intrinsics. We consider everything apart from volatile memory intrinsics and lifetime markers to be removable. This allows us to perform DSE on intrinsics that DSE has not been specially taught about, using a matrix store as an example here. There is an interesting test change for invariant.start, but I believe that optimization is correct. It only looks a bit odd because the code is immediate UB anyway. Differential Revision: https://reviews.llvm.org/D116210	2021-12-24 09:29:57 +01:00
Mehrnoosh Heidarpour	0ff20f2f44	[InstSimplify] Fold logic AND to zero Adding following fold opportunity: ((A \| B) ^ A) & ((A \| B) ^ B) --> 0 Reviewed By: spatel, rampitec Differential Revision: https://reviews.llvm.org/D115755	2021-12-23 10:06:26 -05:00
Mircea Trofin	edf8e3ea5e	[NFC][mlgo]Make the test model generator inlining-specific When looking at building the generator for regalloc, we realized we'd need quite a bit of custom logic, and that perhaps it'd be easier to just have each usecase (each kind of mlgo policy) have it's own stand-alone test generator. This patch just consolidates the old `config.py` and `generate_mock_model.py` into one file, and does away with subdirectories under Analysis/models.	2021-12-22 13:38:45 -08:00
Nikita Popov	8a0e35f3a7	[MemoryLocation] Don't require nocapture in getForDest() As reames mentioned on related reviews, we don't need the nocapture requirement here. First of all, from an API perspective, this is not something that MemoryLocation::getForDest() should be checking in the first place, because it does not affect which memory this particular call can access; it's an orthogonal concern that should be handled by the caller if necessary. However, for both of the motivating users in DSE and InstCombine, we don't need the nocapture requirement, because the capture can either be purely local to the call (a pointer identity check that is irrelevant to us), be part of the return value (which we check is unused), or be written in the dest location, which we have determined to be dead. This allows us to remove the special handling for libcalls as well. Differential Revision: https://reviews.llvm.org/D116148	2021-12-22 12:20:13 +01:00
Nikita Popov	f5ac23b5ae	[ArgPromotion][TTI] Pass types to ABI compatibility hook The areFunctionArgsABICompatible() hook currently accepts a list of pointer arguments, though what we're actually interested in is the ABI compatibility after these pointer arguments have been converted into value arguments. This means that a) the current API is incompatible with opaque pointers (because it requires inspection of pointee types) and b) it can only be used in the specific context of ArgPromotion. I would like to reuse the API when inspecting calls during inlining. This patch converts it into an areTypesABICompatible() hook, which accepts a list of types. This makes the method more generally usable, and compatible with opaque pointers from an API perspective (the actual usage in ArgPromotion/Attributor is still incompatible, I'll follow up on that in separate patches). Differential Revision: https://reviews.llvm.org/D116031	2021-12-22 09:37:51 +01:00
Serge Pavlov	77b923d0db	[ConstantFolding] Do not remove side effect from constrained functions According to the discussion in https://reviews.llvm.org/D110322 the code that removes side effect from replaced function call is deleted. Differential Revision: https://reviews.llvm.org/D115870	2021-12-22 13:45:49 +07:00
Nikita Popov	2926d6d335	[ConstantFold][GlobalOpt] Don't create x86_mmx null value This fixes the assertion failure reported at https://reviews.llvm.org/D114889#3198921 with a straightforward check, until the cleaner fix in D115924 can be reapplied.	2021-12-21 09:11:41 +01:00
Kazu Hirata	500c4b68dc	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-12-20 23:43:24 -08:00
Philip Reames	44d23d5345	[DSE] Remove calls with known writes to dead memory This is a reapply of `a8a51fe5`, which was reverted in 1ba99e due to a failing compiler-rt test. That test was a false positive because it was checking asan failures not accounting for the fact the call could be validly optimized out. I hopefully managed to stablize that test in 9b955f. (That's a speculative fix due to disk consumption needed to build compiler-rt tests locally being absurd.) Original commit message follows.. The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size. Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet). In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest. Differential Revision: https://reviews.llvm.org/D115904	2021-12-20 18:10:23 -08:00
Sanjay Patel	a56803b8f8	[Analysis] fix cast in ValueTracking to allow constant expression The test would crash because a non-instruction negate op made it in here. Fixes #51506	2021-12-20 17:16:47 -05:00
Sander de Smalen	b1ff20fd35	[LV] Enable scalable vectorization by default for SVE cores. The availability of SVE should be sufficient to enable scalable auto-vectorization. This patch adds a new TTI interface to query the target what style of vectorization it wants when scalable vectors are available. For other targets than AArch64, this currently defaults to 'FixedWidthOnly'. Differential Revision: https://reviews.llvm.org/D115651	2021-12-20 16:23:29 +00:00
Nikita Popov	aeb36ae0f4	Revert "[ConstantFolding] Unify handling of load from uniform value" This reverts commit `9fd4f80e33`. This breaks SingleSource/Regression/C/gcc-c-torture/execute/pr19687.c in test-suite. Either the test is incorrect, or clang is generating incorrect union initialization code. I've submitted https://reviews.llvm.org/D115994 to fix the test, assuming my interpretation is correct. Reverting this in the meantime as it may take some time to resolve.	2021-12-18 20:46:52 +01:00
Ricky Zhou	9927a06f74	[AA] Handle callbr instructions in alias analysis Before this change, AAResults::getModRefInfo() was missing a case for callbr instructions (asm goto), which may read/write memory. In PR52735, this led to a miscompile where a load was incorrect eliminated. Add this missing case, as well as an assert verifying that all memory-accessing instructions are handled properly. Fixes #52735. Differential Revision: https://reviews.llvm.org/D115992	2021-12-18 18:49:17 +01:00
Nikita Popov	1ba99eaf70	Revert "[DSE] Remove calls with known writes to dead memory" This reverts commit `a8a51fe556`. This breaks the strncpy-overflow.cpp test case.	2021-12-18 09:23:41 +01:00
Philip Reames	a8a51fe556	[DSE] Remove calls with known writes to dead memory The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size. Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet). In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest. Differential Revision: https://reviews.llvm.org/D115904	2021-12-17 13:42:36 -08:00
Philip Reames	793c0da89e	[capturetracking] Explicitly check for callee operand [NFC] Pull out an explicit check rather than relying on the fact that the callee operand is not a data operand. The only real value is it gives us a clear place to move the comment, and makes the code slightly more understandable.	2021-12-17 09:21:35 -08:00
Nikita Popov	9fd4f80e33	[ConstantFolding] Unify handling of load from uniform value There are a number of places that specially handle loads from a uniform value where all the bits are the same (zero, one, undef, poison), because we a) don't care about the load offset in that case and b) it bypasses casts that might not be legal generally but do work with uniform values. We had multiple implementations of this, with a different set of supported values each time, as well as incomplete type checks in some cases. In particular, this fixes the assertion reported in https://reviews.llvm.org/D114889#3198921, as well as a similar assertion that could be triggered via constant folding. Differential Revision: https://reviews.llvm.org/D115924	2021-12-17 17:05:06 +01:00
Momchil Velikov	6192c312cf	[AA] Correctly maintain the sign of PartiaAlias offset Preserve the invariant that offset reported in the case of a `PartialAlias` between `Loc1` and `Loc2`, is such that `Loc1 + Offset = Loc2`, where `Loc1` and `Loc2` are the first and the second argument, respectively, in alias queries. Differential Revision: https://reviews.llvm.org/D115927	2021-12-17 15:45:26 +00:00
Florian Hahn	f5f421e0ee	[SCEV] Apply loop guards in reverse order. This patch updates applyLoopGuards to first collect all conditions and then applies them in reverse order. This ensures the SCEVs with the shortest dependency chains are constructed first, limiting the required stack size. This fixes a crash reported in D113578. Note that the order conditions are applied can impact the accuracy of the result, mostly due to missing min/max simplifications when constructing SCEVs. The changed test highlights the impact of the evaluation order. I will follow up with a SCEV patch to improve min/max simplifications to get the same results for both orders.	2021-12-16 10:52:37 +00:00
Nikita Popov	a8c2ba105d	[Inline] Disable deferred inlining After the switch to the new pass manager, we have observed multiple instances of catastrophic inlining, where the inliner produces huge functions with many hundreds of thousands of instructions from small input IR. We were forced to back out the switch to the new pass manager for this reason. This patch fixes at least one of the root cause issues. LLVM uses a bottom-up inliner, and the fact that functions are processed bottom-up is not just a question of optimality -- it is an imporant requirement to prevent runaway inlining. The premise of the current inlining approach and cost model is that after all calls inside a function have been inlined, it may get large enough that inlining it into its callers is no longer considered profitable. This safeguard does not exist if inlining doesn't happen bottom-up, as inlining the callees, and their callees, and their callees etc. will always seem individually profitable, and the inliner can easily flatten the whole call tree. There are instances where we necessarily have to deviate from bottom-up inlining: When inlining in an SCC there is no natural "bottom", so inlining effectively happens top-down. This requires special care, and the inliner avoids exponential blowup by ensuring that functions in the SCC grow in a balanced way and will eventually hit the threshold. However, there is one instance where the inlining advisor explicitly violates the bottom-up principle: Deferred inlining tries to "defer" inlining a call if it determines that inlining the caller into all its call-sites would be more profitable. Something very important to understand about deferred inlining is that it doesn't make one inlining choice in place of another -- it effectively chooses to do both. If we have a call chain A -> B -> C and cost modelling tells us that inlining B -> C is profitable, but we defer this and instead inline A -> B first, then we'll now have a call A -> C, and the cost model will (a few special cases notwithstanding) still tell us that this is profitable. So the end result is that we inlined both B and C, even though under the usual cost model function B would have been too large to further inline after C has been integrated into it. Because deferred inlining violates the bottom-up invariant of the inliner, it can result in exponential inlining. The exponential-deferred-inlining.ll test case illustrates this on a simple example (see https://gist.github.com/nikic/1262b5f7d27278e1b34a190ae10947f5 for a much more catastrophic case with about 5000x size blowup). If the call chain A -> B -> C is not a chain but a tree of calls, then we end up deferring inlining across the tree and end up flattening everything into the root node. This patch proposes to address this by disabling deferred inlining entirely (currently still behind an option). Beyond the issue of exponential inlining, I don't think that the whole concept makes sense, at least as long as deferred inlining still ends up inlining both call edges. I believe the motivation for having deferred inlining in the first place is that you might have a small wrapper function with local linkage that could be eliminated if inlined. This would automatically happen if there was a single caller, due to the large "last call to local" bonus. However, this bonus is not extended if there are multiple callers, even if we would eventually end up inlining into all of them (if the bonus were extended). Now, unlike the normal inlining cost model, the deferred inlining cost model does look at all callers, and will extend the "last call to local" bonus if it determines that we could inline all of them as long as we defer the current inlining decision. This makes very little sense. The "last call to local" bonus doesn't really cost model anything. It's basically an "infinite" bonus that ensures we always inline the last call to a local. The fact that it's not literally infinite just prevents inlining of huge functions, which can easily result in scalability issues. I very much doubt that it was an intentional cost-modelling choice to say that getting rid of a small local function is worth adding 15000 instructions elsewhere, yet this is exactly how this value is getting used here. The main alternative I see to complete removal is to change deferred inlining to an actual either/or decision. That is, to mark deferred calls as noinline so we're actually trading off one inlining decision against another, and not just adding a side-channel to the cost model to do both. Apart from fixing the catastrophic inlining case, the effect on rustc is a modest compile-time improvement on average (up to 8% for a parsing-type crate, where tree-like calls are expected) and pretty neutral where run-time performance is concerned (mix of small wins and losses, usually in the sub-1% category). Differential Revision: https://reviews.llvm.org/D115497	2021-12-16 09:59:50 +01:00
Mircea Trofin	db5aceb979	[NFC] Expose the ReleaseModeModelRunner The type was pretty much generic, just needed a bit of parameterization. Differential Revision: https://reviews.llvm.org/D115764	2021-12-15 23:21:58 -08:00
Fangrui Song	cf9e61a9bb	[LTO][WPD] Simplify mustBeUnreachableFunction and test after D115492 An well-formed IR function definition must have an entry basic block and a well-formed IR basic block must have one terminator so the emptiness check can be simplified. Also simplify the test a bit. Reviewed By: luna Differential Revision: https://reviews.llvm.org/D115780	2021-12-15 15:43:35 -08:00
Arthur Eubanks	5a81a60391	[NFC] Remove more calls to getAlignment() These are deprecated and should be replaced with getAlign(). Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.	2021-12-15 14:40:57 -08:00
Mingming Liu	09a704c5ef	[LTO] Ignore unreachable virtual functions in WPD in hybrid LTO. Differential Revision: https://reviews.llvm.org/D115492	2021-12-14 20:18:04 +00:00
Philip Reames	423f19680a	Add FMF to hasPoisonGeneratingFlags/dropPoisonGeneratingFlags These flags are documented as generating poison values for particular input values. As such, we should really be consistent about their handling with how we handle nsw/nuw/exact/inbounds. Differential Revision: https://reviews.llvm.org/D115460	2021-12-14 08:43:00 -08:00
Florian Hahn	ddfac0759c	Revert "[MemoryLocation] Handle memset_pattern{4,8,16} in getForDest." This reverts commit `ac60263ad1`. It looks like the test fails on certain non-Darwin system, even though the triple is explicitly set to macos. Revert while I investigate.	2021-12-14 14:48:47 +00:00
Nikita Popov	7abf299fed	[InlineAdvisor] Add option to control deferred inlining (NFC) This change is split out from D115497 to add the option independently from the switch of the default value.	2021-12-14 15:46:11 +01:00
Florian Hahn	ac60263ad1	[MemoryLocation] Handle memset_pattern{4,8,16} in getForDest. memset_pattern{4,8,16} writes to the first argument. Use getForDest to return the corresponding MemoryLocation. Reviewed By: ab Differential Revision: https://reviews.llvm.org/D114906	2021-12-14 14:41:28 +00:00
Kazu Hirata	d2377f24e1	Ensure newlines at the end of files (NFC)	2021-12-12 11:04:44 -08:00
Nikita Popov	9932d4db0d	[SCEV] Fix unused variable warning (NFC)	2021-12-11 21:03:54 +01:00
Mircea Trofin	04f2712ef4	[NFC][MLGO] Factor ModelUnderTrainingRunner for reuse This is so we may reuse it. It was very non-inliner specific already. Differential Revision: https://reviews.llvm.org/D115465	2021-12-10 11:24:15 -08:00
Nikita Popov	65bec04295	[ConstantFold] Handle same type in ConstantFoldLoadThroughBitcast Usually the case where the types are the same ends up being handled fine because it's legal to do a trivial bitcast to the same type. However, this is not true for aggregate types. Short-circuit the whole code if the types match exactly to account for this.	2021-12-10 16:39:50 +01:00
Sameer Sahasrabuddhe	1d0244aed7	Reapply CycleInfo: Introduce cycles as a generalization of loops Reverts `02940d6d22`. Fixes breakage in the modules build. LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops. The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR. This review is a restart of an older review request: https://reviews.llvm.org/D83094 Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> Differential Revision: https://reviews.llvm.org/D112696	2021-12-10 14:36:43 +05:30
Hasyimi Bahrudin	c1cd698a52	[InstSimplify] Simplify bool icmp with not in LHS Refer to https://llvm.org/PR52546. Simplifies the following cases: not(X) == 0 -> X != 0 -> X not(X) <=u 0 -> X >u 0 -> X not(X) >=s 0 -> X <s 0 -> X not(X) != 1 -> X == 1 -> X not(X) <=u 1 -> X >=u 1 -> X not(X) >s 1 -> X <=s -1 -> X Differential Revision: https://reviews.llvm.org/D114666	2021-12-09 16:26:46 -05:00
Arthur Eubanks	1172712f46	[NFC] Replace some deprecated getAlignment() calls with getAlign() Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D115370	2021-12-09 08:43:19 -08:00
Nikita Popov	3beafecedf	[InlineAdvisor] Remove outdated comment (NFC) This just returns None nowadays, so this comment doesn't apply anymore.	2021-12-09 15:11:56 +01:00
Florian Hahn	d74a8a78ad	[LV] Mark various functions as const (NFC). Make sure various accessors do not modify any state, in preparation for D115111.	2021-12-09 10:51:29 +00:00
Mircea Trofin	059e03476c	[NFC][mlgo] Generalize model runner interface This prepares it for the regalloc work. Part of it is making model evaluation accross 'development' and 'release' scenarios more reusable. This patch: - extends support to tensors of any shape (not just scalars, like we had in the inliner -Oz case). While the tensor shape can be anything, we assume row-major layout and expose the tensor as a buffer. - exposes the NoInferenceModelRunner, which we use in the 'development' mode to keep the evaluation code path consistent and simplify logging, as we'll want to reuse it in the regalloc case. Differential Revision: https://reviews.llvm.org/D115306	2021-12-08 20:10:58 -08:00
Florian Hahn	3c55acc4a6	[MemoryLocation] Support memset_pattern{4,8} in getForArgument. memset_pattern{4,8} behave as memset_pattern16, with the only difference being the size of the pattern location. Reviewed By: ab Differential Revision: https://reviews.llvm.org/D114905	2021-12-08 19:39:45 +00:00
Jolanta Jensen	77b2bb5567	[LAA] Use type sizes when determining dependence. In the isDependence function the code does not try hard enough to determine the dependence between types. If the types are different it simply gives up, whereas in fact what we really care about are the type sizes. I've changed the code to compare sizes instead of types. Reviewed By: fhahn, sdesmalen Differential Revision: https://reviews.llvm.org/D108763	2021-12-08 15:00:58 +00:00
James Farrell	219672b8dd	Revert "Revert "Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible."" This reverts commit `63a6348cad`. Differential Revision: https://reviews.llvm.org/D115254	2021-12-07 23:15:21 +00:00
Jonas Devlieghere	02940d6d22	Revert "CycleInfo: Introduce cycles as a generalization of loops" This reverts commit `0fe61ecc2c` because it breaks the modules build. https://green.lab.llvm.org/green/job/clang-stage2-rthinlto/4858/ https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/39112/	2021-12-07 13:06:34 -08:00
Sanjay Patel	8a69b04478	[InstSimplify] add logic fold for 'or' with 'xor'+'and' This replaces the 'or' from `4b30076f16` with an 'and'. We have to guard against propagating undef elements from vector 'not' values: https://alive2.llvm.org/ce/z/irMwRc	2021-12-07 11:08:26 -05:00
Cullen Rhodes	0395e01583	[IR] Split vscale_range interface Interface is split from: std::pair<unsigned, unsigned> getVScaleRangeArgs() into separate functions for min/max: unsigned getVScaleRangeMin(); Optional<unsigned> getVScaleRangeMax(); Reviewed By: sdesmalen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D114075	2021-12-07 10:38:26 +00:00
Sameer Sahasrabuddhe	0fe61ecc2c	CycleInfo: Introduce cycles as a generalization of loops LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops. The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR. This review is a restart of an older review request: https://reviews.llvm.org/D83094 Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> Differential Revision: https://reviews.llvm.org/D112696	2021-12-07 12:02:34 +05:30
James Farrell	63a6348cad	Revert "Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible." This reverts commit `5032467034`.	2021-12-06 17:35:26 +00:00
Bardia Mahjour	dfcfd14070	[VP] getVPMemoryOpCost interface Added TTI queries for the cost of a VP Memory operation, and added Opcode, DataType and Alignment to the hasActiveVectorLength() interface. Reviewed By: Roland Froese Differential Revision: https://reviews.llvm.org/D109416	2021-12-06 11:27:07 -05:00
James Farrell	5032467034	Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible. This reverts commit `40d5eeac6c`. Differential Revision: https://reviews.llvm.org/D114885	2021-12-06 14:57:47 +00:00
Kazu Hirata	1457e78352	[llvm] Use range-based for loops (NFC)	2021-12-05 08:33:02 -08:00
Sanjay Patel	c65e651e60	[InstSimplify] fix logic fold of 'or' for vectors Reduce code duplication for commutative pattern matching and fix a miscompile. We can't safely propagate an undef element in this transform: https://alive2.llvm.org/ce/z/s5xy55	2021-12-05 09:57:07 -05:00
Florian Hahn	203f29b40c	[MemoryLocation] Use getForArgument in getForSource/getForDest. (NFC) getForArgument already knows how to extract a memory location for all memory intrinsics. Use it instead of duplicating the logic.	2021-12-05 11:13:14 +00:00
Florian Hahn	a9125792b3	[MemoryLocation] Support missing atomic intrinsics in getForArg. getForArgument is missing support for atomic memory transfer intrinsics. In terms of accessed locations they behave like regular memory transfer intrinsics and we already support them as such in getForSource/getForDest.	2021-12-04 22:18:39 +00:00
Mehrnoosh Heidarpour	e94134052f	[InstSimplify] Add logic 'or' fold to -1 Adding the following folding opportunity: (~A \| B) \| (A ^ B) --> -1 https://alive2.llvm.org/ce/z/PMtdYB Differential revision: https://reviews.llvm.org/D114996	2021-12-04 15:04:18 -05:00
Florian Hahn	ead3979a92	[MemoryLocation] Move DSE intrinsic handling to MemoryLocation. (NFC) Suggested in D114872.	2021-12-03 16:00:39 +00:00
David Green	ab0c5cea0b	[ARM] Use v2i1 for MVE and CDE intrinsics This adjusts all the MVE and CDE intrinsics now that v2i1 is a legal type, to use a <2 x i1> as opposed to emulating the predicate with a <4 x i1>. The v4i1 workarounds have been removed leaving the natural v2i1 types, notably in vctp64 which now generates a v2i1 type. AutoUpgrade code has been added to upgrade old IR, which needs to convert the old v4i1 to a v2i1 be converting it back and forth to an integer with arm.mve.v2i and arm.mve.i2v intrinsics. These should be optimized away in the final assembly. Differential Revision: https://reviews.llvm.org/D114455	2021-12-03 15:27:58 +00:00
Florian Hahn	af86aa7980	[MemoryLocation] Use None instead of {}. (NFC)	2021-12-03 13:19:00 +00:00
Florian Hahn	f078536f46	[MemoryLocation] Move DSE's logic to new MemLoc::getForDest helper (NFC). DSE has some extra logic to determine the write location of library calls like strcpy and strcat. This patch moves the logic to a new MemoryLocation:getForDest variant, which takes a call and TLI. This patch should be NFC, because no other places take advantage of the new helper yet. Suggested by @reames post-commit `7eec832def`. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114872	2021-12-03 09:12:01 +00:00
Nikita Popov	49d040ac97	[SCEV] Fix ValuesAtScopesUsers consistency Fixes verification failure reported at: https://reviews.llvm.org/rGc9f9be0381d1 The issue is that getSCEVAtScope() might compute a result without inserting it in the ValuesAtScopes map in degenerate cases, specifically if the ValuesAtScopes entry is invalidated during the calculation. Arguably we should still insert the result if no existing placeholder is found, but for now just tweak the logic to only update ValuesAtScopesUsers if ValuesAtScopes is updated.	2021-12-03 10:03:10 +01:00
Florian Hahn	829b29b619	[MemoryLocation] strcat/strncat/strcpy read/write after their args. strcpy/strcat/strncat access memory starting from the passed in pointers. Construct memory locations for their args using getAfter. Discussed in D114872. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D114969	2021-12-03 08:48:23 +00:00
Florian Hahn	639a78a4bf	[MemoryLocation] Support strncpy in getForArgument. The size argument of strncpy can be used as bound for the size of its pointer arguments. strncpy is guaranteed to write N bytes and reads up to N bytes. Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D114871	2021-12-02 14:18:05 +00:00
Sanjay Patel	97e921c81f	[PatternMatch] create and use matcher for 'not' that excludes undef elements We needed a stricter version of m_Not for D114462, but I wasn't sure if that was going to be required anywhere else, so I didn't bother to make that reusable. It turns out we have one more existing simplification that needs this (currently miscompiles): https://alive2.llvm.org/ce/z/9-nTKi And there's at least one more fold in that family that we could add. Differential Revision: https://reviews.llvm.org/D114882	2021-12-02 08:51:13 -05:00
Florian Hahn	9f9e8ba114	[MemoryLocation] Support memset_chk in getForArgument. The size argument for memset_chk is an upper bound for the size of the pointer argument. memset_chk may write less than the specified length, if it exceeds the specified max size and aborts. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114870	2021-12-02 13:45:58 +00:00
Florian Hahn	ad88a37cea	[TLI] Add memset_pattern4, memset_pattern8 lib functions. Similar to memset_pattern16, memset_pattern4, memset_pattern8 are available on Darwin platforms. https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/memset_pattern4.3.html Reviewed By: ab Differential Revision: https://reviews.llvm.org/D114881	2021-12-01 21:18:19 +00:00
Nikita Popov	67704801c6	[SCEV] Track backedge taken count users (NFCI) Track which SCEVs are used as ExactNotTaken counts in BackedgeTakenInfo structures, so we can directly determine which loops need to be invalidated, rather than iterating over all BECounts. This gives a small compile-time improvement on average, but the motivation here is more to ensure there are no degenerate cases, if the number of backedge taken counts is large. Differential Revision: https://reviews.llvm.org/D114784	2021-12-01 10:16:47 +01:00
Nikita Popov	c9f9be0381	[SCEV] Verify integrity of ValuesAtScopes and users (NFC) Make sure that ValuesAtScopes and ValuesAtScopesUsers are consistent during SCEV verification.	2021-11-30 21:08:40 +01:00
Sanjay Patel	4b30076f16	[InstSimplify] add logic fold for 'or' https://alive2.llvm.org/ce/z/4PaPDy There's a related fold where the inner 'or' is replaced by 'and', but that needs to be more careful about matching a 'not'.	2021-11-30 14:08:54 -05:00
Sanjay Patel	c49ef1448d	[InstSimplify] reduce code duplication for 'or' logic folds; NFC	2021-11-30 14:08:54 -05:00
Sanjay Patel	7a7c059d86	[InstSimplify] reduce code duplication for 'or' logic fold; NFC	2021-11-30 12:55:37 -05:00
Sanjay Patel	8dec0b23da	[InstSimplify] refactor 'or' logic folds; NFC Reduce duplication for handling the top-level commuted operands. There are several other folds that should be moved in here, but we need to make sure there's good test coverage.	2021-11-30 12:55:36 -05:00
Nikita Popov	40d5eeac6c	Revert "Use VersionTuple for parsing versions in Triple. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible." This reverts commit `1e82864670`. llvm/test/Transforms/LoopStrengthReduce/X86/2009-11-10-LSRCrash.ll fails with assertion failure: llc: /home/nikic/llvm-project/llvm/include/llvm/ADT/Optional.h:196: T& llvm::optional_detail::OptionalStorage<T, true>::getValue() & [with T = unsigned int]: Assertion `hasVal' failed. ... #8 0x00005633843af5cb llvm::MCStreamer::emitVersionForTarget(llvm::Triple const&, llvm::VersionTuple const&) #9 0x0000563383b47f14 llvm::AsmPrinter::doInitialization(llvm::Module&)	2021-11-30 18:36:32 +01:00
kpyzhov	a356dae74c	[RegionPass] Added check for -filter-print-funcs option to the region IR dumps. Differential Revision: https://reviews.llvm.org/D114310	2021-11-30 12:30:15 -05:00
Nikita Popov	37d72991c1	[SCEV] Track and invalidate ValuesAtScopes users ValuesAtScopes maps a SCEV and a Loop to another SCEV. While we invalidate entries if the left-hand SCEV is invalidated, we currently don't do this for the right-hand SCEV. Fix this by tracking users in a reverse map and using it for invalidation. This is conceptually the same change as D114738, but using the reverse map to avoid performance issues. Differential Revision: https://reviews.llvm.org/D114788	2021-11-30 18:21:14 +01:00
James Farrell	1e82864670	Use VersionTuple for parsing versions in Triple. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible. See also https://github.com/android/ndk/issues/1455. Differential Revision: https://reviews.llvm.org/D114163	2021-11-30 15:44:23 +00:00
Nikita Popov	77dd579827	[SCEV] Remove incorrect assert Fix assertion failure reported on D113349 by removing the assert. While the produced expression should be equivalent, it may not be strictly the same, e.g. due to lazy nowrap flag updates. Similar to what the main createSCEV() code does, simply retain the old value map entry if one already exists.	2021-11-29 17:09:12 +01:00
Florian Hahn	7b75110fac	[SCEV] Turn validity check in getExistingSCEV into assert (NFC). Now that we track users of SCEV expressions, we should be able to always invalidate containing expressions. With that, I think the case where a value gets removed but SCEVs containing references to it should not be possible any longer. Turn check into an assert. This slightly reduces compile-time: NewPM-O3: -0.27% NewPM-ReleaseThinLTO: -0.21% NewPM-ReleaseLTO-g: -0.26% http://llvm-compile-time-tracker.com/compare.php?from=c3dc6b081da6ba503e67d260033f81f61eb38ea3&to=95a4a028b1f1dd0bc3d221435953b7d2c031b3d5&stat=instructions Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114633	2021-11-28 12:16:55 +00:00
Nikita Popov	f492a414ba	[SCEV] Simplify forgetSymbolicName() (NFCI) With the recently introduced tracking as well as D113349, we can greatly simplify forgetSymbolicName(). In fact, we can simply replace it with forgetMemoizedResults(). What forgetSymbolicName() used to do is to walk the IR use-def chain to find all SCEVs that mention the SymbolicName. However, thanks to use tracking, we can now determine the relevant SCEVs in a more direct way. D113349 is needed to also clear out the actual IR to SCEV mapping in ValueExprMap. Differential Revision: https://reviews.llvm.org/D114263	2021-11-27 16:42:38 +01:00
Nikita Popov	c2550e3427	[SCEV] Simplify invalidation after BE count calculation (NFCI) After backedge taken counts have been calculated, we want to invalidate all addrecs and dependent expressions in the loop, because we might compute better results with the newly available backedge taken counts. Previously this was done with a forgetLoop() style use-def walk. With recent improvements to SCEV invalidation, we can instead directly invalidate any SCEVs using addrecs in this loop. This requires a great deal less subtlety to avoid invalidating more than necessary, and in particular gets rid of the hack from D113349. The change is similar to D114263 in spirit.	2021-11-27 16:35:06 +01:00
Nikita Popov	2b160e95c8	Reland [SCEV] Fix and validate ValueExprMap/ExprValueMap consistency Relative to the previous landing attempt, this introduces an additional flag on forgetMemoizedResults() to not remove SCEVUnknown phis from the value map. The invalidation after BECount calculation wants to leave these alone and skips them in its own use-def walk, but we can still end up invalidating them via forgetMemoizedResults() if there is another IR value with the same SCEV. This is intended as a temporary workaround only, and the need for this should go away once the getBackedgeTakenInfo() invalidation is refactored in the spirit of D114263. ----- This adds validation for consistency of ValueExprMap and ExprValueMap, and fixes identified issues: * Addrec construction directly wrote to ValueExprMap in a few places, without updating ExprValueMap. Add a helper to ensures they stay consistent. The adjustment in forgetSymbolicName() explicitly drops the old value from the map, so that we don't rely on it being overwritten. * forgetMemoizedResultsImpl() was dropping the SCEV from ExprValueMap, but not dropping the corresponding entries from ValueExprMap. Differential Revision: https://reviews.llvm.org/D113349	2021-11-27 12:37:15 +01:00
Erik Desjardins	53b00b8215	[InstSimplify] Fold X {lshr,udiv} C <u X --> true for nonzero X, non-identity C This eliminates the bounds check in Rust code like pub fn mid(data: &[i32]) -> i32 { if data.is_empty() { return 0; } return data[data.len()/2]; } (from https://blog.sigplan.org/2021/11/18/undefined-behavior-deserves-a-better-reputation/) Alive proofs: lshr https://alive2.llvm.org/ce/z/nyTu8D udiv https://alive2.llvm.org/ce/z/CNUZH7 Differential Revision: https://reviews.llvm.org/D114279	2021-11-26 16:48:33 -05:00
Nikita Popov	719354a571	Revert "[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency" This reverts commit `bee8dcda1f`. Some sanitizer buildbots fail with: > Attempt to use a SCEVCouldNotCompute object! For example: https://lab.llvm.org/buildbot/#/builders/85/builds/7020/steps/9/logs/stdio	2021-11-26 22:18:23 +01:00
Nikita Popov	bee8dcda1f	[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency Relative to the previous landing attempt, this makes insertValueToMap() resilient against the value already being present in the map -- previously I only checked this for the createSimpleAffineAddRec() case, but the same issue can also occur for the general createNodeForPHI(). In both cases, the addrec may be constructed and added to the map in a recursive query trying to create said addrec. In this case, this happens due to the invalidation when the BE count is computed, which ends up clearing out the symbolic name as well. ----- This adds validation for consistency of ValueExprMap and ExprValueMap, and fixes identified issues: * Addrec construction directly wrote to ValueExprMap in a few places, without updating ExprValueMap. Add a helper to ensures they stay consistent. The adjustment in forgetSymbolicName() explicitly drops the old value from the map, so that we don't rely on it being overwritten. * forgetMemoizedResultsImpl() was dropping the SCEV from ExprValueMap, but not dropping the corresponding entries from ValueExprMap. Differential Revision: https://reviews.llvm.org/D113349	2021-11-26 20:57:47 +01:00
Florian Hahn	b927aa69bf	[SCEV] Turn check in createSimpleAffineAddRec to assertion. (NFC) Accum is guaranteed to be defined outside L (via Loop::isLoopInvariant checks above). I think that should guarantee that the more powerful ScalarEvolution::isLoopInvariant also determines that the value is loop invariant. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114634	2021-11-26 13:23:48 +00:00
Zarko Todorovski	95875d246a	[LLVM][NFC]Inclusive language: remove occurances of sanity check/test from llvm Part of work to use more inclusive language in clang/llvm. Rewording some comments and change function and variable names.	2021-11-24 17:29:55 -05:00
Peter Waller	787b66eb5f	[LoopAccessAnalysis][SVE] Bail out for scalable vectors The supplied test case, reduced from real world code, crashes with a 'Invalid size request on a scalable vector.' error. Since it's similar in spirit to an existing LAA test, rename the file to generalize it to both. Differential Revision: https://reviews.llvm.org/D114155	2021-11-24 15:52:20 +00:00
Sanjay Patel	b326c05814	[InstSimplify] fold xor logic of 2 variables, part 2 (~a & b) ^ (a \| b) --> a This is the swapped and/or (Demorgan?) sibling fold for the fold added with D114462 ( `892648b18a` ). This case is easier to specify because we are returning a root value, not a 'not': https://alive2.llvm.org/ce/z/SRzj4f	2021-11-24 08:15:47 -05:00
Rosie Sumpter	c2441b6b89	[LoopVectorize] Add vector reduction support for fmuladd intrinsic Enables LoopVectorize to handle reduction patterns involving the llvm.fmuladd intrinsic. Differential Revision: https://reviews.llvm.org/D111555	2021-11-24 08:50:04 +00:00
Florian Mayer	6c06d8e310	[stack-safety] Check SCEV constraints at memory instructions. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D113160	2021-11-23 15:29:23 -08:00
Florian Hahn	73a05cc8df	[LAA] Move visitPointers up in file (NFC). This allows easier re-use in earlier functions.	2021-11-23 22:47:26 +00:00
Sanjay Patel	892648b18a	[InstSimplify] fold xor logic of 2 variables (a & b) ^ (~a \| b) --> ~a I was looking for a shortcut to reduce some of the complex logic folds that are currently up for review (D113216 and others in that stack), and I found this missing from instcombine/instsimplify. There is a trade-off in putting it into instsimplify: because we can't create new values here, we need a strict 'not' op (no undef elements). Otherwise, the fold is not valid: https://alive2.llvm.org/ce/z/k_AGGj If this was in instcombine instead, we could create the proper 'not'. But having the fold here benefits other passes like GVN that use instsimplify as an analysis. There is a related fold where 'and' and 'or' are swapped, and that is planned as a follow-up commit. Differential Revision: https://reviews.llvm.org/D114462	2021-11-23 16:50:23 -05:00
Florian Hahn	0a00d64e32	[LAA] Turn aggregate type check into assertion (NFCI). getPtrStride should not be called with aggregate access types. There's also an old TODO. Turn the check into an assertion.	2021-11-23 17:37:30 +00:00
Paul Robinson	c075566c8d	[PS4][TLI] Remove redundant line	2021-11-23 08:42:32 -08:00
Nikita Popov	62e9acad0a	Revert "[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency" This reverts commit `d633db8f9d`. Causes bootstrap assertion failures: https://lab.llvm.org/buildbot/#/builders/168/builds/3459/steps/9/logs/stdio	2021-11-22 15:47:33 +01:00
Nikita Popov	d633db8f9d	[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency This adds validation for consistency of ValueExprMap and ExprValueMap, and fixes identified issues: * Addrec construction directly wrote to ValueExprMap in a few places, without updating ExprValueMap. Add a helper to ensures they stay consistent. The adjustment in forgetSymbolicName() explicitly drops the old value from the map, so that we don't rely on it being overwritten. * forgetMemoizedResultsImpl() was dropping the SCEV from ExprValueMap, but not dropping the corresponding entries from ValueExprMap. Differential Revision: https://reviews.llvm.org/D113349	2021-11-22 15:27:25 +01:00
Simon Moll	56db1c072c	[DA][NFC] Update publication - add remarks Update the reference publication for the SyncDependenceAnalysis and Divergence Analysis. Fix phrasing, formatting. Add comments on reducible loop limitation. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D114146	2021-11-22 12:58:19 +01:00
Sjoerd Meijer	4d21b64464	[BPI] Look-up tables for non-loop branches. NFC. This adds and uses look-up tables for non-loop branch probabilities, which have have probabilities directly encoded into the tables for the different condition codes. Compared to having this logic inlined in different functions, as it used to be the case, I think this is compacter and thus also easier to check/cross reference. This also adds a test for pointer heuristics that was missing. Differential Revision: https://reviews.llvm.org/D114009	2021-11-22 10:30:42 +00:00
Kazu Hirata	f6bce30cf9	[llvm] Use range-based for loops (NFC)	2021-11-20 18:42:10 -08:00
Nikita Popov	0a2bde94a0	[LVI] Drop requirement that modulus is constant If we're looking only at the lower bound, the actual modulus doesn't matter. This is a leftover from when I wanted to consider the upper bound as well, where the modulus does matter.	2021-11-20 21:06:08 +01:00
Nikita Popov	cd84cab6b3	[LVI] Support urem in implied conditions If (X urem M) >= C we know that X >= C. Make use of this fact when computing the implied condition range. In some cases we could also establish an upper bound, but that's both tricker and not interesting in practice. Alive: https://alive2.llvm.org/ce/z/R5ZGSW	2021-11-20 21:01:26 +01:00
Philip Reames	28000587e1	[SCEV] Revert two speculative compile time optimizations which made no difference Revert "[SCEV] Defer all work from `ea12c2cb` as late as possible" Revert "[SCEV] Defer loop property checks from `ea12c2cb` as late as possible" This reverts commit `734abbad79` and `1a5666acb2`. Both of these changes were speculative attempts to address a compile time regression. Neither worked, and both complicated the code in undesirable ways.	2021-11-19 08:45:56 -08:00
Philip Reames	734abbad79	[SCEV] Defer all work from `ea12c2cb` as late as possible This is a second speculative compile time optimization to address a reported regression. My actual suspicion is that availability of no-self-wrap is making some other bit of code trigger, but let's rule this out.	2021-11-18 17:19:52 -08:00
Philip Reames	1a5666acb2	[SCEV] Defer loop property checks from `ea12c2cb` as late as possible This is a speculative compile time optimization to address a reported regression. It's the only thing which vaguely makes sense.	2021-11-18 13:47:45 -08:00
Philip Reames	ea12c2cb9c	[SCEV] Move mustprogress based no-self-wrap logic so it applies to all exit conditions This change moves logic which we'd added specifically for less than tests so that it applies to equalities and greater than tests as well. The basic idea is that if we can show an IV cycles infinitely through the same series on self-wrap, and that the exit condition must be taken to prevent UB, we can conclude that it must be taken before self-wrap and thus infer said flag. The motivation here is simple loops with unsigned induction variables w/non-one steps and inequality tests. A toy example would be: for (unsigned i = 0; i != N; i += 2) { body; } If body contains no side effects, and this is a mustprogress function, we can assume that this must be a finite loop and thus that the exit count is N/2. Differential Revision: https://reviews.llvm.org/D103991	2021-11-18 10:07:44 -08:00
Kazu Hirata	7ca14f6044	[llvm] Use range-based for loops (NFC)	2021-11-18 09:09:52 -08:00
Kerry McLaughlin	ff64b2933a	[LoopVectorize] Check the number of uses of an FAdd before classifying as ordered checkOrderedReductions looks for Phi nodes which can be classified as in-order, meaning they can be vectorised without unsafe math. In order to vectorise the reduction it should also be classified as in-loop by getReductionOpChain, which checks that the reduction has two uses. In this patch, a similar check is added to checkOrderedReductions so that we now return false if there are more than two uses of the FAdd instruction. This fixes PR52515. Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D114002	2021-11-18 16:41:19 +00:00
Florian Hahn	da9f2ba3b1	[SCEV] Reorder operands checks in collectConditions. The initial two cases require a SCEVConstant as RHS. Pull up the condition to check and swap SCEVConstants from below. Also remove a redundant check & swap if RHS is SCEVUnknown.	2021-11-18 09:36:16 +00:00
Philip Reames	ad69402f3e	[SCEVAA] Avoid forming malformed pointer diff expressions This solves the same crash as in D104503, but with a different approach. The test case test_non_dom demonstrates a case where scev-aa crashes today. (If exercised either by -eval-aa or -licm.) The basic problem is that SCEV-AA expects to be able to compute a pointer difference between two SCEVs for any two pair of pointers we do an alias query on. For (valid, but out of scope) reasons, we can end up asking whether expressions in different sub-loops can alias each other. This results in a subtraction expression being formed where neither operand dominates the other. The approach this patch takes is to leverage the "defining scope" notion we introduced for flag semantics to detect and disallow the formation of the problematic SCEV. This ends up being relatively straight forward on that new infrastructure. This change does hint that we should probably be verifying a similar property for all SCEVs somewhere, but I'll leave that to a follow on change. Differential Revision: D114112	2021-11-17 12:38:04 -08:00
Arthur Eubanks	e3e25b5112	[NewPM] Add option to prevent rerunning function pipeline on functions in CGSCC adaptor In a CGSCC pass manager, we may visit the same function multiple times due to SCC mutations. In the inliner pipeline, this results in running the function simplification pipeline on a function multiple times even if it hasn't been changed since the last function simplification pipeline run. We use a newly introduced analysis to keep track of whether or not a function has changed since the last time the function simplification pipeline has run on it. If we see this analysis available for a function in a CGSCCToFunctionPassAdaptor, we skip running the function passes on the function. The analysis is queried at the end of the function passes so that it's available after the first time the function simplification pipeline runs on a function. This is a per-adaptor option so it doesn't apply to every adaptor. The goal of this is to improve compile times. However, currently we can't turn this on by default at least for the higher optimization levels since the function simplification pipeline is not robust enough to be idempotent in many cases, resulting in performance regressions if we stop running the function simplification pipeline on a function multiple times. We may be able to turn this on for -O1 in the near future, but turning this on for higher optimization levels would require more investment in the function simplification pipeline. Heavily inspired by D98103. Example compile time improvements with flag turned on: https://llvm-compile-time-tracker.com/compare.php?from=998dc4a5d3491d2ae8cbe742d2e13bc1b0cacc5f&to=5c27c913687d3d5559ef3ab42b5a3d513531d61c&stat=instructions Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D113947	2021-11-17 09:06:46 -08:00
Florian Hahn	e8b55cf7b7	[SCEV] Apply loop guards when computing max BTC for arbitrary steps. Similar other cases in the current function (e.g. when the step is 1 or -1), applying loop guards can lead to tighter upper bounds for the backedge-taken counts. Fixes PR52464. Reviewed By: reames, nikic Differential Revision: https://reviews.llvm.org/D113578	2021-11-17 11:00:49 +00:00
Philip Reames	8d85e945b2	[SCEV] Canonicalize X - urem X, Y patterns There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could get different results. The sub representation appears to produce strictly inferior results in practice, so I decided to canonicalize to the Y * X/Y version. The motivation here is that runtime unroll produces the sub X - (and X, Y-1) pattern when Y is a power of two. SCEV is thus unable to recognize that an unrolled loop exits because we don't figure out that the new unrolled step evenly divides the trip count of the unrolled loop. After instcombine runs, we convert the the andn form which SCEV recognizes, so essentially, this is just fixing a nasty pass ordering dependency. The ARM loop hardware interaction in the test diff is opague to me, but the comments in the review from others knowledge of the infrastructure appear to indicate these are improvements in loop recognition, not regressions. Differential Revision: https://reviews.llvm.org/D114018	2021-11-16 11:59:21 -08:00
Arthur Eubanks	c95a9f46c9	[Loads] Handle addrspacecast constant expressions when determining dereferenceability Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114015	2021-11-16 11:17:57 -08:00
Florian Hahn	b7aec4f08e	[SCEV] Support rewriting ZExt expressions with loop guard info. So far, applying loop guard information has been restricted to SCEVUnknown. In a few cases, like PR40961 and PR52464, this leads to SCEV failing to determine tight upper bounds for the backedge taken count. This patch adjusts SCEVLoopGuardRewriter and applyLoopGuards to support re-writing ZExt expressions. This is a first step towards fixing PR40961 and PR52464. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D113577	2021-11-16 11:16:07 +00:00
Mehrnoosh Heidarpour	62c51a72f9	[InstSimplify] Fold A\|B \| (A^B) --> A\|B This patch adds the following fold opportunity: A\|B \| (A^B) --> A\|B that is reported here : https://bugs.llvm.org/show_bug.cgi?id=52479 https://alive2.llvm.org/ce/z/33-My- Test cases with base results are added in D113860 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D113861	2021-11-15 18:55:04 -05:00
Stanislav Mekhanoshin	833cdb0a07	Revert "[InstSimplify] Fold A\|B \| (A^B) --> A\|B" This reverts commit `193c40e966`.	2021-11-15 14:56:20 -08:00
Arthur Eubanks	19867de9e7	[NewPM] Only invalidate modified functions' analyses in CGSCC passes + turn on eagerly invalidate analyses Previously, any change in any function in an SCC would cause all analyses for all functions in the SCC to be invalidated. With this change, we now manually invalidate analyses for functions we modify, then let the pass manager know that all function analyses should be preserved since we've already handled function analysis invalidation. So far this only touches the inliner, argpromotion, function-attrs, and updateCGAndAnalysisManager(), since they are the most used. This is part of an effort to investigate running the function simplification pipeline less on functions we visit multiple times in the inliner pipeline. However, this causes major memory regressions especially on larger IR. To counteract this, turn on the option to eagerly invalidate function analyses. This invalidates analyses on functions immediately after they're processed in a module or scc to function adaptor for specific parts of the pipeline. Within an SCC, if a pass only modifies one function, other functions in the SCC do not have their analyses invalidated, so in later function passes in the SCC pass manager the analyses may still be cached. It is only after the function passes that the eager invalidation takes effect. For the default pipelines this makes sense because the inliner pipeline runs the function simplification pipeline after all other SCC passes (except CoroSplit which doesn't request any analyses). Overall this has mostly positive effects on compile time and positive effects on memory usage. https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructions https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss D113196 shows that we slightly regressed compile times in exchange for some memory improvements when turning on eager invalidation. D100917 shows that we slightly improved compile times in exchange for major memory regressions in some cases when invalidating less in SCC passes. Turning these on at the same time keeps the memory improvements while keeping compile times neutral/slightly positive. Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D113304	2021-11-15 14:44:53 -08:00
Stanislav Mekhanoshin	193c40e966	[InstSimplify] Fold A\|B \| (A^B) --> A\|B This patch adds the following fold opportunity: A\|B \| (A^B) --> A\|B that is reported here : https://bugs.llvm.org/show_bug.cgi?id=52479 https://alive2.llvm.org/ce/z/33-My- Test cases with base results are added in D113860 (authored by MehrHeidar, committed by rampitec). Differential Revision: https://reviews.llvm.org/D113861	2021-11-15 13:49:20 -08:00
Florian Hahn	112c1c346a	[IVDescriptor] Make sure the sign is included for negative extension. At the moment, computeRecurrenceType does not include any sign bits in the maximum bit width. If the value can be negative, this means the sign bit will be missing and the sext won't properly extend the value. If the value can be negative, increment the bitwidth by one to make sure there is at least one sign bit in the result value. Note that the increment is also needed if the value is known to be negative, as a sign bit needs to be preserved for the sext to work. Note that this at the moment prevents vectorization, because the analysis computes i1 as type for the recurrence when looking through the AND in lookThroughAnd. Fixes PR51794, PR52485. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D113056	2021-11-15 13:12:57 +00:00
Kazu Hirata	d243cbf8ea	[llvm] Use isa instead of dyn_cast (NFC)	2021-11-14 19:40:46 -08:00
Mircea Trofin	a32c2c3808	[NFC] Use Optional<ProfileCount> to model invalid counts ProfileCount could model invalid values, but a user had no indication that the getCount method could return bogus data. Optional<ProfileCount> addresses that, because the user must dereference the optional. In addition, the patch removes concept duplication. Differential Revision: https://reviews.llvm.org/D113839	2021-11-14 19:03:30 -08:00
Kazu Hirata	7379736774	[llvm] Use range-based for loops with User::operands (NFC)	2021-11-14 09:32:38 -08:00
Roman Lebedev	e876698a5d	[NFC][TTI] `getReplicationShuffleCost()`: s/Replicated/Dst/ 'Replicated' is mouthful and somewhat ambigious, while 'destination' is pretty self-explanatory.	2021-11-14 20:01:38 +03:00
Florian Hahn	8ed8d37088	[SCEV] Update SCEVLoopGuardRewriter to hold reference to map. (NFC) SCEVLoopGuardRewriter doesn't need to copy the rewrite map. It can just hold a const reference instead, to avoid an unnecessary copy.	2021-11-13 09:39:14 +00:00
Florian Hahn	03cfea68c6	[SCEV] Update SCEVLoopGuardRewriter to take SCEV -> SCEV map (NFC). Split off refactoring from D113577 to reduce the diff. NFC as the new interface will only be used in D113577.	2021-11-12 18:16:03 +00:00
Florian Hahn	819bca9b90	[SCEV] Use APIntOps::umin to select best max BC count (NFC). Suggested in D102267, but I missed this in the committed version.	2021-11-12 12:20:01 +00:00
Mircea Trofin	f64eee1625	[NFC][InlineAdvisor] Inform advisor when the module is invalidated This avoids unnecessary re-calculation of module-wide features in the MLInlineAdvisor. In cases where function passes don't invalidate functions (and, thus, don't invalidate the module), but we re-process a CGSCC, we currently refreshed module features unnecessarily. The overhead of fetching cached results (albeit they weren't themselves invalidated) was noticeable in certain modules' compilations. We don't want to just invalidate the advisor object, though, via the analysis manager, because we'd then need to re-create expensive state (like the model evaluator in the ML 'development' mode). Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D113644	2021-11-11 10:23:49 -08:00
duanbo.db	53dc525828	[LoopInfo] Fix function getInductionVariable The way function gets the induction variable is by judging whether StepInst or IndVar in the phi statement is one of the operands of CMP. But if the LatchCmpOp0/LatchCmpOp1 is a constant, the subsequent comparison may result in null == null, which is meaningless. This patch fixes the typo. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D112980	2021-11-11 16:22:42 +08:00
Bin Cheng	bf76e64854	[BPI] Push exit block rather than exiting ones in getSccExitBlocks The function BranchProbabilityInfo::SccInfo::getSccExitBlocks is supposed to collect all exit blocks for SCC rather than all exiting blocks. This patch fixes the typo. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D113344	2021-11-11 14:22:19 +08:00
Chris Jackson	116dc70cf3	[DebugInfo][LSR] Add more stringent checks on IV selection and salvage attempts Prevent the selection of IVs that have a SCEV containing an undef. Also prevent salvaging attempts for values for which a SCEV could not be created by ScalarEvolution and have only SCEVUknown. Reviewed by: Orlando Differential Revision: https://reviews.llvm.org/D111810	2021-11-09 13:09:37 +00:00
Roman Lebedev	d484cc152b	[TTI] Adjust `getReplicationShuffleCost()` interface It is trivial to produce DemandedSrcElts given DemandedReplicatedElts, so don't pass the former. Also, it isn't really useful so far to have the overload taking the Mask, so just inline it.	2021-11-09 14:07:59 +03:00
Michael Liao	bf225939bc	[InferAddressSpaces] Support assumed addrspaces from addrspace predicates. - CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC. - This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g., ``` foo(float *p) { __builtin_assume(__isGlobal(p)); // From there, we could assume p is a global pointer instead of a // generic one. } ``` This makes the code portable without introducing the implementation-specific features. Note that NVCC starts to support __builtin_assume from version 11. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112041	2021-11-08 16:51:57 -05:00
Sander de Smalen	2829376bb2	[LV] Use VScaleForTuning to fine-tune the cost per lane. When targeting a specific CPU with scalable vectorization, the knowledge of that particular CPU's vscale value can be used to tune the cost-model and make the cost per lane less pessimistic. If the target implements 'TTI.getVScaleForTuning()', the cost-per-lane is calculated as: Cost / (VScaleForTuning * VF.KnownMinLanes) Otherwise, it assumes a value of 1 meaning that the behavior is unchanged and calculated as: Cost / VF.KnownMinLanes Reviewed By: kmclaughlin, david-arm Differential Revision: https://reviews.llvm.org/D113209	2021-11-08 16:59:46 +00:00
Nikita Popov	a8c318b50e	[BasicAA] Use index size instead of pointer size When accumulating the GEP offset in BasicAA, we should use the pointer index size rather than the pointer size. Differential Revision: https://reviews.llvm.org/D112370	2021-11-07 18:56:11 +01:00
Benjamin Kramer	9b8b16457c	Put implementation details into anonymous namespaces. NFCI.	2021-11-07 15:18:30 +01:00
Kazu Hirata	843d1eda18	[llvm] Use llvm::reverse (NFC)	2021-11-06 19:31:18 -07:00
Nikita Popov	e3cec17b2d	[InstSimplify] Remove incorrect icmp of gep fold (PR52429) As described in https://bugs.llvm.org/show_bug.cgi?id=52429 this fold is incorrect, because inbounds only guarantees that the pointers don't wrap in the unsigned space: It is possible that the sign boundary is crossed by an object. I'm dropping the fold entirely rather than adjusting it, because computePointerICmp() fully subsumes it (just with correct predicate handling). Differential Revision: https://reviews.llvm.org/D113343	2021-11-06 21:03:21 +01:00
Roman Lebedev	a30ec4778a	[TTI][CostModel] `getUserCost()`: recognize replication shuffles and query their cost This finally creates proper test coverage for replication shuffles, that are used by LV for conditional loads, and will allow to add proper costmodel at least for AVX512. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113324	2021-11-06 16:45:15 +03:00
Roman Lebedev	f8efc5c0ac	[NFC][TTI] Add/extract `getReplicationShuffleCost()` method, deduplicate it's implementations Hiding it in `getInterleavedMemoryOpCost()` is problematic for a number of reasons, including testability and reuse, let's do better. In a followup `getUserCost()` will be taught to use to to estimate the mask costs, which will allow for better cost model tests for it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113313	2021-11-06 16:45:15 +03:00
Kazu Hirata	87e53a0ad8	[llvm] Use make_early_inc_range (NFC)	2021-11-05 19:39:07 -07:00
Philip Reames	d24a0e8857	[SCEV] Use constant range of RHS to prove NUW on narrow IV in trip count logic The basic idea here is that given a zero extended narrow IV, we can prove the inner IV to be NUW if we can prove there's a value the inner IV must take before overflow which must exit the loop. Differential Revision: https://reviews.llvm.org/D109457	2021-11-05 15:36:47 -07:00
David Green	61225c0818	[ValueTracking][InstCombine] Introduce and use ComputeMinSignedBits This introduces a new ComputeMinSignedBits method for ValueTracking that returns the BitWidth - SignBits + 1 from ComputeSignBits, and represents the minimum bit size for the value as a signed integer. Similar to the existing APInt::getMinSignedBits method, this can make some of the reasoning around ComputeSignBits more natural. See https://reviews.llvm.org/D112298	2021-11-05 14:41:37 +00:00
Arthur Eubanks	7175886a0f	[NewPM] Make eager analysis invalidation per-adaptor Follow-up change to D111575. We don't need eager invalidation on every adaptor. Most notably, adaptors running passes that use very few analyses, or passes that purely invalidate specific analyses. Also allow testing of this via a pipeline string "function<eager-inv>()". The compile time/memory impact of this is very comparable to D111575. https://llvm-compile-time-tracker.com/compare.php?from=9a2eec512a29df45c90c2fcb741e9d5c693b1383&to=b9f20bcdea138060967d95a98eab87ce725b22bb&stat=instructions Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D113196	2021-11-04 17:16:11 -07:00
Liren Peng	57e093162e	[ScalarEvolution] Infer loop max trip count from array accesses Data references in a loop should not access elements over the statically allocated size. So we can infer a loop max trip count from this undefined behavior. Reviewed By: reames, mkazantsev, nikic Differential Revision: https://reviews.llvm.org/D109821	2021-11-03 10:40:18 +08:00
Nikita Popov	c00e9c6345	[BasicAA] Check known access sizes earlier (NFC) All heuristics for variable accesses require both access sizes to be known, so check this once at the start, rather than for each particular heuristic.	2021-11-02 21:26:26 +01:00
Nikita Popov	0b6ed92c8a	[BasicAA] Use early returns (NFC) Reduce nesting in aliasGEP() a bit by returning early.	2021-11-02 21:17:36 +01:00
Nikita Popov	51e9f33603	[BasicAA] Use saturating multiply on range if nsw If we know that the var * scale multiplication is nsw, we can use a saturating multiplication on the range (as a good approximation of an nsw multiply). This recovers some cases where the fix from D112611 is unnecessarily strict. (This can be further strengthened by using a saturating add, but we currently don't track all the necessary information for that.) This exposes an issue in our NSW tracking for multiplies. The code was assuming that (X +nsw Y) nsw Z results in (X nsw Z) +nsw (Y nsw Z) -- however, it is possible that the distributed multiplications overflow, even if the non-distributed one does not. We should discard the nsw flag if the the offset is non-zero. If we just have (X nsw Y) nsw Z then concluding X nsw (Y *nsw Z) is fine. Differential Revision: https://reviews.llvm.org/D112848	2021-11-02 20:27:39 +01:00
Arthur Eubanks	e2024d72fa	Revert "[NFC] Remove LinkAll*.h" This reverts commit `fe364e5dc7`. Causes breakages, e.g. https://lab.llvm.org/buildbot/#/builders/188/builds/5266	2021-11-02 09:08:09 -07:00
Arthur Eubanks	fe364e5dc7	[NFC] Remove LinkAll*.h These were added to prevent functions from being removed by WPO. But that doesn't make sense, correct WPO will not remove functions we actually use. I noticed these because compiling cc1_main.cpp was pulling in random LLVM pass headers. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112971	2021-11-02 08:43:17 -07:00
Arthur Eubanks	029f1a5344	[LazyCallGraph] Skip blockaddresses blockaddresses do not participate in the call graph since the only instructions that use them must all return to someplace within the current function. And passes cannot retrieve a function address from a blockaddress. This was suggested by efriedma in D58260. Fixes PR50881. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112178	2021-11-01 13:10:24 -07:00
Nikita Popov	4972d12185	[SCEV] Only add direct loop users (NFC) It it now sufficient to track only direct addrec users of a loop, and let the SCEVUsers mechanism track and invalidate transitive users. Differential Revision: https://reviews.llvm.org/D112875	2021-11-01 18:49:43 +01:00
Max Kazantsev	e512c5b166	[SCEV][NFC] Factor out common API for getting unique operands of a SCEV This function is used at least in 2 places, to it makes sense to make it separate. Differential Revision: https://reviews.llvm.org/D112516 Reviewed By: reames	2021-11-01 11:36:47 +07:00
Kazu Hirata	c8b1ed5fb2	[clang, llvm] Use Optional::getValueOr (NFC)	2021-10-30 19:00:21 -07:00
David Green	2c4a9e830c	[ValueTracking] Teach computeConstantRange that the maximum value of a half is 65504 The maximal value of a half is 0x7bff, which is 65504 when converted to an integer. This patch teaches that to computeConstantRange to compute a constant range with the correct maximum value. https://alive2.llvm.org/ce/z/BV_Spb https://alive2.llvm.org/ce/z/Nwuqvb The maximum value for a float converted in the same way is 3.4e38, which requires 129bits of data. I have not added that here as integer types so larger are rare, compared to integers types larger than 17 bits require for half floats. The MVE tests change because instsimplify happens to be run as a part of the backend, where it doesn't tend to for other backends. Differential Revision: https://reviews.llvm.org/D112694	2021-10-30 14:27:38 +01:00
Kazu Hirata	972d4133e9	Use {DenseSet,SmallPtrSet}::contains (NFC)	2021-10-29 20:26:07 -07:00
Nikita Popov	cdf45f98ca	[BasicAA] Extract linear expression multiplication (NFC) Extract a common method for multiplying a linear expression by a factor.	2021-10-29 22:41:40 +02:00
Nikita Popov	7cf7378a9d	[BasicAA] Don't treat non-inbounds GEP as nsw The scale multiplication is only guaranteed to be nsw if the GEP is inbounds (or the multiplication is trivial). Previously we were only considering explicit muls in GEP indices.	2021-10-29 22:30:44 +02:00
modimo	5caad9b5d3	[InlineAdvisor] Add fallback/format switches and negative remark processing to Replay Inliner Adds the following switches: 1. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback: controls what the replay advisor does for inline sites that are not present in the replay. Options are: 1. Original: defers to original advisor 2. AlwaysInline: inline all sites not in replay 3. NeverInline: inline no sites not in replay 2. --sample-profile-inline-replay-format/--cgscc-inline-replay-format: controls what format should be generated to match against the replay remarks. Options are: 1. Line 2. LineColumn 3. LineDiscriminator 4. LineColumnDiscriminator Adds support for negative inlining decisions. These are denoted by "will not be inlined into" as compared to the positive "inlined into" in the remarks. All of these together with the previous `--sample-profile-inline-replay-scope/--cgscc-inline-replay-scope` allow tweaking in how to apply replay. In my testing, I'm using: 1. --sample-profile-inline-replay-scope/--cgscc-inline-replay-scope = Function to only replay on a function 2. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback = NeverInline since I'm feeding in only positive remarks to the replay system 3. --sample-profile-inline-replay-format/--cgscc-inline-replay-format = Line since I'm generating the remarks from DWARF information from GCC which can conflict quite heavily in column number compared to Clang An alternative configuration could be to do Function, AlwaysInline, Line fallback with negative remarks which closer matches the final call-sites. Note that this can lead to unbounded inlining if a negative remark doesn't match/exist for one reason or another. Updated various tests to cover the new switches and negative remarks Testing: ninja check-all Reviewed By: wenlei, mtrofin Differential Revision: https://reviews.llvm.org/D112040	2021-10-29 12:32:03 -07:00
Peter Waller	98f08752f7	[InstCombine][ConstantFolding] Make ConstantFoldLoadThroughBitcast TypeSize-aware The newly added test previously caused the compiler to fail an assertion. It looks like a strightforward TypeSize upgrade. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D112142	2021-10-28 12:15:15 +00:00
Max Kazantsev	513914e1f3	[SCEV] Invalidate user SCEVs along with operand SCEVs to avoid cache corruption Following discussion in D110390, it seems that we are suffering from unability to traverse users of a SCEV being invalidated. The result of that is that ScalarEvolution's inner caches may store obsolete data about SCEVs even if their operands are forgotten. It creates problems when we try to verify the contents of those caches. It's also a frequent situation when messing with cache causes very sneaky and hard-to-analyze bugs related to corruption of memory when dealing with cached data. They are lurking there because ScalarEvolution's veirfication is not powerful enough and misses many problematic cases. I plan to make SCEV's verification much stricter in follow-ups, and this requires dangling-pointers-free caches. This patch makes sure that, whenever we forget cached information for a SCEV, we also forget it for all SCEVs that (transitively) use it. This may have negative compile time impact. It's a sacrifice we are more than willing to make to enforce correctness. We can also save some time by reworking invokers of forgetMemoizedResults (maybe we can forget multiple SCEVs with single query). Differential Revision: https://reviews.llvm.org/D111533 Reviewed By: reames	2021-10-28 09:39:24 +07:00
Nikita Popov	665060ea45	[BasicAA] Remove misleading overflow check GEP decomposition currently checks whether the multiplication of the linear expression offset and GEP scale overflows. However, if everything else works correctly, this overflow check is both unnecessary and dangerously misleading. While it will avoid an overflow in Scale * Offset in particular, other parts of the calculation (including those on dynamic values) may still overflow. The code working on the decomposed GEPs is responsible for ensuring that it remains correct in the presence of overflow. D112611 fixes the last issue of that kind that I'm aware of (in fact, the overflow check was originally introduced to work around precisely that issue). Differential Revision: https://reviews.llvm.org/D112618	2021-10-27 20:56:03 +02:00
Philip Reames	425cbbc602	[Operator] Add hasPoisonGeneratingFlags [mostly NFC] This method parallels the dropPoisonGeneratingFlags on Instruction, but is hoisted to operator to handle constant expressions as well. This is mostly code movement, but I did go ahead and add the inrange constexpr gep case. This had been discussed previously, but apparently never followed up o.	2021-10-27 11:25:40 -07:00
Nikita Popov	fbc0c308d5	[BasicAA] Handle known bits as ranges BasicAA currently tries to determine that the offset is positive by checking whether all variable indices are positive based on known bits, multiplied by a positive scale. However, this is incorrect if the scale multiplication might overflow. In the modified test case the original value is positive, but may be negative after a left shift. Fix this by converting known bits into a constant range and reusing the range-based logic, which handles overflow correctly. Differential Revision: https://reviews.llvm.org/D112611	2021-10-27 14:41:31 +02:00
Nikita Popov	9bc7e543b4	[BasicAA] Make range check more precise Make the range check more precise by calculating the range of potentially accessed bytes for both accesses and checking whether their intersection is empty. In that case there can be no overlap between the accesses and the result is NoAlias. This is more powerful than the previous approach, because it can deal with sign-wrapped ranges. In the test case the original range is [-1, INT_MAX] but becomes [0, INT_MIN] after applying the offset. This is a wrapping range, so getSignedMin/getSignedMax will treat it as a full range. However, the range excludes the elements [INT_MIN+1, -1], which is enough to prove NoAlias with an access at offset -1. Differential Revision: https://reviews.llvm.org/D112486	2021-10-27 12:40:58 +02:00
Max Kazantsev	5961f0308f	[SCEV][NFC] Verify intergity of SCEVUsers Make sure that, for every living SCEV, we have all its direct operand tracking it as their user. Differential Revision: https://reviews.llvm.org/D112402 Reviewed By: reames	2021-10-27 09:54:49 +07:00
Nikita Popov	3a995c918e	[SCEV] Move SCEVLostPoisonFlags() check into SCEVExpander Always insert values into ExprValueMap, and instead skip using them in SCEVExpander if poison-generating flags have been lost. This ensures that all values that are in ValueExprMap are also in ExprValueMap, so we can use the latter to invalidate the former. This change is probably not entirely NFC for the case where originally the SCEV had no nowrap flags but they were inferred later, in which case that would now allow reusing the existing value for expansion. Differential Revision: https://reviews.llvm.org/D112389	2021-10-25 22:37:20 +02:00
Nikita Popov	0d20ebf686	[BasicAA] Use ranges for more than one index D109746 made BasicAA use range information to determine the minimum/maximum GEP offset. However, it was limited to the case of a single variable index. This patch extends support to multiple indices by adding all the ranges together. Differential Revision: https://reviews.llvm.org/D112378	2021-10-25 15:30:50 +02:00
Nikita Popov	75384ecdf8	[InstSimplify] Refactor invariant.group load folding Currently strip.invariant/launder.invariant are handled by constructing constant expressions with the intrinsics skipped. This takes an alternative approach of accumulating the offset using stripAndAccumulateConstantOffsets(), with a flag to look through invariant.group intrinsics. Differential Revision: https://reviews.llvm.org/D112382	2021-10-25 10:56:25 +02:00
Kazu Hirata	3729a5abf4	[SCEV] Fix a warning on an unused lambda capture This patch fixes: llvm/lib/Analysis/ScalarEvolution.cpp:12770:37: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]	2021-10-25 00:45:18 -07:00
Max Kazantsev	f8623b0783	[SCEV][NFC] Win some compile time from mass forgetMemoizedResults Mass forgetMemoizedResults can be done more efficiently than bunch of individual invocations of helper because we can traverse maps being updated just once, rather than doing this for each invidivual SCEV. Should be NFC and supposedly improves compile time. Differential Revision: https://reviews.llvm.org/D112294 Reviewed By: reames	2021-10-25 14:09:41 +07:00
Max Kazantsev	dbab339ea4	[SCEV][NFC] Apply mass forgetMemoizedResults queries where possible When forgetting multiple SCEVs, rather than doing this one by one, we can instead use mass updates. We plan to make them more efficient than they are now, potentially improving compile time. Differential Revision: https://reviews.llvm.org/D111602 Reviewed By: reames	2021-10-25 13:50:49 +07:00
Max Kazantsev	a6096b7f9e	[SCEV][NFC] Introduce API for mass forgetMemoizedResults query This patch changes signature of forgetMemoizedResults to be able to work with multiple SCEVs. Usage will come in follow-ups. We also plan to optimize it in the future to work faster than individual invalidation updates. Should not change behavior in any sense. Split-off from D111602. Differential Revision: https://reviews.llvm.org/D112293 Reviewed By: reames	2021-10-25 13:49:31 +07:00
Max Kazantsev	1c18ebb2cc	[NFC][SCEV] Do not track users of SCEVConstants Follow-up from D112295, suggested by Nikita: we can avoid tracking users of SCEVConstants because dropping their cached info is unlikely to give any new prospects for fact inference, and it should not introduce any correctness problems.	2021-10-25 12:30:46 +07:00
Max Kazantsev	fea4a48c0b	[SCEV][NFC] API for tracking of SCEV users This patch introduces API that keeps track of SCEVs users of another SCEVs, required to handle invalidations of users along with operands that comes in follow-up patches. Differential Revision: https://reviews.llvm.org/D112295 Reviewed By: reames	2021-10-25 12:14:18 +07:00
Kazu Hirata	4bd46501c3	Use llvm::any_of and llvm::none_of (NFC)	2021-10-24 17:35:33 -07:00
Philip Reames	a461fa64bb	Treat branch on poison as immediate UB (under an off by default flag) The LangRef clearly states that branching on a undef or poison value is immediate undefined behavior, but historically, we have not been consistent about implementing that interpretation in the optimizer. Historically, we used (in some cases) a more relaxed model which essentially looked for provable UB along both paths which was control dependent on the condition. However, we've never been 100% consistent here. For instance SCEV uses the strong model for increments which form AddRecs (and only addrecs). At the moment, the last big blocker for finally making this switch is enabling the fix landed in D106041. Loop unswitching (in it's classic form) is incorrect as it creates many "branch on poisons" when unswitching conditions originally unreachable within the loop. This change adds a flag to value tracking which allows to easily test the optimization potential of treating branch on poison as immediate UB. It's intended to help ease work on getting us finally through this transition and avoid multiple independent rediscovers of the same issues. Differential Revision: https://reviews.llvm.org/D112026	2021-10-24 14:42:03 -07:00
Nikita Popov	0c7f85d786	[InstSimplify] Simplify fetching of index size (NFC) Directly fetch the size instead of going through the index type first.	2021-10-23 22:08:15 +02:00
Nikita Popov	710596a1e1	[ConstantFolding] Accept offset in ConstantFoldLoadFromConstPtr (NFCI) As this API is now internally offset-based, we can accept a starting offset and remove the need to create a temporary bitcast+gep sequence to perform an offset load. The API now mirrors the ConstantFoldLoadFromConst() API.	2021-10-23 17:59:39 +02:00
Kazu Hirata	d8e4170b0a	Ensure newlines at the end of files (NFC)	2021-10-23 08:45:29 -07:00
Kazu Hirata	d14d7068b6	[llvm] Use StringRef::contains (NFC)	2021-10-23 08:45:27 -07:00
Nikita Popov	c5b5b7f621	[ConstantFolding] Remove ConstantFoldLoadThroughGEPIndices() API (NFC) The last user of this API went away in `4f5e9a2bb2`.	2021-10-23 16:59:29 +02:00
Nikita Popov	4f5e9a2bb2	[SCEV] Remove computeLoadConstantCompareExitLimit() (NFCI) The functionality of this method is already covered by computeExitCountExhaustively() in a more general fashion. It was added at a time when exhaustive exit count calculation did not support constant folding loads yet. I double checked that dropping this code causes no binary changes in test-suite. Differential Revision: https://reviews.llvm.org/D112343	2021-10-23 15:34:25 +02:00
Nikita Popov	61cfdf636d	[BasicAA] Model implicit trunc of GEP indices GEP indices larger than the GEP index size are implicitly truncated to the index size. BasicAA currently doesn't model this, resulting in incorrect alias analysis results. Fix this by explicitly modelling truncation in CastedValue in the same way we do zext and sext. Additionally we need to disable a number of optimizations for truncated values, in particular "non-zero" and "non-equal" may no longer hold after truncation. I believe the constant offset heuristic is also not necessarily correct for truncated values, but wasn't able to come up with a test for that one. A possible followup here would be to use the new mechanism to model explicit trunc as well (which should be much more common, as it is the canonical form). This is straightforward, but omitted here to separate the correctness fix from the analysis improvement. (Side note: While I say "index size" above, BasicAA currently uses the pointer size instead. Something for another day...) Differential Revision: https://reviews.llvm.org/D110977	2021-10-22 23:47:02 +02:00
Nikita Popov	3a10fe2d89	[Loads] Use more powerful constant folding API This follows up on D111023 by exporting the generic "load value from constant at given offset as given type" and using it in the store to load forwarding code. We now need to make sure that the load size is smaller than the store size, previously this was implicitly ensured by ConstantFoldLoadThroughBitcast(). Differential Revision: https://reviews.llvm.org/D112260	2021-10-22 18:33:03 +02:00
Nikita Popov	1848525842	[CodeMetrics] Don't require speculatability for ephemeral values As discussed in D112016, our current requirement of speculatability for ephemeral is overly strict: What we really care about is that the instruction will be DCEd once the assume is dropped. For that it is sufficient that the instruction is side-effect free and not a terminator. In particular, this allows non-dereferenceable loads to be ephemeral values. Differential Revision: https://reviews.llvm.org/D112179	2021-10-21 20:30:01 +02:00
Arthur Eubanks	3781a46c3c	Revert "[IPT] Restructure cache to allow lazy update following invalidation [NFC]" This reverts commit `baea663a6e`. Causes crashes, e.g. https://lab.llvm.org/buildbot/#/builders/77/builds/10715.	2021-10-21 10:48:41 -07:00
Philip Reames	baea663a6e	[IPT] Restructure cache to allow lazy update following invalidation [NFC] This change restructures the cache used in IPT to point not to the first special instruction, but to the first instruction which could be special. That is, the cached reference is always equal to the first special, or comes before it in the block. This avoids expensive block scans when we are removing special instructions from the beginning of the block. At the moment, this case is not heavily used, though it does trigger in GVN when doing CSE of calls. The main motivation was a change I'm no longer planning to move forward with, but the cache optimization seemed worthwhile as a minor perf win at low cost. Differential Revision: https://reviews.llvm.org/D111768	2021-10-21 09:16:21 -07:00
Arthur Eubanks	00500d5bad	[NFC] De-template LazyCallGraph::visitReferences() and move into .cpp file This makes changing it and recompiling it much faster.	2021-10-20 10:50:00 -07:00
Bjorn Pettersson	9c44a0996c	[SCEV] Fix formatting error introduced by D112080 Accidentally pushed D112080 without this clang-format cleanup.	2021-10-19 21:44:07 +02:00
Bjorn Pettersson	08619006a0	[SCEV] Avoid compile time explosion in ScalarEvolution::isImpliedCond As seen in PR51869 the ScalarEvolution::isImpliedCond function might end up spending lots of time when doing the isKnownPredicate checks. Calling isKnownPredicate for example result in isKnownViaInduction being called, which might result in isLoopBackedgeGuardedByCond being called, and then we might get one or more new calls to isImpliedCond. Even if the scenario described here isn't an infinite loop, using some random generated C programs as input indicates that those isKnownPredicate checks quite often returns true. On the other hand, the third condition that needs to be fulfilled in order to "prove implications via truncation", i.e. the isImpliedCondBalancedTypes check, is rarely fulfilled. I also made some similar experiments to look at how often we would get the same result when using isKnownViaNonRecursiveReasoning instead of isKnownPredicate. So far I haven't seen a single case when codegen is negatively impacted by using isKnownViaNonRecursiveReasoning. On the other hand, it seems like we get rid of the compile time explosion seen in PR51869 that way. Hence this patch. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112080	2021-10-19 21:37:57 +02:00
Arthur Eubanks	ecd25edfc5	[InlineCost] Add empty line between call sites when printing inline costs	2021-10-18 13:56:48 -07:00
Arthur Eubanks	b8ce97372d	[NewPM] Add PipelineTuningOption to eagerly invalidate analyses This trades off more compile time for less peak memory usage. Right now it invalidates all function analyses after a module->function or cgscc->function adaptor. https://llvm-compile-time-tracker.com/compare.php?from=1fb24fe85a19ae71b00875ff6c96ef1831dcf7e3&to=cb28ddb063c87f0d5df89812ab2de9a69dd276db&stat=instructions https://llvm-compile-time-tracker.com/compare.php?from=1fb24fe85a19ae71b00875ff6c96ef1831dcf7e3&to=cb28ddb063c87f0d5df89812ab2de9a69dd276db&stat=max-rss For now this is just experimental. See comments on why this may affect optimizations. Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D111575	2021-10-18 13:20:35 -07:00
modimo	313c657fce	[InlineAdvisor] Add -inline-replay-scope=<Function\|Module> to control replay scope The goal is to allow grafting an inline tree from Clang or GCC into a new compilation without affecting other functions. For GCC, we're doing this by extracting the inline tree from dwarf information and generating the equivalent remarks. This allows easier side-by-side asm analysis and a trial way to see if a particular inlining setup provides benefits by itself. Testing: ninja check-all Reviewed By: wenlei, mtrofin Differential Revision: https://reviews.llvm.org/D110658	2021-10-18 13:08:39 -07:00
Kirill Stoimenov	62627c7217	[Sanitizers] Replaced getMaxPointerSizeInBits with getPointerSizeInBits, which was causing failures for 32bit x86. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D111829	2021-10-18 09:31:14 -07:00
Sanjay Patel	2a3cc4d461	[Analysis] add utility function for unary shuffle mask creation This is NFC-intended for the callers. Posting in case there are other potential users that I missed. I would also use this from VectorCombine in a patch for: https://llvm.org/PR52178 ( D111901 ) Differential Revision: https://reviews.llvm.org/D111891	2021-10-18 09:00:39 -04:00
Nikita Popov	274b2439f8	[ConstantRange] Add fast signed multiply The multiply() implementation is very slow -- it performs six multiplications in double the bitwidth, which means that it will typically work on allocated APInts and bypass fast-path implementations. Add an additional implementation that doesn't try to produce anything better than a full range if overflow is possible. At least for the BasicAA use-case, we really don't care about more precise modeling of overflow behavior. The current use of multiply() is fine while the implementation is limited to a single index, but extending it to the multiple-index case makes the compile-time impact untenable.	2021-10-17 16:41:49 +02:00
Simon Pilgrim	d464a9d476	[Analysis] Replace assert(isa)/dyn_cast with cast. NFC. cast<> will perform the assertion for us. Removes a static analysis null dereference warning.	2021-10-16 11:40:19 +01:00
Simon Pilgrim	a1b43d2bc9	[LazyValueInfo] getPredicateAt - remove unnecessary null pointer check. NFC. We already dereference the CxtI pointer several times before reaching the "if(CxtI)", we have no need to check it again. Fixes a coverity warning.	2021-10-16 11:20:19 +01:00
Simon Pilgrim	c288241795	[ConstantFolding] ConstantFoldScalarCall2 - early-out if getLibFunc fails. NFC.	2021-10-16 11:20:19 +01:00
Simon Pilgrim	c18cf10a04	[ConstantFolding] Use getValueAPF const ref value where possible. NFC. Don't copy the value if we can avoid it.	2021-10-16 11:20:19 +01:00
Simon Pilgrim	76ca0d67ab	[ConstantFolding] ConstantFoldScalarCall1 - early-out if getLibFunc fails. NFC.	2021-10-16 11:20:18 +01:00
Nikita Popov	0c52c271a5	[BasicAA] Rename ExtendedValue to CastedValue (NFC) As suggested on D110977, rename ExtendedValue to CastedValue, because it will contain more than just extensions in the future.	2021-10-15 21:56:54 +02:00
Max Kazantsev	90ae538cab	[SCEV] Prove implication of predicates to their sign-flipped counterparts This patch teaches SCEV two implication rules: x <u y && y >=s 0 --> x <s y, x <s y && y <s 0 --> x <u y. And all equivalents with signs/parts swapped. Differential Revision: https://reviews.llvm.org/D110517 Reviewed By: nikic	2021-10-15 11:49:18 +07:00
Max Kazantsev	1202d280c6	[SCEV][NFC] Reduce memory footprint & compile time via DFS refactoring Current implementations of DFS in SCEV check unique-visited of traversed values on pop, and not on push. As result, the same value may be pushed multiple times just to be thrown away when popped. These operations are meaningless and only waste time and increase memory footprint of the worklist. This patch reworks the DFS strategy to check uniqueness before push. Should be NFC. Differential Revision: https://reviews.llvm.org/D111774 Reviewed By: nikic, reames	2021-10-15 10:19:15 +07:00
Artur Pilipenko	3f96f7b30c	Fix getInlineCost with ComputeFullInlineCost enabled Fix a bug when getInlineCost incorrectly returns a cost/threshold pair instead of an explicit never inline. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D111687	2021-10-14 17:41:41 -07:00
Nikita Popov	69853f9920	[IVUsers] Move preheader check into SCEVExpander Rather than checking for loop nest preheaders upfront in IVUsers, move this requirement into isSafeToExpand() from SCEVExpander. Historically, LSR did not check whether SCEVs are safe to expand and fully relied on IVUsers to validate this. Later, support for non-expandable SCEVs was added via rigid formulas. Checking this in isSafeToExpand() makes it more obvious what exactly this check is guarding against, and avoids the awkward loop nest scan. This is a followup to https://reviews.llvm.org/D111493#3055286. Differential Revision: https://reviews.llvm.org/D111681	2021-10-14 21:52:31 +02:00
Nikita Popov	5f05ff081f	[BasicAA] Improve scalable vector handling Currently, DecomposeGEP() bails out on the whole decomposition if it encounters a scalable GEP type anywhere. However, it is fine to still analyze other GEPs that we look through before hitting the scalable GEP. This does mean that the decomposed GEP base is no longer required to be the same as the underlying object. However, I don't believe this property is necessary for correctness anymore. This allows us to compute slightly more precise aliasing results for GEP chains containing scalable vectors, though my primary interest here is simplifying the code. Differential Revision: https://reviews.llvm.org/D110511	2021-10-14 20:23:50 +02:00
Kevin P. Neal	727a891ec8	[FPEnv][InstSimplify] Fold fadd X, 0 ==> X, when we know X is not -0 Currently the fadd optimizations in InstSimplify don't know how to do this NoSignedZeros "X + 0.0 ==> X" fold when using the constrained intrinsics. This adds the support. This review is derived from D106362 with some improvements from D107285 and is a follow-on to D111085. Differential Revision: https://reviews.llvm.org/D111450	2021-10-14 12:32:45 -04:00
Nikita Popov	a8e7d11aca	[ValueTracking] Simplify getKnowledgeValidInContext() call (NFC) This accepts an ArrayRef, there's no need to create a SmallVector.	2021-10-14 18:17:54 +02:00
Max Kazantsev	6e1308bc10	[SCEV][NFC] Simplify check with CI->isZero() exit condition Replace check with if ((ExitIfTrue && CI->isZero()) \|\| (!ExitIfTrue && CI->isOne())) with equivalent and simpler version if (ExitIfTrue == CI->isZero())	2021-10-14 14:06:52 +07:00
Max Kazantsev	46a1dd47e6	[SCEV][NFC] Reorder checks to delay call of all_of Check lightweight getter condition before calling all_of.	2021-10-14 13:30:51 +07:00
Mircea Trofin	6c76d01011	[mlgo][aot] requrie the model is autogenerated for test determinism The tests that exercise the 'release' mode, where the model is AOT-ed, check the output has certain properties, to validate that, indeed, a different policy from the default one was exercised. For determinism, we can't reliably check that output for an arbitrary learned policy, since it could be that policy happens to mimic the default one in that particular case. This patch adds a requirement that those tests run only when the model is autogenerated (e.g. on build bots). Differential Revision: https://reviews.llvm.org/D111747	2021-10-13 14:02:41 -07:00
Arthur Eubanks	3628bb7436	Make various assume bundle data structures use uint64_t Following D110451, we need to make sure to support 64 bit values.	2021-10-13 10:38:41 -07:00
Philip Reames	24c9016574	[instcombine] propagate single use freeze(gep inbounds X) This is a follow on for D111675 which implements the gep case. I'd originally left it out because I was hoping to actually implement the inrange todo, but after a bit of staring at the code, decided to leave it as is since it doesn't effect this use case (i.e. instcombine requires the op to freeze to be an instruction). Differential Revision: https://reviews.llvm.org/D111691	2021-10-13 09:25:00 -07:00
Philip Reames	4c5702cb12	Fix bug introduced with `6f34839` (poison flags on floating point ops) The newly introduced API for checking whether poison comes solely from flags which can be dropped was out of sync. This was noticed by a reviewer post commit. For the moment, disable the floating point flags. In a follow up change, I plan to add support in dropPoisonGeneratingFlags, but that deserves to be a change of it's own.	2021-10-12 20:25:00 -07:00
Philip Reames	6f34839407	[instcombine] propagate freeze through single use poison producing flag instruction If we have an instruction which produces poison only when flags are specified on the instruction, then we know that freezing the operands and dropping flags is equivalent to freezing the result. If we know those flags don't result in any undefined behavior being executed, then there's no point in preserving the flags as we gain no knowledge by having them. This patch extends the existing propagation logic which sinks freeze to single potential non-poison operands to allow dropping of flags when we know the freeze is the sole use of the instruction with poison flags. The main value is that we tend to sink freezes towards the phi in IV cycles where the incoming value to the phi is the freeze of an IV increment. This will in turn (in a future patch), let us fold the freeze through the phi into the loop preheader. Motivated by eliminating need for CanonicalizeFreezeInLoops for the clearly profitable cases from onephi.ll test case in the test directory. Differential Revision: https://reviews.llvm.org/D111675	2021-10-12 13:52:41 -07:00
Hongtao Yu	098a0d8fbc	[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3. This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation. Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are: - Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes - Unblocked CSE by avoiding pseudo probe from clobbering memory SSA - Unblocked induction variable simpliciation - Allow empty loop deletion by treating probe intrinsic isDroppable - Some refactoring. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110847	2021-10-12 09:44:12 -07:00
Nikita Popov	2a2a37d972	[IVUsers] Check for preheader instead of loop simplify form IVUsers currently makes sure that all loops dominating a user are in loop simplify form, because SCEVExpander needs a preheader to insert into. However, loop simplify form requires much more than that. In particular, it requires dedicated exits, which means that exits need to be found and walked. For large functions with many nested loops, this can result in pathological compile-time explosion. Fix this by only checking the property we're actually interested in, which is incidentally cheap to check. Differential Revision: https://reviews.llvm.org/D111493	2021-10-11 23:13:13 +02:00
Roman Lebedev	684cbae89a	[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places	2021-10-11 23:36:06 +03:00
Arthur Eubanks	259390de9a	[LCG] Don't skip invalidation of LazyCallGraph if CFG analyses are preserved The CFG being changed and the overall call graph are not related, we can introduce/remove calls without changing the CFG. Resolves one of the issues in PR51946. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D111275	2021-10-11 13:30:47 -07:00
Philip Reames	7f55209cee	[SCEV] Extend trip count to avoid overflow by default As a brief reminder, an "exit count" is the number of times the backedge executes before some event. It can be zero if we exit before the backedge is reached. A "trip count" is the number of times the loop header is entered if we branch into the loop. In general, TC = BTC + 1 and thus a zero trip count is ill defined There is a cornercases which we don't handle well. Let's assume i8 for our examples to keep things simple. If BTC = 255, then the correct trip count is 256. However, 256 is not representable in i8. In theory, code which needs to reason about trip counts is responsible for checking for this cornercase, and either bailing out, or handling it correctly. Historically, we don't have a great track record about actually doing so. When reviewing D109676, I found myself asking a basic question. Was there any good reason to preserve the current wrap-to-zero behavior when converting from backedge taken counts to trip counts? After reviewing existing code, I could not find a single case which appears to correctly and precisely handle the overflow case. This patch changes the default behavior to extend instead of wrap. That is, if the result might be 256, we return a value of i9 type to ensure we interpret the count correctly. I did leave the legacy behavior as an option since a) loop-flatten stops triggering if I extend due to weirdly specific pattern matching I didn't understand and b) we could reasonably use the mode if we'd externally established a lack of overflow. I want to emphasize that this change is not NFC. There are two call sites (one in ScalarEvolution.cpp, one in LoopCacheAnalysis.cpp) which are switched to the extend semantics. The former appears imprecise (but correct) for a constant 255 BTC. The later appears incorrect, though I don't have a test case. Differential Revision: https://reviews.llvm.org/D110587	2021-10-11 09:55:55 -07:00
David Sherwood	26b7d9d622	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-11 09:41:38 +01:00
Clement Courbet	83ded5d323	re-land "[AA] Teach BasicAA to recognize basic GEP range information." Now that PR52104 is fixed.	2021-10-11 10:04:22 +02:00
Nick Desaulniers	9697f93587	[InlineCost] model calls to llvm.is.constant* more carefully llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values. A common use case for llvm.is.constant comes from the higher level __builtin_constant_p. A common usage pattern of __builtin_constant_p in the Linux kernel is: void foo (int bar) { if (__builtin_constant_p(bar)) { // lots of code that will fold away to a constant. } else { // a little bit of code, usually a libcall. } } A minor issue in InlineCost calculations is when `bar` is _not_ Constant and still will not be after inlining, we don't discount the true branch and the inline cost of `foo` ends up being the cost of both branches together, rather than just the false branch. This leads to code like the above where inlining will not help prove bar Constant, but it still would be beneficial to inline foo, because the "true" branch is irrelevant from a cost perspective. For example, IPSCCP can sink a passed constant argument to foo: const int x = 42; void bar (void) { foo(x); } This improves our inlining decisions, and fixes a few head scratching cases were the disassembly shows a relatively small `foo` not inlined into a lone caller. We could further improve this modeling by tracking whether the argument to llvm.is.constant* is a parameter of the function, and if inlining would allow that parameter to become Constant. This idea is noted in a FIXME comment. Link: https://github.com/ClangBuiltLinux/linux/issues/1302 Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D111272	2021-10-08 15:27:30 -07:00
Philip Reames	edf31b4db1	[IPT] Add a statistic to track instructions scanned to answer queries I'm planning some changes to the invalidation mechanism here, and having a concrete mechanism to track progress is key.	2021-10-08 10:59:35 -07:00
Philip Reames	b4498e6b8d	[IPT] Narrow scope of removeInstruction invalidation [NFC] We only need to invalidate if the instruction being removed is the cached "first special instruction". If the instruction is before that one, it can't (by assumption) be special. If it is after that one, it wasn't the first.	2021-10-08 10:35:03 -07:00
Philip Reames	d694dd0f0d	Add iterator range variants of isGuaranteedToTransferExecutionToSuccessor [mostly-nfc] This factors out utilities for scanning a bounded block of instructions since we have this code repeated in a bunch of places. The change to InlineFunction isn't strictly NFC as the limit mechanism there didn't handle debug instructions correctly.	2021-10-08 09:50:10 -07:00
Nikita Popov	c77a5c21bb	[BasicAA] Use base of decomposed GEP in recursive queries (NFC) DecompGEP.Base and UnderlyingV are currently always the same. However, logically DecompGEP.Base is the right value to use here, because the decomposed offset is relative to that base.	2021-10-07 22:08:41 +02:00
Paul Robinson	aec66f895b	[PS4][TargetLibraryInfo] Set TLI info correctly for PS4	2021-10-07 10:03:31 -07:00
Sanjay Patel	fdbf2bb4ee	[InstSimplify] (x \|\| y) && (x \|\| !y) --> x https://alive2.llvm.org/ce/z/4BE33w This is the logical (select-form) equivalent of the bitwise logic fold: `e36d351d19` This is another part of solving the regression from: https://llvm.org/PR52077	2021-10-07 12:25:25 -04:00
Erik Desjardins	11c8efd4db	[Inline] Introduce Constant::hasOneLiveUse, use it instead of hasOneUse in inline cost model (PR51667) Otherwise, inlining costs may be pessimized by dead constants. Fixes https://bugs.llvm.org/show_bug.cgi?id=51667. Reviewed By: mtrofin, aeubanks Differential Revision: https://reviews.llvm.org/D109294	2021-10-07 08:33:25 -07:00
Itay Bookstein	40ec1c0f16	[IR][NFC] Rename getBaseObject to getAliaseeObject To better reflect the meaning of the now-disambiguated {GlobalValue, GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction (D109792), the function is renamed to getAliaseeObject.	2021-10-06 19:33:10 -07:00
Kuba Mracek	7329abf2f8	[GlobalDCE] In VFE, replace the whole 'sub' expression of unused relative-pointer-based vtable slots Differential Revision: https://reviews.llvm.org/D109114	2021-10-06 15:55:55 -07:00
Philip Reames	1183d65b4d	[SCEV] Search operand tree for scope bound when inferring flags from IR When checking to see if we can apply IR flags to a SCEV, we need to identify a bound on the defining scope of the SCEV to be produced. We'd previously added support for a couple SCEVExpr types which trivially imply bounds, but hadn't handled types such as umax where the bounds come from the bounds of the operands. This does the obvious thing, and recurses through operands searching for a tighter bound on the defining scope. I'm honestly surprised by how little this seems to mater on existing tests, but it's worth doing for completeness sake alone. Differential Revision: https://reviews.llvm.org/D111191	2021-10-06 15:10:02 -07:00
Nikita Popov	17c20a6dfb	[SCEV] Avoid unnecessary domination checks (NFC) When determining the defining scope, avoid repeatedly querying dominationg against the function entry instruction. This ends up begin a very common case that we can handle more efficiently.	2021-10-06 22:14:04 +02:00
Philip Reames	a7ae227baf	[scev] minor style improvement [nfc]	2021-10-06 12:15:16 -07:00
Philip Reames	67896f494e	Returning poison from a function w/ noundef return attribute is UB This does for readability of returns within said function as what we do for the caller side when reasoning about what might be poison. Differential Revision: https://reviews.llvm.org/D111180	2021-10-06 11:52:18 -07:00
Philip Reames	0658bab870	[SCEV] Infer flags from add/gep in any block This patch removes a compile time restriction from isSCEVExprNeverPoison. We've strengthened our ability to reason about flags on scopes other than addrecs, and this bailout prevents us from using it. The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands anyways. Differential Revision: https://reviews.llvm.org/D111186	2021-10-06 11:11:54 -07:00
Kevin P. Neal	f86c930cc9	[FPEnv][InstSimplify] Fold constrained X + -0.0 ==> X Currently the fadd optimizations in InstSimplify don't know how to do this "X + -0.0 ==> X" fold when using the constrained intrinsics. This adds the support. This commit is derived from D106362 with some improvements from D107285. Differential Revision: https://reviews.llvm.org/D111085	2021-10-06 13:52:31 -04:00
Nikita Popov	1301a8b473	[BasicAA] Don't unnecessarily extend pointer size BasicAA GEP decomposition currently performs all calculation on the maximum pointer size, but at least 64-bit, with an option to double the size. The code comment claims that this improves analysis power when working with uint64_t indices on 32-bit systems. However, I don't see how this can be, at least while maintaining correctness: When working on canonical code, the GEP indices will have GEP index size. If the original code worked on uint64_t with a 32-bit size_t, then there will be truncs inserted before use as a GEP index. Linear expression decomposition does not look through truncs, so this will be an opaque value as far as GEP decomposition is concerned. Working on a wider pointer size does not help here (or have any effect at all). When working on non-canonical code (before first InstCombine), the GEP indices are implicitly truncated to GEP index size. The BasicAA code currently just ignores this fact completely, and pretends that this truncation doesn't happen. This is incorrect and will be addressed by D110977. I believe that for correctness reasons, it is important to work on the actual GEP index size to properly model potential overflow. BasicAA tries to patch over the fact that it uses the wrong size (see adjustToPointerSize), but it only does that in limited cases (only for constant values, and not all of them either). I'd like to move this code towards always working on the correct size, and dropping these artificial pointer size adjustments is the first step towards that. Differential Revision: https://reviews.llvm.org/D110657	2021-10-06 18:40:21 +02:00
Sanjay Patel	e36d351d19	[InstSimplify] (x \| y) & (x \| !y) --> x https://alive2.llvm.org/ce/z/QagQMn This fold is handled by instcombine via SimplifyUsingDistributiveLaws(), but we are missing the sibliing fold for 'logical and' (implemented with 'select'). Retrofitting the code in instcombine looks much harder than just adding a small adjustment here, and this is potentially more efficient and beneficial to other passes.	2021-10-06 12:31:25 -04:00
Clement Courbet	3255015407	Fix incomplete conflict resolution in `ff41fc07b1`	2021-10-06 16:55:14 +02:00
Clement Courbet	ff41fc07b1	Revert "[AA] Teach BasicAA to recognize basic GEP range information." We have found a miscompile with this change, reverting while working on a reproducer. This reverts commit `455b60ccfb`.	2021-10-06 16:49:10 +02:00
Mircea Trofin	7d541eb4d4	[inliner] Mandatory inlining decisions produce remarks This also removes the need to disable the mandatory inlining phase in tests. In a departure from the previous remark, we don't output a 'cost' in this case, because there's no such thing. We just report that inlining happened because of the attribute. Differential Revision: https://reviews.llvm.org/D110891	2021-10-05 14:01:25 -07:00
Nikita Popov	0be9940ef2	[SCEV] Don't check if propagation safe if there are no flags (NFC) If there are no nowrap flags, then we don't need to determine whether propagating flags is safe -- it will make no difference.	2021-10-05 22:25:41 +02:00
Philip Reames	c608b49d67	[SCEV] Tweak the algorithm for figuring out if flags must apply to a SCEV [mostly-NFC] Behavior wise, this patch should be mostly NFC. The only behavior difference known is that on the isSCEVExprNeverPoison path we'll consider a bound imposed by the SCEVable operands (if any). Algorithmically, it's an invert of the existing code. Previously, we checked for each operand if we could find a bound, then checked for must-execute given that bound. With the patch, we use dominance to refine the innermost bound, then check must execute once. The interesting case is when we have multiple unknowns within a single basic block. While both dominance and must-execute are worst-case linear walks within the block, only dominance is cached. As such, refining based on dominance should be more efficient.	2021-10-05 11:20:48 -07:00
Nikita Popov	c117d77e93	[ConstantFold] Refactor load folding This refactors load folding to happen in two cleanly separated steps: ConstantFoldLoadFromConstPtr() takes a pointer to load from and decomposes it into a constant initializer base and an offset. Then ConstantFoldLoadFromConst() loads from that initializer at the given offset. This makes the core logic independent of having actual GEP expressions (and those GEP expressions having certain structure) and will allow exposing ConstantFoldLoadFromConst() as an independent API in the future. This is mostly only a refactoring, but it does make the folding logic slightly more powerful. Differential Revision: https://reviews.llvm.org/D111023	2021-10-05 18:07:57 +02:00
Nikita Popov	30001af84e	[BasicAA] Ignore CanBeFreed in minimal extent reasoning When determining NoAlias based on object size and dereferenceability information, we can ignore frees for the same reason we can ignore possible null pointers (if null is not a valid pointer): Actually accessing the null pointer / freed pointer would be immediate UB, and AA results are only valid under the assumption of an access. This addresses a minor regression from D110745. Differential Revision: https://reviews.llvm.org/D111028	2021-10-04 22:08:57 +02:00
Bjorn Pettersson	7f84fa4ad4	[TargetLibraryInfo] Refactor size_t checks in isValidProtoForLibFunc. NFC In TargetLibraryInfoImpl::isValidProtoForLibFunc we no longer need the IsSizeTTy lambda function and the SizeTTy object. Instead we just follow the regular structure of checking for integer types given an exepected number of bits.	2021-10-04 15:46:39 +02:00
Jay Foad	a9bceb2b05	[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC. Stop using APInt constructors and methods that were soft-deprecated in D109483. This fixes all the uses I found in llvm, except for the APInt unit tests which should still test the deprecated methods. Differential Revision: https://reviews.llvm.org/D110807	2021-10-04 08:57:44 +01:00
Philip Reames	5f7a535330	[SCEV] Cap the number of instructions scanned when infering flags This addresses a comment from review on D109845. The concern was raised that an unbounded scan would be expensive. Long term plan is to cache this search - likely reusing the existing mechanism for loop side effects - but let's be simple and conservative for now.	2021-10-03 16:14:06 -07:00
Philip Reames	35ab211c37	[SCEV] Use trivial bound on defining scope of all SCEVs when computing flags This addresses a comment from review on D109845. Even for SCEVs which we can't find true bounds without recursing through operands, entry to the function forms a trivial upper bound. In some cases, this trivial bound is enough to prove safety of flag inference.	2021-10-03 16:01:30 -07:00
Philip Reames	d02db32644	[SCEV] Use full logic when infering flags on add and gep This is a followon to D109845. With that landed, we will have fixed all known instances of pr51817, and can thus start inferring flags more aggressively with greatly reduced risk of miscompiles. This patch simply applies the same inference logic used in that patch to our other major flag inference path. We can still do much better here (on both paths), but this is our first step. Differential Revision: https://reviews.llvm.org/D111003	2021-10-03 15:32:15 -07:00
Philip Reames	f39978b84f	[SCEV] Correctly propagate nowrap flags across scopes when folding invariant add through addrec This fixes a violation of the wrap flag rules introduced in `c4048d8f`. This is an alternate fix to D106852. The basic problem being fixed is that we infer a set of flags which is valid at some inner scope S1 (usually by correctly propagating them from IR), and then (incorrectly) extend them to a SCEV in scope S2 where S1 != S2. This is not in general safe per the wrap flags semantics recently defined. In this patch, I include a simple inference step to handle the case where we can prove that S2 is the preheader of the loop S1, and that entry into S2 implies execution of S1. See the code for a more detailed explanation. One worry I have with this patch is that I might be over-fitting what shows up in tests - and thus hiding negative impact we'd see in the real world. My best defense is that the rule used here very closely follows the one used to propagate the flags from IR to the inner add to start with, and thus if one is reasonable, so probably is the other. Curious what others think about that piece. The test diffs are roughly as expected. Mostly analysis only, with two transform changes. Oddly, the result looks better in the loop-idiom test, and I don't understand the PPC output enough to have tell. Nothing terrible looking though. (For context, without the scope inference peephole, the test delta includes a couple of vectorization tests. Again, not super concerning, but slightly more so.) Differential Revision: https://reviews.llvm.org/D109845	2021-10-03 15:19:33 -07:00
Kazu Hirata	d34cd75d89	[Analysis, CodeGen] Migrate from arg_operands to args (NFC) Note that arg_operands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-03 08:22:20 -07:00
Dávid Bolvanský	5f2f611880	Fixed more warnings in LLVM produced by -Wbitwise-instead-of-logical	2021-10-03 13:58:10 +02:00
Philip Reames	26223af256	[SCEV] Split isSCEVExprNeverPoison reasoning explicitly into scope and mustexecute parts [NFC] Inspired by the needs to D111001 and D109845. The seperation of concerns also amakes it easier to reason about correctness and completeness.	2021-10-02 13:10:38 -07:00
Philip Reames	2ca8a3f213	[SCEV] Stop blindly propagating flags from inbound geps to SCEV nodes This fixes a violation of the wrap flag rules introduced in `c4048d8f`. This was also noted in the (very old) PR23527. The issue being fixed is that we assume the inbound flag on any GEP assumes that all users of any gep (or add) which happens to map to that SCEV would also be UB if the (other) gep overflowed. That's simply not true. In terms of the test diffs, I don't see anything seriously problematic. The lost flags are expected (given the semantic restriction on when its legal to tag the SCEV), and there are several cases where the previously inferred flags are unsound per the new semantics. The only common trend I noticed when looking at the deltas is that by not considering branch on poison as immediate UB in ValueTracking, we do miss a few cases we could reclaim. We may be able to claw some of these back with the follow ideas mentioned in PR51817. It's worth noting that most of the changes are analysis result only changes. The two transform changes are pretty minimal. In one case, we miss the opportunity to infer a nuw (correctly). In the other, we fail to fold an exit and produce a loop invariant form instead. This one is probably over-reduced as the program appears to be undefined in practice, and neither before or after exploits that. Differential Revision: https://reviews.llvm.org/D109789	2021-10-01 16:30:44 -07:00
Philip Reames	24cde2f602	[SCEV] Remove invariant requirement from isSCEVExprNeverPoison This code is attempting to prove that I must execute if we enter the defining scope of the SCEV which will be created from I. In the case where it found a defining addrec scope, it had a rather odd restriction that all of the other operands must be loop invariant in that addrec's loop. As near as I can tell here, we really only need a upper bound on the defining scope. If we can prove the stronger property, then we must also have proven the property on the exact defining scope as well. In practice, the actual effect of this change is narrow. The compile time restriction at the top of the routine basically limits us to I being an arithmetic in some loop L with both an addrec operand in L, and a unknown operands in L. Possible to demonstrate, but the main value of the change is removing unneeded code. Differential Revision: https://reviews.llvm.org/D110892	2021-10-01 15:57:37 -07:00
Krasimir Georgiev	685f1bfd0a	Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns" It appears to cause stage2 clang build failures, e.g., https://lab.llvm.org/buildbot/#/builders/74/builds/7145. This reverts commit `1fb37334bd`.	2021-10-01 11:39:43 +02:00
David Sherwood	1fb37334bd	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-01 08:41:03 +01:00
Philip Reames	c5e491e6ee	[SCEV] Modernize code style of isSCEVExprNeverPoison [NFC] Use for-range and all_of to make code easier to read in advance of other changes.	2021-09-30 15:13:43 -07:00
Florian Hahn	1fbdbb5595	Revert "Recommit "[SCEV] Look through single value PHIs." (take 2)" This reverts commit `764d9aa979`. This patch exposed a few additional cases where SCEV expressions are not properly invalidated. See PR52024, PR52023.	2021-09-30 20:53:51 +01:00
Nikita Popov	b989211d7d	[BasicAA] Move more extension logic into ExtendedValue (NFC) Add methods to appropriately extend KnownBits/ConstantRange there, same as with APInt. Also clean up the known bits handling by actually doing that extension rather than checking ZExtBits. This doesn't matter now, but becomes relevant once truncation is involved.	2021-09-30 20:45:12 +02:00
Nikita Popov	ea02f9caff	[BasicAA] Use ExtendedValue in VariableGEPIndex (NFC) Use the ExtendedValue structure which is used for LinearExpression in VariableGEPIndex as well.	2021-09-30 18:48:51 +02:00
Kazu Hirata	f631173d80	[llvm] Migrate from arg_operands to args (NFC) Note that arg_operands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-09-30 08:51:21 -07:00
Clement Courbet	455b60ccfb	[AA] Teach BasicAA to recognize basic GEP range information. The information can be implicit (from `ValueTracking`) or explicit. This implements the backend part of the following RFC https://groups.google.com/g/llvm-dev/c/T9o51zB1JY. We still need to settle on how to best represent the information in the IR, but this is a separate discussion. Differential Revision: https://reviews.llvm.org/D109746	2021-09-30 08:29:32 +02:00
Nikita Popov	2898101552	[BasicAA] Move DecomposedGEP out of header (NFC) It's sufficient to have a forward declaration in the header, we can move the definition of the struct (and VariableGEPIndex) in the source file.	2021-09-29 23:45:15 +02:00
Nikita Popov	45288edb65	[BasicAA] Pass whole DecomposedGEP to subtraction API (NFC) Rather than separately handling subtraction of offset and variable indices, make this one operation. Also rewrite the implementation to use range-based for loops.	2021-09-29 23:32:15 +02:00
Nikita Popov	49813f7fbf	[BasicAA] Pass DecomposedGEP to constantOffsetHeuristic() (NFC) Rather than separately passing VarIndices and BaseOffset, pass the whole DecomposedGEP.	2021-09-29 22:23:27 +02:00
Sanjay Patel	4414e2ad97	[InstSimplify] (-1 << x) s>> x --> -1 This was noticed in: https://llvm.org/PR51351 https://alive2.llvm.org/ce/z/aLxunD	2021-09-29 13:03:12 -04:00
Paul Robinson	56e681afcc	[TargetLibraryInfo] Pick new/delete calls by target There are two sets of new/delete functions, one with Windows/MSVC mangling and one with Itanium mangling. Mark one set or the other as unavailable depending on the target. Split the test malloc-free-delete.ll into three parts: malloc-free.dll for the C API tests, new-delete-itanium.ll and new-delete-msvc.ll for the target-specific new/delete tests. Differential Revision: https://reviews.llvm.org/D110419	2021-09-28 10:10:25 -07:00
Alex Richardson	9049a1c61e	[ConstantFolding] Fold ptrtoint(gep i8 null, x) -> x I was looking at some missed optimizations in CHERI-enabled targets and noticed that we weren't removing vtable indirection for calls via known pointers-to-members. The underlying reason for this is that we represent pointers-to-function-members as {i8 addrspace(200)*, i64} and generate the constant offsets using (gep i8 null, <index>). We use a constant GEP here since inttoptr should be avoided for CHERI capabilities. The pointer-to-member call uses ptrtoint to extract the index, and due to this missing fold we can't infer the actual value loaded from the vtable. This is the initial constant folding change for this pattern, I will add an InstCombine fold as a follow-up. We could fold all inbounds GEP to null (and therefore the ptrtoint to zero) since zero is the only valid offset for an inbounds GEP. If the offset is not zero, that GEP is poison and therefore returning 0 is valid (https://alive2.llvm.org/ce/z/Gzb5iH). However, Clang currently generates inbounds GEPs on NULL for hand-written offsetof() expressions, so this could lead to miscompilations. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D110245	2021-09-28 17:57:36 +01:00
Alex Richardson	3c51b9e270	Fix incorrect GEP bitwidth in areNonOverlapSameBaseLoadAndStore() When using a datalayout that has pointer index width != pointer size this code triggers an assertion in Value::stripAndAccumulateConstantOffsets(). I encountered this this while compiling FreeBSD for CHERI-RISC-V. Also update LoadsTest.cpp to use a DataLayout with index width != pointer width to ensure this case is tested. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D110406	2021-09-28 17:57:36 +01:00
Bjorn Pettersson	460efc1fb8	[Analysis] Be defensive when matching size_t in lib call signatures When TargetLibraryInfoImpl::isValidProtoForLibFunc is checking function signatures to detect lib calls it may check that a parameter or return value matches with the "size_t" type. For this to work it has to derive the IR type matching with "size_t". Depending on if a DataLayout is provided or not, this has been done in two different way. Either a more strict check being based on IntPtrType (which is given by the DataLayout) or a more relaxed check assuming that any integer type matches with "size_t". Given that the stricter approach exist it seems like we do not want to trigger rewrites etc if we aren't sure that a function calls actually match with the library function. Therefore it was questioned why we actually have the more relaxed approach when not being able to derive an IR type for "size_t". This patch will take a more defensive approach, requiring that a DataLayout is passed to isValidProtoForLibFunc. Differential Revision: https://reviews.llvm.org/D110584	2021-09-28 15:29:37 +02:00
Bjorn Pettersson	1f5ea14bca	[Analysis] Add FIXME:s related to size_t type checks Differential Revision: https://reviews.llvm.org/D110583	2021-09-28 15:29:37 +02:00
Florian Hahn	764d9aa979	Recommit "[SCEV] Look through single value PHIs." (take 2) This reverts commit `8fdac7cb7a`. The issue causing the revert has been fixed a while ago in `60b852092c`. Original message: Now that SCEVExpander can preserve LCSSA form, we do not have to worry about LCSSA form when trying to look through PHIs. SCEVExpander will take care of inserting LCSSA PHI nodes as required. This increases precision of the analysis in some cases. Reviewed By: mkazantsev, bmahjour Differential Revision: https://reviews.llvm.org/D71539	2021-09-28 10:32:17 +01:00
modimo	20faf78919	[ThinLTO] Add noRecurse and noUnwind thinlink function attribute propagation Thinlink provides an opportunity to propagate function attributes across modules, enabling additional propagation opportunities. This change propagates (currently default off, turn on with `disable-thinlto-funcattrs=1`) noRecurse and noUnwind based off of function summaries of the prevailing functions in bottom-up call-graph order. Testing on clang self-build: 1. There's a 35-40% increase in noUnwind functions due to the additional propagation opportunities. 2. Throughput is measured at 10-15% increase in thinlink time which itself is 1.5% of E2E link time. Implementation-wise this adds the following summary function attributes: 1. noUnwind: function is noUnwind 2. mayThrow: function contains a non-call instruction that `Instruction::mayThrow` returns true on (e.g. windows SEH instructions) 3. hasUnknownCall: function contains calls that don't make it into the summary call-graph thus should not be propagated from (e.g. indirect for now, could add no-opt functions as well) Testing: Clang self-build passes and 2nd stage build passes check-all ninja check-all with newly added tests passing Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D36850	2021-09-27 12:28:07 -07:00
Nikita Popov	7a855596c3	[BasicAA] Don't check whether GEP is sized (NFC) GEPs are required to have sized source element type, so we can just assert that here.	2021-09-26 21:21:54 +02:00
Nikita Popov	ba664d9066	[AA] Move earliest escape tracking from DSE to AA This is a followup to D109844 (and alternative to D109907), which integrates the new "earliest escape" tracking into AliasAnalysis. This is done by replacing the pre-existing context-free capture cache in AAQueryInfo with a replaceable (virtual) object with two implementations: The SimpleCaptureInfo implements the previous behavior (check whether object is captured at all), while EarliestEscapeInfo implements the new behavior from DSE. This combines the "earliest escape" analysis with the full power of BasicAA: It subsumes the call handling from D109907, considers a wider range of escape sources, and works with AA recursion. The compile-time cost is slightly higher than with D109907. Differential Revision: https://reviews.llvm.org/D110368	2021-09-25 22:40:41 +02:00
Nikita Popov	1c3859f31d	[BasicAA] Don't consider Argument as escape source (NFCI) The case of an Argument and an identified function local is already handled earlier, because we don't care about captures in that case. As such, we don't need to additionally consider the combination of an Argument with a non-escaping identified function local. This ensures that isEscapeSource() only returns true for instructions, which is necessary for D110368.	2021-09-25 22:08:15 +02:00
Paul Robinson	6185ad03f1	[TargetLibraryInfo] Correctly handle sqrt*_finite Other <math>_finite calls are marked as unavailable except on GNU/Linux; it looks like the sqrt set was just overlooked. Differential Revision: https://reviews.llvm.org/D110418	2021-09-24 11:57:38 -07:00
Florian Hahn	6f28fb7081	Recommit "[DSE] Track earliest escape, use for loads in isReadClobber." This reverts the revert commit `df56fc6ebb`. This version of the patch adjusts the location where the EarliestEscapes cache is cleared when an instruction gets removed. The earliest escaping instruction does not have to be a memory instruction. It could be a ptrtoint instruction like in the added test @earliest_escape_ptrtoint, which subsequently gets removed. We need to invalidate the EarliestEscape entry referring to the ptrtoint when deleting it. This fixes the crash mentioned in https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6	2021-09-24 17:13:27 +01:00
Paul Robinson	1376ae9094	[TargetLibraryInfo][AMDGPU] Minor cleanup, NFC	2021-09-24 07:52:44 -07:00
Nico Weber	df56fc6ebb	Revert "[DSE] Track earliest escape, use for loads in isReadClobber." This reverts commit `5ce89279c0`. Makes clang crash, see comments on https://reviews.llvm.org/D109844	2021-09-24 09:57:59 -04:00
David Sherwood	8e4f7b749c	[Analysis] Fix another issue when querying vscale attributes on functions There are several places in the code that are currently broken where we assume an Instruction is always a member of a BasicBlock that lives in a Function. This is a problem specifically when attempting to get the vscale_range attribute. This patch adds checks that an Instruction's parent also has a parent! I've added a test for a function-less @llvm.vscale intrinsic call here: unittests/Analysis/ValueTrackingTest.cpp	2021-09-24 13:37:23 +01:00
David Sherwood	c2634fc6ab	[Analysis] Fix issues when querying vscale attributes on functions There are several places in the code that are currently broken as they assume an Instruction always has a parent Function when attempting to get the vscale_range attribute. This patch adds checks that an Instruction has a parent. I've added a test for a parentless @llvm.vscale intrinsic call here: unittests/Analysis/ValueTrackingTest.cpp Differential Revision: https://reviews.llvm.org/D110158	2021-09-24 09:58:10 +01:00
Fangrui Song	0bb767e7db	[InlineAdvisor] Use one single quote	2021-09-23 12:16:15 -07:00
Florian Hahn	5ce89279c0	[DSE] Track earliest escape, use for loads in isReadClobber. At the moment, DSE only considers whether a pointer may be captured at all in a function. This leads to cases where we fail to remove stores to local objects because we do not check if they escape before potential read-clobbers or after. Doing context-sensitive escape queries in isReadClobber has been removed a while ago in `d1a1cce5b1` to save compile-time. See PR50220 for more context. This patch introduces a new capture tracker, which keeps track of the 'earliest' capture. An instruction A is considered earlier than instruction B, if A dominates B. If 2 escapes do not dominate each other, the terminator of the common dominator is chosen. If not all uses cannot be analyzed, the earliest escape is set to the first instruction in the function entry block. If the query instruction dominates the earliest escape and is not in a cycle, then pointer does not escape before the query instruction. This patch uses this information when checking if a load of a loaded underlying object may alias a write to a stack object. If the stack object does not escape before the load, they do not alias. I will share a follow-up patch to also use the information for call instructions to fix PR50220. In terms of compile-time, the impact is low in general, NewPM-O3: +0.05% NewPM-ReleaseThinLTO: +0.05% NewPM-ReleaseLTO-g: +0.03 with the largest change being tramp3d-v4 (+0.30%) http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions Compared to always computing the capture information on demand, we get the following benefits from the caching: NewPM-O3: -0.03% NewPM-ReleaseThinLTO: -0.08% NewPM-ReleaseLTO-g: -0.04% The biggest speedup is tramp3d-v4 (-0.21%). http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions Overall there is a small, but noticeable benefit from caching. I am not entirely sure if the speedups warrant the extra complexity of caching. The way the caching works also means that we might miss a few cases, as it is less precise. Also, there may be a better way to cache things. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109844	2021-09-23 12:45:05 +01:00
Alex Richardson	05663dc146	[InstSimplify] Don't lose inbounds when simplifying a GEP I noticed this while working on a (ptrtoint (gep null, x)) -> x fold. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D110168	2021-09-23 09:25:06 +01:00
Sanjay Patel	a85d7a56c7	[ValueTracking] fix isOnlyUsedInZeroEqualityComparison with no users This is another problem exposed by: https://bugs.llvm.org/PR50836	2021-09-22 15:01:53 -04:00
Sanjay Patel	b05804ab4c	[Analysis] reduce code for isOnlyUsedInZeroEqualityComparison; NFC There's a bug here noted by the FIXME and visible in variations of PR50836.	2021-09-22 14:57:53 -04:00
Sanjay Patel	c240169ff2	[Analysis] improve function matching for strlen libcall The return type of strlen is size_t, not just any integer. This is a partial fix for an example based on: https://llvm.org/PR50836 There's another bug here because we can still crash processing a real strlen or something that looks like it.	2021-09-22 13:50:12 -04:00
Florian Mayer	36daf074d9	[hwasan] also omit safe mem[cpy\|mov\|set]. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D109816	2021-09-22 11:08:27 +01:00
George Burgess IV	cd5f582c3d	MemoryBuiltins: update comment; NFC This comment references behavior that was removed in `ccae43a247`, which is a commit from 5 years ago. It seems safe to assume that that behavior won't be coming back soon. If it does, we can readd this part of the comment :)	2021-09-21 13:47:26 -07:00
Michael Liao	2d1ffad010	[IR] Re-group AAMDNodes relevant interfaces. NFC.	2021-09-21 14:29:33 -04:00
Florian Hahn	5131037ea9	[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange. isValidAssumeForContext can provide better results with access to the dominator tree in some cases. This patch adjusts computeConstantRange to allow passing through a dominator tree. The use VectorCombine is updated to pass through the DT to enable additional scalarization. Note that similar APIs like computeKnownBits already accept optional dominator tree arguments. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D110175	2021-09-21 16:54:47 +01:00
Michael Liao	5fb3ae525f	[SelectionDAG] Re-calculate scoped AA metadata when merging stores. Reviewed By: jeroen.dobbelaere Differential Revision: https://reviews.llvm.org/D102821	2021-09-21 11:41:17 -04:00
Max Kazantsev	cd166fb2ef	[SCEV] Use isAvailableAtLoopEntry in the asserts This is what is supposed to be there.	2021-09-21 17:11:15 +07:00
Max Kazantsev	4d5d725428	[SCEV] Add some asserts on availability of arguments of isLoopEntryGuardedByCond The logic in howManyLessThans is fishy. It first checks invariance of RHS, and then uses OrigRHS as argument for isLoopEntryGuardedByCond, which is, strictly saying, a different thing. We are seeing a very rare intermittent failure of availability checks, and it looks like this precondition is sometimes broken. Before we can figure out what's going on, adding asserts that all involved values that may possibly to to isLoopEntryGuardedByCond are available at loop entry. If either of these asserts fails (OrigRHS is the most likely suspect), it means that the logic here is flawed.	2021-09-21 17:08:52 +07:00
Max Kazantsev	2c7d5fbc9e	[SCEV] Generalize implication when signedness of FoundPred doesn't matter The implication logic for two values that are both negative or non-negative says that it doesn't matter whether their predicate is signed and unsigned, but only flips unsigned into signed for further inference. This patch adds support for flipping a signed predicate into unsigned as well. Differential Revision: https://reviews.llvm.org/D109959 Reviewed By: nikic	2021-09-21 11:17:56 +07:00
Max Kazantsev	a06db78fd9	[NFC] Rename Context->CtxI in SCEV for uniformity reasons	2021-09-21 10:12:20 +07:00
Nikita Popov	dd0226561e	[IR] Add helper to convert offset to GEP indices We implement logic to convert a byte offset into a sequence of GEP indices for that offset in a number of places. This patch adds a DataLayout::getGEPIndicesForOffset() method, which implements the core logic. I've updated SROA, ConstantFolding and InstCombine to use it, and there's a few more places where it looks relevant. Differential Revision: https://reviews.llvm.org/D110043	2021-09-20 20:18:16 +02:00
David Sherwood	f988f68064	[Analysis] Add support for vscale in computeKnownBitsFromOperator In ValueTracking.cpp we use a function called computeKnownBitsFromOperator to determine the known bits of a value. For the vscale intrinsic if the function contains the vscale_range attribute we can use the maximum and minimum values of vscale to determine some known zero and one bits. This should help to improve code quality by allowing certain optimisations to take place. Tests added here: Transforms/InstCombine/icmp-vscale.ll Differential Revision: https://reviews.llvm.org/D109883	2021-09-20 15:01:59 +01:00
Florian Hahn	7f6a4826ac	[CaptureTracking] Allow passing LI to PointerMayBeCapturedBefore (NFC). isPotentiallyReachable can use LoopInfo to return earlier. This patch allows passing an optional LI to PointerMayBeCapturedBefore. Used in D109844. Reviewed By: nikic, asbirlea Differential Revision: https://reviews.llvm.org/D109978	2021-09-20 09:07:34 +01:00
Max Kazantsev	def15c5fb6	[SCEV] Support negative values in signed/unsigned predicate reasoning There is a piece of logic that uses the fact that signed and unsigned versions of the same predicate are equivalent when both values are non-negative. It's also true when both of them are negative. Differential Revision: https://reviews.llvm.org/D109957 Reviewed By: nikic	2021-09-20 11:26:33 +07:00
Kazu Hirata	84b07c9b3a	[llvm] Use pop_back_val (NFC)	2021-09-19 13:44:23 -07:00
Arthur Eubanks	0db9481208	[NFC] Remove FIXMEs about calling LLVMContext::yield() Nobody has complained about this, and the documentation for LLVMContext::yield() states that LLVM is allowed to never call it. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D110008	2021-09-17 14:59:34 -07:00
Hongtao Yu	c5fafc1e73	[CSSPGO] Tweakes to lower pseudo probe runtime overhead A couple tweaks to 1. allow more thinlto importing by excluding probe intrinsics from IR size in module summary 2. Allow general default attributes (nofree nosync nounwind) for pseudo probe intrinsic. Without those attributes, pseudo probes will be basically treated as unknown calls which will in turn block their containing functions from annotated with those attributes. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D109976	2021-09-17 12:28:09 -07:00
Nikita Popov	0fc624f029	[IR] Return AAMDNodes from Instruction::getMetadata() (NFC) getMetadata() currently uses a weird API where it populates a structure passed to it, and optionally merges into it. Instead, we can return the AAMDNodes and provide a separate merge() API. This makes usages more compact. Differential Revision: https://reviews.llvm.org/D109852	2021-09-16 21:06:57 +02:00
Michael Liao	ffa5c3a555	Fix warning on `llvm-else-after-return`. NFC.	2021-09-16 11:25:43 -04:00
Kazu Hirata	385f380e80	[MemorySSA] Fix "set but not used" warnings	2021-09-15 11:41:41 -07:00
Philip Reames	9bdb19cca2	[SCEV] (udiv X, Y) * Y is always NUW Motivated by the removal done in D109782. This implements the correct flag part generically. Differential Revision: https://reviews.llvm.org/D109786	2021-09-15 11:34:50 -07:00
Alina Sbirlea	b759381b75	[MemorySSA] Add verification levels to MemorySSA. [NFC] Add two levels of verification for MemorySSA: Fast and Full. The defaults are kept the same. Full verification always occurs under EXPENSIVE_CHECKS, but now it can also be requested in a specific pass for debugging purposes.	2021-09-15 11:09:54 -07:00
David Green	61cc873a8e	[LV] Recognize intrinsic min/max reductions This extends the reduction logic in the vectorizer to handle intrinsic versions of min and max, both the floating point variants already created by instcombine under fastmath and the integer variants from D98152. As a bonus this allows us to match a chain of min or max operations into a single reduction, similar to how add/mul/etc work. Differential Revision: https://reviews.llvm.org/D109645	2021-09-15 10:45:50 +01:00
Markus Lavin	1ac209ed76	[NPM] Added -print-pipeline-passes print params for a few passes. Added '-print-pipeline-passes' printing of parameters for those passes declared with _WITH_PARAMS macro in PassRegistry.def. Note that it only prints the parameters declared inside _WITH_PARAMS as in a few cases there appear to be additional parameters not parsable. The following passes are now covered (i.e. all of those with *_WITH_PARAMS in PassRegistry.def). LoopExtractorPass - loop-extract HWAddressSanitizerPass - hwsan EarlyCSEPass - early-cse EntryExitInstrumenterPass - ee-instrument LowerMatrixIntrinsicsPass - lower-matrix-intrinsics LoopUnrollPass - loop-unroll AddressSanitizerPass - asan MemorySanitizerPass - msan SimplifyCFGPass - simplifycfg LoopVectorizePass - loop-vectorize MergedLoadStoreMotionPass - mldst-motion GVN - gvn StackLifetimePrinterPass - print<stack-lifetime> SimpleLoopUnswitchPass - simple-loop-unswitch Differential Revision: https://reviews.llvm.org/D109310	2021-09-15 08:34:04 +02:00
Philip Reames	0dd755f027	[SCEV] Stop applying contextual flags in applyLoopGuards This fixes a violation of the wrap flag rules introduced in `c4048d8f`. As noted in the original review, the NUW is legal to infer from the structure of the replacee, but a) there's no test coverage, and b) this should be done generically for all multiplies. Differential Revision: https://reviews.llvm.org/D109782	2021-09-14 14:14:52 -07:00
Florian Hahn	e248d69036	Recommit "[LAA] Support pointer phis in loop by analyzing each incoming pointer." SCEV does not look through non-header PHIs inside the loop. Such phis can be analyzed by adding separate accesses for each incoming pointer value. This results in 2 more loops vectorized in SPEC2000/186.crafty and avoids regressions when sinking instructions before vectorizing. Fixes PR50296, PR50288. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D102266	2021-09-14 11:19:12 +01:00
Kuba Mracek	e80ee4cbd9	[GlobalDCE] In VFE support for relative pointers, allow GEP references to the base symbol This is for Swift VFE support. In some vtable forms that Swift emits, the "base" of a relative pointer is not the global symbol itself directly, but a GEP into it -- so the pointer is relative to a particular field in the global. So getPointerAtOffset() needs to be able to see through the GEP and allow it in a SUB expression, to correctly recognize the offset as a vtable slot. Differential Revision: https://reviews.llvm.org/D109169	2021-09-13 15:22:11 -07:00
Florian Mayer	0a22510f3e	[value-tracking] see through returned attribute. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D109675	2021-09-13 20:52:26 +01:00
Florian Mayer	5b5d774f5d	[hwasan] Respect returns attribute when tracking values. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D109233	2021-09-13 20:52:24 +01:00
Nikita Popov	45c467346a	[LAA] Pass access type to getPtrStride() Pass the access type to getPtrStride(), so it is not determined from the pointer element type. Many cases still fetch the element type at a higher level though, so this only partially addresses the issue.	2021-09-11 19:16:49 +02:00
Johannes Doerfert	c09fbbdcfb	Reapply "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals"" This reapplies commit `7dbba3376f`, or, put differently, this reverts commit `d9a8d20827`. The test now requires the amdgpu and nvptx backend explicitly as it won't work without properly.	2021-09-10 15:22:56 -05:00
Florian Mayer	57335b6e2e	[stack-safety] Allow to determine safe accesses. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D109503	2021-09-10 19:23:54 +01:00
Johannes Doerfert	d9a8d20827	Revert "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals" This reverts commit `7dbba3376f`. There seems to be a problem with the tests, investigating now: https://lab.llvm.org/buildbot/#/builders/61/builds/14574	2021-09-10 12:23:08 -05:00
Johannes Doerfert	7dbba3376f	[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals Not all address spaces support initializers for globals and we can therefore not set them without checking if they are allowed. This patch adds a hook into TTI to check if an AS allows non-undef initializers. We disable it for all but address space 0 by default, NVPTX and AMDGPU targets allow all but address space 3. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D109337	2021-09-10 12:08:50 -05:00
Philip Reames	bfa2a81e92	[ScalarEvolution] Add an additional bailout to avoid NOT of pointer. It's possible in some cases for the LHS to be a pointer where the RHS is not. This isn't directly possible for an icmp, but the analysis mixes up operands of different icmp expressions in some cases. This does not include a test case as the smallest reduced case we've managed is extremely fragile and unlikely to test anything meaningful in the long term. Also add an assertion to getNotSCEV() to make tracking down this sort of issue a bit easier in the future. Fixes https://bugs.llvm.org/show_bug.cgi?id=51787 . Differential Revision: https://reviews.llvm.org/D109546	2021-09-09 15:19:36 -07:00
Philip Reames	eede4846a9	[SCEV] Allow negative steps for LT exit count computation for unsigned comparisons This bit of code is incredibly suspicious. It allows fully unknown (but potentially negative) steps, but not steps known to be negative. The comment about scev flag inference is worrying, but also not correct to my knowledge. At best, this might be covering up some related miscompile. However, there's no test in tree for it, the review history doesn't include obvious motivation, and the C++ example doesn't appear to give wrong results when hand translated to IR. I think it's time to remove this and see what falls out. During review, there were concerns raised about the correctness of the corresponding signed case. This change was deliberately narrowed to the unsigned case which has been auditted and appears correct for negative values. We need to get back to the known-negative signed case, but that'll be a future patch if nothing falls out from this one. Differential Revision: https://reviews.llvm.org/D104140	2021-09-09 14:09:29 -07:00
Eli Friedman	8f792707c4	[ScalarEvolution] Fix pointer/int confusion in howManyLessThans. In general, howManyLessThans doesn't really want to work with pointers at all; the result is an integer, and the operands of the icmp are effectively integers. However, isLoopEntryGuardedByCond doesn't like extra ptrtoint casts, so the arguments to isLoopEntryGuardedByCond need to be computed without those casts. Somehow, the values got mixed up with the recent howManyLessThans improvements; fix the confused values, and add a better comment to explain what's happening. Differential Revision: https://reviews.llvm.org/D109465	2021-09-09 12:38:33 -07:00
Chris Lattner	735f46715d	[APInt] Normalize naming on keep constructors / predicate methods. This renames the primary methods for creating a zero value to `getZero` instead of `getNullValue` and renames predicates like `isAllOnesValue` to simply `isAllOnes`. This achieves two things: 1) This starts standardizing predicates across the LLVM codebase, following (in this case) ConstantInt. The word "Value" doesn't convey anything of merit, and is missing in some of the other things. 2) Calling an integer "null" doesn't make any sense. The original sin here is mine and I've regretted it for years. This moves us to calling it "zero" instead, which is correct! APInt is widely used and I don't think anyone is keen to take massive source breakage on anything so core, at least not all in one go. As such, this doesn't actually delete any entrypoints, it "soft deprecates" them with a comment. Included in this patch are changes to a bunch of the codebase, but there are more. We should normalize SelectionDAG and other APIs as well, which would make the API change more mechanical. Differential Revision: https://reviews.llvm.org/D109483	2021-09-09 09:50:24 -07:00
Florian Mayer	6e12c73316	[NFC] [stack-safety] add placeholder addRange. This is in preparataion of D108457.	2021-09-09 13:13:18 +01:00
Florian Mayer	d261d4cf55	[stack-safety] [NFC] do not terminate print with blank line.	2021-09-09 12:31:09 +01:00
Florian Mayer	08b4dd8b24	[NFC] [stack-safety] remove unused return value.	2021-09-09 12:19:47 +01:00
Philip Reames	e741fabc22	[SCEV] Move getIndexExpressionsFromGEP to delinearize [NFC]	2021-09-08 16:56:49 -07:00
Philip Reames	4b5e260b1d	[SCEV] Simplify findExistingSCEVInCache interface [NFC] We were returning a tuple when all but one caller only cared about one piece of the return value. That one caller can inline the complexity, and we can simplify all other uses.	2021-09-08 15:26:07 -07:00
Arthur Eubanks	fe15347a1e	Port the cost model printer to New PM Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109284	2021-09-08 14:47:05 -07:00
Michael Kruse	088577a38e	[Delinerization] Require by offset to be zero. Users of delinearization assume that the the offset into the array element is zero. In most cases it will indeed be zero, but if it is not, the delinearization has to fail since it violates that assumption without the API even allowing to signal to the caller that the by offset is non-zero. This bug caused Polly to miscompile blender (526.blender_r from SPEC CPU 2017) in -polly-process-unprofitable mode. The SCEV expression incorrectly delinearized has been reduced in the test case byte_offset.ll. The dropped offset into the array element of size 4 (a float) is ((sext i32 %mul7.i4534 to i64) + {(sext i32 %i1 to i64),+,((sext i32 (1 + ((1 + %shl.i.i) * (1 + %shl.i.i)) + %shl.i.i) to i64) * (sext i32 %i1 to i64))}<%for.body703>). This significant component was just dropped, and the wrong pointer was computed when regenerating code from the remaining delinearized subscripts. This occurred during blender's subsurface scattering implementation. As a result, blender's rendering diverged from the reference image. Patch D108885 would also fix the API. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D109133	2021-09-08 16:02:37 -05:00
Florian Hahn	f4726e7238	[LAA] Remove unused OrigPtr from replaceSymbolicStrideSCEV (NFC). The OrigPtr argument is not used in tree.	2021-09-08 22:35:36 +02:00
Arthur Eubanks	b493124ae2	[MemorySSA] Support invariant.group metadata The implementation is mostly copied from MemDepAnalysis. We want to look at all loads and stores to the same pointer operand. Bitcasts and zero GEPs of a pointer are considered the same pointer value. We choose the most dominating instruction. Since updating MemorySSA with invariant.group is non-trivial, for now handling of invariant.group is not cached in any way, so it's part of the walker. The number of loads/stores with invariant.group is small for now anyway. We can revisit if this actually noticeably affects compile times. To avoid invariant.group affecting optimized uses, we need to have optimizeUsesInBlock() not use invariant.group in any way. Co-authored-by: Piotr Padlewski <prazek@google.com> Reviewed By: asbirlea, nikic, Prazek Differential Revision: https://reviews.llvm.org/D109134	2021-09-08 13:06:12 -07:00
Philip Reames	585c594d74	Move delinearization logic out of SCEV [NFC] None of this logic has anything to do with SCEV's internals, it just uses the existing public APIs. As a result, we can move the code from ScalarEvolution.cpp/hpp to Delinearization.cpp/hpp with only minor changes. This was discussed in advance on today's loop opt call. It turned out to be easy as hoped.	2021-09-08 12:28:35 -07:00
Philip Reames	6cdca906c7	[SCEV] Use no-self-wrap flags infered from exit structure to compute trip count The basic problem being solved is that we largely give up when encountering a trip count involving an IV which is not an addrec. We will fall back to the brute force constant eval, but that doesn't have the information about the fact that we can't cycle back through the same set of values. There's a high level design question of whether this is the right place to handle this, and if not, where that place is. The major alternative here would be to return a conservative upper bound, and then rely on two invocations of indvars to add the facts to the narrow IV, and then reconstruct SCEV. (I have not implemented the alternative and am not 100% sure this would work out.) That's arguably more in line with existing code, but I find this substantially easier to reason about. During review, no one expressed a strong opinion, so we went with this one. Differential Revision: D108651	2021-09-07 17:00:02 -07:00
Philip Reames	9659069978	[SCEV] Further clarify comments regarding UB and zero stride Follow on to D109029. I realized we had no mention of mustprogrress in the comment (as it prexisted mustprogress in the codebase). In the process of adding it, I tweaked the preconditions into something I think is more clear. Note that mustprogress is checked in the code. Differential Revision: https://reviews.llvm.org/D109091	2021-09-07 13:53:56 -07:00
Nikita Popov	58db5f6e95	[ConstFold] Support opaque pointers in constexpr GEPs Support opaque pointers in SymbolicallyEvaluateGEP() by using the value type of a GlobalValue base or falling back to i8 if there isn't one. We don't unconditionally generate i8 GEPs here because that would lose inrange attribues, and because some optimizations on globals currently rely on GEP types (e.g. the globals SROA mentioned in the comment). Differential Revision: https://reviews.llvm.org/D109297	2021-09-07 20:50:29 +02:00
Kazu Hirata	5648f7170e	[Analysis, Target, Transforms] Construct SmallVector with iterator ranges (NFC)	2021-09-07 09:19:33 -07:00
Nikita Popov	8d54c8a0c3	[SCEV] Fix applyLoopGuards() with range check idiom (PR51760) Due to a typo, this replaced %x with umax(C1, umin(C2, %x + C3)) rather than umax(C1, umin(C2, %x)). This didn't make a difference for the existing tests, because the result is only used for range calculation, and %x will usually have an unknown starting range, and the additional offset keeps it unknown. However, if %x already has a known range, we may compute a result range that is too small.	2021-09-06 22:22:41 +02:00
Andrew Litteken	bd4b1b5f6d	[IRSim] Adding support for recognizing branch similarity The current IRSimilarityIdentifier does not try to find similarity across blocks, this patch provides a mechanism to compare two branches against one another, to find similarity across basic blocks, rather than just within them. This adds a step in the similarity identification process that labels all of the basic blocks so that we can identify the relative branching locations. Within an IRSimilarityCandidate we use these relative locations to determine whether if the branching to other relative locations in the same region is the same between branches. If they are, we consider them similar. We do not consider the relative location of the branch if the target branch is outside of the region. In this case, both branches must exit to a location outside the region, but the exact relative location does not matter. Reviewers: paquette, yroux Differential Revision: https://reviews.llvm.org/D106989	2021-09-06 11:55:38 -07:00
Sander de Smalen	96f6785bc9	[VectorUtils] Teach findScalarElement to return splat value. If the vector is a splat of some scalar value, findScalarElement() can simply return the scalar value if it knows the requested lane is in the vector. This is only needed for scalable vectors, because the InsertElement/ShuffleVector case is already handled explicitly for the fixed-width case. This helps to recognize an InstCombine fold like: extractelt(bitcast(splat(%v))) -> bitcast(%v) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D107254	2021-09-06 10:56:06 +01:00
Michael Kruse	650bbc5620	[OpenMP][OpenMPIRBuilder] Implement loop unrolling. Recommit of `707ce34b06`. Don't introduce a dependency to the LLVMPasses component, instead register the required passes individually. Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are: * `unrollLoopFull` * `unrollLoopPartial` * `unrollLoopHeuristic` `unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility. With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism. Reviewed By: jdoerfert, kiranchandramohan Differential Revision: https://reviews.llvm.org/D107764	2021-09-04 19:18:58 -05:00
Chen Zheng	34badc409c	Revert "[HardwareLoops] Change order of SCEV expression construction for InitLoopCount." This causes https://bugs.llvm.org/show_bug.cgi?id=51714 and is not a right patch according to comments in D91724 This reverts commit `42eaf4fe0a`.	2021-09-03 02:55:43 +00:00
Arthur Eubanks	813a7f1ad7	[MemorySSA] Properly handle liveOnEntry in the walker printer Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109177	2021-09-02 12:51:27 -07:00
Wenlei He	f7fff46acc	[CSSPGO] Allow inlining recursive call for preinliner When preinliner is used for CSSPGO, we try to honor global preinliner decision as much as we can except for uninlinable callees. We rely on InlineCost::Never to prevent us from illegal inlining. However, it turns out that we use InlineCost::Never for both illeagle inlining and some of the "not-so-beneficial" inlining. The most common one is recursive inlining, while it can bloat size a lot during CGSCC bottom-up inlining, it's less of a problem when recursive inlining is guided by profile and done in top-down manner. Ideally it'd be better to have a clear separation between inline legality check vs cost-benefit check, but that requires a bigger change. This change enables InlineCost computation to allow inlining recursive calls, controlled by InlineParams. In SampleLoader, we now enable recursive inlining for CSSPGO when global preinliner decision is used. With this change, we saw a few perf improvements on SPEC2017 with CSSPGO and preinliner on: 2% for povray_r, 6% for xalancbmk_s, 3% omnetpp_s, while size is about the same (no noticeable perf change for all other benchmarks) Differential Revision: https://reviews.llvm.org/D109104	2021-09-02 11:24:27 -07:00
Daniil Suchkov	5c97507e2b	[InlineCost] Introduce attributes to override InlineCost for inliner testing This patch introduces four new string attributes: function-inline-cost, function-inline-threshold, call-inline-cost and call-threshold-bonus. These attributes allow you to selectively override some aspects of InlineCost analysis. That would allow us to test inliner separately from the InlineCost analysis. That could be useful when you're trying to write tests for inliner and you need to test some very specific situation, like "the inline cost has to be this high", or "the threshold has to be this low". Right now every time someone does that, they have get creative to come up with a way to make the InlineCost give them the number they need (like adding ~30 load/add pairs for a trivial test). This process can be somewhat tedious which can discourage some people from writing enough tests for their changes. Also, that results in tests that are fragile and can be easily broken without anyone noticing it because the test writer can't explicitly control what input the inliner will get from the inline cost analysis. These new attributes will alleviate those problems to an extent. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D109033	2021-09-02 17:35:06 +00:00
Roman Lebedev	3f1f08f0ed	Revert @llvm.isnan intrinsic patchset. Please refer to https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html (and that whole thread.) TLDR: the original patch had no prior RFC, yet it had some changes that really need a proper RFC discussion. It won't be productive to discuss such an RFC, once it's actually posted, while said patch is already committed, because that introduces bias towards already-committed stuff, and the tree is potentially in broken state meanwhile. While the end result of discussion may lead back to the current design, it may also not lead to the current design. Therefore i take it upon myself to revert the tree back to last known good state. This reverts commit `4c4093e6e3`. This reverts commit `0a2b1ba33a`. This reverts commit `d9873711cb`. This reverts commit `791006fb8c`. This reverts commit `c22b64ef66`. This reverts commit `72ebcd3198`. This reverts commit `5fa6039a5f`. This reverts commit `9efda541bf`. This reverts commit `94d3ff09cf`.	2021-09-02 13:53:56 +03:00
Roman Lebedev	50634deaa5	Revert "[OpenMP][OpenMPIRBuilder] Implement loop unrolling." Breaks build with -DBUILD_SHARED_LIBS=ON ``` CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle): "LLVMFrontendOpenMP" of type SHARED_LIBRARY depends on "LLVMPasses" (weak) "LLVMipo" of type SHARED_LIBRARY depends on "LLVMFrontendOpenMP" (weak) "LLVMCoroutines" of type SHARED_LIBRARY depends on "LLVMipo" (weak) "LLVMPasses" of type SHARED_LIBRARY depends on "LLVMCoroutines" (weak) depends on "LLVMipo" (weak) At least one of these targets is not a STATIC_LIBRARY. Cyclic dependencies are allowed only among static libraries. CMake Generate step failed. Build files cannot be regenerated correctly. ``` This reverts commit `707ce34b06`.	2021-09-02 12:42:23 +03:00
Michael Kruse	707ce34b06	[OpenMP][OpenMPIRBuilder] Implement loop unrolling. Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are: * `unrollLoopFull` * `unrollLoopPartial` * `unrollLoopHeuristic` `unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility. With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism. Reviewed By: jdoerfert, kiranchandramohan Differential Revision: https://reviews.llvm.org/D107764	2021-09-02 02:37:25 -05:00
Arthur Eubanks	7b08d9da55	Reland [MemorySSA] Add pass to print results of MemorySSA walker Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109028	2021-09-01 18:58:57 -07:00
Arthur Eubanks	0f63496ea4	Revert "[MemorySSA] Add pass to print results of MemorySSA walker" This reverts commit `8f98477c2d`. Breaks bots	2021-09-01 18:45:19 -07:00
Arthur Eubanks	8f98477c2d	[MemorySSA] Add pass to print results of MemorySSA walker Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109028	2021-09-01 18:29:15 -07:00
Philip Reames	bb0fa3ea02	Revert "snapshot - do not push" This reverts commit `91f4655d92`. This wasn't intented to be pushed, sorry.	2021-09-01 16:59:23 -07:00
Philip Reames	91f4655d92	snapshot - do not push	2021-09-01 16:59:01 -07:00
Alina Sbirlea	a10409fe23	[MemorySSAUpdater] Simplify updates when only deleting edges. When performing only edge deletion, we don't need to do the DT updates back and forth. Check for the existance of insert updates to simplify this.	2021-09-01 15:48:20 -07:00
Philip Reames	73b951a7f7	[SCEV] Clarify requirements for zero-stride to be UB There's a silent bug in our reasoning about zero strides. We assume that having a single static exit implies that if that exit is not taken, then the loop must be infinite. This ignores the potential for abnormal exits via exceptions. Consider the following example: for (uint_8 i = 0; i < 1; i += 0) { throw_on_thousandth_call(); } Our reasoning is such that we'd conclude this loop can't take the backedge as that would lead to a (presumed) infinite loop. In practice, this is a silent bug because the loopIsFiniteByAssumption returns false strictly more often than the loopHaNoAbnormalExits property. We could reasonable want to change that in the future, so fixing the codeflow now is worthwhile. Differential Revision: https://reviews.llvm.org/D109029	2021-09-01 14:01:13 -07:00
Nikita Popov	02f74eadbe	[IVDescriptors] Make pointer inductions compatible with opaque pointers Store the used element type in the InductionDescriptor. For typed pointers, it remains the pointer element type. For opaque pointers, we always use an i8 element type, such that the step is a simple offset. A previous version of this patch instead tried to guess the element type from an induction GEP, but this is not reliable, as the GEP may be hidden (see @both in iv_outside_user.ll). Differential Revision: https://reviews.llvm.org/D104795	2021-09-01 21:02:05 +02:00
Philip Reames	29fa37ec9f	[SCEV] If max BTC is zero, then so is the exact BTC [2 of 2] This extends D108921 into a generic rule applied to constructing ExitLimits along all paths. The remaining paths (primarily howFarToZero) don't have the same reasoning about UB sensitivity as the howManyLessThan ones did. Instead, the remain cause for max counts being more precise than exact counts is that we apply context sensitive loop guards on the max path, and not on the exact path. That choice is mildly suspect, but out of scope of this patch. The MVETailPredication.cpp change deserves a bit of explanation. We were previously figuring out that two SCEVs happened to be equal because the happened to be identical. When we optimized one with context sensitive information, but not the other, we lost the ability to prove them equal. So, cover this case by subtracting and then applying loop guards again. Without this, we see changes in test/CodeGen/Thumb2/mve-blockplacement.ll Differential Revision: https://reviews.llvm.org/D109015	2021-09-01 11:51:48 -07:00
Philip Reames	6600e1759b	[SCEV] If max BTC is zero, then so is the exact BTC [1 of N] This patch is specifically the howManyLessThan case. There will be a couple of followon patches for other codepaths. The subtle bit is explaining why the two codepaths have a difference while both are correct. The test case with modifications is a good example, so let's discuss in terms of it. * The previous exact bounds for this example of (-126 + (126 smax %n))<nsw> can evaluate to either 0 or 1. Both are "correct" results, but only one of them results in a well defined loop. If %n were 127 (the only possible value producing a trip count of 1), then the loop must execute undefined behavior. As a result, we can ignore the TC computed when %n is 127. All other values produce 0. * The max taken count computation uses the limit (i.e. the maximum value END can be without resulting in UB) to restrict the bound computation. As a result, it returns 0 which is also correct. WARNING: The logic above only holds for a single exit loop. The current logic for max trip count would be incorrect for multiple exit loops, except that we never call computeMaxBECountForLT except when we can prove either a) no overflow occurs in this IV before exit, or b) this is the sole exit. An alternate approach here would be to add the limit logic to the symbolic path. I haven't played with this extensively, but I'm hesitant because a) the term is optional and b) I'm not sure it'll reliably simplify away. As such, the resulting code quality from expansion might actually get worse. This was noticed while trying to figure out why D108848 wasn't NFC, but is otherwise standalone. Differential Revision: https://reviews.llvm.org/D108921	2021-08-31 08:50:11 -07:00
Kuba Mracek	4c066bd08b	[GlobalDCE] Handle relative pointers in VFE (for Swift vtables) To support Virtual Function Elimination to Swift, this PR adds support for Swift vtables which contain "relative pointers" instead of direct pointer references. These are in the form of: @symbol = ... { i32 trunc (i64 sub (i64 ptrtoint (<type> @target to i64), i64 ptrtoint (... @symbol to i64)) to i32) } The PR extends GlobalDCE's way of looking up a vtable offset into a dependency to be able to see through this expression and find the target symbol. Differential Revision: https://reviews.llvm.org/D107645	2021-08-31 07:07:22 -07:00
Andrew Litteken	cf56b08d15	[IRSim] Adding missing comments canonical relation commit Adding missing comments to IRSimilarityIdentifier.cpp since they were not properly added in commit `063af63b96`.	2021-08-30 08:41:05 -07:00
Nikita Popov	9f7873784d	[SCEVExpander] Reuse removePointerBase() for canonical addrecs ExposePointerBase() in SCEVExpander implements basically the same functionality as removePointerBase() in SCEV, so reuse it. The SCEVExpander code assumes that the pointer operand on adds is the last one -- I'm not sure that always holds. As such this might not be strictly NFC.	2021-08-29 21:12:35 +02:00
Nikita Popov	e6a5dd60ff	[SCEV] Assert unique pointer base (NFC) Add expressions can contain at most one pointer operand nowadays, assert that in getPointerBase() and removePointerBase().	2021-08-29 20:06:24 +02:00
Kazu Hirata	0003d57434	[Analysis] Fix a "set but not used" warning	2021-08-28 06:37:01 -07:00
Andrew Litteken	063af63b96	[IRSim][IROutliner] Canonicalizing commutative value numbering between similarity sections. When the initial relationship between two pairs of values between similar sections is ambiguous to commutativity, arguments to the outlined functions can be passed in such that the order is incorrect, causing miscompilations. This adds a canonical mapping to each similarity section, so that we can maintain the relationship of global value numbering from one section to another. Added Tests: Transforms/IROutliner/outlining-commutative-operands-opposite-order.ll unittests/Analysis/IRSimilarityIdentifierTest.cpp - IRSimilarityCandidate:CanonicalNumbering Reviewers: jroelofs, jpaquette, yroux Differential Revision: https://reviews.llvm.org/D104143	2021-08-27 15:02:56 -07:00
Philip Reames	ec8d87e9f5	[SCEV] Infer nuw from nw for addrecs This was previously committed in `914836b`, and reverted due to confusion on the status of the review. Differential Revision: https://reviews.llvm.org/D108601	2021-08-24 14:24:05 -07:00
Sanjay Patel	204038d52e	[InstSimplify] fold or+shifted -1 to -1 These are similar to the rotate pattern added with: `dcf659e821` ...but we don't have guard ops on the shift amount, so we don't canonicalize to the intrinsic. declare void @llvm.assume(i1) define i32 @src(i32 %shamt, i32 %bitwidth) { ; subtract must be in range of bitwidth %lt = icmp ule i32 %bitwidth, 32 call void @llvm.assume(i1 %lt) %r = lshr i32 -1, %shamt %s = sub i32 %bitwidth, %shamt %l = shl i32 -1, %s %o = or i32 %r, %l ret i32 %o } define i32 @tgt(i32 %shamt, i32 %bitwidth) { ret i32 -1 } https://alive2.llvm.org/ce/z/aF7WHx	2021-08-24 15:38:38 -04:00
Philip Reames	58582bae63	Revert "[SCEV] Infer nsw/nuw from nw for addrecs" This reverts commit `914836b1c8`. Further comments on review came up after initial approval. Reverting while addressing.	2021-08-24 09:28:37 -07:00
Philip Reames	914836b1c8	[SCEV] Infer nsw/nuw from nw for addrecs If we no an addrec doesn't self-wrap, the increment is strictly positive, and the start value is the smallest representable value, then we know that the corresponding wrap type can not occur. Differential Revision: https://reviews.llvm.org/D108601	2021-08-24 08:53:21 -07:00
Philip Reames	96ef794fd0	[SCEV] Add a hasFlags utility to improve readability [NFC]	2021-08-23 17:36:52 -07:00
Mircea Trofin	1055c5e1d3	[MLGO] Make sure inliner logs when deleting callees When using final reward (which is now the default), we were skipping logging decisions that were leading to callee deletion. This fixes that. Differential Revision: https://reviews.llvm.org/D108587	2021-08-23 14:54:46 -07:00
Florian Hahn	d024a01511	Recommit "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64" This reverts the revert `ab9296f13b`. The issue causing the revert should be fixed in `9baed023b4`.	2021-08-23 11:25:27 +01:00
Sanjay Patel	dcf659e821	[InstSimplify] fold rotate of -1 to -1 This is part of solving more general rotate patterns seen in bugs related to: https://llvm.org/PR51575 https://alive2.llvm.org/ce/z/GpkFCt	2021-08-22 09:15:48 -04:00
Sanjay Patel	d41e308f10	[InstSimplify] fold rotate of zero to zero This is part of solving more general rotate patterns seen in bugs related to: https://llvm.org/PR51575 https://alive2.llvm.org/ce/z/fjKwqv	2021-08-22 09:15:48 -04:00
Mircea Trofin	8dc3fe0cd1	[NFC][MLGO] Use std::move when moving protobufs Because of an odd linking problem, we need to temporarily support building with TF C API 1.15 + tensorflow 2.50 pip package in 'development' mode scenarios. Protobuf Message 'Swap' is partially implemented in the header (2.50) and relies on a symbol not found in TF C API 1.15. std::move avoids that, at no semantic cost.	2021-08-20 13:40:35 -07:00
Florian Hahn	ab9296f13b	Revert "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64" This reverts commit `f4122398e7` to investigate a crash exposed by it. The patch breaks building the code below with `clang -O2 --target=aarch64-linux` int a; double b, c; void d() { for (; a; a++) { b += c; c = a; } }	2021-08-20 21:24:28 +01:00
Augie Fackler	e59c88294b	MemoryBuiltins: trailing , on collection literal This was probably bugging more than is reasonable, but it makes merging changes in this file slightly less annoying to have the trailing comma here. I only noticed this because Rust is currently carrying a patch to this file and it kept making life a little difficult.	2021-08-19 17:59:23 +02:00
David Sherwood	f4122398e7	[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64 I have added a new TTI interface called enableOrderedReductions() that controls whether or not ordered reductions should be enabled for a given target. By default this returns false, whereas for AArch64 it returns true and we rely upon the cost model to make sensible vectorisation choices. It is still possible to override the new TTI interface by setting the command line flag: -force-ordered-reductions=true\|false I have added a new RUN line to show that we use ordered reductions by default for SVE and Neon: Transforms/LoopVectorize/AArch64/strict-fadd.ll Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll Differential Revision: https://reviews.llvm.org/D106653	2021-08-19 09:29:40 +01:00
Peter Collingbourne	6f85225ef3	StackLifetime: Remove asserts for multiple lifetime intrinsics. According to the langref, it is valid to have multiple consecutive lifetime start or end intrinsics on the same object. For llvm.lifetime.start: "If ptr [...] is a stack object that is already alive, it simply fills all bytes of the object with poison." For llvm.lifetime.end: "Calling llvm.lifetime.end on an already dead alloca is no-op." However, we currently fail an assertion in such cases. I've observed the assertion failure when the loop vectorization pass duplicates the intrinsic. We can conservatively handle these intrinsics by ignoring all but the first one, which can be implemented by removing the assertions. Differential Revision: https://reviews.llvm.org/D108337	2021-08-18 18:45:28 -07:00
Arthur Eubanks	7557d6c896	[NFC] Cleanup calls to CallBase::getAttribute()	2021-08-18 09:39:33 -07:00
Arthur Eubanks	3f4d00bc3b	[NFC] More get/removeAttribute() cleanup	2021-08-17 21:05:41 -07:00
Mark Danial	4018d25da8	LoopNest Analysis expansion to return instructions that prevent a Loop Nest from being perfect Expand LoopNestAnalysis to return the full list of instructions that cause a loop nest to be imperfect. This is useful for other passes to know if they should continue for in the inner loops. Added New function getInterveningInstructions that returns a small vector with the instructions that prevent a loop for being perfect. Also added a couple of helper functions to reduce code duplication. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D107773	2021-08-17 22:25:49 +00:00
Nikita Popov	735a590471	[MemorySSA] Remove -enable-mssa-loop-dependency option This option has been enabled by default for quite a while now. The practical impact of removing the option is that MSSA use cannot be disabled in default pipelines (both LPM and NPM) and in manual LPM invocations. NPM can still choose to enable/disable MSSA using loop vs loop-mssa. The next step will be to require MSSA for LICM and drop the AST-based implementation entirely. Differential Revision: https://reviews.llvm.org/D108075	2021-08-16 20:59:37 +02:00
Paul Robinson	94b4598d77	[PS4] stp[n]cpy not available on PS4	2021-08-16 09:06:52 -07:00
Sanjay Patel	ca637014f1	[Analysis][SimplifyLibCalls] improve function signature check for memcmp This would assert/crash as shown in: https://llvm.org/PR50850 The matching for bcmp/bcopy should probably also be updated, but that's another patch.	2021-08-15 16:11:26 -04:00
Arthur Eubanks	92ce6db9ee	[NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr() This is more consistent with similar methods.	2021-08-13 11:09:18 -07:00
Arthur Eubanks	a0c42ca56c	[NFC] Remove AttributeList::hasParamAttribute() It's the same as AttributeList::hasParamAttr().	2021-08-13 10:58:21 -07:00
Roman Lebedev	0dc6b597db	Revert "[SCEV] Remove premature assert. PR46786" Since then, the SCEV pointer handling as been improved, so the assertion should now hold. This reverts commit `b96114c1e1`, relanding the assertion from commit `141e845da5`.	2021-08-13 17:50:22 +03:00
Usman Nadeem	a7c4e9b1f7	[InstSimplify] Eliminate vector reverse of a splat vector experimental.vector.reverse(splat(X)) -> splat(X) Differential Revision: https://reviews.llvm.org/D107793 Change-Id: Id29ba88fd669ff8686712e96b1bdc46dda5b853c	2021-08-11 11:27:58 -07:00
Mircea Trofin	510402c2c8	[NFC][MLGO] 'Use' variable used for asserts	2021-08-10 19:55:17 -07:00
Christopher Di Bella	c874dd5362	[llvm][clang][NFC] updates inline licence info Some files still contained the old University of Illinois Open Source Licence header. This patch replaces that with the Apache 2 with LLVM Exception licence. Differential Revision: https://reviews.llvm.org/D107528	2021-08-11 02:48:53 +00:00
Fangrui Song	76093b1739	[InlineAdvisor] Add single quotes around caller/callee names Clang diagnostics refer to identifier names in quotes. This patch makes inline remarks conform to the convention. New behavior: ``` % clang -O2 -Rpass=inline -Rpass-missed=inline -S a.c a.c:4:25: remark: 'foo' inlined into 'bar' with (cost=-30, threshold=337) at callsite bar:0:25; [-Rpass=inline] int bar(int a) { return foo(a); } ^ ``` Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D107791	2021-08-10 11:51:31 -07:00
Sanjay Patel	e260e10c4a	[InstSimplify] fold min/max with limit constant This is already done within InstCombine: https://alive2.llvm.org/ce/z/MiGE22 ...but leaving it out of analysis makes it harder to avoid infinite loops there.	2021-08-10 10:57:25 -04:00
Sanjay Patel	188832f419	Revert "[InstSimplify] fold min/max with limit constant; NFC" This reverts commit `f43859b437`. This is not NFC, so I'll try again without that mistake in the commit message.	2021-08-10 10:50:09 -04:00
Sanjay Patel	f43859b437	[InstSimplify] fold min/max with limit constant; NFC This is already done within InstCombine: https://alive2.llvm.org/ce/z/MiGE22 ...but leaving it out of analysis makes it harder to avoid infinite loops there.	2021-08-10 10:43:07 -04:00
Dorit Nuzman	67278b8a90	[LV] Support Interleaved Store Group With Gaps Teach LV to use masked-store to support interleave-store-group with gaps (instead of scatters/scalarization). The symmetric case of using masked-load to support interleaved-load-group with gaps was introduced a while ago, by https://reviews.llvm.org/D53668; This patch completes the store-scenario leftover from D53668, and solves PR50566. Reviewed by: Ayal Zaks Differential Revision: https://reviews.llvm.org/D104750	2021-08-08 10:32:02 +03:00
Zheng Chen	30b0c455b1	[LoopCacheAnalysis]: handle mismatch type for Numerator and CacheLineSize fix an assertion due to mismatch type for Numerator and CacheLineSize in loop cache analysis pass. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D107618	2021-08-06 16:51:09 +00:00
Mircea Trofin	ae1a2a09e4	[NFC][MLGO] Make logging more robust 1) add some self-diagnosis (when asserts are enabled) to check that all features have the same nr of entries 2) avoid storing pointers to mutable fields because the proto API contract doesn't actually guarantee those stay fixed even if no further mutation of the object occurs. Differential Revision: https://reviews.llvm.org/D107594	2021-08-06 04:44:52 -07:00
Serge Pavlov	4c4093e6e3	Introduce intrinsic llvm.isnan This is recommit of the patch `16ff91ebcc`, reverted in `0c28a7c990` because it had an error in call of getFastMathFlags (base type should be FPMathOperator but not Instruction). The original commit message is duplicated below: Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-06 14:32:27 +07:00
Ryan Prichard	623cf3dfdf	Mark getc_unlocked as unavailable by default Before D45736, getc_unlocked was available by default, but turned off for non-Cygwin/non-MinGW Windows. D45736 then added 9 more unlocked functions, which were unavailable by default, but it also: * left getc_unlocked enabled by default, * removed the disabling line for Windows, and * added code to enable getc_unlocked for GNU, Android, and OSX. For consistency, make getc_unlocked unavailable by default. Maybe this was the intent of D45736 anyway. Reviewed By: MaskRay, efriedma Differential Revision: https://reviews.llvm.org/D107527	2021-08-05 16:35:02 -07:00
Bardia Mahjour	0e08891ec1	[DA] control compile-time spent by MIV tests Function exploreDirections() in DependenceAnalysis implements a recursive algorithm for refining direction vectors. This algorithm has worst-case complexity of O(3^(n+1)) where n is the number of common loop levels. In this patch I'm adding a threshold to control the amount of time we spend in doing MIV tests (which most of the time end up resulting in over pessimistic direction vectors anyway). Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D107159	2021-08-05 09:50:11 -04:00
Nathan Lanza	5848166369	Disable LibFuncs for stpcpy and stpncpy for Android < 21 These functions don't exist in android API levels < 21. A change in llvm-12 (rG6dbf0cfcf789) caused Oz builds to emit this symbol assuming it's available and thus is causing link errors. Simply disable it here. Differential Revision: https://reviews.llvm.org/D107509	2021-08-04 22:48:41 -04:00
Serge Pavlov	0c28a7c990	Revert "Introduce intrinsic llvm.isnan" This reverts commit `16ff91ebcc`. Several errors were reported mainly test-suite execution time. Reverted for investigation.	2021-08-04 17:18:15 +07:00
Serge Pavlov	16ff91ebcc	Introduce intrinsic llvm.isnan Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-04 15:27:49 +07:00
Jacob Hegna	b16c37fa2c	[MLGO] Update the current model url for the Oz inliner model.	2021-08-04 03:09:00 +00:00
Roman Lebedev	6f6e9a867f	[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop I'm not sure this is the best way to approach this, but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D107271	2021-08-03 00:57:26 +03:00
Chang-Sun Lin, Jr	b58eda39eb	[ValueTracking] Fix computeConstantRange to use "may" instead of "always" semantics for llvm.assume ValueTracking should allow for value ranges that may satisfy llvm.assume, instead of restricting the ranges only to values that will always satisfy the condition. Differential Revision: https://reviews.llvm.org/D107298	2021-08-02 22:20:17 +02:00
Sanjay Patel	7f55557765	[Analysis] improve function signature checking for snprintf The check for size_t parameter 1 was already here for snprintf_chk, but it wasn't applied to regular snprintf. This could lead to mismatching and eventually crashing as shown in: https://llvm.org/PR50885	2021-07-31 15:17:20 -04:00
Kerry McLaughlin	9d35594993	Reland "[LV] Use lookThroughAnd with logical reductions" If a reduction Phi has a single user which `AND`s the Phi with a type mask, `lookThroughAnd` will return the user of the Phi and the narrower type represented by the mask. Currently this is only used for arithmetic reductions, whereas loops containing logical reductions will create a reduction intrinsic using the widened type, for example: for.body: %phi = phi i32 [ %and, %for.body ], [ 255, %entry ] %mask = and i32 %phi, 255 %gep = getelementptr inbounds i8, i8* %ptr, i32 %iv %load = load i8, i8* %gep %ext = zext i8 %load to i32 %and = and i32 %mask, %ext ... ^ this will generate an and reduction intrinsic such as the following: call i32 @llvm.vector.reduce.and.v8i32(<8 x i32>...) The same example for an add instruction would create an intrinsic of type i8: call i8 @llvm.vector.reduce.add.v8i8(<8 x i8>...) This patch changes AddReductionVar to call lookThroughAnd for other integer reductions, allowing loops similar to the example above with reductions such as and, or & xor to vectorize. Reviewed By: david-arm, dmgreen Differential Revision: https://reviews.llvm.org/D105632	2021-07-30 18:04:09 +01:00
Sander de Smalen	84a4caeb84	[InstSimplify] Don't assume parent function when simplifying llvm.vscale. D106850 introduced a simplification for llvm.vscale by looking at the surrounding function's vscale_range attributes. The call that's being simplified may not yet have been inserted into the IR. This happens for example during function cloning. This patch fixes the issue by checking if the instruction is in a parent basic block.	2021-07-29 20:08:08 +01:00
Jun Ma	e2fe26e77b	[NFC][InstSimplify] Use more intuitive variable names.	2021-07-29 13:55:47 +08:00
Wenlei He	1a8087adaf	[ThinLTO] Disallow importing for functions with indir branch to block address We don't allowing inlining for functions with blockaddress with uses other than strictly callbr. This is because if the blockaddress escapes the function via a global variable, inlining may lead to an invalid cross-function reference. We check against such cases during inlining, however the check can fail for ThinLTO post-link because CFG simplification can incorrectly removes blocks based on wrong block reachability. When we import a function with blockaddress taken in a global variable but without importing that variable, we won't go through value mapping to reflect the real address-taken-ness of the cloned blocks. For the imported clone, this leads to blocks reachable from indirect branch through global variable being incorrectly treated as unreachable and removed by SimplifyCFG. Since inlining for such cases shouldn't be allowed in the first place, I'm marking them as ineligible for importing during pre-link to save the problem of missing address-taken-ness of imported clone as well as bad DCE and inlining. Differential Revision: https://reviews.llvm.org/D106930	2021-07-28 18:02:48 -07:00
Jun Ma	ca0fe3447f	[InstSimplify] Simplify llvm.vscale when vscale_range attribute exists Reduce llvm.vscale to constant based on vscale_range attribute. Differential Revision: https://reviews.llvm.org/D106850	2021-07-28 21:41:52 +08:00
Mircea Trofin	935dea2cb2	[MLGO] fix silly LLVM_DEBUG misuse	2021-07-27 15:10:28 -07:00
Mircea Trofin	eb76ca573d	[NFC][MLGO] Debug messages for what inline advisor is selected We already have an indication (error) if the desired inline advisor cannot be enabled, but we don't have a positive indication. Added LLVM_DEBUG messages for the latter.	2021-07-27 15:05:39 -07:00
Anna Thomas	68ffed12b7	[IVDescriptors] Fix bug in checkOrderedReduction The Exit instruction passed in for checking if it's an ordered reduction need not be an FPAdd operation. We need to bail out at that point instead of assuming it is an FPAdd (and hence has two operands). See added testcase. It crashes without the patch because the Exit instruction is a phi with exactly one operand. This latent bug was exposed by `95346ba` which added support for multi-exit loops for vectorization. Reviewed-By: kmclaughlin Differential Revision: https://reviews.llvm.org/D106843	2021-07-27 09:31:44 -04:00
Johannes Doerfert	75636868e2	[InstSimplify] Expose generic interface for replaced operand simplification Users, especially the Attributor, might replace multiple operands at once. The actual implementation of simplifyWithOpReplaced is able to handle that just fine, the interface was simply not allowing to replace more than one operand at a time. This is exposing a more generic interface without intended changes for existing code. Differential Revision: https://reviews.llvm.org/D106189	2021-07-27 00:56:12 -05:00
Philip Reames	f82f39b9cf	[SCEV] Add a comment about invariant in howManyLessThans	2021-07-26 16:39:26 -07:00
Eli Friedman	5c486ce04d	[LLVM IR] Allow volatile stores to trap. Proposed alternative to D105338. This is ugly, but short-term I think it's the best way forward: first, let's formalize the hacks into a coherent model. Then we can consider extensions of that model (we could have different flavors of volatile with different rules). Differential Revision: https://reviews.llvm.org/D106309	2021-07-26 10:51:00 -07:00
Florian Hahn	6d753b0751	[LAA] Remove RuntimeCheckingPtrGroup::RtCheck member (NFC). This patch removes RtCheck from RuntimeCheckingPtrGroup to make it possible to construct RuntimeCheckingPtrGroup objects without a RuntimePointerChecking object. This should make it easier to re-use the code to generate runtime checks, e.g. in D102834. RtCheck was only used to access the pointer info for a given index. Instead, the start and end expressions can be passed directly. For code-gen, we also need to know the address space to use. This can also be explicitly passed at construction. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D105481	2021-07-26 17:38:10 +01:00
Nikita Popov	33146857e9	[IR] Consider non-willreturn as side effect (PR50511) This adjusts mayHaveSideEffect() to return true for !willReturn() instructions. Just like other side-effects, non-willreturn calls (aka "divergence") cannot be removed and cannot be reordered relative to other side effects. This fixes a number of bugs where non-willreturn calls are either incorrectly dropped or moved. In particular, it also fixes the last open problem in https://bugs.llvm.org/show_bug.cgi?id=50511. I performed a cursory review of all current mayHaveSideEffect() uses, which convinced me that these are indeed the desired default semantics. Places that do not want to consider non-willreturn as a sideeffect generally do not want mayHaveSideEffect() semantics at all. I identified two such cases, which are addressed by D106591 and D106742. Finally, there is a use in SCEV for which we don't really have an appropriate API right now -- what it wants is basically "would this be considered forward progress". I've just spelled out the previous semantics there. Differential Revision: https://reviews.llvm.org/D106749	2021-07-26 16:35:14 +02:00
Paul Walker	8a8d01d58c	[NFC] Change VFShape so it contains an ElementCount rather than seperate VF and IsScalable properties. Differential Revision: https://reviews.llvm.org/D106750	2021-07-26 12:25:46 +01:00
Philipp Krones	46c0366877	[Inliner] Make the CallPenalty configurable Tests with multiple benchmarks, like Embench [1], showed that the CallPenalty magic number has the most influence on inlining decisions when optimizing for size. On the other hand, there was no good default value for this parameter. Some benchmarks profited strongly from a reduced call penalty. On example is the picojpeg benchmark compiled for RISC-V, which got 6% smaller with a CallPenalty of 10 instead of 12. Other benchmarks increased in size, like matmult. This commit makes the compromise of turning the magic number constant of CallPenalty into a configurable value. This introduces the flag `--inline-call-penalty`. With that flag users can fine tune the inliner to their needs. The CallPenalty constant was also used for loops. This commit replaces the CallPenalty constant with a new LoopPenalty constant that is now used instead. This is a slimmed down version of https://reviews.llvm.org/D30899 [1]: https://github.com/embench/embench-iot Differential Revision: https://reviews.llvm.org/D105976	2021-07-26 12:07:49 +01:00
David Sherwood	0aff1798b5	[Analysis] Add simple cost model for strict (in-order) reductions I have added a new FastMathFlags parameter to getArithmeticReductionCost to indicate what type of reduction we are performing: 1. Tree-wise. This is the typical fast-math reduction that involves continually splitting a vector up into halves and adding each half together until we get a scalar result. This is the default behaviour for integers, whereas for floating point we only do this if reassociation is allowed. 2. Ordered. This now allows us to estimate the cost of performing a strict vector reduction by treating it as a series of scalar operations in lane order. This is the case when FP reassociation is not permitted. For scalable vectors this is more difficult because at compile time we do not know how many lanes there are, and so we use the worst case maximum vscale value. I have also fixed getTypeBasedIntrinsicInstrCost to pass in the FastMathFlags, which meant fixing up some X86 tests where we always assumed the vector.reduce.fadd/mul intrinsics were 'fast'. New tests have been added here: Analysis/CostModel/AArch64/reduce-fadd.ll Analysis/CostModel/AArch64/sve-intrinsics.ll Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D105432	2021-07-26 10:26:06 +01:00
Liqiang Tao	4bdfea2c51	[llvm][Inline] Add interface to return cost-benefit stuff Return cost-benefit stuff which is computed by cost-benefit analysis. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D105349	2021-07-25 20:18:19 +08:00
Philip Reames	ec43def700	Style tweaks for SCEV's computeMaxBECountForLT [NFC]	2021-07-23 17:19:45 -07:00
Philip Reames	4a3dc7dc9a	[SCEV] Fix bug involving zero step and non-invariant RHS in trip count logic Eli pointed out the issue when reviewing D104140. The max trip count logic makes an assumption that the value of IV changes. When the step is zero, the nowrap fact becomes trivial, and thus there's nothing preventing the loop from being nearly infinite. (The "nearly" part is because mustprogress may disallow an infinite loop while still allowing 999999999 iterations before RHS happens to allow an exit.) This is very difficult to see in practice. You need a means to produce a loop varying RHS in a mustprogress loop which doesn't allow the loop to be infinite. In most cases, LICM or SCEV are smart enough to remove the loop varying expressions. Differential Revision: https://reviews.llvm.org/D106327	2021-07-23 15:19:23 -07:00
Mircea Trofin	55e12f7080	[NFC][MLGO] Just use the underlying protobuf object for logging Avoid buffering just to copy the buffered data, in 'development mode', when logging. Instead, just populate the underlying protobuf. Differential Revision: https://reviews.llvm.org/D106592	2021-07-23 10:56:48 -07:00
Serge Pavlov	1c64b5dc5e	[ConstantFolding] Fold constrained arithmetic intrinsics Constfold constrained variants of operations fadd, fsub, fmul, fdiv, frem, fma and fmuladd. The change also sets up some means to support for removal of unused constrained intrinsics. They are declared as accessing memory to model interaction with floating point environment, so they were not removed, as they have side effect. Now constrained intrinsics that have "fpexcept.ignore" as exception behavior are removed if they have no uses. As for intrinsics that have exception behavior other than "fpexcept.ignore", they can be removed if it is known that they do not raise floating point exceptions. It happens when doing constant folding, attributes of such intrinsic are changed so that the intrinsic is not claimed as accessing memory. Differential Revision: https://reviews.llvm.org/D102673	2021-07-23 14:39:51 +07:00
Mircea Trofin	df0066a1c9	[NFC][MLGO] Fix vector sizing The bots only build release mode, and the use of `reserve` instead of `resize`, while not causing invalid memory accesses, is incorrect.	2021-07-22 13:06:00 -07:00
Joseph Huber	754eb1c210	[OpenMP] Change `__kmpc_free_shared` to include the paired allocation size This patch changes `__kmpc_free_shared` to take an additional argument corresponding to the associated allocation's size. This makes it easier to implement the allocator in the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106496	2021-07-21 20:56:21 -04:00
Jacob Hegna	cfc4def85d	[NFC] Code cleanups in InlineCost.cpp. - annotate const functions with "const" - replace C-style casts with static_cast Differential Revision: https://reviews.llvm.org/D105362	2021-07-22 00:03:36 +00:00
Kerry McLaughlin	be753b207f	Revert "[LV] Use lookThroughAnd with logical reductions" Reverting patch due to buildbot failures. This reverts commit `e22a599672`.	2021-07-21 15:16:00 +01:00
Rosie Sumpter	44c9adb414	[LoopFlatten][LoopInfo] Use Loop to identify latch compare instruction Make getLatchCmpInst non-static and use it in LoopFlatten as a more robust way of identifying the compare. Differential Revision: https://reviews.llvm.org/D106256	2021-07-21 10:14:18 +01:00
Kerry McLaughlin	e22a599672	[LV] Use lookThroughAnd with logical reductions If a reduction Phi has a single user which `AND`s the Phi with a type mask, `lookThroughAnd` will return the user of the Phi and the narrower type represented by the mask. Currently this is only used for arithmetic reductions, whereas loops containing logical reductions will create a reduction intrinsic using the widened type, for example: for.body: %phi = phi i32 [ %and, %for.body ], [ 255, %entry ] %mask = and i32 %phi, 255 %gep = getelementptr inbounds i8, i8* %ptr, i32 %iv %load = load i8, i8* %gep %ext = zext i8 %load to i32 %and = and i32 %mask, %ext ... ^ this will generate an and reduction intrinsic such as the following: call i32 @llvm.vector.reduce.and.v8i32(<8 x i32>...) The same example for an add instruction would create an intrinsic of type i8: call i8 @llvm.vector.reduce.add.v8i8(<8 x i8>...) This patch changes AddReductionVar to call lookThroughAnd for other integer reductions, allowing loops similar to the example above with reductions such as and, or & xor to vectorize. Reviewed By: david-arm, dmgreen Differential Revision: https://reviews.llvm.org/D105632	2021-07-21 09:56:00 +01:00
Sanjay Patel	13302c06cd	[ConstantFolding] avoid crashing on a fake math library call https://llvm.org/PR50960	2021-07-20 18:25:21 -04:00
Jacob Hegna	1f3e90e128	Fix Threshold overwrite bug in the Oz inlining model features. Differential Revision: https://reviews.llvm.org/D106336	2021-07-20 18:05:06 +00:00
Eli Friedman	de3ea51be4	[ScalarEvolution] Refine computeMaxBECountForLT to be accurate in more cases. Allow arbitrary strides, and make sure we return the correct result when the backedge-taken count is zero. Differential Revision: https://reviews.llvm.org/D106197	2021-07-19 15:43:30 -07:00
Philip Reames	4402d0d4fb	[SCEV] Add a clarifying comment in howManyLessThans Wrap semantics are subtle when combined with multiple exits. This has caused several rounds of confusion during recent reviews, so try to document the subtly distinction between when wrap flags provide <u and <=u facts.	2021-07-19 15:13:48 -07:00
Arthur Eubanks	6cbb35dd3b	[NewPM] Bail out of devirtualization wrapper if the current SCC is invalidated The specific case that triggered this was when inlining a recursive internal function into itself caused the recursion to go away, allowing the inliner to mark the function as dead. The inliner marks the SCC as invalidated but does not provide a new SCC to continue with. This matches the implementations of ModuleToPostOrderCGSCCPassAdaptor and CGSCCPassManager. Fixes PR50363. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D106306	2021-07-19 15:07:30 -07:00
Mircea Trofin	55e2d2060a	[MLGO] Use binary protobufs for improved training performance. It turns out that during training, the time required to parse the textual protobuf of a training log is about the same as the time it takes to compile the module generating that log. Using binary protobufs instead elides that cost almost completely. Differential Revision: https://reviews.llvm.org/D106157	2021-07-19 13:59:28 -07:00
Mindong Chen	e908e063d1	[LoopUtils] Fix incorrect RT check bounds of loop-invariant mem accesses This fixes the lower and upper bound calculation of a RuntimeCheckingPtrGroup when it has more than one loop invariant pointers. Resolves PR50686. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D104148	2021-07-19 19:38:24 +08:00
Nikita Popov	2b17c24a03	[SCEV] Fix unused variable warning (NFC)	2021-07-18 23:12:22 +02:00

... 10 11 12 13 14 ...

11816 Commits