llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	b7315ffc3c	[LAA,LV] Add initial support for pointer-diff memory checks. This patch adds initial support for a pointer diff based runtime check scheme for vectorization. This scheme requires fewer computations and checks than the existing full overlap checking, if it is applicable. The main idea is to only check if source and sink of a dependency are far enough apart so the accesses won't overlap in the vector loop. To do so, it is sufficient to compute the difference and compare it to the `VF * UF * AccessSize`. It is sufficient to check `(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards dependence in the vector loop with the given VF and UF. If Src >=u Sink, there is not dependence preventing vectorization, hence the overflow should not matter and using the ULT should be sufficient. Note that the initial version is restricted in multiple ways: 1. Pointers must only either be read or written, by a single instruction (this allows re-constructing source/sink for dependences with the available information) 2. Source and sink pointers must be add-recs, with matching steps 3. The step must be a constant. 3. abs(step) == AccessSize. Most of those restrictions can be relaxed in the future. See https://github.com/llvm/llvm-project/issues/53590. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D119078	2022-05-16 15:27:22 +01:00
NAKAMURA Takumi	da7d8de1e4	ScalarEvolution.cpp: Reformat.	2022-05-15 20:51:27 +09:00
Sanjay Patel	ee6754c277	[ValueTracking] recognize sub X, (X % Y) as not overflowing I fixed some poison-safety violations on related patterns in InstCombine and noticed that we missed adding nsw/nuw on them, so this adds clauses to the underlying analysis for that. We need the undef input restriction to make this safe according to Alive2: https://alive2.llvm.org/ce/z/48g9K8 Differential Revision: https://reviews.llvm.org/D125500	2022-05-13 09:59:41 -04:00
Nikita Popov	ddfee07519	[InstSimplify] Fold and/or using implied conditions This adds two conjugated folds: * A \| B -> B if A implies B (https://alive2.llvm.org/ce/z/R6GU4j) * A & B -> A if A implies B (https://alive2.llvm.org/ce/z/EGMqyy) If A and B are icmps themselves, we will usually fold this through other logic already (though the tests show a couple additional cases we previously missed). However, isImpliedCond() also supports A being of the form X & Y, which allows us to handle cases like (X & Y) \| B where X implies B. This addresses the regression from D125398. Something that notably doesn't work yet is the (X \| Y) & B case. This is due to an asymmetry in the isImpliedCondition() implementation that will have to be addressed separately. Differential Revision: https://reviews.llvm.org/D125530	2022-05-13 15:09:14 +02:00
Florian Hahn	5890b30105	[LAA] Initial support for runtime checks with pointer selects. Scaffolding support for generating runtime checks for multiple SCEV expressions per pointer. The initial version just adds support for looking through a single pointer select. The more sophisticated logic for analyzing forks is in D108699 Reviewed By: huntergr Differential Revision: https://reviews.llvm.org/D114487	2022-05-12 19:33:48 +01:00
Arthur Eubanks	7e0802aeb5	[BasicAA] Fix order in which we pass MemoryLocations to alias() D98718 caused the order of Values/MemoryLocations we pass to alias() to be significant due to storing the offset in the PartialAlias case. But some callers weren't audited and were still passing swapped arguments, causing the returned PartialAlias offset to be negative in some cases. For example, the newly added unittests would return -1 instead of 1. Fixes #55343, a miscompile. Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D125328	2022-05-10 12:05:38 -07:00
Nikita Popov	c077510bb1	[InstSimplify] Handle unknown function context in pointer icmp fold (PR54615) This issue reproduces in the context of LoopDeletion, because the bitcast does not get simplified away there. For a plain -inst-simplify run the bitcast would get folded away first. Fixes https://github.com/llvm/llvm-project/issues/54615.	2022-05-10 11:48:43 +02:00
Andrew Litteken	96345f773c	[IRSim] Remove early check from similarity matching such that commutative instructions are checked correctly when using the same value. When the first commutative instruction in a region using the same value in both positions was compared to a corresponding instruction with two different values, there was an early check that determined that since the values were new, it was true that these values acted in the same way structurally. If this was not contradicted later in the program, the regions were marked as similar. This removes that check, so that it is clear that the same value cannot be mapped to two different values. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D124775	2022-05-09 22:59:09 -05:00
Mircea Trofin	c35ad9ee4f	[mlgo] Support exposing more features than those supported by models This allows the compiler to support more features than those supported by a model. The only requirement (development mode only) is that the new features must be appended at the end of the list of features requested from the model. The support is transparent to compiler code: for unsupported features, we provide a valid buffer to copy their values; it's just that this buffer is disconnected from the model, so insofar as the model is concerned (AOT or development mode), these features don't exist. The buffers are allocated at setup - meaning, at steady state, there is no extra allocation (maintaining the current invariant). These buffers has 2 roles: one, keep the compiler code simple. Second, allow logging their values in development mode. The latter allows retraining a model supporting the larger feature set starting from traces produced with the old model. For release mode (AOT-ed models), this decouples compiler evolution from model evolution, which we want in scenarios where the toolchain is frequently rebuilt and redeployed: we can first deploy the new features, and continue working with the older model, until a new model is made available, which can then be picked up the next time the compiler is built. Differential Revision: https://reviews.llvm.org/D124565	2022-05-09 18:01:21 -07:00
Michael Kruse	6b3b87376b	[polly] migrate -polly-show to the new pass manager Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D123678	2022-05-09 14:04:29 -05:00
Michael Kruse	a6b399ad79	[PassManager] Implement DOTGraphTraitsViewer under NPM Rename the legacy `DOTGraphTraits{Module,}{Viewer,Printer}` to the corresponding `DOTGraphTraits...WrapperPass`, and implement a new `DOTGraphTraitsViewer` with new pass manager. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D123677	2022-05-09 14:04:28 -05:00
Alexey Bataev	9dc4ced204	[SLP]Try partial store vectorization if supported by target. We can try to vectorize number of stores less than MinVecRegSize / scalar_value_size, if it is allowed by target. Gives an extra opportunity for the vectorization. Fixes PR54985. Differential Revision: https://reviews.llvm.org/D124284	2022-05-09 09:48:15 -07:00
Nikita Popov	68e1ba8188	[SCEV] Fold umin_seq using known predicate Fold %x umin_seq %y to %x if %x ule %y. This also subsumes the special handling for constant operands, as if %y is constant this folds to umin via implied poison reasoning, and if %x is constant then either %x is not zero and it folds to umin, or it is known zero, in which case it is ule anything.	2022-05-09 16:35:08 +02:00
Nikita Popov	18eaff1510	[ScalarEvolution] Fold %x umin_seq %y if %x cannot be zero Fold %x umin_seq %y to %x umin %y if %x cannot be zero. They only differ in semantics for %x==0. More generally %x _seq %y folds to %x %y if %x cannot be the saturation fold (though currently we only have umin_seq).	2022-05-09 15:11:05 +02:00
Serge Pavlov	eb28da89a6	[InstCombine] Remove side effect of replaced constrained intrinsics If a constrained intrinsic call was replaced by some value, it was not removed in some cases. The dangling instruction resulted in useless instructions executed in runtime. It happened because constrained intrinsics usually have side effect, it is used to model the interaction with floating-point environment. In some cases side effect is actually absent or can be ignored. This change adds specific treatment of constrained intrinsics so that their side effect can be removed if it actually absents. Differential Revision: https://reviews.llvm.org/D118426	2022-05-07 19:04:11 +07:00
Nikita Popov	47c559d6c1	[SCEV] Fold umin_seq to umin using implied poison reasoning Similar to how we convert logical and/or to bitwise and/or, we should also convert umin_seq to umin based on implied poison reasoning. In %x umin_seq %y, if %y being poison implies %x being poison, then we don't need the sequential evaluation: Having %y contribute towards the result will never make the result more poisonous. An important corollary of this is that if %y is never poison, we also don't need the sequential evaluation. This avoids some of the regressions in D124910. Differential Revision: https://reviews.llvm.org/D124921	2022-05-05 09:43:49 +02:00
Yangguang Li	3a8266902b	[SCEV] Removed an unnecessary assertion The assertion is to check we always get backedge taken count (`BECount`) of zero when the exit condition is in select form (`isa<BinaryOperation>(ExitCond)`) and the exit limit for the first operand is zero `EL0.ExactNotTaken->isZero()`). However the assertion is checking that the exit condition is NOT in select form. Removing the the whole assertion since we now handle select form in ScalarEvolution::getSequentialMinMaxExpr. Reviewed By: reames, nikic Differential Revision: https://reviews.llvm.org/D122835	2022-05-03 17:26:27 -04:00
Augie Fackler	1deea714b3	BuildLibCalls: simplify switch statement slightly Per feedback on D123086 after submit. Also added a test for vec_malloc et al attribute inference to show it's doing the right thing. The new tests exposed a defect, corrected by adding vec_free to the list of free functions in MemoryBuiltins.cpp, which had been overlooked all the way back in D94710, over a year ago. Differential Revision: https://reviews.llvm.org/D124859	2022-05-03 13:17:33 -04:00
Nikita Popov	47255834e7	[ValueTracking] A and (B & ~A) have no common bits set This extends haveNoCommonBitsSet() to two additional cases, allowing the following folds: * `A + (B & ~A)` --> `A \| (B & ~A)` (https://alive2.llvm.org/ce/z/crxxhN) * `A + ((A & B) ^ B)` --> `A \| ((A & B) ^ B)` (https://alive2.llvm.org/ce/z/A_wsH_) These should further fold to just `A \| B`, though this currently only works in the first case. The reason why the second fold is necessary is that we consider this to be the canonical form if B is a constant. (I did check whether we can change that, but it looks like a number of folds depend on the current canonicalization, so I ended up adding both patterns here.) Differential Revision: https://reviews.llvm.org/D124763	2022-05-03 11:33:27 +02:00
Igor Kirillov	4e5e042d9a	[LoopVectorize] Support reductions that store intermediary result Adds ability to vectorize loops containing a store to a loop-invariant address as part of a reduction that isn't converted to SSA form due to lack of aliasing info. Runtime checks are generated to ensure the store does not alias any other accesses in the loop. Ordered fadd reductions are not yet supported. Differential Revision: https://reviews.llvm.org/D110235	2022-05-03 10:12:30 +01:00
David Green	6f81903e89	[LV][SLP] Mark fptosi_sat as vectorizable This adds fptosi_sat and fptoui_sat to the list of trivially vectorizable functions, mainly so that the loop vectorizer can vectorize the instruction. Marking them as trivially vectorizable also allows them to be SLP vectorized, and Scalarized. The signature of a fptosi_sat requires two type overrides (@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first operand of the intrinsic as a overloaded (but not scalar) operand. Differential Revision: https://reviews.llvm.org/D124358	2022-05-03 09:32:34 +01:00
Bardia Mahjour	363b3a645a	fix warning caused by `ef4ecc3cef`	2022-05-02 17:06:27 -04:00
Bardia Mahjour	ef4ecc3cef	[LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost Reviewed By: congzhe, etiotto Differential Revision: https://reviews.llvm.org/D123400	2022-05-02 16:49:10 -04:00
Nikita Popov	597946a4dd	[ConstantFold] Don't convert getelementptr to ptrtoint+inttoptr ConstantFolding currently converts "getelementptr i8, Ptr, (sub 0, V)" to "inttoptr (sub (ptrtoint Ptr), V)". This transform is, taken by itself, correct, but does came with two issues: 1. It unnecessarily broadens provenance by introducing an inttoptr. We generally prefer not to introduce inttoptr during optimization. 2. For the case where V == ptrtoint Ptr, this folds to inttoptr 0, which further folds to null. In that case provenance becomes incorrect. This has been observed as a real-world miscompile with rustc. We should probably address that incorrect inttoptr 0 fold at some point, but in either case we should also drop this inttoptr-introducing fold. Instead, replace it with a fold rooted at ptrtoint(getelementptr), which seems to cover the original motivation for this fold (test2 in the changed file). Differential Revision: https://reviews.llvm.org/D124677	2022-05-02 10:24:46 +02:00
Congzhe Cao	c428a3d2a0	[LoopCacheAnalysis] Enable delinearization of fixed sized arrays Currently loop cache cost (LCC) cannot analyze fix-sized arrays since it cannot delinearize them. This patch adds the capability to delinearize fix-sized arrays to LCC. Most of the code is ported from DependenceAnalysis.cpp and some refactoring will be done in a next patch. Reviewed By: #loopoptwg, Meinersbur Differential Revision: https://reviews.llvm.org/D122857	2022-04-29 16:01:27 -04:00
Roman Lebedev	981ed72a17	[NFC][SCEV] Refactor `createNodeForSelectViaUMinSeq()` out of `createNodeForSelectOrPHIViaUMinSeq()`	2022-04-29 02:37:06 +03:00
Mircea Trofin	49942d595f	[NFC] remove const from FunctionPropertiesAnalysis::run, keep on Result The goal in `75881d8b02` was just modifying what `Result` is, didn't need to also modify ::run.	2022-04-28 15:10:21 -07:00
Mircea Trofin	75881d8b02	[NFC] const-ed the return type of FunctionPropertiesAnalysis The result is a data bag, this makes sure it's signaled to a user that the data can't be mutated when, for example, doing something like: auto &R = FAM.getResult<FunctionPropertiesAnalysis>(F) ... R.Uses++	2022-04-28 12:42:16 -07:00
Alexey Bataev	75e1cf4a6a	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-28 10:04:41 -07:00
Alexey Bataev	9861ca0c23	Revert "[COST]Improve cost model for shuffles in SLP." This reverts commit `29a470e380` to fix a crash reported in https://reviews.llvm.org/D100486#3479989.	2022-04-28 08:11:56 -07:00
Chris Jackson	c792884589	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] Reland `3f2b76ec90` with the test corrected to require x86-registered-target. Differential Revision: https://reviews.llvm.org/D120169	2022-04-28 14:21:56 +01:00
Chris Jackson	cd5f9efc4d	Revert "[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]" This reverts commit `3f2b76ec90`.	2022-04-28 14:07:31 +01:00
Chris Jackson	3f2b76ec90	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] Reland commit `74273d575f` following a fix for a memory leak. The DVIRecoveryRecord vectors now use unique_ptr. Differential Revision: https://reviews.llvm.org/D120169	2022-04-28 13:55:49 +01:00
Kirill Stoimenov	761366e6ae	Revert "[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]" This reverts commit `74273d575f`. Buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/22795 Failing with memory leak.	2022-04-27 23:11:48 +00:00
Alexey Bataev	29a470e380	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-27 10:56:26 -07:00
Chris Jackson	74273d575f	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] This relands commit `8f550368b1`. The test is amended with REQUIRES: x86-registered-target, in line with the other debuginfo-scev-salvage tests. Differential Revision: https://reviews.llvm.org/D120169	2022-04-27 13:10:30 +01:00
Chris Jackson	855752e563	Revert [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics[2/2] This reverts commit `8f550368b1`.	2022-04-27 13:06:03 +01:00
Chris Jackson	8f550368b1	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] Second of two patches to extend SCEV-based salvaging to dbg.value intrinsics that have multiple location ops pre-LSR. This second patch adds the core implementation. Reviewers: @StephenTozer, @djtodoro Differential Revision: https://reviews.llvm.org/D120169	2022-04-27 12:47:35 +01:00
Vasileios Porpodas	fa8a9fea47	Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `6a9bbd9f20`. Code review: https://reviews.llvm.org/D124202	2022-04-26 14:02:40 -07:00
Vasileios Porpodas	6a9bbd9f20	Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `55ce296d6f`.	2022-04-26 11:25:26 -07:00
Vasileios Porpodas	55ce296d6f	[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost` Before this patch `Args` was used to pass a broadcat's arguments by SLP. This patch changes this. `Args` is now used for passing the operands of the shuffle. Differential Revision: https://reviews.llvm.org/D124202	2022-04-26 11:11:29 -07:00
Mircea Trofin	b1fa5ac3ba	[mlgo] Factor out TensorSpec This is a simple datatype with a few JSON utilities, and is independent of the underlying executor. The main motivation is to allow taking a dependency on it on the AOT side, and allow us build a correctly-sized buffer in the cases when the requested feature isn't supported by the model. This, in turn, allows us to grow the feature set supported by the compiler in a backward-compatible way; and also collect traces exposing the new features, but starting off the older model, and continue training from those new traces. Differential Revision: https://reviews.llvm.org/D124417	2022-04-25 18:35:46 -07:00
David Green	9727c77d58	[NFC] Rename Instrinsic to Intrinsic	2022-04-25 18:13:23 +01:00
Jun Ma	c0022b4bb1	[InlineCost] Set LastCallToStaticBonus in ML inlining models. This patch set LastCallToStaticBonus based on check, it has no noticeable size reduction on an internal workload and linux kernel with Os/Oz. Differential Revision: https://reviews.llvm.org/D124233	2022-04-24 09:34:19 +08:00
Florian Hahn	d43c083ab6	[SCEV] Use getConstant to construct SCEV for ConstantInt (NFC). We already know that we will construct a SCEVConstant. Directly use getConstant, rather than going through getSCEV.	2022-04-23 11:12:59 +01:00
Chang-Sun Lin Jr	7ee30a0e24	[NFC][LAA] Match-up type sizes for possible extensions, based on actual bit-size rather than rounded-up byte size. Differential Revision: https://reviews.llvm.org/D119200	2022-04-22 23:16:20 -07:00
Mircea Trofin	e4794ff5c6	[mlgo][nfc] Decouple TensorSpec from tensorflow. The motivation is twofold: 1) Allow plugging in a different training-time evaluator, e.g. TFLite-based, etc. 2) Allow using TensorSpec for AOT, too, to support evolution: we start by extracting a superset of the features currently supported by a model. For the tensors the model does not support, we just return a valid, but useless, buffer. This makes using a 'smaller' model (less supported tensors) transparent to the compiler. The key is to dimension the buffer appropriately, and we already have TensorSpec modeling that info. The only coupling was due to the reliance of a TF internal API for getting the element size, but for the types we are interested in, `sizeof` is sufficient. A subsequent change will yank out TensorSpec in its own module. Differential Revision: https://reviews.llvm.org/D124045	2022-04-21 15:37:01 -07:00
Vasileios Porpodas	889588ee97	[SLP] Refactoring isLegalBroadcastLoad() to use `ElementCount`. Replacing `unsigned` with `ElementCount` in the argument of `isLegalBroadcastLoad()`. This helps reduce the diff of a future SLP patch for AArch64.	2022-04-21 10:19:00 -07:00
Alexey Bataev	2cca53c815	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 09:37:16 -07:00
Alexey Bataev	5f7ac15912	Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer." This reverts commit `2f49163b33` to fix a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284	2022-04-20 06:35:55 -07:00
Alexey Bataev	2f49163b33	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 05:32:56 -07:00
Nikita Popov	f767a7d115	[DomTreeUpdater] Remove deprecated methods Remove the insertEdge(), insertEdgeRelaxed(), deleteEdge() and deleteEdgeRelaxed() methods, which have been deprecated three years ago.	2022-04-20 12:14:29 +02:00
Andrew Litteken	3de29ad209	[IRSim] Ignore debug instructions when creating canonical numbering When constructing canonical relationships between two regions, the first instruction of a basic block from the first region is used to find the corresponding basic block from the second region. However, debug instructions are not included in similarity matching, and therefore do not have a canonical numbering. This patch makes sure to ignore the debug instructions when finding the first instruction in a basic block. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D123903	2022-04-19 13:18:28 -05:00
Arthur Eubanks	a7e20a8a7a	[CallPrinter] Port CallPrinter passes to new pass manager Port the legacy CallGraphViewer and CallGraphDOTPrinter to work with the new pass manager. Addresses issue https://github.com/llvm/llvm-project/issues/54323 Adds back related tests that were removed in commits `d53a4e7b4a` and `9e9d9aba14` Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D122989	2022-04-18 10:02:18 -07:00
Andrew Litteken	a919d3d888	[IROutliner] Ensure that incoming blocks of PHINodes are included in the unique numbering gneration for phi nodes for each exit path Issue: https://github.com/llvm/llvm-project/issues/54431 PHINodes that need to be generated to accommodate a PHINode outside the region due to different output paths need to have their own numbering to determine the number of output schemes required to properly handle all the outlined regions. This numbering was previously only determined by the order and values of the incoming values, as well as the parent block of the PHINode. This adds the incoming blocks to the calculation of a hash value for these PHINodes as well, and the supporting infrastructure to give each block in a region a corresponding canonical numbering. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D122207	2022-04-14 12:13:17 -05:00
Kevin P. Neal	d43d9e1d5c	[FPEnv][InstSimplify] Fold fsub -0.0, -X ==> X Currently the fsub optimizations in InstSimplify don't know how to fold -0.0 - (-X) to X when the constrained intrinsics are used. This adds partial support. The rest of the support will come later with work on the IR matchers. This review is split out from D107285. Differential Revision: https://reviews.llvm.org/D123396	2022-04-14 11:48:54 -04:00
Congzhe Cao	557b131c88	[DA] Refactor with a better API Refactor from iteratively using BitCastInst::getOperand() to using stripPointerCasts() instead. This is an improvement since now we are able to analyze more cases, please refer to test cases added in this patch. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D123559	2022-04-13 14:51:48 -04:00
serge-sans-paille	262eba01b3	Revert "[ValueTracking] Make getStringLenth aware of strdup" This reverts commit `e810d55809`. The commit was not taken into account the fact that strduped string could be modified. Checking if such modification happens would make the function very costly, without a test case in mind it's not worth the effort.	2022-04-13 19:17:28 +02:00
Muhammad Omair Javaid	42ebfa8269	Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth" This reverts commit `64b6192e81`. This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage: https://lab.llvm.org/buildbot/#/builders/176/builds/1515 llvm-tblgen crashes after applying this patch.	2022-04-13 04:53:07 +05:00
Johannes Doerfert	9dc7da3f9c	[GlobalsModRef][FIX] Ensure we honor synchronizing effects of intrinsics This is a long standing problem that resurfaces once in a while [0]. There might actually be two problems because I'm not 100% sure if the issue underlying https://reviews.llvm.org/D115302 would be solved by this or not. Anyway. In 2008 we thought intrinsics do not read/write globals passed to them: `d4133ac315` This is not correct given that intrinsics can synchronize threads and cause effects to effectively become visible. NOTE: I did not yet modify any tests but only tried out the reproducer of https://github.com/llvm/llvm-project/issues/54851. Fixes: https://github.com/llvm/llvm-project/issues/54851 [0] https://discourse.llvm.org/t/bug-gvn-memdep-bug-in-the-presence-of-intrinsics/59402 Differential Revision: https://reviews.llvm.org/D123531	2022-04-12 16:42:50 -05:00
Nikita Popov	1d530b914e	[InstSimplify] Don't fold phi of poison and trapping const expr (PR49839) Folding this case would result in the constant expression being executed unconditionally, which may introduce a new trap. Fixes https://github.com/llvm/llvm-project/issues/49839.	2022-04-12 17:32:25 +02:00
serge-sans-paille	e810d55809	[ValueTracking] Make getStringLenth aware of strdup During strlen compile-time evaluation, make it possible to track size of strduped strings. Differential Revision: https://reviews.llvm.org/D123497	2022-04-12 14:47:29 +02:00
Nikita Popov	8d5c8d57c6	[InlineCost] Check that function types match Retain the behavior we get without opaque pointers: A call to a known function with different function type is considered an indirect call. This fixes the crash reported in https://reviews.llvm.org/D123300#3444772.	2022-04-12 11:05:33 +02:00
Arthur Eubanks	b22ffc7b98	[CaptureTracking] Ignore ephemeral values in EarliestEscapeInfo And thread DSE's ephemeral values to EarliestEscapeInfo. This allows more precise analysis in DSEState::isReadClobber() via BatchAA. Followup to D123162. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123342	2022-04-08 10:07:26 -07:00
Nikita Popov	930a68765d	[Loads] Check type size in bits during store to load forwarding Rather than checking the rounded type store size, check the type size in bits. We don't want to forward a store of i1 to a load of i8 for example, even though they have the same type store size. The padding bits have unspecified contents. This is a partial fix for the issue reported at https://reviews.llvm.org/D115924#inline-1179482, the problem also needs to be addressed more generally in the constant folding code.	2022-04-08 17:29:29 +02:00
Nikita Popov	4e85b427dd	[MemoryBuiltins] Remove unnecessary lambda capture (NFC)	2022-04-08 10:13:37 +02:00
serge-sans-paille	aa15ea47e2	[builtin_object_size] Basic support for posix_memalign It actually implements support for seeing through loads, using alias analysis to refine the result. This is rather limited, but I didn't want to rely on more than available analysis at that point (to be gentle with compilation time), and it does seem to catch common scenario, as showcased by the included tests. Differential Revision: https://reviews.llvm.org/D122431	2022-04-08 09:31:11 +02:00
Evgeniy Brevnov	da41214d65	Add support for atomic memory copy lowering Currently, the utility supports lowering of non atomic memory transfer routines only. This patch adds support for atomic version of memcopy. This may be useful for targets not supporting atomic memcopy. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D118443	2022-04-08 10:41:31 +07:00
Arthur Eubanks	17fdaccccf	[CaptureTracking] Ignore ephemeral values when determining pointer escapeness Ephemeral values cannot cause a pointer to escape. No change in compile time: https://llvm-compile-time-tracker.com/compare.php?from=4371710085ba1c376a094948b806ddd3b88319de&to=c5ddbcc4866f38026737762ee8d7b9b00395d4f4&stat=instructions This partially fixes some regressions caused by more calls to `__builtin_assume` (D122397). Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D123162	2022-04-07 10:11:14 -07:00
Augie Fackler	d6a7da5ae3	MemoryBuiltins: only claim an allocator family on builtin functions This lines up with other parts of the codebase that only use special knowledge about allocator functions if they're builtins. Differential Revision: https://reviews.llvm.org/D123053	2022-04-07 12:38:45 -04:00
Augie Fackler	5f09498a11	MemoryBuiltins: also check function definition for allocalign This got changed to use hasAttrSomewhere() during review, and I didn't notice until today when I was writing some tests for another part of this system that using hasAttrSomewhere only checked the callsite for allocalign, rather than both the callsite and the definition. This fixes that by introducing a helper method. Differential Revision: https://reviews.llvm.org/D121641	2022-04-07 12:38:44 -04:00
Alina Sbirlea	50d41f3e0d	[MSSA] Print memory phis when inspecting walker. This makes the MemorySSA and MemorySSA Walker printers consistent. Invokation `-print<memoryssa-walker>` should also have the MemoryPhis.	2022-04-06 16:06:14 -07:00
Augie Fackler	33b1f41914	MemoryBuiltins: getAllocAlignment is now useful for non-allocator funcs This has been true since `dba73135c8`, but didn't matter until now because clang wasn't emitting allocalign attributes. Differential Revision: https://reviews.llvm.org/D121640	2022-04-06 09:51:38 -04:00
Martin Storsjö	46776f7556	Fix warnings about variables that are set but only used in debug mode Add void casts to mark the variables used, next to the places where they are used in assert or `LLVM_DEBUG()` expressions. Differential Revision: https://reviews.llvm.org/D123117	2022-04-06 10:01:46 +03:00
Tom Honermann	c54ad13602	[Lint][Verifier] NFC: Rename 'Assert' macros to 'Check'. The LLVM IR verifier and analysis linter defines and uses several macros in code that performs validation of IR expectations. Previously, these macros were named with an 'Assert' prefix. These names were misleading since the macro definitions are not conditioned on build kind; they are defined identically in builds that have asserts enabled and those that do not. This was confusing since an LLVM developer might expect these macros to be conditionally enabled as 'assert' is. Further confusion was possible since the LLVM IR verifier is implicitly disabled (in Clang::ConstructJob()) for builds without asserts enabled, but only for Clang driver invocations; not for clang -cc1 invocations. This could make it appear that the macros were not active for builds without asserts enabled, e.g. when investigating behavior using the Clang driver, and thus lead to surprises when running tests that exercise the clang -cc1 interface. This change renames this set of macros as follows: Assert -> Check AssertDI -> CheckDI AssertTBAA -> CheckTBAA	2022-04-05 15:34:35 -04:00
Nikita Popov	516333d632	[ValueTracking] Handle non-pow2 align assume bundle (PR53693) https://reviews.llvm.org/D119414 clarified that this is legal IR, so handle it gracefully. (We could aggressively use the fact that the pointer must be a null pointer in that case, but I'm not bothering with that.) Fixes https://github.com/llvm/llvm-project/issues/53693.	2022-04-05 16:48:40 +02:00
Jingu Kang	64b6192e81	[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth Set the maximum VF of AArch64 with 128 / the size of smallest type in loop. Differential Revision: https://reviews.llvm.org/D118979	2022-04-05 13:16:52 +01:00
Hirochika Matsumoto	447a4485c5	[InstSimplify] Fold (ctpop(X) == N) \|\| (X != 0) into X != 0 where N > 0 (ctpop(X) == N) \|\| (X != 0) --> (X != 0) https://alive2.llvm.org/ce/z/udgUVV (ctpop(X) != N) && (X == 0) --> (X == 0) https://alive2.llvm.org/ce/z/9dq-cR Differential Revision: https://reviews.llvm.org/D122757	2022-04-04 23:23:34 +09:00
Augie Fackler	e90bce8f91	CallBase: fix getFnAttr so it also checks the function Prior to this change, CallBase::hasFnAttr checked the called function to see if it had an attribute if it wasn't set on the CallBase, but getFnAttr didn't do the same delegation, which led to very confusing behavior. This patch fixes the issue by making CallBase::getFnAttr also check the function under the same circumstances. Test changes look (to me) like they're cleaning up redundant attributes which no longer get specified both on the callee and call. We also clean up the one ad-hoc implementation of this getter over in InlineCost.cpp. Differential Revision: https://reviews.llvm.org/D122821	2022-04-03 23:19:23 -04:00
Artur Pilipenko	4fbde1ef40	Fix MemorySSAUpdater::insertDef for dead code Fix for https://github.com/llvm/llvm-project/issues/51257. Differential Revision: https://reviews.llvm.org/D122601	2022-03-31 16:32:35 -07:00
serge-sans-paille	01be9be2f2	Cleanup includes: final pass Cleanup a few extra files, this closes the work on libLLVM dependencies on my side. Impact on libLLVM preprocessed output: -35876 lines Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122576	2022-03-29 09:00:21 +02:00
Vasileios Porpodas	39aa202aff	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit `e6ead19b77`.	2022-03-23 18:32:17 -07:00
Fangrui Song	dcad676958	[CGSCC] Use make_early_inc_range. NFC	2022-03-23 15:31:09 -07:00
Arthur Eubanks	e6ead19b77	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash." This reverts commit `27bd8f9492`. Causes crashes, see comments in D121973	2022-03-23 10:57:45 -07:00
Vasileios Porpodas	27bd8f9492	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit `f7d7d2a08d`.	2022-03-22 16:41:55 -07:00
Arthur Eubanks	f7d7d2a08d	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads."" This reverts commit `79613185d3`. Causes crashes, see comments in https://reviews.llvm.org/D121973.	2022-03-22 13:33:49 -07:00
Vasileios Porpodas	79613185d3	Recommit "[SLP] Fix lookahead operand reordering for splat loads." Original review: https://reviews.llvm.org/D121354 The original commit `9136145eb0` broke the build on several targets. Differential Revision: https://reviews.llvm.org/D121973	2022-03-21 15:57:32 -07:00
Philip Reames	b880bde92b	Add missing dependencies to mayHaveNonDefUseDependency Two interesting ommissions: * When reordering in either direction, reordering two calls which both contain inf-loops is illegal. This one is possibly a change in behavior for certain callers (e.g. fixes a latent bug.) * When moving down, control dependence must be respected by checking the inverse of isSafeToSpeculativeExecute. Current callers all seem to handle this case - though admitted, I did not do an exhaustive audit. Most seem to be only interested in moving upwards within a block. This is mostly a case of future proofing an API so that it implements what the comments says, not just what current callers need. Noticed via inspection. I don't have a test case.	2022-03-21 10:15:36 -07:00
Philip Reames	ee7324b898	Rename mayBeMemoryDependent to mayHaveNonDefUseDependency [nfc]	2022-03-21 10:01:40 -07:00
serge-sans-paille	39b02d49cc	[instcombine] Support and test __builtin_object_size interaction with __strdup and __strndup Differential Revision: https://reviews.llvm.org/D122005	2022-03-21 11:30:51 +01:00
serge-sans-paille	d8e0a6d5e9	[LowerConstantIntrinsics] Support phi operand in __builtin_object_size folder The implementation is just a generalization of the Select handler. We're no trying to be smart and compute any kind of fixed point. Differential Revision: https://reviews.llvm.org/D121897	2022-03-21 11:30:50 +01:00
Kazu Hirata	9aa52ba574	[Analysis] Apply clang-tidy fixes for readability-redundant-smartptr-get (NFC)	2022-03-20 18:21:40 -07:00
Arthur Eubanks	ddc702376a	[NewPM] Don't skip SCCs not in current RefSCC With D107249 I saw huge compile time regressions on a module (150s -> 5700s). This turned out to be due to a huge RefSCC in the module. As we ran the function simplification pipeline on functions in the SCCs in the RefSCC, some of those SCCs would be split out to their RefSCC, a child of the current RefSCC. We'd skip the remaining SCCs in the huge RefSCC because the current RefSCC is now the RefSCC just split out, then revisit the original huge RefSCC from the beginning. This happened many times because many functions in the RefSCC were optimizable to the point of becoming their own RefSCC. This patch makes it so we don't skip SCCs not in the current RefSCC so that we split out all the child RefSCCs on the first iteration of RefSCC. When we split out a RefSCC, we invalidate the original RefSCC and add the remainder of the SCCs into a new RefSCC in RCWorklist. This happens repeatedly until we finish visiting all SCCs, at which point there is only one valid RefSCC in RCWorklist from the original RefSCC containing all the SCCs that were not split out, and we visit that. For example, in the newly added test cgscc-refscc-mutation-order.ll, we'd previously run instcombine in this order: f1, f2, f1, f3, f1, f4, f1 Now it's: f1, f2, f3, f4, f1 This can cause more passes to be run in some specific cases, e.g. if f1<->f2 gets optimized to f1<-f2, we'd previously run f1, f2; now we run f1, f2, f2. This improves kimwitu++ compile times by a lot (12-15% for various -O3 configs): https://llvm-compile-time-tracker.com/compare.php?from=2371c5a0e06d22b48da0427cebaf53a5e5c54635&to=00908f1d67400cab1ad7bcd7cacc7558d1672e97&stat=instructions Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D121953	2022-03-18 14:16:29 -07:00
Florian Hahn	1b7ef6aac8	[BasicAA] Account for wrapping when using abs(VarIndex) >= abs(Scale). The patch adds an extra check to only set MinAbsVarIndex if abs(V * Scale) won't wrap. In the absence of IsNSW, try to use the bitwidths of the original V and Scale to rule out wrapping. Attempt to model https://alive2.llvm.org/ce/z/HE8ZKj The code in the else if below probably needs the same treatment, but I need to come up with a test first. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D121695	2022-03-18 14:41:15 +00:00
Kevin P. Neal	bd050a34fe	[FPEnv][InstSimplify] Teach CannotBeNegativeZero() about constrained intrinsics. Currently some optimizations are disabled because llvm::CannotBeNegativeZero() does not know how to deal with the constrained intrinsics. This patch fixes that by extending the existing implementation. Differential Revision: https://reviews.llvm.org/D121483	2022-03-18 10:24:48 -04:00
Nikita Popov	6ffb3ad631	[SCEV] Use constant ranges when determining reachable blocks (PR54434) This avoids false positive verification failures if the condition is not literally true/false, but SCEV still makes use of the fact that a loop is not reachable through more complex reasoning. Fixes https://github.com/llvm/llvm-project/issues/54434.	2022-03-18 12:04:35 +01:00
Nikita Popov	f96428e16d	[MemorySSA] Don't optimize uses during construction This changes MemorySSA to be constructed in unoptimized form. MemorySSA::ensureOptimizedUses() can be called to optimize all uses (once). This should be done by passes where having optimized uses is beneficial, either because we're going to query all uses anyway, or because we're doing def-use walks. This should help reduce the compile-time impact of MemorySSA for some use cases (the reason why I started looking into this is D117926), which can avoid optimizing all uses upfront, and instead only optimize those that are actually queried. Actually, we have an existing use-case for this, which is EarlyCSE. Disabling eager use optimization there gives a significant compile-time improvement, because EarlyCSE will generally only query clobbers for a subset of all uses (this change is not included in this patch). Differential Revision: https://reviews.llvm.org/D121381	2022-03-18 09:56:16 +01:00
Vasileios Porpodas	9136145eb0	Revert "[SLP] Fix lookahead operand reordering for splat loads." due to build failures This reverts commit `5efa78985b`.	2022-03-17 18:22:04 -07:00
Vasileios Porpodas	5efa78985b	[SLP] Fix lookahead operand reordering for splat loads. Splat loads are inexpensive in X86. For a 2-lane vector we need just one instruction: `movddup (%reg), xmm0`. Using the standard Splat score leads to worse code. This patch adds a new score dedicated for splat loads. Please note that a splat is usually three IR instructions: - It is usually a load and 2 inserts: %ld = load double, double* %gep %ins1 = insertelement <2 x double> poison, double %ld, i32 0 %ins2 = insertelement <2 x double> %ins1, double %ld, i32 1 - But it can also be a load, an insert and a shuffle: %ld = load double, double* %gep %ins = insertelement <2 x double> poison, double %ld, i32 0 %shf = shufflevector <2 x double> %ins, <2 x double> poison, <2 x i32> zeroinitializer Because of this some of the lit tests contain more IR instructions. Differential Revision: https://reviews.llvm.org/D121354	2022-03-17 18:05:54 -07:00
Jay Foad	a3a4591856	[LegacyPassManager] Move structural hashing into Pass classes. NFC. Move structural hashing into virtual methods on Pass. This will allow MachineFunctionPass to override the method to add hashing of the MachineFunction. Differential Revision: https://reviews.llvm.org/D120123	2022-03-17 09:51:12 +00:00
Nikita Popov	f3cbe60aa9	[AAEval] Remove unused function (NFC)	2022-03-16 10:25:45 +01:00
Nikita Popov	57d57b1afd	[AAEval] Make compatible with opaque pointers With opaque pointers, we cannot use the pointer element type to determine the LocationSize for the AA query. Instead, -aa-eval tests are now required to have an explicit load or store for any pointer they want to compute alias results for, and the load/store types are used to determine the location size. This may affect ordering of results, and sorting within one result, as the type is not considered part of the sorted string anymore. To somewhat minimize the churn, printing still uses faux typed pointer notation.	2022-03-16 10:02:11 +01:00
Dmitry Makogon	361034ba78	[NFC] Add LazyValueInfo::clear method This method just calls LazyValueInfoImpl::clear	2022-03-15 17:52:50 +07:00
Arthur Eubanks	4fc7c55fff	[NewPM] Actually recompute GlobalsAA before module optimization pipeline RequireAnalysis<GlobalsAA> doesn't actually recompute GlobalsAA. GlobalsAA isn't invalidated (unless specifically invalidated) because it's self-updating via ValueHandles, but can be imprecise during the self-updates. Rather than invalidating GlobalsAA, which would invalidate AAManager and any analyses that use AAManager, create a new pass that recomputes GlobalsAA. Fixes #53131. Differential Revision: https://reviews.llvm.org/D121167	2022-03-14 09:42:34 -07:00
Arthur Eubanks	55cf09ae26	[ValueTracking] Simplify llvm::isPointerOffset() We still need the code after stripAndAccumulateConstantOffsets() since it doesn't handle GEPs of scalable types and non-constant but identical indexes. Differential Revision: https://reviews.llvm.org/D120523	2022-03-14 09:32:36 -07:00
Nikita Popov	04b717c423	[TLI] Check that malloc argument has type size_t DSE assumes that this is the case when forming a calloc from a malloc + memset pair. For tests, either update the malloc signature or change the data layout.	2022-03-14 17:22:24 +01:00
Andrew Litteken	0c4bbd293e	[IRSim] Make sure the first instruction of a block doesn't get missed if it is the first valid instruction in Module. If an instruction is first legal instruction in the module, and is the only legal instruction in its basic block, it will be ignored by the outliner due to a length check inherited from the older version of the outliner that was restricted to outlining within a single basic block. This removes that check, and updates any tests that broke because of it. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D120786	2022-03-13 23:13:09 -05:00
Andrew Litteken	1643f01232	[IRSim][IROutliner] Ignoring Musttail Function Musttail calls require extra handling to properly propagate the calling convention information and tail call information. The outliner does not currently do this, so we ignore call instructions that utilize the swifttailcc and tailcc calling convention as well as functions marked with the attribute musttail. Reviewers: paquette, aschwaighofer Differential Revision: https://reviews.llvm.org/D120733	2022-03-13 19:27:25 -05:00
Andrew Litteken	66f90fdff1	Revert "[IRSim][IROutliner] Ignoring Musttail Function" This reverts commit `c7037c7257`. Pushed too soon	2022-03-13 19:26:51 -05:00
Andrew Litteken	c7037c7257	[IRSim][IROutliner] Ignoring Musttail Function	2022-03-13 18:57:24 -05:00
serge-sans-paille	3d219d805c	Add missing include under EXPENSIVE_CHECKS	2022-03-12 18:54:29 +01:00
serge-sans-paille	ed98c1b376	Cleanup includes: DebugInfo & CodeGen Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121332	2022-03-12 17:26:40 +01:00
Johannes Doerfert	d6e09ce86f	[CaptureTracking][NFCI] Expose capture tracking logic The logic exposed by this patch via `llvm::DetermineUseCaptureKind` was part of `llvm::PointerMayBeCaptured`. In the Attributor we want to keep track of the work list items but still reuse the logic if a use might capture a value. A follow up for the Attributor removes ~100 lines of code and complexity while making future handling of simplified values possible. Differential Revision: https://reviews.llvm.org/D121272	2022-03-11 22:56:16 -06:00
Anna Thomas	a4aa97d578	[InlineCost] Add cl::opt for target attributes compatibility check. NFC This patch adds a CL option for avoiding the attribute compatibility check between caller and callee in TTI. TTI attribute compatibility checks for target CPU and target features. In our downstream compiler, this attribute always remains the same between callee and caller. By avoiding the addition of this attribute to each of our inline candidate (and then checking them here during inline cost), we save some compile time. The option is kept false, so this change is an NFC upstream.	2022-03-11 18:05:16 -05:00
Nikita Popov	806450805d	[ConstFold] Don't fold calls with mismatching function type With opaque pointers, this is no longer ensured through pointer type identity.	2022-03-11 14:09:23 +01:00
Nikita Popov	02c2106002	[InstSimplify] Handle vector GEP when simplifying zero indices If the base is a scalar and the index is a vector, we can't simplify, as this is effectively a splat operation.	2022-03-11 10:56:44 +01:00
Sanjay Patel	b48fe158e0	[Analysis] remove bogus smin/smax pattern detection This is a revert of `cfcc42bdc`. The analysis is wrong as shown by the minimal tests for instcombine: https://alive2.llvm.org/ce/z/y9Dp8A There may be a way to salvage some of the other tests, but that can be done as follow-ups. This avoids a miscompile and fixes #54311.	2022-03-09 17:50:34 -05:00
Florian Hahn	f98125abb2	Revert "[PassManager] Add pretty stack entries before P->run() call." This reverts commit `128745cc26`. This increased compile-time unnecessarily. Revert this change and follow ups `2c7afadb47` & `add0c5856d`. http://llvm-compile-time-tracker.com/compare.php?from=338dfcd60f843082bb589b287d890dbd9394eb82&to=128745cc2681c284bc6d0150a319673a6d6e8424&stat=instructions	2022-03-09 18:46:32 +00:00
Florian Hahn	128745cc26	[PassManager] Add pretty stack entries before P->run() call. This patch adds PrettyStackEntries before running passes. The entries include the pass name and the IR unit the pass runs on. The information is used the print additional information when a pass crashes, including the name and a reference to the IR unit on which it crashed. This is similar to the behavior of the legacy pass manager. The improved stack trace now includes: Stack dump: 0. Program arguments: bin/opt -loop-vectorize -force-vector-width=4 crash.ll 1. Running pass 'ModuleToFunctionPassAdaptor' on module 'crash.ll' 2. Running pass 'LoopVectorizePass' on function '@a' Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D120993	2022-03-09 13:01:09 +00:00
Nikita Popov	ba8ee4a43e	[SCEV] Verify all IR -> SCEV mappings This extends SCEV verification to check not only backedge-taken counts, but all entries in the IR -> SCEV cache. The restrictions are the same as for the BECount case, i.e. we ignore expressions based on undef, we only diagnose constant deltas (there are way too many false positives otherwise) and we limit to reachable code. Differential Revision: https://reviews.llvm.org/D121104	2022-03-09 09:33:22 +01:00
Arthur Eubanks	53e5e58670	[NewPM][Inliner] Make inlined calls to functions in same SCC as callee exponentially expensive Introduce a new attribute "function-inline-cost-multiplier" which multiplies the inline cost of a call site (or all calls to a callee) by the multiplier. When processing the list of calls created by inlining, check each call to see if the new call's callee is in the same SCC as the original callee. If so, set the "function-inline-cost-multiplier" attribute of the new call site to double the original call site's attribute value. This does not happen when the original call site is intra-SCC. This is an alternative to D120584, which marks the call sites as noinline. Hopefully fixes PR45253. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D121084	2022-03-07 23:51:09 -08:00
Florian Hahn	a2979c8399	[IVDescriptors] Bail out instead of asserting that order is expected. When dealing with multiple phis that depend on each other, the order might have been changed and may not match the expectation. If that happens, bail out, rather than asserting. Fixes https://github.com/llvm/llvm-project/issues/54218 Fixes https://github.com/llvm/llvm-project/issues/54233 Fixes https://github.com/llvm/llvm-project/issues/54254	2022-03-07 19:57:26 +00:00
Nikita Popov	81b43b23e4	[SCEV] Enable verification under EXPENSIVE_CHECKS SCEV verification should no longer affect results of subsequent queries, and our lit tests as well as llvm-test-suite pass with SCEV verification enabled, so I think we can enable it by default under EXPENSIVE_CHECKS now. Differential Revision: https://reviews.llvm.org/D120708	2022-03-07 09:53:00 +01:00
Nikita Popov	d1e880acaa	[SCEV] Enable verification in LoopPM Currently, we hardly ever actually run SCEV verification, even in tests with -verify-scev. This is because the NewPM LPM does not verify SCEV. The reason for this is that SCEV verification can actually change the result of subsequent SCEV queries, which means that you see different transformations depending on whether verification is enabled or not. To allow verification in the LPM, this limits verification to BECounts that have actually been cached. It will not calculate new BECounts. BackedgeTakenInfo::getExact() is still not entirely readonly, it still calls getUMinFromMismatchedTypes(). But I hope that this is not problematic in the same way. (This could be avoided by performing the umin in the other SCEV instance, but this would require duplicating some of the code.) Differential Revision: https://reviews.llvm.org/D120551	2022-03-07 09:46:20 +01:00
Nikita Popov	8133778d3c	[SCEV] Fully invalidate SCEVUnknown on RAUW When a SCEVUnknown gets RAUWd, we currently drop it from the folding set, but don't forget memoized values. I believe we should be treating RAUW the same way as deletion here and invalidate all caches and dependent expressions. I don't have any specific cases where this causes issues right now, but it does address the FIXME in https://reviews.llvm.org/D119488. Differential Revision: https://reviews.llvm.org/D120033	2022-03-07 09:28:28 +01:00
Florian Hahn	de8ac485e5	[IVDescriptor] Remove SinkCandidate from SinkAfter before re-sinking. This ensures the right order in the sink-after map is maintained. If we re-sink an instruction, it must be sunk after all earlier instructions have been sunk. Fixes https://github.com/llvm/llvm-project/issues/54223	2022-03-05 19:48:26 +00:00
Arthur Eubanks	f909aed671	Revert "[SCEV] Infer ranges for SCC consisting of cycled Phis" This reverts commit `fc539b0004`. Causes miscompiles, see D110620.	2022-03-04 19:52:44 -08:00
Augie Fackler	dba73135c8	getAllocAlignment: respect allocalign attribute if present As with allocsize(), we prefer the table data to attributes. Differential Revision: https://reviews.llvm.org/D118263	2022-03-04 15:57:54 -05:00
Augie Fackler	5e4c75db3b	InstructionCombining: avoid eliding mismatched alloc/free pairs Prior to this change LLVM would happily elide a call to any allocation function and a call to any free function operating on the same unused pointer. This can cause problems in some obscure cases, for example if the body of operator::new can be inlined but the body of operator::delete can't, as in this example from jyknight: #include <stdlib.h> #include <stdio.h> int allocs = 0; void operator new(size_t n) { allocs++; void mem = malloc(n); if (!mem) abort(); return mem; } __attribute__((noinline)) void operator delete(void mem) noexcept { allocs--; free(mem); } void deleteit(inti) { delete i; } int main() { int*i = new int; deleteit(i); if (allocs != 0) printf("MEMORY LEAK! allocs: %d\n", allocs); } This patch addresses the issue by introducing the concept of an allocator function family and uses it to make sure that alloc/free function pairs are only removed if they're in the same family. Differential Revision: https://reviews.llvm.org/D117356	2022-03-04 10:41:10 -05:00
Florian Hahn	5a60260efe	[IVDescriptor] Use DT to check order of Previous, OtherPrev. Previous and OhterPrev may not be in the same block. Use DT::dominates instead of local comesBefore. DT::dominates is already used earlier to check the order of Previous and SinkCandidate. Fixes https://github.com/llvm/llvm-project/issues/54195	2022-03-04 11:07:42 +00:00
Jez Ng	dd29597e10	[LTO] Initialize canAutoHide() using canBeOmittedFromSymbolTable() Per discussion on https://reviews.llvm.org/D59709#inline-1148734, this seems like the right course of action. `canBeOmittedFromSymbolTable()` subsumes and generalizes the previous logic. In addition to handling `linkonce_odr` `unnamed_addr` globals, we now also internalize `linkonce_odr` + `local_unnamed_addr` constants. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D120173	2022-03-03 19:04:11 -05:00
Arthur Eubanks	41e792d725	[CostModel] Change printer pass wording to work with update_analyze_test_checks.py update_analyze_test_checks.py looks for very specific wording, update the printer pass to match the legacy `-analyze -cost-model` wording.	2022-03-03 10:10:48 -08:00
Craig Topper	608161225e	[InstCombine][Analysis] Move getFCmpCode and getPredForFCmpCode to CmpInstAnalysis. NFC The similar getICmpCode and getPredForICmpCode are already there. This moves FP for consistency. I think InstCombine is currently the only user of both. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120754	2022-03-03 09:33:24 -08:00
Florian Hahn	139215af8e	[IVDescriptor] Find original 'Previous' for first-order recurrences. This patch extends first-order recurrence handling to support cases where we already sunk an instruction for a different recurrence, but LastPrev comes before Previous. To handle those cases correctly, we need to find the earliest entry for the sink-after chain, because this is references the Previous from the original recurrence. This is needed to ensure we use the correct instruction as sink point. Depends on D118558. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D118642	2022-03-03 16:41:26 +00:00
serge-sans-paille	81a1760cac	Revert "Add missing include under EXPENSIVE_CHECK" This reverts commit `eeaca53df7`. It's a duplicate of https://reviews.llvm.org/rG50874a188b94a25827963956887b878d3701509a	2022-03-03 07:56:34 +01:00
serge-sans-paille	a494ae43be	Cleanup includes: TransformsUtils Estimation on the impact on preprocessor output: before: 1065307662 after: 1064800684 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120741	2022-03-01 21:00:07 +01:00
serge-sans-paille	eeaca53df7	Add missing include under EXPENSIVE_CHECK This is a followup to 344f8ec3048b6eeef94569800acb012f794ad372 It should fix https://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/21961/console	2022-03-01 21:00:06 +01:00
Fangrui Song	50874a188b	Fix -DLLVM_ENABLE_EXPENSIVE_CHECKS=on build after D120659	2022-03-01 11:36:25 -08:00
Mircea Trofin	261419273a	Fix build breaks on ml-* bots introduced by include cleanups	2022-03-01 11:29:18 -08:00
Craig Topper	7bc6667845	[Analysis] Simplify the interface to llvm::getICmpCode. NFC Instead of passing an InstCmpInt * and a bool just pass the predicate from the caller. I'm considering moving the similar FCmp functions from InstCombine over here and this makes the interface consistent with what is used for FCmp. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120609	2022-03-01 09:53:27 -08:00
serge-sans-paille	71c3a5519d	Cleanup includes: LLVMAnalysis Number of lines output by preprocessor: before: 1065940348 after: 1065307662 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120659	2022-03-01 18:01:54 +01:00
Nikita Popov	aeab6167b0	[SCEV] Only verify BECounts for reachable loops (PR50523) For unreachable loops, any BECount is legal, and since D98706 SCEV can make use of this for loops that are unreachable due to constant branches. To avoid false positives, adjust SCEV verification to only check BECounts in reachable loops. Fixes https://github.com/llvm/llvm-project/issues/50523. Differential Revision: https://reviews.llvm.org/D120651	2022-03-01 11:52:35 +01:00
Nikita Popov	3c53d3a733	[InlineCost] Use SmallPtrSet for DeadBlocks (NFC) This set is only used with contains operations, so there is no need to use a SetVector.	2022-02-28 15:26:22 +01:00
Serge Pavlov	6982c38cb1	[ConstantFolding] Fix folding of constrained compare intrinsics The change fixes treatment of constrained compare intrinsics if compared values are of vector type. Differential revision: https://reviews.llvm.org/D110322	2022-02-27 10:19:19 +07:00
Nikita Popov	2d0fc3e46f	[SCEV] Return ArrayRef from getSCEVValues() (NFC) Return a read-only view on this set. For the one internal use, directly access ExprValueMap.	2022-02-25 09:32:22 +01:00
Nikita Popov	d9715a7266	[SCEV] Don't try to reuse expressions with offset SCEVs ExprValueMap currently tracks not only which IR Values correspond to a given SCEV expression, but additionally stores that it may be expanded in the form X+Offset. In theory, this allows reusing existing IR Values in more cases. In practice, this doesn't seem to be particularly useful (the test changes are rather underwhelming) and adds a good bit of complexity. Per https://github.com/llvm/llvm-project/issues/53905, we have an invalidation issue with these offseted expressions. Differential Revision: https://reviews.llvm.org/D120311	2022-02-25 09:16:48 +01:00
Mircea Trofin	7e3606f43c	[ScalarEvolution] Control flag for nonstrict inequalities in finite loops D118090 causes a pretty significant (19%) regression in some Eigen benchmarks. Investigating is a bit time consuming as the compilation unit where this occurs is large. Rather than revert, this patch adds a flag controlling that behavior (enabled by default).	2022-02-23 17:56:35 -08:00
Malhar Jajoo	9f1c6fbf11	[LAA] Add remarks for unbounded array access Adds new optimization remarks when loop vectorization fails due to the compiler being unable to find bound of an array access inside a loop Differential Revision: https://reviews.llvm.org/D115873	2022-02-23 15:57:39 +00:00
Sanjay Patel	fc3b34c508	[InstSimplify] remove shift that is redundant with part of funnel shift In D111530, I suggested that we add some relatively basic pattern-matching folds for shifts and funnel shifts and avoid a more specialized solution if possible. We can start by implementing at least one of these in IR because it's easier to write the code and verify with Alive2: https://alive2.llvm.org/ce/z/qHpmNn This will need to be adapted/extended for SDAG to handle the motivating bug ( #49541 ) because the patterns only appear later with that example (added some tests: `bb850d422b`) This can be extended within InstSimplify to handle cases where we 'and' with a shift too (in that case, kill the funnel shift). We could also handle patterns where the shift and funnel shift directions are inverted, but I think it's better to canonicalize that instead to avoid pattern-match case explosion. Differential Revision: https://reviews.llvm.org/D120253	2022-02-23 09:10:01 -05:00
Thomas Preud'homme	40f9081958	[LAA] Add missing newline in debug print	2022-02-23 13:25:16 +00:00
Nikita Popov	6777ec9e4d	[ValueTracking] Support signed intrinsic clamp This is the same special logic we apply for SPF signed clamps when computing the number of sign bits, just for intrinsics. This just uses the same logic as the select case, but there's multiple directions this could be improved in: We could also use the num sign bits from the clamped value, we could do this during constant range calculation, and there's probably unsigned analogues for the constant range case at least.	2022-02-23 12:45:16 +01:00
Bill Wendling	a5bbc6ef99	[NFC] Remove unnecessary "#include"s from header files	2022-02-23 01:20:48 -08:00
Kerry McLaughlin	12fb133eba	[LoopVectorize] Support conditional in-loop vector reductions Extends getReductionOpChain to look through Phis which may be part of the reduction chain. adjustRecipesForReductions will now also create a CondOp for VPReductionRecipe if the block is predicated and not only if foldTailByMasking is true. Changes were required in tryToBlend to ensure that we don't attempt to convert the reduction Phi into a select by returning a VPBlendRecipe. The VPReductionRecipe will create a select between the Phi and the reduction. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117580	2022-02-22 12:04:35 +00:00
Max Kazantsev	ad3b1fe472	[SCEV] Do not erase LoopUsers. PR53969 This patch fixes a logical error in how we work with `LoopUsers` map. It maps a loop onto a set of AddRecs that depend on it. The Addrecs are added to this map only once when they are created and put to the UniqueSCEVs` map. The only purpose of this map is to make sure that, whenever we forget a loop, all (directly or indirectly) dependent SCEVs get forgotten too. Current code erases SCEVs from dependent set of a given loop whenever we forget this loop. This is not a correct behavior due to the following scenario: 1. We have a loop `L` and an AddRec `AR` that depends on it; 2. We modify something in the loop, but don't destroy it. We still call forgetLoop on it; 3. `AR` is no longer dependent on `L` according to `LoopUsers`. It is erased from ValueExprMap` and `ExprValue map, but still exists in UniqueSCEVs; 4. We can later request the very same AddRec for the very same loop again, and get existing SCEV `AR`. 5. Now, `AR` exists and is used again, but its notion that it depends on `L` is lost; 6. Then we decide to delete `L`. `AR` will not be forgotten because we have lost it; 7. Just you wait when you run into a dangling pointer problem, or any other kind of problem because an active SCEV is now referecing a non-existent loop. The solution to this is to stop erasing values from `LoopUsers`. Yes, we will maybe forget something that is already not used, but it's cheap. This fixes a functional bug and potentially may have negative compile time impact on methods with huge or numerous loops. Differential Revision: https://reviews.llvm.org/D120303 Reviewed By: nikic	2022-02-22 17:24:39 +07:00
David Sherwood	dc0657277f	Fix warning introduced by `47eff645d8`	2022-02-22 09:37:16 +00:00
David Sherwood	47eff645d8	[InstCombine] Bail out of load-store forwarding for scalable vector types This patch fixes an invalid TypeSize->uint64_t implicit conversion in FoldReinterpretLoadFromConst. If the size of the constant is scalable we bail out of the optimisation for now. Tests added here: Transforms/InstCombine/load-store-forward.ll Differential Revision: https://reviews.llvm.org/D120240	2022-02-22 09:26:04 +00:00
Max Kazantsev	40d06c4ce9	[SCEV][NFC] Replace contains+insert check with insert.second	2022-02-21 20:11:13 +07:00
Philip Reames	34a9642af8	Revert "[instsimplify] Simplify HaveNonOverlappingStorage per review suggestion on D120133 [NFC]" This reverts commit `3a6be124cc`. This appears to have caused a stage2 build failure: https://lab.llvm.org/buildbot/#/builders/168/builds/4813 Will investigate further on Monday and recommit.	2022-02-18 15:36:15 -08:00
Whitney Tsang	e7afbea8ca	[MemorySSA] Clear VisitedBlocks per query The problem can be shown from the newly added test case. There are two invocations to MemorySSAUpdater::moveToPlace, and the internal data structure VisitedBlocks is changed in the first invocation, and reused in the second invocation. In between the two invocations, there is a change to the CFG, and MemorySSAUpdater is notified about the change. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D119898	2022-02-18 15:36:19 -05:00
Philip Reames	3a6be124cc	[instsimplify] Simplify HaveNonOverlappingStorage per review suggestion on D120133 [NFC]	2022-02-18 11:33:15 -08:00
Philip Reames	ff2e4c04c4	[instsimplify] Assume storage for byval args doesn't overlap allocas, globals, or other byval args This allows us to discharge many pointer comparisons based on byval arguments. Differential Revision: https://reviews.llvm.org/D120133	2022-02-18 11:08:01 -08:00
Philip Reames	bf296ea6bb	[instsimplify] Clarify assumptions about disjoint memory regions [NFC]	2022-02-18 08:51:18 -08:00
Philip Reames	5ecf218eca	[instsimplify] Add a comment hinting how compares involving two globals are handled [NFC]	2022-02-18 08:41:30 -08:00
Philip Reames	f6510e6d6f	[instsimplify] Factor out a helper for alloca bounds checking [NFC] At the moment, this just groups comments with a reasonably named predicate, but I plan to add other cases to this in the near future.	2022-02-18 07:40:22 -08:00
Serguei Katkov	b45d0b3e8e	[MemoryDependency] Simplfy re-ordering condition. Cleanup. NFC. Make the reading of condition for restricting re-ordering simpler. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D120005	2022-02-18 12:40:26 +07:00
Philip Reames	cf5e88864b	[instsimplify] When compare allocas, consider their minimal size The code was using exact sizing only, but since what we really need is just to make sure the offsets are in bounds, a minimum bound on the object size is sufficient. To demonstrate the difference, support computing minimum sizes from obects of scalable vector type.	2022-02-17 09:53:24 -08:00
Philip Reames	2404313d80	[instsimplify] Fix a miscompile with zero sized allocas Remove some code which tried to handle the case of comparing two allocas where an object size could not be precisely computed. This code had zero coverage in tree, and at least one nasty bug. The bug comes from the fact that the code uses the size of the result pointer as a proxy for whether the alloca can be of size zero. Since the result of an alloca is always a pointer type, and a pointer type can never be empty, this check was a nop. As a result, we blindly consider a zero offset from two allocas to never be equal. They can in fact be equal when one or more of the allocas is zero sized. This is particularly ugly because instcombine contains the exact opposite rule. If instcombine reaches the allocas first, it combines them into one (making them equal). If instsimplify reaches the compare first, it would consider them not equal. This creates all kinds of fun scenarios for order of optimization reaching different and contradictory conclusions.	2022-02-17 09:27:34 -08:00
Max Kazantsev	fc539b0004	[SCEV] Infer ranges for SCC consisting of cycled Phis Our current strategy of computing ranges of SCEVUnknown Phis was to simply compute the union of ranges of all its inputs. In order to avoid infinite recursion, we mark Phis as pending and conservatively return full set for them. As result, even simplest patterns of cycled phis always have a range of full set. This patch makes this logic a bit smarter. We basically do the same, but instead of taking inputs of single Phi we find its strongly connected component (SCC) and compute the union of all inputs that come into this SCC from outside. Processing entire SCC together has one more advantage: we can set range for all of them at once, because the only thing that happens to them is the same value is being passed between those Phis. So, despite we spend more time analyzing a single Phi, overall we may save time by not processing other SCC members, so amortized compile time spent should be approximately the same. Differential Revision: https://reviews.llvm.org/D110620 Reviewed By: reames	2022-02-17 18:03:52 +07:00
Nikita Popov	c3c5280b0e	[InstSimplify] Delay creation of constants for offsets (NFC) Return APInt from stripAndComputeConstantOffsets(), and only create corresponding Constants later, if we actually need them.	2022-02-17 09:56:32 +01:00
Serguei Katkov	194899caef	[MemoryDependency] Relax the re-ordering of atomic store and unordered load/store Atomic store with Release semantic allows re-ordering of unordered load/store before the store. Implement it. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D119844	2022-02-17 10:53:25 +07:00
Roman Lebedev	ae48af582b	[NFC][SCEV] Recognize umin_seq when operand is zext'ed in zero-check zext(umin(x,y)) == umin(zext(x),zext(y)) zext(x) == 0 -> x == 0 While it is not a very likely scenario, we probably should not expect that instcombine already dropped such a redundant zext, but handle directly. Moreover, perhaps there was no ZExtInst, and SCEV somehow managed to pull out said zext out of the SCEV expression.	2022-02-16 22:16:02 +03:00
Roman Lebedev	3c7d48ed90	[NFC][SCEV] Recognize umin_seq when operand is zext'ed in umin but not in zero-check zext(umin(x,y)) == umin(zext(x),zext(y)) zext(x) == 0 -> x == 0 Extra leading zeros do not affect the result of comparison with zero, nor do they matter for the unsigned min/max, so we should not be dissuaded when we find a zero-extensions, but instead we should just skip it.	2022-02-16 22:16:02 +03:00
Kevin P. Neal	8290f2535b	[FPEnv][FMF] Move helper function to header, move fast math flags to new include file. In a prior review I was asked to move the helper function canIgnoreSNaN() out to FPEnv.h. This wasn't possible at the time because that function needs the fast math flags, and including them includes lots of other stuff that isn't needed. This patch moves the fast math flags out into a new FMF.h file unchanged, and moves the helper function out to FPEnv.h also unchanged. This ticket only moves code around. Differential Revision: https://reviews.llvm.org/D119752	2022-02-16 12:34:53 -05:00
Kevin P. Neal	c7400892ca	[FPEnv][InstSimplify] Fold fsub X, -0 ==> X, when we know X is not -0 Currently the fsub optimizations in InstSimplify don't know how to fold X - -0.0 to X when we know X is not zero and the constrained intrinsics are used. This adds the support. This review is split out from D107285. Differential Revision: https://reviews.llvm.org/D119746	2022-02-16 10:10:13 -05:00
Chuanqi Xu	a2609be0b2	[ValueTracking] Checking haveNoCommonBitsSet for (x & y) and ~(x \| y) This one tries to fix: https://github.com/llvm/llvm-project/issues/53357. Simply, this one would check (x & y) and ~(x \| y) in haveNoCommonBitsSet. Since they shouldn't have common bits (we could traverse the case by enumerating), and we could convert this one to (x & y) \| ~(x \| y) . Then the compiler could handle it in InstCombineAndOrXor. Further more, since ((x & y) + (~x & ~y)) would be converted to ((x & y) + ~(x \| y)), this patch would fix it too. https://alive2.llvm.org/ce/z/qsKzRS Reviewed By: spatel, xbolva00, RKSimon, lebedev.ri Differential Revision: https://reviews.llvm.org/D118094	2022-02-16 13:42:52 +08:00
Serguei Katkov	15f1cffb3a	[MemoryDependency] Relax the re-ordering with volatile store. Volatile store does not provide any special rules for reordering with atomics. Usual must alias anaylsis is enough here. This makes the bahavior similar to how volatile load is handled. Reviewers: reames, nikic Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D119818	2022-02-16 10:58:48 +07:00
Sanjay Patel	7cc0a29b3f	[Analysis] propagate poison through add/sub saturate intrinsics A more general enhancement needs to add tests and make sure that intrinsics that return structs are correct. There are also target-specific intrinsics, and I'm not sure what behavior is expected for those.	2022-02-15 10:45:32 -05:00
Sanjay Patel	00218c188b	[Analysis] propagate poison through integer min/max intrinsics A more general enhancement needs to add tests and make sure that intrinsics that return structs are correct. There are also target-specific intrinsics, and I'm not sure what behavior is expected for those.	2022-02-15 10:45:32 -05:00
Nikita Popov	f35af77573	[InstSimplify] Strip offsets once in computePointerICmp() Instead of doing an inbounds strip first and another non-inbounds strip afterward for equality comparisons, directly do a single inbounds or non-inbounds strip based on whether we have an equality predicate or not. This is NFC-ish in that the alloca equality codepath is the only part that sees additional non-inbounds offsets now, and for that codepath it doesn't matter whether or not the GEP is inbounds, as it does a stronger check itself. InstCombine would infer inbounds for such GEPs.	2022-02-15 12:04:24 +01:00
Dávid Bolvanský	f0e6ec1547	[Inliner] Respect noinline call site attribute ``` always_inline foo() { } bar () { noinline foo(); } ``` We should prefer call site attribute over attribute on decl. Related to https://reviews.llvm.org/D119061 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D119579	2022-02-14 18:35:52 +01:00
Kevin P. Neal	22bd65fbe7	[FPEnv][InstSimplify] Fold fsub X, +0 ==> X Currently the fsub optimizations in InstSimplify don't know how to fold X - +0.0 to X when using the constrained intrinsics. This adds the support. This review is split out from D107285. Differential Revision: https://reviews.llvm.org/D118928	2022-02-14 11:56:45 -05:00
zhongyunde	b2f5164deb	[IVDescriptors] Support FOR where we have multiple sink pointed Handles the case where Previous doesn't come before LastPrev incorrectly. Fix https://github.com/llvm/llvm-project/issues/53483 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D118558	2022-02-14 09:30:35 +08:00
Arthur Eubanks	a9029a33ff	[OpaquePtr][ValueTracking] Check GEP source element type in isPointerOffset() Fixes a MemCpyOpt miscompile with opaque pointers. This function can be further cleaned up, but let's just fix the miscompile first. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D119652	2022-02-13 10:35:38 -08:00
Philip Reames	c02deae18c	[SCEVPredicate] Remove getExpr mechanism [NFC] This mechanism was used for a couple of purposes, but the primary one was keeping track of which predicates in a union might apply to an expression. As these sets are small and agressively deduped, this has little value.	2022-02-11 11:35:58 -08:00
Roman Lebedev	97484f46eb	[NFCI][SCEV] `SCEVTraversal`: if search terminated, don't push further ops of nary Even if the search is marked as terminated after only looking at the first operand, we'd still look at the remaining operands before actually ending the search. This seems pointless and wasteful, let's not do that.	2022-02-11 21:58:19 +03:00
Roman Lebedev	65715ac72a	[SCEV] Generalize umin_seq matching Since we don't greedily flatten `umin_seq(a, umin(b, c))` into `umin_seq(a, b, c)`, just looking at the operands of the outer-level `umin` is not sufficient, and we need to recurse into all same-typed `umin`'s.	2022-02-11 21:58:19 +03:00
Roman Lebedev	c234809ff8	[SCEV] Recognize `x == 0 ? 0 : umin_seq(..., x, ...) -> umin_seq(x, umin_seq(...))`	2022-02-11 21:58:19 +03:00
Roman Lebedev	281421693b	[SCEV] Recognize `x == 0 ? 0 : umin(..., x, ...) -> umin_seq(x, umin(...))` That is the canonical expansion for umin_seq, so we really should roundtrip it.	2022-02-11 21:58:19 +03:00
Roman Lebedev	4d0c0e6cc2	[SCEV] `createNodeForSelectOrPHIInstWithICmpInstCond()`: generalize eq handling The current logic was: https://alive2.llvm.org/ce/z/j8muXk but in reality the offset to the Y in the 'true' hand does not need to exist: https://alive2.llvm.org/ce/z/MNQ7DZ https://alive2.llvm.org/ce/z/S2pMQD To catch that, instead of computing the Y's in both hands and checking their equality, compute Y and C, and check that C is 0 or 1.	2022-02-11 21:58:19 +03:00
Roman Lebedev	a473c457f6	[NFC][SCEV] `createNodeForSelectOrPHIInstWithICmpInstCond()`: dedup eq/ne pred handling	2022-02-11 21:58:19 +03:00
Philip Reames	3e27fb8590	[PSE] Allow duplicate predicates in debug output This lets us avoid redundant implication work in the constructor of SCEVUnionPredicate which simplifies an upcoming change. If we're actually building a predicate via PSE, that goes through addPredicate which does include the implication check.	2022-02-11 10:39:01 -08:00
Nikita Popov	e9c0720010	[PHITransAddr] Check GEP source element type It's not the same GEP if the source element type is different.	2022-02-11 16:22:48 +01:00
Nikita Popov	5d63903465	[SCCP] Check that load/store and global type match SCCP requires that the load/store type and global type are the same (it does not support bitcasts of tracked globals). With typed pointers this was implicitly enforced.	2022-02-11 11:01:18 +01:00
Philip Reames	5ba115031d	[PSE] Remove assumption that top level predicate is union from public interface [NFC] Note that this doesn't actually cause the top level predicate to become a non-union just yet. The above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.	2022-02-10 16:14:52 -08:00
Philip Reames	01b56b8bdd	[SCEVPredicateRewriter] Remove assumption top level predicate is a union [NFC]	2022-02-10 15:51:15 -08:00
Roman Lebedev	c94ec7997a	[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: use sub instead of add For booleans, xor/add/sub are interchangeable: https://alive2.llvm.org/ce/z/ziav3d But for larger bitwidths, we'll need sub, so change it now.	2022-02-11 01:21:45 +03:00
Philip Reames	e43b1ce4d5	[SCEV] Constify some uses of SCEVUnionPredicate* [NFC] This exploits the immutability introduced in `d334fec`.	2022-02-10 12:42:19 -08:00
Nikita Popov	87a0b1bd23	[InstSimplify] Remove zero-index opaque pointer GEP With opaque pointers, a zero-index GEP is a no-op. It does not need to be retained for the pointer element type change it may perform.	2022-02-10 16:01:56 +01:00
Roman Lebedev	580d3a14b2	[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: refactor `i1 cond ? i1 x : i1 y` handling While that effectively concludes i1 select handling, that boolean restriction can be lifted later.	2022-02-10 17:42:56 +03:00
Roman Lebedev	9a322e430f	[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: refactor `i1 cond ? i1 C : i1 y` pattern https://alive2.llvm.org/ce/z/uRvVtN	2022-02-10 17:42:56 +03:00
Roman Lebedev	576a45f20d	[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: refactor `i1 cond ? i1 x : i1 C` pattern https://alive2.llvm.org/ce/z/2Q7Du_	2022-02-10 17:42:55 +03:00
Roman Lebedev	9766a0cca0	[SCEV] Recognize `cond ? i1 0 : i1 y` as `umin_seq ~cond, x` By definition, `umin_seq` has the exact same poison stopping properties the original `select` had: https://alive2.llvm.org/ce/z/N6XwV-	2022-02-10 17:42:55 +03:00
Roman Lebedev	418604fd90	[SCEV] Recognize `cond ? i1 x : i1 1` as `~umin_seq cond, ~x` By definition, `umin_seq` has the exact same poison stopping properties the original `select` had: https://alive2.llvm.org/ce/z/aqe9GK	2022-02-10 17:42:55 +03:00
Roman Lebedev	49d9acc242	[SCEV] Recognize logical `or` as `not umin_seq (not, not)` By definition, `umin_seq` has the exact same poison stopping properties the original `select` had: https://alive2.llvm.org/ce/z/MUfbTL	2022-02-10 17:42:55 +03:00
Roman Lebedev	16bc24e7be	[SCEV] Recognize logical `and` as `umin_seq` By definition, `umin_seq` has the exact same poison stopping properties the original `select` had: https://alive2.llvm.org/ce/z/59KuZZ	2022-02-10 17:42:55 +03:00
Roman Lebedev	1c69444863	[SCEV] `createNodeForSelectOrPHI()`: try constant-folding even if not an Instruction We'd catch the tautological select pattern later anyways due to constant folding, so that leaves PHI-like select, but it does not appear to fire there.	2022-02-10 17:42:55 +03:00
Roman Lebedev	97930f85af	[NFC][SCEV] Prepare `createNodeForSelectOrPHI()` for gaining additional strategy Currently `createNodeForSelectOrPHI()` takes an Instruction, and only works on the Cond that is an ICmpInst, but that can be relaxed somewhat. For now, simply rename the existing function, and add a thin wrapper ontop that still does the same thing as it used to.	2022-02-10 17:42:55 +03:00
Roman Lebedev	73990ff8a7	[SCEV] Recognize binary `xor` as bit-wise `add` https://alive2.llvm.org/ce/z/ULuZxB We could transparently handle wider bitwidths, by effectively casting iN to <N x i1> and performing the `add` bit/element -wise, the expression will be rather large, so let's not do that for now.	2022-02-10 17:42:55 +03:00
Roman Lebedev	503541fa93	[SCEV] Recognize binary `and` as bit-wise `umin` https://alive2.llvm.org/ce/z/aKAr94 We could transparently handle wider bitwidths, by effectively casting iN to <N x i1> and performing the `umin` bit/element -wise, the expression will be rather large, so let's not do that for now.	2022-02-10 17:42:54 +03:00
Roman Lebedev	e7e0834f07	[SCEV] Recognize binary `or` as bit-wise `umax` https://alive2.llvm.org/ce/z/SMEaoc We could transparently handle wider bitwidths, by effectively casting iN to <N x i1> and performing the `umax` bit/element -wise, the expression will be rather large, so let's not do that for now.	2022-02-10 17:42:54 +03:00
David Sherwood	1badfbb4fc	Fix incorrect TypeSize->uint64_t cast in InductionDescriptor::isInductionPHI The code was relying upon the implicit conversion of TypeSize to uint64_t and assuming the type in question was always fixed. However, I discovered an issue when running the canon-freeze pass with some IR loops that contains scalable vector types. I've changed the code to bail out if the size is unknown at compile time, since we cannot compute whether the step is a multiple of the type size or not. I added a test here: Transforms/CanonicalizeFreezeInLoops/phis.ll Differential Revision: https://reviews.llvm.org/D118696	2022-02-10 09:39:12 +00:00
Philip Reames	d334fec140	[SCEV] Make SCEVUnionPredicate externally immutable [NFC] This is the last major stepping stone before being able to allocate the node via the folding set allocator. That will in turn allow more general SCEV predicate expression trees.	2022-02-09 13:47:28 -08:00
Philip Reames	e6d9bab558	[SCEV] Remove a direct call to SCEVUnionPredicate::add [NFC]	2022-02-09 13:04:12 -08:00
Philip Reames	d39f4ac494	[SCEV] Unwind SCEVUnionPredicate from getPredicatedBackedgeTakenCount [NFC] For those curious, the whole reason for tracking the predicate set seperately as opposed to just immediately registering the dependencies appears to be allowing the printing code to print a result without changing the PSE state. It's slightly questionable if this justifies the complexity, but since we can preserve it with local ugliness, I did so.	2022-02-09 12:55:40 -08:00
Philip Reames	aa845d7a24	[SCEV] Remove conversion to SCEVUnionPredicate in ExitNotTakenInfo [NFC] This removes one of the places where we mutate an existing union predicate.	2022-02-09 12:10:23 -08:00
Philip Reames	83f895d952	[SCEV] Add interface for constructing generic SCEVComparePredicate [NFC}	2022-02-09 10:29:04 -08:00
Arthur Eubanks	ff31020ee6	[OpaquePtr][LoopAccessAnalysis] Support opaque pointers Previously we relied on the pointee type to determine what type we need to do runtime pointer access checks. With opaque pointers, we can access a pointer with more than one type, so now we keep track of all the types we're accessing a pointer's memory with. Also some other minor getPointerElementType() removals. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D119047	2022-02-09 09:11:27 -08:00
Philip Reames	c302f1e677	[SCEV] Generalize SCEVEqualsPredicate to any compare [NFC] PredicatedScalarEvolution has a predicate type for representing A == B. This change generalizes it into something which can represent a A <pred> B. This generality is currently unused, but is motivated by a couple of recent cases which have come up. In particular, I'm currently playing around with using this to simplify the runtime checking code in LoopVectorizer. Regardless of the outcome of that prototyping, generalizing the compare node seemed useful.	2022-02-08 08:18:09 -08:00
Roman Lebedev	ae9414d562	[ValueTracking] Only check for non-undef/poison if already known to be a self-multiply https://godbolt.org/z/js9fTTG9h ^ we don't care what `isGuaranteedNotToBeUndefOrPoison()` says unless we already knew that the operands were equal.	2022-02-08 18:35:29 +03:00
Simon Pilgrim	1468202748	[ValueTracking] Add support for X*X self-multiplication D108992 added KnownBits handling for 'Quadratic Reciprocity' self-multiplication patterns (bit[1] == 0), which can be used for non-undef values (poison is OK). This patch adds noundef selfmultiply handling to value tracking so demanded bits patterns can make use of it. Differential Revision: https://reviews.llvm.org/D117995	2022-02-08 13:33:27 +00:00
Simon Pilgrim	e2537f6b19	[ValueTracking] Replace dyn_cast with dyn_cast_or_null to account for getTerminator returning null Noticed while running checks on D117995 - a hexagon regression test was managing to return a block without a terminator	2022-02-08 13:33:26 +00:00
Johannes Doerfert	29c8ebad10	[MemoryBuiltins][FIX] Adjust index type size properly wrt. AS casts Use existing functionality to strip constant offsets that works well with AS casts and avoids the code duplication. Since we strip AS casts during the computation of the offset we also need to adjust the APInt properly to avoid mismatches in the bit width. This code ensures the caller of `compute` sees APInts that match the index type size of the value passed to `compute`, not the value result of the strip pointer cast. Fixes #53559. Differential Revision: https://reviews.llvm.org/D118727	2022-02-07 20:19:19 -06:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Bill Wendling	c6f0940d99	[NFC] Remove unnecessary #includes An attempt to reduce the number of files that are recompiled due to a change. Differential Revision: https://reviews.llvm.org/D119055	2022-02-04 21:22:41 -08:00
serge-sans-paille	ffe8720aa0	Reduce dependencies on llvm/BinaryFormat/Dwarf.h This header is very large (3M Lines once expended) and was included in location where dwarf-specific information were not needed. More specifically, this commit suppresses the dependencies on llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used, this has a decent impact on number of preprocessed lines generated during compilation of LLVM, as showcased below. This is achieved by moving some definitions back to the .cpp file, no performance impact implied[0]. As a consequence of that patch, downstream user may need to manually some extra files: llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h In some situations, codes maybe relying on the fact that llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden dependency now needs to be explicit. $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l after: 10978519 before: 11245451 Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup [0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions Differential Revision: https://reviews.llvm.org/D118781	2022-02-04 11:44:03 +01:00
Augie Fackler	b2d091aa5d	[NFC] MemoryBuiltins: tease out a getFreeFunctionDataForFunction helper	2022-02-03 08:36:36 -08:00
Augie Fackler	bad0301cc5	MemoryBuiltins: simplify isLibFreeFunction [NFC] This is in anticipation of my next patch, where I need to store more information about free functions than just their argument count. It felt invasive enough on this function that it seemed worthwhile to just extract this as its own commit that makes no functional changes. Differential Revision: https://reviews.llvm.org/D117350	2022-02-03 08:30:02 -08:00
Serge Pavlov	d2f132f0b7	[ConstantFolding] Fold constrained compare intrinsics The change implements constant folding of ‘llvm.experimental.constrained.fcmp’ and ‘llvm.experimental.constrained.fcmps’ intrinsics. Differential Revision: https://reviews.llvm.org/D110322	2022-02-03 16:45:56 +07:00
Andrew Litteken	30420bc344	[IRSim] Make sure that commutative intrinsics are treated as function calls without commutativity Created to fix: https://github.com/llvm/llvm-project/issues/53537 Some intrinsics functions are considered commutative since they are performing operations like addition or multiplication. Some of these have extra parameters to provide extra information that are not part of the operation itself and are not commutative. This makes sure that if an instruction that is an intrinsic takes the non commutative path to handle this case. Reviewer: paquette Closes Issue #53537 Differential Revision: https://reviews.llvm.org/D118807	2022-02-02 13:24:56 -06:00
Malhar Jajoo	778b455dd6	[LAA] Add Memory dependence remarks. Adds new optimization remarks when vectorization fails. More specifically, new remarks are added for following 4 cases: - Backward dependency - Backward dependency that prevents Store-to-load forwarding - Forward dependency that prevents Store-to-load forwarding - Unknown dependency It is important to note that only one of the sources of failures (to vectorize) is reported by the remarks. This source of failure may not be first in program order. A regression test has been added to test the following cases: a) Loop can be vectorized: No optimization remark is emitted b) Loop can not be vectorized: In this case an optimization remark will be emitted for one source of failure. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D108371	2022-02-02 12:07:51 +00:00
William S. Moses	8cb9c73609	[LoopIdiom] Keep TBAA when creating memcpy/memmove When upgrading a loop of load/store to a memcpy, the existing pass does not keep existing aliasing information. This patch allows existing aliasing information to be kept. Reviewed By: jeroen.dobbelaere Differential Revision: https://reviews.llvm.org/D108221	2022-01-31 16:28:13 -05:00
Eli Friedman	b2837bf2f2	[ScalarEvolution] Add bailout to avoid zext of pointer. The RHS of an isImpliedCond call can be a pointer even if the LHS is not. This is similar to `bfa2a81e`. Not going to include a testcase; an IR testcase would be extremely complicated and fragile. Fixes https://github.com/llvm/llvm-project/issues/51936 . Differential Revision: https://reviews.llvm.org/D114555	2022-01-31 11:41:39 -08:00
Kazu Hirata	cda7b6aaf3	[Analysis] Drop an unnecessary const from a return type (NFC) Identified with readability-const-return-type.	2022-01-30 16:04:58 -08:00
Kazu Hirata	49fdee13c1	[Analysis] Use != to compare strings (NFC) Identified with readability-string-compare.	2022-01-30 12:32:57 -08:00
Nuno Lopes	0dc20e321c	[InstSimplify] fold 'xor X, poison' and 'div/rem X, poison' to poison	2022-01-30 10:46:54 +00:00
William S. Moses	99d2582164	[ScalarEvolution] Handle <= and >= in non infinite loops Extend scalar evolution to handle >= and <= if a loop is known to be finite and the induction variable guards the condition. Specifically, with these assumptions lhs <= rhs is equivalent to lhs < rhs + 1 and lhs >= rhs to lhs > rhs -1. In the case of lhs <= rhs, this is true since the only case these are not equivalent is when rhs == unsigned/signed intmax, which would have resulted in an infinite loop. In the case of lhs >= rhs, this is true since the only case these are not equivalent is when rhs == unsigned/signed intmin, which would again have resulted in an infinite loop. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D118090	2022-01-28 17:41:08 -05:00
Andrew Litteken	3785c1d055	[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match. This also adds an additional command line flag debug option to disable outlining intrinsics. Recommit of: `8de76bd569` Adds extra checking of intrinsic function calls names to avoid taking the address of intrinsic calls when extracting function calls. Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D109450	2022-01-28 13:52:21 -06:00
William S. Moses	0d04c77856	[ScalarEvolution] Mark a loop as finite if in a willreturn function A limited version of (https://reviews.llvm.org/D118090) that only marks a loop as finite if in a willreturn function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118429	2022-01-28 14:17:05 -05:00
Nikita Popov	8a4293f3ef	[Loads] Require Align in isDereferenceableAndAlignedPointer() (NFC) Now that loads always have an alignment, we should not perform an ABI alignment fallback here.	2022-01-28 16:23:32 +01:00
Evgeniy Brevnov	d7424939a6	[BasicAA] Add support for memmove intrinsic Currently, basic AA has special support for llvm.memcpy.* intrinsics. This change extends this support for any memory trancsfer opration and in particular llvm.memmove.* intrinsic. Reviewed By: reames, nikic Differential Revision: https://reviews.llvm.org/D117095	2022-01-28 18:19:36 +07:00
Florian Hahn	1ca02bddb4	[ConstraintSystem] Mark function as const (NFC).	2022-01-27 13:44:47 +00:00
Congzhe Cao	f3e1f44340	[IVDescriptor] Get the exact FP instruction that does not allow reordering This is a bugfix in IVDescriptor.cpp. The helper function `RecurrenceDescriptor::getExactFPMathInst()` is supposed to return the 1st FP instruction that does not allow reordering. However, when constructing the RecurrenceDescriptor, we trace the use-def chain staring from a PHI node and for each instruction in the use-def chain, its descriptor overrides the previous one. Therefore in the final RecurrenceDescriptor we constructed, we lose previous FP instructions that does not allow reordering. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D118073	2022-01-27 00:33:46 -05:00
Nikita Popov	44cfc3a816	[LICM] Generalize unwinding check during scalar promotion This extract a common isNotVisibleOnUnwind() helper into AliasAnalysis, which handles allocas, byval arguments and noalias calls. After D116998 this could also handle sret arguments. We have similar logic in DSE and MemCpyOpt, which will be switched to use this helper as well. The noalias call case is a bit different from the others, because it also requires that the object is not captured. The caller is responsible for doing the appropriate check. Differential Revision: https://reviews.llvm.org/D117000	2022-01-26 11:15:03 +01:00
Andrew Litteken	ba79295c48	[NFC][IROutliner] fix namespace and unused variable	2022-01-25 18:41:30 -06:00
Andrew Litteken	e8f4e41b6b	[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region. We use the same similarity scheme we used for branch instructions for phi nodes, and allow them to be outlined. There is not a lot of special handling needed for these phi nodes when outlining, as they simply act as outputs. The code extractor does not currently allow for non entry blocks within the extracted region to have predecessors, so there are not conflicts to handle with respect to predecessors no longer contained in the function. Recommit of `515eec3553` Reviewers: paquette Differential Revision: https://reviews.llvm.org/D106997	2022-01-25 18:25:50 -06:00
Andrew Litteken	e50b217b4e	Revert "[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region." This reverts commit `515eec3553`. By mistake, commit message was not complete.	2022-01-25 18:24:19 -06:00
Andrew Litteken	515eec3553	[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region.	2022-01-25 18:20:10 -06:00
Andrew Litteken	9c2daf648c	Revert "[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions" This reverts commit `8de76bd569`. Reverting due to failure of different-intrinsics.ll on lld-x86_64-win buildbot.	2022-01-25 18:19:33 -06:00
Andrew Litteken	8de76bd569	[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match. This also adds an additional command line flag debug option to disable outlining intrinsics. Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D109450	2022-01-25 17:06:09 -06:00
Andrew Litteken	f5f377d1fc	[IRSim][IROutliner] Adding support for recognizing and outlining indirect function calls, and function calls with different names, but the same type The outliner currently requires that function calls not be indirect calls, and have that the function name, and function type must match, as well as other attributes such as calling conventions. This patch treats called functions as values, and just another operand, and named function calls as constants. This allows functions to be treated like any other constant, or input and output into the outlined functions. There are also debugging flags added to enforce the old behaviors where indirect calls not be allowed, and to enforce the old rule that function calls names must also match. Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D109448	2022-01-25 15:19:28 -06:00
Nikita Popov	3e2ae92d3f	[SCEV] Remove an unnecessary GEP type check The code already checked that the addrec step size and type alloc size are the same. The actual pointer element type is irrelevant here.	2022-01-25 12:56:46 +01:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Max Kazantsev	c913dccfde	[SCEV] Use lshr in implications This patch adds support for implication inference logic for the following pattern: ``` lhs < (y >> z) <= y, y <= rhs --> lhs < rhs ``` We should be able to use the fact that value shifted to right is not greater than the original value (provided it is non-negative). Differential Revision: https://reviews.llvm.org/D116150 Reviewed-By: apilipenko	2022-01-25 13:25:19 +07:00
Ahmed Bougacha	e7298464c5	[ObjCARC] Use "UnsafeClaimRV" to refer to unsafeClaim in enums. NFC. This matches the actual runtime function more closely. I considered also renaming both RetainRV/UnsafeClaimRV to end with "ARV", for AutoreleasedReturnValue, but there's less potential for confusion there.	2022-01-24 19:37:01 -08:00
Evgeniy Brevnov	0e55d4fab0	[AA] Refine ModRefInfo for llvm.memcpy.* in presence of operand bundles Presence of operand bundles changes semantics in respect to ModRef. In particular, spec says: "From the compilers perspective, deoptimization operand bundles make the call sites theyre attached to at least readonly. They read through all of their pointer typed operands (even if theyre not otherwise escaped) and the entire visible heap. Deoptimization operand bundles do not capture their operands except during deoptimization, in which case control will not be returned to the compiled frame". Fix handling of llvm.memcpy.* according to the spec. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D118033	2022-01-25 10:15:23 +07:00
Mircea Trofin	b1af01fe6a	[NFC][MLGO] Simplify conditional compilation Most of the code that's shared between 'release' and 'development' modes doesn't depend on anything special.	2022-01-24 11:19:04 -08:00
Kazu Hirata	b752eb887f	[Analysis] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-01-23 20:32:56 -08:00
Kazu Hirata	448d0dfab7	[Analysis] Remove a redundant const from a return type (NFC) Identified with readability-const-return-type.	2022-01-23 14:00:03 -08:00
Sanjay Patel	2e26633af0	[IR] document and update ctlz/cttz intrinsics to optionally return poison rather than undef The behavior in Analysis (knownbits) implements poison semantics already, and we expect the transforms (for example, in instcombine) derived from those semantics, so this patch changes the LangRef and remaining code to be consistent. This is one more step in removing "undef" from LLVM. Without this, I think https://github.com/llvm/llvm-project/issues/53330 has a legitimate complaint because that report wants to allow subsequent code to mask off bits, and that is allowed with undef values. The clang builtins are not actually documented anywhere AFAICT, but we might want to add that to remove more uncertainty. Differential Revision: https://reviews.llvm.org/D117912	2022-01-23 11:22:48 -05:00
Nikita Popov	b4900296e4	[ConstantFold] Allow all float types in reinterpret load folding Rather than hardcoding just half, float and double, allow all floating point types.	2022-01-21 09:26:51 +01:00
Nikita Popov	6a19cb837c	[ConstantFold] Support pointers in reinterpret load folding Peculiarly, the necessary code to handle pointers (including the check for non-integral address spaces) is already in place, because we were already allowing vectors of pointers here, just not plain pointers.	2022-01-21 09:13:37 +01:00
Nikita Popov	05cd9a0596	[ConstantFold] Simplify type check in reinterpret load folding (NFC) Keep a list of allowed types, but then always construct the map type the same way. We need an integer with the same width as the original type.	2022-01-21 09:06:35 +01:00
Mircea Trofin	f29256a64a	[MLGO] Improved support for AOT cross-targeting scenarios The tensorflow AOT compiler can cross-target, but it can't run on (for example) arm64. We added earlier support where the AOT-ed header and object would be built on a separate builder and then passed at build time to a build host where the AOT compiler can't run, but clang can be otherwise built. To simplify such scenarios given we now support more than one AOT-able case (regalloc and inliner), we make the AOT scenario centered on whether files are generated, case by case (this includes the "passed from a different builder" scenario). This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of case by case control. A builder can opt out of an AOT case by passing that case's model path as `none`. Note that the overrides still take precedence. This patch controls conditional compilation with case-specific flags, which can be enabled locally, for the component where those are available. We still keep an overall flag for some tests. The 'development/training' mode is unchanged, because there the model is passed from the command line and interpreted. Differential Revision: https://reviews.llvm.org/D117752	2022-01-20 07:05:39 -08:00
Mircea Trofin	e67430cca4	[MLGO] ML Regalloc Eviction Advisor The bulk of the implementation is common between 'release' mode (==AOT-ed model) and 'development' mode (for training), the main difference is that in development mode, we may also log features (for training logs), inject scoring information (currently after the Virtual Register Rewriter) and then produce the log file. This patch also introduces the score injection pass, 'Register Allocation Pass Scoring', which is trivially just logging the score in development mode. Differential Revision: https://reviews.llvm.org/D117147	2022-01-19 11:00:32 -08:00
Nikita Popov	d8bff13a8a	[NFC] Add missing <map> includes These were relying on a transitive include.	2022-01-19 12:29:03 +01:00
Philip Reames	215bd46905	[MemoryBuiltins] Demote isMallocLikeFn to implementation routine since last use has been removed Try 2, this time including the test.	2022-01-18 15:24:52 -08:00
Philip Reames	fcab2d1309	Revert "[MemoryBuiltins] Demote isMallocLikeFn to implementation routine since last use has been removed" This reverts commit `167af7bbfe`. Buildbot breaks since I forgot to remove a unit test.	2022-01-18 15:16:12 -08:00
Philip Reames	167af7bbfe	[MemoryBuiltins] Demote isMallocLikeFn to implementation routine since last use has been removed	2022-01-18 15:12:07 -08:00
Mircea Trofin	3e8553aab4	[mlgo][inline] Improve global state tracking The global state refers to the number of the nodes currently in the module, and the number of direct calls between nodes, across the module. Node counts are not a problem; edge counts are because we want strictly the kind of edges that affect inlining (direct calls), and that is not easily obtainable without iteration over the whole module. This patch avoids relying on analysis invalidation because it turned out to be too aggressive in some cases. It leverages the fact that Node objects are stable - they do not get deleted while cgscc passes are run over the module; and cgscc pass manager invariants. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D115847	2022-01-18 17:45:34 +00:00
Jan Svoboda	5f4ae56457	[llvm] Remove uses of `std::vector<bool>` LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement. This patch does just that for llvm. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D117121	2022-01-18 18:20:45 +01:00
Nikita Popov	3ec7f46e99	[LVI] Handle implication from icmp of trunc (PR51867) Similar to the existing urem code, if we have (trunc X) >= C, then also X >= C. Proof: https://alive2.llvm.org/ce/z/RF4YR2 Fixes https://github.com/llvm/llvm-project/issues/51867.	2022-01-18 11:24:11 +01:00
Nikita Popov	9e68557e64	[LVI] Handle commuted SPF min/max operands We need to check that the operands of the min/max are the operands of the select, but we don't care which order they are in.	2022-01-18 10:43:00 +01:00
Nikita Popov	d15823e300	[LVI] Compute SPF range even if one operands is overdefined If we have a constant range for one operand but not the other, we can generally still compute a useful results for SPF min/max.	2022-01-18 10:40:49 +01:00
Nikita Popov	202d590a01	[LVI] Consistently intersect assumes Integrate intersection with assumes into getBlockValue(), to ensure that it is consistently performed. We were doing it in nearly all places, but for example missed it for select inputs.	2022-01-18 10:15:31 +01:00
Nikita Popov	f104cc38f4	[ConstantFold] Don't fold load from non-byte-sized vector Following up on `1470f94d71 (r63981173)`: The result here (probably) depends on endianness. Don't bother trying to handle this exotic case, just bail out.	2022-01-17 17:01:47 +01:00
Nikita Popov	af12a3f4a9	[ValueTracking] Remove ComputeMultiple() function This function is no longer used since `499f1ca79f`.	2022-01-17 10:28:31 +01:00
Bryce Wilson	dd13744bfb	Revert "[BasicAliasAnalysis] Remove isMallocOrCallocLikeFn" This reverts commit `1f2cfc4fdc`.	2022-01-14 14:42:53 -08:00
Roman Lebedev	650fc40b6d	[NFC][SCEV] Introduce `getCastExpr()` QoL helper	2022-01-15 00:52:22 +03:00
Bryce Wilson	1f2cfc4fdc	[BasicAliasAnalysis] Remove isMallocOrCallocLikeFn Allocation functions should be marked with onlyAccessesInaccessibleMemory (when that is correct for the given function) which is checked elsewhere so this check is no longer needed. Differential Revision: https://reviews.llvm.org/D117180	2022-01-14 12:22:01 -08:00
Philip Reames	dac82b53e2	Revert "[MemoryBuiltins] [NFC] Add missing section comments" This reverts commit `83338d5032`. Comments in source are non-idiomatic and naming choice in head is unclear.	2022-01-14 08:34:21 -08:00
Roman Lebedev	b32077234b	[NFCI][SCEV] `computeExitLimitFromCondFromBinOp()`: rely on `getSequentialMinMaxExpr()` constant relaxation `getSequentialMinMaxExpr()` has been taught to perform this relaxation, so rely on that now. Not sure this can be tested.	2022-01-14 17:07:48 +03:00
Roman Lebedev	8dcba20674	[SCEV] `getSequentialMinMaxExpr()`: relax 2-op umin_seq w/ constant to umin Currently, `computeExitLimitFromCondFromBinOp()` does that directly.	2022-01-14 17:07:48 +03:00
Roman Lebedev	c86a982d7d	[SCEV] `getSequentialMinMaxExpr()`: rewrite deduplication to be fully recursive Since we don't merge/expand non-sequential umin exprs into umin_seq exprs, we may have umin_seq(umin(umin_seq())) chain, and the innermost umin_seq can have duplicate operands still.	2022-01-14 15:42:26 +03:00
Florian Hahn	1ef9bfa013	[InstSimplify] Pass pointer and indices separately to SimplifyGEPInst. This doesn't require callers to put the pointer operand and the indices in a container like a vector when calling the function. This is not really an issue with the existing callers. But when using it from IRBuilder the inputs are available as separate pointer value and indices ArrayRef. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D117038	2022-01-14 09:59:52 +00:00
Nikita Popov	20d9c51dc0	[ConstantFold] Check for uniform value before reinterpret load The reinterpret load code will convert undef values into zero. Check the uniform value case before it to produce a better result for all-undef initializers. However, the uniform value handling will return the uniform value even if the access is out of bounds, while the reinterpret load code will return undef. Add an explicit check to retain the previous result in this case.	2022-01-14 10:18:02 +01:00
Bryce Wilson	83338d5032	[MemoryBuiltins] [NFC] Add missing section comments	2022-01-13 17:43:43 -08:00
Philip Reames	ee02cf0797	[MemoryBuiltins] Demote isCallocLikeFn and isAlignedAllocLikeFn to local helpers after removal of last external use [NFC]	2022-01-13 15:51:17 -08:00
Philip Reames	cf66f01ec1	[Attributor] Share code for abstract interpretation of allocation sizes with getObjectSize [NFC-ish] The basic idea is that we can parameterize the getObjectSize implementation with a callback which lets us replace the operand before analysis if desired. This is what Attributor is doing during it's abstract interpretation, and allows us to have one copy of the code. Note this is not NFC for two reasons: * The existing attributor code is wrong. (Well, this is under-specified to be honest, but at least inconsistent.) The intermediate math needs to be done in the index type of the pointer space. Imagine e.g. i64 arguments in a 32 bit address space. * I did not preserve the behavior in getAPInt where we return 0 for a partially analyzed value. This looks simply wrong in the original code, and nothing test wise contradicts that. Differential Revision: https://reviews.llvm.org/D117241	2022-01-13 15:33:24 -08:00
Bryce Wilson	68874d8b5f	[MemoryBuiltins] [NFC] Remove unused overload of isAlignedAllocLikeFn Differential Revision: https://reviews.llvm.org/D117245	2022-01-13 15:19:04 -08:00
Arthur Eubanks	757e044dce	[Inliner] Don't removeDeadConstantUsers() when checking if a function is dead If a function has many uses, this can take a good chunk of compile times. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D117236	2022-01-13 14:29:45 -08:00
Philip Reames	cd36b29ec7	[MemoryBuiltins] (Slightly) clean up abuse of MallocLike bitmask [NFC]	2022-01-13 12:39:22 -08:00
Nikita Popov	aba7c3c033	[ConstantFold] Check uniform value in ConstantFoldLoadFromConst() This case is automatically handled if ConstantFoldLoadFromConstPtr() is used. Make sure that ConstantFoldLoadFromConst() also handles it.	2022-01-13 14:40:19 +01:00
Hans Wennborg	2bc57d85eb	Don't override __attribute__((no_stack_protector)) by inlining (PR52886) Since `26c6a3e736`, LLVM's inliner will "upgrade" the caller's stack protector attribute based on the callee. This lead to surprising results with Clang's no_stack_protector attribute added in `4fbf84c173` (D46300). Consider the following code compiled with clang -fstack-protector-strong -Os (https://godbolt.org/z/7s3rW7a1q). extern void h(int* p); inline __attribute__((always_inline)) int g() { return 0; } int __attribute__((__no_stack_protector__)) f() { int a[1]; h(a); return g(); } LLVM will inline g() into f(), and f() would get a stack protector, against the users explicit wishes, potentially breaking the program e.g. if h() changes the value of the stack cookie. That's a miscompile. More recently, `bc044a88ee` (D91816) addressed this problem by preventing inlining when the stack protector is disabled in the caller and enabled in the callee or vice versa. However, the problem remained if the callee is marked always_inline as in the example above. This affected users, see e.g. http://crbug.com/1274129 and http://llvm.org/pr52886. One way to fix this would be to prevent inlining also in the always_inline case. Despite the name, always_inline does not guarantee inlining, so this would be legal but potentially surprising to users. However, I think the better fix is to not enable the stack protector in a caller based on the callee. The motivation for the old behaviour is unclear, it seems counter-intuitive, and causes real problems as we've seen. This commit implements that fix, which means in the example above, g() gets inlined into f() (also without always_inline), and f() is emitted without stack protector. I think that matches most developers' expectations, and that's also what GCC does. Another effect of this change is that a no_stack_protector function can now be inlined into a stack protected function, e.g. (https://godbolt.org/z/hafP6W856): extern void h(int* p); inline int __attribute__((__no_stack_protector__)) __attribute__((always_inline)) g() { return 0; } int f() { int a[1]; h(a); return g(); } I think that's fine. Such code would be unusual since no_stack_protector is normally applied to a program entry point which sets up the stack canary. And even if such code exists, inlining doesn't change the semantics: there is still no stack cookie setup/check around entry/exit of the g() code region, but there may be in the surrounding context, as there was before inlining. This also matches GCC. See also the discussion at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722 Differential revision: https://reviews.llvm.org/D116589	2022-01-13 12:04:49 +01:00
Sanjay Patel	6bd127b079	[InstSimplify] use knownbits to fold more udiv/urem We could use knownbits on both operands for even more folds (and there are already tests in place for that), but this is enough to recover the example from: https://github.com/llvm/llvm-project/issues/51934 (the tests are derived from the code in that example) I am assuming no noticeable compile-time impact from this because udiv/urem are rare opcodes. Differential Revision: https://reviews.llvm.org/D116616	2022-01-12 14:59:43 -05:00
Rosie Sumpter	552eb372cb	[LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter This is required to query the legality more precisely in the LoopVectorizer. This adds another TTI function named 'forceScalarizeMaskedGather/Scatter' function to work around the hack introduced for MVE, where isLegalMaskedGather/Scatter would return an answer by second-guessing where the function was called from, based on the Type passed in (vector vs scalar). The new interface makes this explicit. It is also used by X86 to check for vector widths where gather/scatters aren't profitable (or don't exist) for certain subtargets. Differential Revision: https://reviews.llvm.org/D115329	2022-01-12 13:34:12 +00:00
Mircea Trofin	248d55af3e	[NFC][MLGO] Use LazyCallGraph::Node to track functions. This avoids the InlineAdvisor carrying the responsibility of deleting Function objects. We use LazyCallGraph::Node objects instead, which are stable in memory for the duration of the Module-wide performance of CGSCC passes started under the same ModuleToPostOrderCGSCCPassAdaptor (which is the case here) Differential Revision: https://reviews.llvm.org/D116964	2022-01-11 19:23:47 -08:00
Mircea Trofin	1f5dceb1d0	[MLGO] Add support for multiple training traces per module This happens in e.g. regalloc, where we trace decisions per function, but wouldn't want to spew N log files (i.e. one per function). So we output a key-value association, where the key is an ID for the sub-module object, and the value is the tensorflow::SequenceExample. The current relation with protobuf is tenuous, so we're avoiding a custom message type in favor of using the `Struct` message, but that requires the values be wire-able strings, hence base64 encoding. We plan on resolving the protobuf situation shortly, and improve the encoding of such logs, but this is sufficient for now for setting up regalloc training. Differential Revision: https://reviews.llvm.org/D116985	2022-01-11 16:13:31 -08:00
Mircea Trofin	a81b0c978f	[NFC][MLGO] Remove the word "inliner" in a generic error message.	2022-01-11 12:39:16 -08:00
Arthur Eubanks	bf52210e25	[NFC][LazyCallGraph] Remove check in removeDeadFunction() if graph is empty If we're in removeDeadFunction(), we should have already constructed the call graph. Differential Revision: https://reviews.llvm.org/D115676	2022-01-11 10:17:13 -08:00
Florian Hahn	f0ef1ea6dd	[IRBuilder] Introduce folder using inst-simplify, use for Or fold. Alternative to D116817. This introduces a new value-based folding interface for Or (FoldOr), which takes 2 values and returns an existing Value or a constant if the Or can be simplified. Otherwise nullptr is returned. This replaces the more restrictive CreateOr which takes 2 constants. This is the used to implement a folder that uses InstructionSimplify. The logic to simplify `Or` instructions is moved there. Subsequent patches are going to transition other CreateXXX to the more general FoldXXX interface. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D116935	2022-01-11 17:30:48 +00:00
Philip Reames	8f553da492	[instsimplify] Add a comment and test for a highly confusing case	2022-01-11 09:24:10 -08:00
Philip Reames	e838949bee	[GlobalsModRef] Apply indirect-global rule to all globals initialized from noalias calls Extend the existing malloc-family specific optimization to all noalias calls. This allows us to handle allocation wrappers, and removes a dependency on a lib-func check in favor of generic attribute usage. Differential Revision: https://reviews.llvm.org/D116980	2022-01-11 08:44:31 -08:00
Florian Hahn	8a469e2050	[InstSimplify] Fold inbounds GEP to poison if base is undef. D92270 updated constant expression folding to fold inbounds GEP to poison if the base is undef. Apply the same logic to SimplifyGEPInst. The justification is that we can choose an out-of-bounds pointer as base pointer. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D117015	2022-01-11 16:11:22 +00:00
Roman Lebedev	5ceb070bbb	[SCEV] `getSequentialMinMaxExpr()`: look into `umin` when deduplicating operands We could just merge all umin into umin_seq, but that is likely a pessimization, so don't do that, but pretend that we did for the purpose of deduplication.	2022-01-11 18:51:57 +03:00
Roman Lebedev	5e16650792	[SCEV] `getSequentialMinMaxExpr()`: keep only the first instance of an operand Having the same operand more than once doesn't change the outcome here, neither reduction-wise nor poison-wise. We must keep the first instance specifically though.	2022-01-11 16:51:53 +03:00
Roman Lebedev	76a0abbc13	[SCEV] Reenable umin_seq support and fix the `computeSCEVAtScope()` This reverts commit `f62f47f5e1`.	2022-01-11 16:03:35 +03:00
Nikita Popov	3946095b88	[MemoryBuiltins] Remove unused isOpNewLikeFn() (NFC) This function is no longer used since `2cafbcb560`.	2022-01-11 12:27:23 +01:00
Nikita Popov	b56f6f1913	[MemoryBuiltins] Remove unused isStrdupLikeFn() function (NFC) This function is no longer used after `dcbc91f40c`.	2022-01-11 12:26:20 +01:00
Philip Reames	f62f47f5e1	Partial revert of `82fb4f4` Two crashes have been reported. This change disables the new logic while leaving the new node in tree. Hopefully, that's enough to allow investigation without breakage while avoiding massive churn.	2022-01-10 18:18:34 -08:00
Philip Reames	5265ac72c6	[MemoryBuiltin] Add an API for checking if an unused allocation can be removed [NFC] Not all allocation functions are removable if unused. An example of a non-removable allocation would be a direct call to the replaceable global allocation function in C++. An example of a removable one - at least according to historical practice - would be malloc.	2022-01-10 15:43:39 -08:00
Roman Lebedev	82fb4f4b22	[SCEV] Sequential/in-order `UMin` expression As discussed in https://github.com/llvm/llvm-project/issues/53020 / https://reviews.llvm.org/D116692, SCEV is forbidden from reasoning about 'backedge taken count' if the branch condition is a poison-safe logical operation, which is conservatively correct, but is severely limiting. Instead, we should have a way to express those poison blocking properties in SCEV expressions. The proposed semantics is: ``` Sequential/in-order min/max SCEV expressions are non-commutative variants of commutative min/max SCEV expressions. If none of their operands are poison, then they are functionally equivalent, otherwise, if the operand that represents the saturation point* of given expression, comes before the first poison operand, then the whole expression is not poison, but is said saturation point. ``` * saturation point - the maximal/minimal possible integer value for the given type The lowering is straight-forward: ``` compare each operand to the saturation point, perform sequential in-order logical-or (poison-safe!) ordered reduction over those checks, and if reduction returned true then return saturation point else return the naive min/max reduction over the operands ``` https://alive2.llvm.org/ce/z/Q7jxvH (2 ops) https://alive2.llvm.org/ce/z/QCRrhk (3 ops) Note that we don't need to check the last operand: https://alive2.llvm.org/ce/z/abvHQS Note that this is not commutative: https://alive2.llvm.org/ce/z/FK9e97 That allows us to handle the patterns in question. Reviewed By: nikic, reames Differential Revision: https://reviews.llvm.org/D116766	2022-01-10 20:51:26 +03:00
Philip Reames	1d127315e7	Minor style tweaks following `fb93659`	2022-01-10 09:32:29 -08:00
Bryce Wilson	7febd60a90	[instcombine] Add align return attributes for operator new(..., align_val) (Split from original patch to separate non-NFC part and add coverage. I typoed when adding the new test, so this change includes the typo fix to let libfunc recongize the signature. Didn't figure it was worth another separate commit.) Differential Revision: https://reviews.llvm.org/D116851 (part 2 of 2)	2022-01-10 09:15:20 -08:00
Bryce Wilson	fb936595fa	[MemoryBuiltins] Add field for alignment argument [NFC] There are a few places where the alignment argument for AlignedAllocLike functions was previously hardcoded. This patch adds an getAllocAlignment function and a change to the MemoryBuiltin table to allow alignment arguments to be found generically. This will shortly allow alignment inference on operator new's with align_val params and an extension to Attributor's HeapToStack. The former will follow shortly - I split Bryce's patch for purpose of having the large change be NFC. The later will be reviewed separately. Differential Revision: https://reviews.llvm.org/D116851 (part 1 of 2)	2022-01-10 09:15:20 -08:00
Simon Pilgrim	fd1094f318	[ConstantFolding] Clean up Intrinsics::abs undef handling Match cttz/ctlz handling by assuming C1 == 0 if C1 != 1 - I've added an assertion as well. Fixes static analyzer nullptr dereference warnings.	2022-01-10 17:04:03 +00:00
Nikita Popov	92d55e7336	[MemoryBuiltins] Remove isNoAliasFn() in favor of isNoAliasCall() We currently have two similar implementations of this concept: isNoAliasCall() only checks for the noalias return attribute. isNoAliasFn() also checks for allocation functions. We should switch to only checking the attribute. SLC is responsible for inferring the noalias return attribute for non-new allocation functions (with a missing case fixed in `348bc76e35`). For new, clang is responsible for setting the attribute, if -fno-assume-sane-operator-new is not passed. Differential Revision: https://reviews.llvm.org/D116800	2022-01-10 09:18:15 +01:00
Simon Pilgrim	be7dbd674c	[DivergenceAnalysis] Simplify inRegion test based on whether the RegionLoop pointer is null or not More closely matches the documentation Requested by @nikic	2022-01-08 14:30:10 +00:00
Simon Pilgrim	b3f193a980	[DivergenceAnalysis] Fix static analyzer warning about dereference of nullptr We're testing that the RegionLoop pointer is null in the first part of the check, so we need to check that its non-null before dereferencing it in a later part of the check.	2022-01-08 13:57:33 +00:00
Kazu Hirata	b932bdf59f	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-07 17:45:09 -08:00
Philip Reames	f38873537b	[MemoryBuiltin] Cleanup stale todo comments [NFC] strdup/strndup are already partially implemented, move remaining comment to relevant place. Remaining named routines are copy routines and mostly handled via intrinsics already - they do not allocate new memory.	2022-01-07 13:57:20 -08:00
Roman Lebedev	32300375f5	[NFCI] `ScalarEvolution::getRangeRef()`: collapse `SCEVMinMaxExpr` handling	2022-01-08 00:23:08 +03:00
Arthur Eubanks	d51e3474e0	[LazyCallGraph] Ignore empty RefSCCs rather than shift RefSCCs when removing dead functions This is in preparation for D115545 which attempts to delete discardable functions if they are unused. With that change, shifting RefSCCs becomes noticeable in compile time. This change makes the LCG update negligible again. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D116776	2022-01-07 09:42:23 -08:00
Philip Reames	6b0ff0969d	Extract utility function for checking initial value of allocation [NFC, try 2] This is a reoccuring pattern, we can consolidate three copies into one. The main motivation is to reduce usages of isMallocLike. The original commit (which was quickly reverted) didn't account for the allocation function could be an invoke, test coverage for that case added in this commit.	2022-01-07 08:44:08 -08:00
Roman Lebedev	a5a6960d1c	[NFCI][IR] MinMaxIntrinsic: add some more helper methods, and use them	2022-01-07 13:02:11 +03:00
Philip Reames	c6a0c1585a	Revert "Extract utility function for checking initial value of allocation [NFC]" This reverts commit `9ce30fe86f`. Appears to be causing a problem on a buildbot, revert while investigating. https://green.lab.llvm.org/green//job/clang-stage1-RA/26818/consoleFull#-1502953973d489585b-5106-414a-ac11-3ff90657619c	2022-01-06 19:05:51 -08:00
Philip Reames	9ce30fe86f	Extract utility function for checking initial value of allocation [NFC] This is a reoccuring pattern, we can consolidate three copies into one. The main motivation is to reduce usages of isMallocLike.	2022-01-06 18:02:14 -08:00
Philip Reames	5d1cfd4348	Remove unused LookThroughBitCast param in isXAllocLike functions [NFC] This parameter took the non-default value exactly twice, and neither had semantic effect.	2022-01-06 18:02:13 -08:00
Philip Reames	7052670e96	Move getMallocAllocatedType and getMallocArraySize to GlobalOpt [NFC] These are implementation details of the global-opt transform and not easily reuseable, so remove them from the analysis header.	2022-01-06 18:02:13 -08:00
Philip Reames	67a3331e4f	Inline extractMallocCall to sole use and delete [NFC]	2022-01-06 18:02:13 -08:00
Philip Reames	4b0fc924a9	Delete unused extractCallocCall routine [NFC]	2022-01-06 18:02:13 -08:00
Philip Reames	cffd268316	Demote getMallocType to implementation routine in MemoryBuiltins [NFC]	2022-01-06 18:02:13 -08:00
Daniil Suchkov	524abc68f2	Introduce NewPM .dot printers for DomTree This patch adds a couple of NewPM function passes (dot-dom and dot-dom-only) that dump DomTree into .dot files. Reviewed-By: aeubanks Differential Revision: https://reviews.llvm.org/D116629	2022-01-05 23:25:40 +00:00
Nico Weber	085f078307	Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."" This reverts commit `859ebca744`. The change contained many unrelated changes and e.g. restored unit test failes for the old lld port.	2022-01-05 13:10:25 -05:00
David Salinas	859ebca744	Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`." This reverts commit `640beb38e7`. That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup Change-Id: Ibf8e397df94001f248fba609f072088a46abae08 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D115960 Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105	2022-01-05 17:57:32 +00:00
Philip Reames	c16fd6a376	Rename doesNotReadMemory to onlyWritesMemory globally [NFC] The naming has come up as a source of confusion in several recent reviews. onlyWritesMemory is consist with onlyReadsMemory which we use for the corresponding readonly case as well.	2022-01-05 08:52:55 -08:00
Nikita Popov	3dc1907d06	[ConstantFold] Use ConstantFoldLoadFromUniformValue() in more places In particular, this also preserves undef when loading from padding, rather than converting it to zero through a different codepath. This is the remaining part of D115924.	2022-01-05 12:47:50 +01:00
Nikita Popov	99c6b12b92	[ConstantFolding] Unify handling of load from uniform value There are a number of places that specially handle loads from a uniform value where all the bits are the same (zero, one, undef, poison), because we a) don't care about the load offset in that case b) it bypasses casts that might not be legal generally but do work with uniform values. We had multiple implementations of this, with a different set of supported values each time. This replaces two usages with a more complete helper. Other usages will be replaced separately, because they have larger impact. This is part of D115924.	2022-01-05 12:30:46 +01:00
Mircea Trofin	a120fdd337	[NFC][MLGO]Add RTTI support for MLModelRunner and simplify runner setup	2022-01-04 19:46:14 -08:00
Sanjay Patel	1e50d06466	[Analysis] fix swapped operands to computeConstantRange This was noted in post-commit review for D116322 / `0edf99950e` . I am not seeing how to expose the bug in a test though because we don't pass an assumption cache into this analysis from there.	2022-01-04 13:13:50 -05:00
Philip Reames	b061d86c69	[SCEV] Compute exit count from overflow check expressed w/ x.with.overflow intrinsics This ports the logic we generate in instcombine for a single use x.with.overflow check for use in SCEV's analysis. The result is that we can prove trip counts for many checks, and (through existing logic) often discharge them. Motivation comes from compiling a simple example with -ftrapv. Differential Revision: https://reviews.llvm.org/D116499	2022-01-04 09:44:23 -08:00
Florian Hahn	d8276208be	[LAA] Remove overeager assertion for aggregate types. `0a00d64` turned an early exit here into an assertion, but the assertion can be triggered, as PR52920 shows. The later code is agnostic to the accessed type, so just drop the assert. The patch also adds tests for LAA directly and loop-load-elimination to show the behavior is sane.	2022-01-04 15:20:35 +00:00
Nikita Popov	71b2c4a3cf	[ConstantFolding] Remove unused ConstantFoldLoadThroughGEPConstantExpr() This API is no longer used since `bbeaf2aac6`.	2022-01-04 12:37:12 +01:00
Rosie Sumpter	961f51fdf0	[LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions without loads/stores For loops that contain in-loop reductions but no loads or stores, large VFs are chosen because LoopVectorizationCostModel::getSmallestAndWidestTypes has no element types to check through and so returns the default widths (-1U for the smallest and 8 for the widest). This results in the widest VF being chosen for the following example, float s = 0; for (int i = 0; i < N; ++i) s += (float) i*i; which, for more computationally intensive loops, leads to large loop sizes when the operations end up being scalarized. In this patch, for the case where ElementTypesInLoop is empty, the widest type is determined by finding the smallest type used by recurrences in the loop instead of falling back to a default value of 8 bits. This results in the cost model choosing a more sensible VF for loops like the one above. Differential Revision: https://reviews.llvm.org/D113973	2022-01-04 10:12:57 +00:00
Craig Topper	cbcbbd6ac8	[ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC This function returns an upper bound on the number of bits needed to represent the signed value. Use "Max" to match similar functions in KnownBits like countMaxActiveBits. Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old name around to keep this patch size down. Will do a bulk rename as follow up. Rename KnownBits::countMaxSignedBits->countMaxSignificantBits. Reviewed By: lebedev.ri, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D116522	2022-01-03 11:33:30 -08:00
Kazu Hirata	e5947760c2	Revert "[llvm] Remove redundant member initialization (NFC)" This reverts commit `fd4808887e`. This patch causes gcc to issue a lot of warnings like: warning: base class ‘class llvm::MCParsedAsmOperand’ should be explicitly initialized in the copy constructor [-Wextra]	2022-01-03 11:28:47 -08:00
Nikita Popov	5afbfe33e7	[ConstantFold] Make icmp of gep fold offset based We can fold an equality or unsigned icmp between base+offset1 and base+offset2 with inbounds offsets by comparing the offsets directly. This replaces a pair of specialized folds that tried to reason based on the GEP structure instead. One of those folds was plain wrong (because it does not account for negative offsets), while the other is unnecessarily complicated and limited (e.g. it will fail with bitcasts involved). The disadvantage of this change is that it requires data layout, so the fold is no longer performed by datalayout-independent constant folding. I don't think this is a loss in practice, but it does regress the ConstantExprFold.ll test, which checks folding without running any passes. Differential Revision: https://reviews.llvm.org/D116332	2022-01-03 09:41:37 +01:00
Philip Reames	890e685492	[SCEV] Drop unused param from new version of computeExitLimitFromICmp [NFC]	2022-01-02 10:15:17 -08:00
Philip Reames	f19a95bbed	[SCEV] Split computeExitLimitFromICmp into two versions [NFC] This is in advance of a following change which needs to the non-icmp API.	2022-01-02 09:58:32 -08:00
Kazu Hirata	fd4808887e	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-01 16:18:18 -08:00
Sanjay Patel	c054402170	[InstSimplify] fold or-nand-xor ~(A & B) \| (A ^ B) --> ~(A & B) https://alive2.llvm.org/ce/z/hXQucg	2021-12-31 15:11:13 -05:00
Nuno Lopes	64af9f61c3	[InstSimplify] add 'x + poison -> poison' (needed for NewGVN)	2021-12-30 11:52:42 +00:00
Fangrui Song	b69fe48ccf	[IROutliner] Move global namespace cl::opt inside llvm::	2021-12-30 01:12:55 -08:00
Sanjay Patel	0edf99950e	[Analysis] allow caller to choose signed/unsigned when computing constant range We should not lose analysis precision if an 'add' has both no-wrap flags (nsw and nuw) compared to just one or the other. This patch is modeled on a similar construct that was added with D59386. I don't think it is possible to expose a problem with an unsigned compare because of the way this was coded (nuw is handled first). InstCombine has an assert that fires with the example from: https://github.com/llvm/llvm-project/issues/52884 ...because it was expecting InstSimplify to handle this kind of pattern with an smax. Fixes #52884 Differential Revision: https://reviews.llvm.org/D116322	2021-12-28 09:45:37 -05:00
Sanjay Patel	773ab3c665	[Analysis] remove unneeded casts; NFC The callee does the casting too; this matches a plain call later in the same function for 'shl'.	2021-12-27 13:41:50 -05:00
Nikita Popov	ae64c5a0fd	[DSE][MemLoc] Handle intrinsics more generically Remove the special casing for intrinsics in MemoryLocation::getForDest() and handle them through the general attribute based code. On the DSE side, this means that isRemovable() now needs to handle more than a hardcoded list of intrinsics. We consider everything apart from volatile memory intrinsics and lifetime markers to be removable. This allows us to perform DSE on intrinsics that DSE has not been specially taught about, using a matrix store as an example here. There is an interesting test change for invariant.start, but I believe that optimization is correct. It only looks a bit odd because the code is immediate UB anyway. Differential Revision: https://reviews.llvm.org/D116210	2021-12-24 09:29:57 +01:00
Mehrnoosh Heidarpour	0ff20f2f44	[InstSimplify] Fold logic AND to zero Adding following fold opportunity: ((A \| B) ^ A) & ((A \| B) ^ B) --> 0 Reviewed By: spatel, rampitec Differential Revision: https://reviews.llvm.org/D115755	2021-12-23 10:06:26 -05:00
Mircea Trofin	edf8e3ea5e	[NFC][mlgo]Make the test model generator inlining-specific When looking at building the generator for regalloc, we realized we'd need quite a bit of custom logic, and that perhaps it'd be easier to just have each usecase (each kind of mlgo policy) have it's own stand-alone test generator. This patch just consolidates the old `config.py` and `generate_mock_model.py` into one file, and does away with subdirectories under Analysis/models.	2021-12-22 13:38:45 -08:00
Nikita Popov	8a0e35f3a7	[MemoryLocation] Don't require nocapture in getForDest() As reames mentioned on related reviews, we don't need the nocapture requirement here. First of all, from an API perspective, this is not something that MemoryLocation::getForDest() should be checking in the first place, because it does not affect which memory this particular call can access; it's an orthogonal concern that should be handled by the caller if necessary. However, for both of the motivating users in DSE and InstCombine, we don't need the nocapture requirement, because the capture can either be purely local to the call (a pointer identity check that is irrelevant to us), be part of the return value (which we check is unused), or be written in the dest location, which we have determined to be dead. This allows us to remove the special handling for libcalls as well. Differential Revision: https://reviews.llvm.org/D116148	2021-12-22 12:20:13 +01:00
Nikita Popov	f5ac23b5ae	[ArgPromotion][TTI] Pass types to ABI compatibility hook The areFunctionArgsABICompatible() hook currently accepts a list of pointer arguments, though what we're actually interested in is the ABI compatibility after these pointer arguments have been converted into value arguments. This means that a) the current API is incompatible with opaque pointers (because it requires inspection of pointee types) and b) it can only be used in the specific context of ArgPromotion. I would like to reuse the API when inspecting calls during inlining. This patch converts it into an areTypesABICompatible() hook, which accepts a list of types. This makes the method more generally usable, and compatible with opaque pointers from an API perspective (the actual usage in ArgPromotion/Attributor is still incompatible, I'll follow up on that in separate patches). Differential Revision: https://reviews.llvm.org/D116031	2021-12-22 09:37:51 +01:00
Serge Pavlov	77b923d0db	[ConstantFolding] Do not remove side effect from constrained functions According to the discussion in https://reviews.llvm.org/D110322 the code that removes side effect from replaced function call is deleted. Differential Revision: https://reviews.llvm.org/D115870	2021-12-22 13:45:49 +07:00
Nikita Popov	2926d6d335	[ConstantFold][GlobalOpt] Don't create x86_mmx null value This fixes the assertion failure reported at https://reviews.llvm.org/D114889#3198921 with a straightforward check, until the cleaner fix in D115924 can be reapplied.	2021-12-21 09:11:41 +01:00
Kazu Hirata	500c4b68dc	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-12-20 23:43:24 -08:00
Philip Reames	44d23d5345	[DSE] Remove calls with known writes to dead memory This is a reapply of `a8a51fe5`, which was reverted in 1ba99e due to a failing compiler-rt test. That test was a false positive because it was checking asan failures not accounting for the fact the call could be validly optimized out. I hopefully managed to stablize that test in 9b955f. (That's a speculative fix due to disk consumption needed to build compiler-rt tests locally being absurd.) Original commit message follows.. The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size. Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet). In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest. Differential Revision: https://reviews.llvm.org/D115904	2021-12-20 18:10:23 -08:00
Sanjay Patel	a56803b8f8	[Analysis] fix cast in ValueTracking to allow constant expression The test would crash because a non-instruction negate op made it in here. Fixes #51506	2021-12-20 17:16:47 -05:00
Sander de Smalen	b1ff20fd35	[LV] Enable scalable vectorization by default for SVE cores. The availability of SVE should be sufficient to enable scalable auto-vectorization. This patch adds a new TTI interface to query the target what style of vectorization it wants when scalable vectors are available. For other targets than AArch64, this currently defaults to 'FixedWidthOnly'. Differential Revision: https://reviews.llvm.org/D115651	2021-12-20 16:23:29 +00:00
Nikita Popov	aeb36ae0f4	Revert "[ConstantFolding] Unify handling of load from uniform value" This reverts commit `9fd4f80e33`. This breaks SingleSource/Regression/C/gcc-c-torture/execute/pr19687.c in test-suite. Either the test is incorrect, or clang is generating incorrect union initialization code. I've submitted https://reviews.llvm.org/D115994 to fix the test, assuming my interpretation is correct. Reverting this in the meantime as it may take some time to resolve.	2021-12-18 20:46:52 +01:00
Ricky Zhou	9927a06f74	[AA] Handle callbr instructions in alias analysis Before this change, AAResults::getModRefInfo() was missing a case for callbr instructions (asm goto), which may read/write memory. In PR52735, this led to a miscompile where a load was incorrect eliminated. Add this missing case, as well as an assert verifying that all memory-accessing instructions are handled properly. Fixes #52735. Differential Revision: https://reviews.llvm.org/D115992	2021-12-18 18:49:17 +01:00
Nikita Popov	1ba99eaf70	Revert "[DSE] Remove calls with known writes to dead memory" This reverts commit `a8a51fe556`. This breaks the strncpy-overflow.cpp test case.	2021-12-18 09:23:41 +01:00
Philip Reames	a8a51fe556	[DSE] Remove calls with known writes to dead memory The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size. Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet). In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest. Differential Revision: https://reviews.llvm.org/D115904	2021-12-17 13:42:36 -08:00
Philip Reames	793c0da89e	[capturetracking] Explicitly check for callee operand [NFC] Pull out an explicit check rather than relying on the fact that the callee operand is not a data operand. The only real value is it gives us a clear place to move the comment, and makes the code slightly more understandable.	2021-12-17 09:21:35 -08:00
Nikita Popov	9fd4f80e33	[ConstantFolding] Unify handling of load from uniform value There are a number of places that specially handle loads from a uniform value where all the bits are the same (zero, one, undef, poison), because we a) don't care about the load offset in that case and b) it bypasses casts that might not be legal generally but do work with uniform values. We had multiple implementations of this, with a different set of supported values each time, as well as incomplete type checks in some cases. In particular, this fixes the assertion reported in https://reviews.llvm.org/D114889#3198921, as well as a similar assertion that could be triggered via constant folding. Differential Revision: https://reviews.llvm.org/D115924	2021-12-17 17:05:06 +01:00
Momchil Velikov	6192c312cf	[AA] Correctly maintain the sign of PartiaAlias offset Preserve the invariant that offset reported in the case of a `PartialAlias` between `Loc1` and `Loc2`, is such that `Loc1 + Offset = Loc2`, where `Loc1` and `Loc2` are the first and the second argument, respectively, in alias queries. Differential Revision: https://reviews.llvm.org/D115927	2021-12-17 15:45:26 +00:00
Florian Hahn	f5f421e0ee	[SCEV] Apply loop guards in reverse order. This patch updates applyLoopGuards to first collect all conditions and then applies them in reverse order. This ensures the SCEVs with the shortest dependency chains are constructed first, limiting the required stack size. This fixes a crash reported in D113578. Note that the order conditions are applied can impact the accuracy of the result, mostly due to missing min/max simplifications when constructing SCEVs. The changed test highlights the impact of the evaluation order. I will follow up with a SCEV patch to improve min/max simplifications to get the same results for both orders.	2021-12-16 10:52:37 +00:00
Nikita Popov	a8c2ba105d	[Inline] Disable deferred inlining After the switch to the new pass manager, we have observed multiple instances of catastrophic inlining, where the inliner produces huge functions with many hundreds of thousands of instructions from small input IR. We were forced to back out the switch to the new pass manager for this reason. This patch fixes at least one of the root cause issues. LLVM uses a bottom-up inliner, and the fact that functions are processed bottom-up is not just a question of optimality -- it is an imporant requirement to prevent runaway inlining. The premise of the current inlining approach and cost model is that after all calls inside a function have been inlined, it may get large enough that inlining it into its callers is no longer considered profitable. This safeguard does not exist if inlining doesn't happen bottom-up, as inlining the callees, and their callees, and their callees etc. will always seem individually profitable, and the inliner can easily flatten the whole call tree. There are instances where we necessarily have to deviate from bottom-up inlining: When inlining in an SCC there is no natural "bottom", so inlining effectively happens top-down. This requires special care, and the inliner avoids exponential blowup by ensuring that functions in the SCC grow in a balanced way and will eventually hit the threshold. However, there is one instance where the inlining advisor explicitly violates the bottom-up principle: Deferred inlining tries to "defer" inlining a call if it determines that inlining the caller into all its call-sites would be more profitable. Something very important to understand about deferred inlining is that it doesn't make one inlining choice in place of another -- it effectively chooses to do both. If we have a call chain A -> B -> C and cost modelling tells us that inlining B -> C is profitable, but we defer this and instead inline A -> B first, then we'll now have a call A -> C, and the cost model will (a few special cases notwithstanding) still tell us that this is profitable. So the end result is that we inlined both B and C, even though under the usual cost model function B would have been too large to further inline after C has been integrated into it. Because deferred inlining violates the bottom-up invariant of the inliner, it can result in exponential inlining. The exponential-deferred-inlining.ll test case illustrates this on a simple example (see https://gist.github.com/nikic/1262b5f7d27278e1b34a190ae10947f5 for a much more catastrophic case with about 5000x size blowup). If the call chain A -> B -> C is not a chain but a tree of calls, then we end up deferring inlining across the tree and end up flattening everything into the root node. This patch proposes to address this by disabling deferred inlining entirely (currently still behind an option). Beyond the issue of exponential inlining, I don't think that the whole concept makes sense, at least as long as deferred inlining still ends up inlining both call edges. I believe the motivation for having deferred inlining in the first place is that you might have a small wrapper function with local linkage that could be eliminated if inlined. This would automatically happen if there was a single caller, due to the large "last call to local" bonus. However, this bonus is not extended if there are multiple callers, even if we would eventually end up inlining into all of them (if the bonus were extended). Now, unlike the normal inlining cost model, the deferred inlining cost model does look at all callers, and will extend the "last call to local" bonus if it determines that we could inline all of them as long as we defer the current inlining decision. This makes very little sense. The "last call to local" bonus doesn't really cost model anything. It's basically an "infinite" bonus that ensures we always inline the last call to a local. The fact that it's not literally infinite just prevents inlining of huge functions, which can easily result in scalability issues. I very much doubt that it was an intentional cost-modelling choice to say that getting rid of a small local function is worth adding 15000 instructions elsewhere, yet this is exactly how this value is getting used here. The main alternative I see to complete removal is to change deferred inlining to an actual either/or decision. That is, to mark deferred calls as noinline so we're actually trading off one inlining decision against another, and not just adding a side-channel to the cost model to do both. Apart from fixing the catastrophic inlining case, the effect on rustc is a modest compile-time improvement on average (up to 8% for a parsing-type crate, where tree-like calls are expected) and pretty neutral where run-time performance is concerned (mix of small wins and losses, usually in the sub-1% category). Differential Revision: https://reviews.llvm.org/D115497	2021-12-16 09:59:50 +01:00
Mircea Trofin	db5aceb979	[NFC] Expose the ReleaseModeModelRunner The type was pretty much generic, just needed a bit of parameterization. Differential Revision: https://reviews.llvm.org/D115764	2021-12-15 23:21:58 -08:00
Fangrui Song	cf9e61a9bb	[LTO][WPD] Simplify mustBeUnreachableFunction and test after D115492 An well-formed IR function definition must have an entry basic block and a well-formed IR basic block must have one terminator so the emptiness check can be simplified. Also simplify the test a bit. Reviewed By: luna Differential Revision: https://reviews.llvm.org/D115780	2021-12-15 15:43:35 -08:00
Arthur Eubanks	5a81a60391	[NFC] Remove more calls to getAlignment() These are deprecated and should be replaced with getAlign(). Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.	2021-12-15 14:40:57 -08:00
Mingming Liu	09a704c5ef	[LTO] Ignore unreachable virtual functions in WPD in hybrid LTO. Differential Revision: https://reviews.llvm.org/D115492	2021-12-14 20:18:04 +00:00
Philip Reames	423f19680a	Add FMF to hasPoisonGeneratingFlags/dropPoisonGeneratingFlags These flags are documented as generating poison values for particular input values. As such, we should really be consistent about their handling with how we handle nsw/nuw/exact/inbounds. Differential Revision: https://reviews.llvm.org/D115460	2021-12-14 08:43:00 -08:00
Florian Hahn	ddfac0759c	Revert "[MemoryLocation] Handle memset_pattern{4,8,16} in getForDest." This reverts commit `ac60263ad1`. It looks like the test fails on certain non-Darwin system, even though the triple is explicitly set to macos. Revert while I investigate.	2021-12-14 14:48:47 +00:00
Nikita Popov	7abf299fed	[InlineAdvisor] Add option to control deferred inlining (NFC) This change is split out from D115497 to add the option independently from the switch of the default value.	2021-12-14 15:46:11 +01:00
Florian Hahn	ac60263ad1	[MemoryLocation] Handle memset_pattern{4,8,16} in getForDest. memset_pattern{4,8,16} writes to the first argument. Use getForDest to return the corresponding MemoryLocation. Reviewed By: ab Differential Revision: https://reviews.llvm.org/D114906	2021-12-14 14:41:28 +00:00
Kazu Hirata	d2377f24e1	Ensure newlines at the end of files (NFC)	2021-12-12 11:04:44 -08:00
Nikita Popov	9932d4db0d	[SCEV] Fix unused variable warning (NFC)	2021-12-11 21:03:54 +01:00
Mircea Trofin	04f2712ef4	[NFC][MLGO] Factor ModelUnderTrainingRunner for reuse This is so we may reuse it. It was very non-inliner specific already. Differential Revision: https://reviews.llvm.org/D115465	2021-12-10 11:24:15 -08:00
Nikita Popov	65bec04295	[ConstantFold] Handle same type in ConstantFoldLoadThroughBitcast Usually the case where the types are the same ends up being handled fine because it's legal to do a trivial bitcast to the same type. However, this is not true for aggregate types. Short-circuit the whole code if the types match exactly to account for this.	2021-12-10 16:39:50 +01:00
Sameer Sahasrabuddhe	1d0244aed7	Reapply CycleInfo: Introduce cycles as a generalization of loops Reverts `02940d6d22`. Fixes breakage in the modules build. LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops. The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR. This review is a restart of an older review request: https://reviews.llvm.org/D83094 Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> Differential Revision: https://reviews.llvm.org/D112696	2021-12-10 14:36:43 +05:30
Hasyimi Bahrudin	c1cd698a52	[InstSimplify] Simplify bool icmp with not in LHS Refer to https://llvm.org/PR52546. Simplifies the following cases: not(X) == 0 -> X != 0 -> X not(X) <=u 0 -> X >u 0 -> X not(X) >=s 0 -> X <s 0 -> X not(X) != 1 -> X == 1 -> X not(X) <=u 1 -> X >=u 1 -> X not(X) >s 1 -> X <=s -1 -> X Differential Revision: https://reviews.llvm.org/D114666	2021-12-09 16:26:46 -05:00
Arthur Eubanks	1172712f46	[NFC] Replace some deprecated getAlignment() calls with getAlign() Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D115370	2021-12-09 08:43:19 -08:00
Nikita Popov	3beafecedf	[InlineAdvisor] Remove outdated comment (NFC) This just returns None nowadays, so this comment doesn't apply anymore.	2021-12-09 15:11:56 +01:00
Florian Hahn	d74a8a78ad	[LV] Mark various functions as const (NFC). Make sure various accessors do not modify any state, in preparation for D115111.	2021-12-09 10:51:29 +00:00
Mircea Trofin	059e03476c	[NFC][mlgo] Generalize model runner interface This prepares it for the regalloc work. Part of it is making model evaluation accross 'development' and 'release' scenarios more reusable. This patch: - extends support to tensors of any shape (not just scalars, like we had in the inliner -Oz case). While the tensor shape can be anything, we assume row-major layout and expose the tensor as a buffer. - exposes the NoInferenceModelRunner, which we use in the 'development' mode to keep the evaluation code path consistent and simplify logging, as we'll want to reuse it in the regalloc case. Differential Revision: https://reviews.llvm.org/D115306	2021-12-08 20:10:58 -08:00
Florian Hahn	3c55acc4a6	[MemoryLocation] Support memset_pattern{4,8} in getForArgument. memset_pattern{4,8} behave as memset_pattern16, with the only difference being the size of the pattern location. Reviewed By: ab Differential Revision: https://reviews.llvm.org/D114905	2021-12-08 19:39:45 +00:00
Jolanta Jensen	77b2bb5567	[LAA] Use type sizes when determining dependence. In the isDependence function the code does not try hard enough to determine the dependence between types. If the types are different it simply gives up, whereas in fact what we really care about are the type sizes. I've changed the code to compare sizes instead of types. Reviewed By: fhahn, sdesmalen Differential Revision: https://reviews.llvm.org/D108763	2021-12-08 15:00:58 +00:00
James Farrell	219672b8dd	Revert "Revert "Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible."" This reverts commit `63a6348cad`. Differential Revision: https://reviews.llvm.org/D115254	2021-12-07 23:15:21 +00:00
Jonas Devlieghere	02940d6d22	Revert "CycleInfo: Introduce cycles as a generalization of loops" This reverts commit `0fe61ecc2c` because it breaks the modules build. https://green.lab.llvm.org/green/job/clang-stage2-rthinlto/4858/ https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/39112/	2021-12-07 13:06:34 -08:00
Sanjay Patel	8a69b04478	[InstSimplify] add logic fold for 'or' with 'xor'+'and' This replaces the 'or' from `4b30076f16` with an 'and'. We have to guard against propagating undef elements from vector 'not' values: https://alive2.llvm.org/ce/z/irMwRc	2021-12-07 11:08:26 -05:00
Cullen Rhodes	0395e01583	[IR] Split vscale_range interface Interface is split from: std::pair<unsigned, unsigned> getVScaleRangeArgs() into separate functions for min/max: unsigned getVScaleRangeMin(); Optional<unsigned> getVScaleRangeMax(); Reviewed By: sdesmalen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D114075	2021-12-07 10:38:26 +00:00
Sameer Sahasrabuddhe	0fe61ecc2c	CycleInfo: Introduce cycles as a generalization of loops LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops. The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR. This review is a restart of an older review request: https://reviews.llvm.org/D83094 Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> Differential Revision: https://reviews.llvm.org/D112696	2021-12-07 12:02:34 +05:30
James Farrell	63a6348cad	Revert "Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible." This reverts commit `5032467034`.	2021-12-06 17:35:26 +00:00

... 6 7 8 9 10 ...

11816 Commits