llvm-project

Commit Graph

Author	SHA1	Message	Date
Arthur Eubanks	b07aab8fc1	[GlobalOpt] Iterate over replaced values deterministically to constprop If there are pre-existing dead instructions, the order we visit replaced values can cause us sometimes to not delete dead instructions. The added test non-deterministically failed without the change.	2022-05-02 09:43:20 -07:00
Phoebe Wang	7c04454227	[ArgPromotion][Attributor] Update min-legal-vector-width when do promotion X86 codegen uses function attribute `min-legal-vector-width` to select the proper ABI. The intention of the attribute is to reflect user's requirement when they passing or returning vector arguments. So Clang front-end will iterate the vector arguments and set `min-legal-vector-width` to the width of the maximum for both caller and callee. It is assumed any middle end optimizations won't care of the attribute expect inlining and argument promotion. - For inlining, we will propagate the attribute of inlined functions because the inlining functions become the newer caller. - For argument promotion, we check the `min-legal-vector-width` of the caller and callee and refuse to promote when they don't match. The problem comes from the optimizations' combination, as shown by https://godbolt.org/z/zo3hba8xW. The caller `foo` has two callees `bar` and `baz`. When doing argument promotion, both `foo` and `bar` has the same `min-legal-vector-width`. So the argument was promoted to vector. Then the inlining inlines `baz` to `foo` and updates `min-legal-vector-width`, which results in ABI mismatch between `foo` and `bar`. This patch fixes the problem by expanding the concept of `min-legal-vector-width` to indicator of functions arguments. That says, any passes touch functions arguments have to set `min-legal-vector-width` to the value reflects the width of vector arguments. It makes sense to me because any arguments modifications are ABI related and should response for the ABI compatibility. Differential Revision: https://reviews.llvm.org/D123284	2022-05-02 14:13:05 +08:00
Hongtao Yu	bdb8c50a1c	[CSSPGO] Turn on priority inlining for probe-only profile We have seen that the prioirty inliner delivered on-par performance with the old inliner for probe-only CSSPGO profile, as long as without a size budget. I'm turning on the priority inliner for probe-only profile by default. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D124632	2022-04-29 17:31:56 -07:00
Hongtao Yu	e36786d15f	[CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122602	2022-04-29 17:03:52 -07:00
Florian Hahn	fb4113ef0c	[Passes] Remove legacy LoopUnswitch pass. The legacy LoopUnswitch pass is only used in the legacy pass manager pipeline, which is deprecated. The NewPM replacement is SimpleLoopUnswitch and I think it is time to remove the legacy LoopUnswitch code. Fixes #31000. Reviewed By: aeubanks, Meinersbur, asbirlea Differential Revision: https://reviews.llvm.org/D124376	2022-04-29 10:30:49 +01:00
Pavel Samolysov	9197959e13	[ArgPromotion] Move ArgPart and OffsetAndArgPart to anonymous namespace The structure ArgPart and alias OffsetAndArgPart have been moved into the anonymous namespace. NFC. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D124617	2022-04-28 09:51:46 -07:00
Pavel Samolysov	6b825e50f7	[ArgPromotion] Change the condition to check the promotion limit The condition should be 'ArgParts.size() > MaxElements', so that if we have exactly 3 elements in the 'ArgParts' vector, the promotion should be allowed because the 'MaxElement' threshold is not exceeded yet. The default value for 'MaxElement' has been decreased to 2 in order to avoid an actual change in argument promoting behavior. However, this changes byval argument transformation behavior by allowing adding not more than 2 arguments to the function instead of 3 allowed before. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D124178	2022-04-28 09:42:58 -07:00
Pavel Samolysov	744a837838	[ArgPromotion] Rename variables according to the code style. NFC Some loop counters ('i', 'e') and variables ('type') were named not in accordance with the code style and clang-tidy issues warnings about the using of such variables. This patch renames the variables and fixes some typos in the comments within the source file. Differential Revision: https://reviews.llvm.org/D123662	2022-04-28 15:32:05 +02:00
Michael Kruse	ff289feeba	[OpenMPIRBuilder] Remove ContinuationBB argument from Body callback. The callback is expected to create a branch to the ContinuationBB (sometimes called FiniBB in some lambdas) argument when finishing. This creates problems: 1. The InsertPoint used for CodeGenIP does not need to be the end of a block. If it is not, a naive callback will insert a branch instruction into the middle of the block. 2. The BasicBlock the CodeGenIP is pointing to may or may not have a terminator. There is an conflict where to branch to if the block already has a terminator. 3. Some API functions work only with block having a terminator. Some workarounds have been used to insert a temporary terminator that is removed again. 4. Some callbacks are sensitive to whether the BasicBlock has a terminator or not. This creates a callback ordering problem where different callback may have different behaviour depending on whether a previous callback created a terminator or not. The problem also exists for FinalizeCallbackTy where some callbacks do create branch to another "continue" block, but unlike BodyGenCallbackTy does not receive the target as argument. This is not addressed in this patch. With this patch, the callback receives an CodeGenIP into a BasicBlock where to insert instructions. If it has to insert control flow, it can split the block at that position as needed but otherwise no separate ContinuationBB is needed. In particular, a callback can be empty without breaking the emitted IR. If the caller needs the control flow to branch to a specific target, it can insert the branch instruction itself and pass an InsertPoint before the terminator to the callback. Certain frontends such as Clang may expect the current IRBuilder position to be at the end of a basic block. In this case its callbacks must split the block at CodeGenIP before setting the IRBuilder position such that the instructions after CodeGenIP are moved to another basic block and before returning create a new branch instruction to the split block. Some utility functions such as `splitBB` are supporting correct splitting of BasicBlocks, independent of whether they have a terminator or not, returning/setting the InsertPoint of an IRBuilder to the end of split predecessor block, and optionally omitting creating a branch to the split successor block to be added later. Reviewed By: kiranchandramohan Differential Revision: https://reviews.llvm.org/D118409	2022-04-26 16:35:01 -05:00
Liqiang Tao	b9fc18f89a	[llvm][Inline] Remove PriorityInlineOrder in SCC inliner Since the size of most of SCC's is 1, the PriorityInlineOrder would not change the inline order in SCC inliner. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D123608	2022-04-26 20:20:10 +08:00
David Green	9727c77d58	[NFC] Rename Instrinsic to Intrinsic	2022-04-25 18:13:23 +01:00
Paul Kirth	4683a2effa	[llvm][misexpect] Avoid division by 0 when using sample profiling MisExpect diagnostics should not prevent compilation from succeeding, and the assertion is insufficient to prevent division by zero in release builds. This patch addresses that by replacing the assert with an early return. Additionally, it disables MisExpect diagnostics when using sample profiling, since this is the only known case where this error has manifested. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D124302	2022-04-22 22:48:00 +00:00
Fangrui Song	14d9390721	Revert D123198 "[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building libcalls." test/Transforms/InstCombine/pr39177.ll failed in a -DLLVM_USE_SANITIZER=Undefined build. ``` lib/Transforms/Utils/BuildLibCalls.cpp:1217:17: runtime error: reference binding to null pointer of type 'llvm::Function' ``` `Function &F = *M->getFunction(Name);` This reverts commit `0f8c626723`.	2022-04-19 22:26:10 -07:00
Paul Kirth	bac6cd5bf8	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D115907	2022-04-19 21:23:48 +00:00
Jonas Paulsson	0f8c626723	[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building libcalls. A new set of overloaded functions named getOrInsertLibFunc() are now supposed to be used instead of getOrInsertFunction() when building a libcall from within an LLVM optimizer(). The idea is that this new function also makes sure that any mandatory argument attributes are added to the function prototype (after calling getOrInsertFunction()). inferLibFuncAttributes() is renamed to inferNonMandatoryLibFuncAttrs() as it only adds attributes that are not necessary for correctness but merely helping with later optimizations. Generally, the front end is responsible for building a correct function prototype with the needed argument attributes. If the middle end however is the one creating the call, e.g. when replacing one libcall with another, it then must take this responsibility. This continues the work of properly handling argument extension if required by the target ABI when building a lib call. getOrInsertLibFunc() now does this for all libcalls currently built by any LLVM optimizer. It is expected that when in the future a new optimization builds a new libcall with an integer argument it is to be added to getOrInsertLibFunc() with the proper handling. Note that not all targets have it in their ABI to sign/zero extend integer arguments to the full register width, but this will be done selectively as determined by getExtAttrForI32Param(). Review: Eli Friedman, Nikita Popov, Dávid Bolvanský Differential Revision: https://reviews.llvm.org/D123198	2022-04-19 21:22:07 +02:00
Michael Kruse	2d92ee97f1	Reapply "[OpenMP] Refactor OMPScheduleType enum." This reverts commit `af0285122f`. The test "libomp::loop_dispatch.c" on builder openmp-gcc-x86_64-linux-debian fails from time-to-time. See #54969. This patch is unrelated.	2022-04-18 21:56:47 -05:00
Michael Kruse	af0285122f	Revert "[OpenMP] Refactor OMPScheduleType enum." This reverts commit `9ec501da76`. It may have caused the openmp-gcc-x86_64-linux-debian buildbot to fail. https://lab.llvm.org/buildbot/#/builders/4/builds/20377	2022-04-18 14:38:31 -05:00
Michael Kruse	9ec501da76	[OpenMP] Refactor OMPScheduleType enum. The OMPScheduleType enum stores the constants from libomp's internal sched_type in kmp.h and are used by several kmp API functions. The enum values have an internal structure, namely each scheduling algorithm (e.g.) exists in four variants: unordered, orderend, normerge unordered, and nomerge ordered. This patch (basically a followup to D114940) splits the "ordered" and "nomerge" bits into separate flags, as was already done for the "monotonic" and "nonmonotonic", so we can apply bit flags operations on them. It also now contains all possible combinations according to kmp's sched_type. Deriving of the OMPScheduleType enum from clause parameters has been moved form MLIR's OpenMPToLLVMIRTranslation.cpp to OpenMPIRBuilder to make available for clang as well. Since the primary purpose of the flag is the binary interface to libomp, it has been made more private to LLVMFrontend. The primary interface for generating worksharing-loop using OpenMPIRBuilder code becomes `applyWorkshareLoop` which derives the OMPScheduleType automatically and calls the appropriate emitter function. While this is mostly a NFC refactor, it still applies the following functional changes: * The logic from OpenMPToLLVMIRTranslation to derive the OMPScheduleType also applies to clang. Most notably, it now applies the nonmonotonic flag for non-static schedules by default. * In OpenMPToLLVMIRTranslation, the nonmonotonic default flag was previously not applied if the simd modifier was used. I assume this was a bug, since the effect was due to `loop.schedule_modifier()` returning `mlir::omp::ScheduleModifier::none` instead of `llvm::Optional::None`. * In OpenMPToLLVMIRTranslation, the nonmonotonic default flag was set even if ordered was specified, in breach to what the comment before citing the OpenMP specification says. I assume this was an oversight. The ordered flag with parameter was not considered in this patch. Changes will need to be made (e.g. adding/modifying function parameters) when support for it is added. The lengthy names of the enum values can be discussed, for the moment this is avoiding reusing previously existing enum value names such as `StaticChunked` to avoid confusion. Reviewed By: peixin Differential Revision: https://reviews.llvm.org/D123403	2022-04-18 14:03:17 -05:00
Arthur Eubanks	2e6ac54cf4	[LegacyPM] Remove ThinLTO/LTO pipelines Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove the (Thin)LTO pipelines. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D123882	2022-04-18 10:09:41 -07:00
Zarko Todorovski	ce87133120	[llvm][IPO] Inclusive language: Rename mergefunc-sanity to mergefunc-verify and remove other instances of sanity in MergeFunctions.cpp This patch renames the mergefunc-sanity to mergefunc-verify and renames the related functions to use more inclusive language Reviewed By: cebowleratibm Differential Revision: https://reviews.llvm.org/D114374	2022-04-18 11:50:08 -04:00
Aaron Ballman	86cdb2929c	Silence a "not all control paths return a value" warning; NFC	2022-04-18 08:54:08 -04:00
Johannes Doerfert	e87f10a771	[Attributor] CGSCC pass should not recompute results outside the SCC (reapply) When we run the CGSCC pass we should only invest time on the SCC. We can initialize AAs with information from the module slice but we should not update those AAs. We make an exception for are call site of the SCC as they are helpful providing information for the SCC. Minor modifications to pointer privatization allow us to perform it even in the CGSCC pass, similar to ArgumentPromotion.	2022-04-17 12:48:49 -05:00
Andrew Litteken	d7c56a076e	[IROutliner] Ensure that phi values that are passed in as arguments are remapped as arguments Issue: https://github.com/llvm/llvm-project/issues/54430 For incoming values of phi nodes added to an outlined function to accommodate different exit paths in the function, when a value is a constant that is passed into the outlined function as an argument, we find the corresponding value in the first extracted function used to fill the overall outlined function. When this value is an argument, the corresponding value used will be the old value, prior to outlining. This patch maintains a mapping from these values to arguments, and uses this mapping to update the added phi node accordingly. Reviewers: paquette Recommit of `d6eb480afb` Differential Revision: https://reviews.llvm.org/D122206	2022-04-16 15:47:52 -05:00
Johannes Doerfert	3be3b40188	[Attributor][NFCI] Introduce AttributorConfig to bundle all options Instead of lengthy constructors we can now set the members of a read-only struct before the Attributor is created. Should make it clearer what is configurable and also help introducing new options in the future. This actually added IsModulePass and avoids deduction through the Function set size. No functional change was intended.	2022-04-15 18:17:19 -05:00
Johannes Doerfert	04f3a224bc	[Attributor][NFC] Introduce a flag to distinguish the scope of a query	2022-04-15 14:56:10 -05:00
Johannes Doerfert	bd72acf4d8	[Attributor][NFC] Code cleanup to minimize follow up changes	2022-04-15 14:56:09 -05:00
Johannes Doerfert	2d8e7834b0	[Attributor][NFC] Rename AAPotentialValues to AAPotentialConstantValues	2022-04-15 14:56:09 -05:00
Fangrui Song	04e094a336	[PGO] Remove legacy PM passes Legacy PM for optimization pipeline was deprecated in 13.0.0 and Clang dropped legacy PM support in D123609. This change removes legacy PM passes for PGO so that downstream projects won't be able to use it. It seems appropriate to start removing such "add-on" features like instrumentations, before we remove more stuff after 15.x is branched. I have checked many LLVM users and only ldc[1] uses the legacy PGO pass. [1]: https://github.com/ldc-developers/ldc/issues/3961 Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D123834	2022-04-15 10:26:43 -07:00
Andrew Litteken	6f8eba06c2	Revert "[IROutliner] Ensure that phi values that are passed in as arguments are remapped as arguments" Failing test due to typo This reverts commit `d6eb480afb`.	2022-04-14 12:23:33 -05:00
Andrew Litteken	d6eb480afb	[IROutliner] Ensure that phi values that are passed in as arguments are remapped as arguments Issue: https://github.com/llvm/llvm-project/issues/54430 For incoming values of phi nodes added to an outlined function to accommodate different exit paths in the function, when a value is a constant that is passed into the outlined function as an argument, we find the corresponding value in the first extracted function used to fill the overall outlined function. When this value is an argument, the corresponding value used will be the old value, prior to outlining. This patch maintains a mapping from these values to arguments, and uses this mapping to update the added phi node accordingly. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D122206	2022-04-14 12:16:23 -05:00
Andrew Litteken	a919d3d888	[IROutliner] Ensure that incoming blocks of PHINodes are included in the unique numbering gneration for phi nodes for each exit path Issue: https://github.com/llvm/llvm-project/issues/54431 PHINodes that need to be generated to accommodate a PHINode outside the region due to different output paths need to have their own numbering to determine the number of output schemes required to properly handle all the outlined regions. This numbering was previously only determined by the order and values of the incoming values, as well as the parent block of the PHINode. This adds the incoming blocks to the calculation of a hash value for these PHINodes as well, and the supporting infrastructure to give each block in a region a corresponding canonical numbering. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D122207	2022-04-14 12:13:17 -05:00
serge-sans-paille	fa5a4e1b95	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `a96638e50e` detected a few regressions, fixing them.	2022-04-13 20:53:19 +02:00
Arthur Eubanks	51561b5e80	[ArgPromo][OpaquePointer] Don't promote mismatched function types Mismatched call/callee function types is considered an indirect call. Fixes crash in https://reviews.llvm.org/D123300#3446023.	2022-04-12 15:17:45 -07:00
Johannes Doerfert	af30de7788	[Attributor] Introduce AAInstanceInfo The Attributor, as many other parts in LLVM, uses pointer equivalence for `llvm::Value`s. This only works as long as `llvm::Value`s are dynamically unique, or, to be exact, we will never end up with the same `llvm::Value` representing two dynamic instances. We already provided a helper to check the former, namely `AA::isDynamicallyUnique`, however we could not check the latter. In this patch we move the logic into a separate AA which helps with the growing complexity and use cases. We also extend the interface to answer the second question rather than the first. So we do not determine dynamically uniqueness but if we might end up with the `llvm::Value` describing a different dynamic instance. Note that the latter is very much tied to the Attributor capabilities to look through memory, recursion, etc. so we need to update the logic as we go.	2022-04-05 23:07:13 -05:00
Johannes Doerfert	c42aa1be74	[Attributor] Keep loads feeding in `llvm.assume` if stores stays If a load is only used by an `llvm.assume` and the stores feeding into the load are not removable, keep the load.	2022-04-05 23:07:12 -05:00
Johannes Doerfert	857bf306d7	[Attributor] Remove broken and duplicated load simplification We look through loads in the "generic value traversal" and we consequently don't need to look through them again in AAValueSimplify*. The test changes stem from the fact that we allowed any simplified value, incl. non-dynamically unique ones, as long as the underlying memory was an alloca. This doesn't seem to make sense as allocas do not protect against dynamically non-unique values. We need to make the unique check better rather than excluding allocas. That in mind, we can remove a lot of code by simply relying on the generic value traversal load look through. To soften the blow some minor adjustments have been made that allow more simplification through the now used scheme and some tests have been given a `norecurse` for now.	2022-04-05 20:49:03 -05:00
Johannes Doerfert	a8610d7523	[Attributor] Move recursion reasoning into `AA::isPotentiallyReachable` With D106397 we ensured that `AAReachability` will not answer queries for potentially recursive functions. This was necessary as we did not treat recursion explicitly otherwise. Now that we have `AA::isPotentiallyReachable` we can make `AAReachability` a purely intra-procedural AA which does not care about recursion. `AA::isPotentiallyReachable`, however, does already deal with "going back" the call graph and can now do so for potentially recursive functions.	2022-04-05 20:49:03 -05:00
Teresa Johnson	ced9a795fd	[WPD] Add statistics Add statistics to count overall devirtualized targets as well as the various types of devirtualizations applied at callsites. Differential Revision: https://reviews.llvm.org/D123152	2022-04-05 18:48:23 -07:00
Johannes Doerfert	3e8c4366e2	[Attributor] Visit droppable uses in AAIsDead If we ignore droppable users everything only used in llvm.assume (among other things) is going to be deleted as dead. This is not helpful. Instead we want to only delete things we actually don't need anymore. A follow up will deal with loads in a smarter way.	2022-04-05 18:20:45 -05:00
Johannes Doerfert	79962df386	[Attributor] Allow to reproduce instructions for simplification When simplify values we might end up with an instruction from a different scope or just one that does not dominate the use. If the instruction can be reproduced without side-effect (incl. UB) we can now do that. For now this is mostly used for speculatable (intrinsic) calls but as we learn to make things like arguments or loads available this will become more powerful. This will also allow us to remove dead stores more easily in a follow up.	2022-04-04 12:28:08 -05:00
Augie Fackler	603ae73146	AttributorAttributes: guard against TLI being nullptr I didn't dig into this very much because it appears to be totally valid (especially once these properties can come from attributes instead of only from hard-coded library functions) for TLI to not be defined, and nothing broke when I added this check, including with all my other patches applied. Differential Revision: https://reviews.llvm.org/D122917	2022-04-03 23:19:23 -04:00
Alexandros Lamprineas	f364278c45	[FuncSpec][NFC] Cache code metrics for analyzed functions. This isn't expected to reduce compilation times as 'max-iters' is set to one by default, but it helps with recursive functions that require higher iteration counts. Differential Revision: https://reviews.llvm.org/D122819	2022-04-01 10:58:26 +01:00
Jorge Gorbe Moya	fc7573f29c	Revert "[misexpect] Re-implement MisExpect Diagnostics" This reverts commit `46774df307`.	2022-03-31 14:54:41 -07:00
Paul Kirth	46774df307	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D115907	2022-03-31 17:38:21 +00:00
Alexandros Lamprineas	b4417075dc	[FuncSpec] Constant propagate multiple arguments for recursive functions. This fixes a TODO in constantArgPropagation() to make it feature complete. However, I do find myself in agreement with the review comments in https://reviews.llvm.org/D106426. I don't think we should pursue specializing such recursive functions as the code size increase becomes linear to 'max-iters'. Compiling the modified test just with -O3 (no function specialization) generates the same code. Differential Revision: https://reviews.llvm.org/D122755	2022-03-31 13:00:08 +01:00
serge-sans-paille	01be9be2f2	Cleanup includes: final pass Cleanup a few extra files, this closes the work on libLLVM dependencies on my side. Impact on libLLVM preprocessed output: -35876 lines Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122576	2022-03-29 09:00:21 +02:00
Paul Kirth	90cb325abd	Revert "[misexpect] Re-implement MisExpect Diagnostics" This reverts commit `2add3fbd97`.	2022-03-29 06:20:30 +00:00
Johannes Doerfert	7df2eba7fa	[Attributor][OpenMP] Add assumption for non-call assembly instructions Inline assembly is scary but we need to support it for the OpenMP GPU device runtime. The new assumption expresses the fact that it may not have call semantics, that is, it will not call another function but simply perform an operation or side-effect. This is important for reachability in the presence of inline assembly. Differential Revision: https://reviews.llvm.org/D109986	2022-03-28 20:57:52 -05:00
Paul Kirth	2add3fbd97	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D115907	2022-03-28 23:30:04 +00:00
Nikita Popov	db561064f6	[GlobalOpt] Handle non-instruction MTI source (PR54572) This was reusing a cast to GlobalVariable to check for an Instruction, which means we'll try to dereference a null pointer if it's not actually a GlobalVariable. We should be casting MTI->getSource() instead. I don't think this problem is really specific to opaque pointers, but it certainly makes it a lot easier to reproduce. Fixes https://github.com/llvm/llvm-project/issues/54572.	2022-03-28 14:28:47 +02:00
Alexandros Lamprineas	8045bf9d0d	[FuncSpec] Support function specialization across multiple arguments. The current implementation of Function Specialization does not allow specializing more than one arguments per function call, which is a limitation I am lifting with this patch. My main challenge was to choose the most suitable ADT for storing the specializations. We need an associative container for binding all the actual arguments of a specialization to the function call. We also need a consistent iteration order across executions. Lastly we want to be able to sort the entries by Gain and reject the least profitable ones. MapVector fits the bill but not quite; erasing elements is expensive and using stable_sort messes up the indices to the underlying vector. I am therefore using the underlying vector directly after calculating the Gain. Differential Revision: https://reviews.llvm.org/D119880	2022-03-28 12:01:53 +01:00
Hongtao Yu	7a316c0a1f	[CSSPGO] Turn on profi and ext-tsp when using probe-based profile. Probe-based profile leads to a better performance when combined with profi and ext-tsp block layout. I'm turning them on by default. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122442	2022-03-25 09:09:21 -07:00
Johannes Doerfert	a81fff8afd	Reapply "[Intrinsics] Add `nocallback` to the default intrinsic attributes" This reverts commit `c5f789050d` and reapplies `7aea3ea8c3` with additional test changes.	2022-03-25 09:36:50 -05:00
Johannes Doerfert	c5f789050d	Revert "[Intrinsics] Add `nocallback` to the default intrinsic attributes" This reverts commit `7aea3ea8c3` as it breaks the buildbots. I didn't see these failures in the pre-merge checks, looking into it.	2022-03-24 14:04:41 -05:00
Johannes Doerfert	7aea3ea8c3	[Intrinsics] Add `nocallback` to the default intrinsic attributes Most intrinsics, especially "default" ones, will not call back into the IR module. `nocallback` encodes this nicely. As it was not used before, this patch also makes use of `nocallback` in the Attributor which results in many more `norecurse` deductions. Tablegen part is mechanical, test updates by script. Differential Revision: https://reviews.llvm.org/D118680	2022-03-24 13:50:54 -05:00
Johannes Doerfert	ee94a4a3d0	[Attributor][FIX] Avoid endless recursion, simple case There is potential for endless recursion if we try to determine the underlying objects of a load, just to end up with the load as underlying object. A proper solution will require us to pass a visited set around. This will happen as we cleanup genericValueTraversal soon.	2022-03-23 15:55:32 -05:00
minglotus-6	e2074de6a8	[ProfSampleLoader] When disable-sample-loader-inlining is true, merge profiles of inlined instances to outlining versions. When --disable-sample-loader-inlining is true, skip inline transformation, but merge profiles of inlined instances to outlining versions. Differential Revision: https://reviews.llvm.org/D121862	2022-03-23 13:02:48 -07:00
Alexandros Lamprineas	a687f96b0f	[FuncSpec][NFC] Clang-format the source code and fix debug typo.	2022-03-23 14:39:58 +00:00
serge-sans-paille	a53b689f0c	Fix missing include under -DEXPENSIVE_CHECK Regression introduced by `f1985a3f85`	2022-03-22 10:37:56 +01:00
serge-sans-paille	f1985a3f85	Cleanup includes: Transforms/IPO Preprocessor output diff: -238205 lines Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122183	2022-03-22 10:06:28 +01:00
Hirochika Matsumoto	86f970e595	[IROutliner][NFC] Fix typo in doc of findOrCreatePHIInBlock Typo Fix in Documentation Author: hkmatsumoto Reviewers: AndrewLitteken Differential Revision: https://reviews.llvm.org/D121627	2022-03-21 12:34:20 -05:00
Andrew Litteken	4e500df89e	[IROutliner] Fix phi nodes when self referential within block but doesn't contain branch When outlining a phi node, if the the incoming branch is a block contained in the region and the branch from that block is not outlined, we create broken code. The fix is to recognize when that branch from the included incoming block is not contained, and ignore the region. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D121311	2022-03-21 11:05:15 -05:00
Andrew Litteken	38e8880e93	[IROutliner] Do not outlined from functions with optnone Since the IROutliner is performing an optimization, it should not outline from functions explicitly marked with optnone. This adds an extra check and test to make sure this does not occur. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D121567	2022-03-20 23:39:23 -05:00
Kazu Hirata	bce1bf0ee2	[Transform] Apply clang-tidy fixes for readability-redundant-smartptr-get (NFC)	2022-03-20 10:41:22 -07:00
Johannes Doerfert	4166738c38	[OpenMP][FIX] Do not crash when kernels are debug wrapper functions With debug information enabled (-g) Clang will wrap the actual target region into a new function which is called from the "kernel". The problem is that the "kernel" is now basically a wrapper without all the things we expect. More importantly, if we end up asking for an AAKernelInfo for the "target region function" we might try to turn it into SPMD mode. That used to cause an assertion as that function doesn't have an appropriately named `_exec_mode` global. While the global is going away soon we still need to make sure to properly handle this case, e.g., perform optimizations reliably. Differential Revision: https://reviews.llvm.org/D122043	2022-03-19 14:15:55 -05:00
Fangrui Song	c6692f819e	[GlobalOpt] Don't replace alias with aliasee if either alias/aliasee may be preemptible Generalize D99629 for ELF. A default visibility non-local symbol is preemptible in a -shared link. `isInterposable` is an insufficient condition. Moreover, a non-preemptible alias may be referenced in a sub constant expression which intends to lower to a PC-relative relocation. Replacing the alias with a preemptible aliasee may introduce a linker error. Respect dso_preemptable and suppress optimization to fix the abose issues. With the change, `alias = 345` will not be rewritten to use aliasee in a `-fpic` compile. ``` int aliasee; extern int alias __attribute__((alias("aliasee"), visibility("hidden"))); void foo() { alias = 345; } // intended to access the local copy ``` While here, refine the condition for the alias as well. For some binary formats like COFF, `isInterposable` is a sufficient condition. But I think canonicalization for the changed case has little advantage, so I don't bother to add the `Triple(M.getTargetTriple()).isOSBinFormatELF()` or `getPICLevel/getPIELevel` complexity. For instrumentations, it's recommended not to create aliases that refer to globals that have a weak linkage or is preemptible. However, the following is supported and the IR needs to handle such cases. ``` int aliasee __attribute__((weak)); extern int alias __attribute__((alias("aliasee"))); ``` There are other places where GlobalAlias isInterposable usage may need to be fixed. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D107249	2022-03-18 14:17:05 -07:00
Paul Kirth	964398ccb1	Revert "Revert "Revert "[misexpect] Re-implement MisExpect Diagnostics""" This reverts commit `6cf560d69a`.	2022-03-18 00:21:33 +00:00
Paul Kirth	6cf560d69a	Revert "Revert "[misexpect] Re-implement MisExpect Diagnostics"" I mistakenly reverted my commit, so I'm relanding it. This reverts commit `10866a1df4`.	2022-03-18 00:04:22 +00:00
Paul Kirth	10866a1df4	Revert "[misexpect] Re-implement MisExpect Diagnostics" This reverts commit `e7749d4713`.	2022-03-17 23:54:26 +00:00
Paul Kirth	e7749d4713	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Differential Revision: https://reviews.llvm.org/D115907	2022-03-17 23:46:23 +00:00
Johannes Doerfert	4308fdf83b	[Attributor] Remove more non-deterministic behavior and debug output	2022-03-17 17:42:32 -05:00
Johannes Doerfert	59a6b668ab	[OpenMP][FIX] Initialize member to avoid undefined value in debug output	2022-03-17 17:42:32 -05:00
Johannes Doerfert	88ea86c369	[Attributor][FIX] Remove reference into map that might dangle The reference was taken and the map was modified after. This can (and did) lead to dangling pointers and all sorts of problems afterwards.	2022-03-17 17:42:32 -05:00
Ellis Hoag	f6b5142ac2	[AlwaysInliner] Emit inline remark only when successful Failures in `InlineFunction()` are caught after D121722, but `emitInlinedIntoBasedOnCost()` should only be called when inlining is successful. This also removes an unnecessary call to `shouldInline()` which always returned `InlineCost::getAlways()`. Reviewed By: kyulee, nikic Differential Revision: https://reviews.llvm.org/D121946	2022-03-17 15:40:24 -07:00
Andrew Litteken	f7d90ad57b	[IROutliner] Make sure that loop debug info is stripped. As pointed out in https://github.com/llvm/llvm-project/issues/54155#issuecomment-1057465479, there was a crash when loop info was being outlined. It was not being properly stripped and adjusted, so would point to the wrong location. This uses similar logic found in the CodeExtractor to adjust the loop debug info. Reviewer: fhahn, paquette Differential Revision: https://reviews.llvm.org/D120869	2022-03-17 14:41:53 -06:00
Ellis Hoag	84c6689b15	[AlwaysInliner] Check inliner errors even without assserts When we build clang without asserts we should still check the result of `InlineFunction()` to be sure there wasn't an error. Otherwise we could incorrectly merge attributes in the next line. This also removes a redundent call to `getCaller()`. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121722	2022-03-17 10:16:23 -07:00
Florian Hahn	e5822ded56	[FunctionAttrs] Infer argmemonly . This patch adds initial argmemonly inference, by checking the underlying objects of locations returned by MemoryLocation. I think this should cover most cases, except function calls to other argmemonly functions. I'm not sure if there's a reason why we don't infer those yet. Additional argmemonly can improve codegen in some cases. It also makes it easier to come up with a C reproducer for `7662d1687b` (already fixed, but I'm trying to see if C/C++ fuzzing could help to uncover similar issues.) Compile-time impact: NewPM-O3: +0.01% NewPM-ReleaseThinLTO: +0.03% NewPM-ReleaseLTO+g: +0.05% https://llvm-compile-time-tracker.com/compare.php?from=067c035012fc061ad6378458774ac2df117283c6&to=fe209d4aab5b593bd62d18c0876732ddcca1614d&stat=instructions Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121415	2022-03-16 10:24:33 +00:00
Florian Hahn	014f5bcf7a	[FunctionAttrs] Replace MemoryAccessKind with FMRB. Update FunctionAttrs to use FunctionModRefBehavior instead MemoryAccessKind. This allows for adding support for inferring argmemonly and others, see D121415. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121460	2022-03-15 19:35:54 +00:00
Nikita Popov	875782bd9e	[OpenMPOpt] Avoid pointer element type access during region merging Hardcode the function type as ParallelTask, which is the guaranteed pointee type of this runtime function argument (if pointee types exist). The elimination of the callee bitcast is left for InstCombine. Differential Revision: https://reviews.llvm.org/D120885	2022-03-15 09:52:46 +01:00
Andrew Litteken	228cc2c38b	[IROutliner] Ensure merged PHINodes respect order and incoming blocks, not just incoming values When matching PHINodes when margining functions the IROutliner only checks that an incoming value exists in phi node in overall function. It doesn't check the length, the order, or that the incoming block also matches. In the given example, we see that both phi nodes have the same incoming values, but from different blocks. The fix is to to enforce stricter a match of the incoming value, and the incoming block as well when matching the created phi nodes. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D121310	2022-03-14 16:48:21 -05:00
Nick Desaulniers	236695e70c	[IRLinker] make IRLinker::AddLazyFor optional (llvm::unique_function). NFC 2 of the 3 callsite of IRMover::move() pass empty lambda functions. Just make this parameter llvm::unique_function. Came about via discussion in D120781. Probably worth making this change regardless of the resolution of D120781. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D121630	2022-03-14 14:37:34 -07:00
Andrew Litteken	c79ab1065e	[IROutliner] Separate split PHI nodes from multiple exits by different outlinable regions. The IR Outliner is supposed to extract the outputs contained in an external phi node and place them into a phi node contained within the outlined function. However, when the output values of two outlined functions with two different output sets are contained within the same phi node, they are counted as the same exit path when first analyzed. In reality, these create two different phi nodes, creating an inconsistency, resulting in a mismatch in the expected number of output paths and a crash. This fixes that counting when analyzing the outputs by also analyzing the incoming blocks rather than just the incoming values. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D121313	2022-03-14 14:56:59 -05:00
Teresa Johnson	fee0bde4c6	[WPD] Extend checking mode to support fallback to indirect call Extend -wholeprogramdevirt-check to support both the existing trapping mode on an incorrect devirtualization, as well as a new mode to fallback to an indirect call on a mismatch. The new mode is The new mode is useful in cases where we want to enable devirtualization but cannot fully guarantee whole program visibility (e.g in the case where LTO has been disabled for a small set of objects that could potentially override virtual methods without having a symbol reference to anything in the base class including the vtable). Remove !prof and !callees metadata (which are used by indirect call promotion) from both the new direct call and the fallback indirect call (so that we don't perform another round of promotion on the latter). Also remove it from the direct call in the non-fallback cases, which was an oversight, although it didn't seem to cause any issues. Add tests for the metadata removal covering the various cases. Differential Revision: https://reviews.llvm.org/D121419	2022-03-14 10:16:28 -07:00
Andrew Litteken	3c90812f3b	[IROutliner] Avoid reusing PHINodes that have already been matched when merging outlined functions' phi node blocks When there are two external phi nodes for two different outlined regions, when compressing the created phi nodes between the two regions, the matching for the second phi node in the second region matches the first phi node created for the first region rather than the second phi node created for the first region. This adds an extra output path where there should not be one. The fix is the ignore phi nodes that have already been matched for each region. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D121312	2022-03-14 12:00:01 -05:00
Nikita Popov	3ec44c22b1	[DeadArgElim] Guard against function type mismatch If the call function type and function type don't match, we should consider the function live (there is effectively a bitcast sitting in between).	2022-03-14 13:03:04 +01:00
Johannes Doerfert	85daf6973d	[Attributor] Remove capture tracker usage and follow uses explicitly Before we used the capture tracker to follow pointer uses, now we do it explicitly ourselves through the Attributor API. There are multiple benefits: For one, the boilerplate is cut down by a lot. The class, potential copies vector, etc. is all not needed anymore. We also do avoid explicitly looking through memory here, something that was duplicated and should only live in the `checkForAllUses~ helper. More importantly, as we do simplifications we need to make sure all parties are in sync when they reason about uses. The old way did not allow us to do this but the new one does as every use visiting AA goes through `checkForAllUses` now..	2022-03-11 22:56:16 -06:00
Johannes Doerfert	f44f60a297	[Attributor] Avoid replacing return operands twice As replacements will become more complex it is better to have a single AA responsible for replacing a use. Before this patch AAValueSimplify* and AAValueSimplifyReturned could both try to replace the returned value. The latter was marginally better for the old pass manager when a function was already carrying a `returned` attribute and when the context of the return instruction was important. The second shortcoming was resolved by looking for return attributes in the AAValueSimplifyCallSiteReturned initialization. The old PM impact is not concerning. This is yet another step towards the removal of AAReturnedValues, the very first AA we should now try to eliminate due to the overlapping logic with value simplification.	2022-03-11 21:55:19 -06:00
Johannes Doerfert	55a970fbd4	[Attributor][FIX] Make sure to not ignore non-load users of stores When we look through memory for a store we used to allow any other use of the memory that is reachable. This is generally OK but we need to make sure to actually let the user look at these properly. For now, we simply require loads (via exact reloads).	2022-03-11 18:41:13 -06:00
Johannes Doerfert	f3ad8cf00e	[Attributor] Cleanup manifest and liveness for CGSCC passes There was some ad-hoc handling of liveness and manifest to avoid breaking CGSCC guarantees. Things always slipped through though. This cleanup will: 1) Prevent us from manifesting any "information" outside the CGSCC. This might be too conservative but we need to opt-in to annotation not try to avoid some problematic ones. 2) Avoid running any liveness analysis outside the CGSCC. We did have some AAIsDeadFunction handling to this end but we need this for all AAIsDead classes. The reason is that AAIsDead information is only correct if we actually manifest it, since we don't (see point 1) we cannot actually derive/use it at all. We are currently trying to avoid running any AA updates outside the CGSCC but that seems to impact things quite a bit. 3) Assert, don't check, that our modifications (during cleanup) modifies only CGSCC functions.	2022-03-11 16:46:02 -06:00
Johannes Doerfert	9ddb1a49ac	[Attributor][FIX] Avoid double free (and useless state copy) In an attempt to remove the memory leak we introduced a double free. The problem was that we allowed a plain copy of the state and it was actually used. The use was useless, so it is gone now. The copy constructor is gone as well. The move constructor ensures the Accesses pointers are owned by a single state, I hope. Reported by: https://lab.llvm.org/buildbot/#/builders/16/builds/25820	2022-03-11 10:10:36 -06:00
Johannes Doerfert	3570b0c5c7	[Attributor][FIX] Remove memory leak The leak was introduced when we made things deterministic. It was reported by the sanitizer buildbot: https://lab.llvm.org/buildbot/#/builders/168	2022-03-11 09:52:44 -06:00
Florian Hahn	e07b899192	[FunctionAttrs] Rename addReadAttrs -> addMemoryAttrs. The addReadAttrs name is out of date, as the function also adds the writeonly attribute. addMemoryAttrs is more accurate.	2022-03-11 11:49:22 +00:00
Johannes Doerfert	e8fadafe77	[Attributor][NFCI] Make AAPointerInfo deterministic The order in which we kept accesses was non-deterministic and a debug output was a pointer value. Fixed both.	2022-03-10 23:27:47 -06:00
Johannes Doerfert	7211dbd01d	[Attributor][NFCI] Remove non-deterministic behavior and debug output	2022-03-10 23:27:47 -06:00
Alok Kumar Sharma	94823500a7	[DebugInfo][SROA] Correct debug info for global variables in case of SROA The existing handling produced crash for test case (attached with patch). Now the function transferSRADebugInfo is modified to - Ignore the current variable if it starts after the current Fragment. - Ignore the current variable if it ends before the current Fragment. - Generate (!DIExpression()) if current variable completely fits the current Fragment. - Otherwise (as earlier), generate the DW_OP_LLVM_fragment in IR if current Fragment partially defines current variable. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D121107	2022-03-10 00:41:30 +05:30
Andrew Litteken	0b3a6c8d20	[IROutliner] Handling outlined code with no exit paths As a result of adding multiblock outlining, it became possible to outline the entirety of basic block, and branches that only pointed to the basic blocks contained in the outlined section. This means that there are no exit paths, and no return statement. There was a previous assertion from the older version of the outliner that explicitly made sure there was a return statement. This removes that assertion. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D120868	2022-03-09 10:43:48 -08:00
Nikita Popov	f682a8386b	[Attributor] Use byval type instead of pointer element type For compatibility with opaque pointers, use the byval type rather than the pointer element type. Differential Review: https://reviews.llvm.org/D120983	2022-03-09 09:30:42 +01:00
Arthur Eubanks	53e5e58670	[NewPM][Inliner] Make inlined calls to functions in same SCC as callee exponentially expensive Introduce a new attribute "function-inline-cost-multiplier" which multiplies the inline cost of a call site (or all calls to a callee) by the multiplier. When processing the list of calls created by inlining, check each call to see if the new call's callee is in the same SCC as the original callee. If so, set the "function-inline-cost-multiplier" attribute of the new call site to double the original call site's attribute value. This does not happen when the original call site is intra-SCC. This is an alternative to D120584, which marks the call sites as noinline. Hopefully fixes PR45253. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D121084	2022-03-07 23:51:09 -08:00
Johannes Doerfert	5b4acb20ff	[OpenMP][FIX] Ensure flag to disable de-globalization works properly If the user disables de-globalization we did not seed the AAHeapToShared and AAHeapToStack but we still could end up with them through in-flight lookups. With this patch we disable AAHeapToShared completely if the user disabled de-globalization. Heap-2-stack is still run though. Differential Revision: https://reviews.llvm.org/D121059	2022-03-07 23:43:05 -06:00
Nikita Popov	0636c93d3e	[Attributor] Remove restriction on simplifying function pointers Dropping this restriction seems to work fine (there are no assertion failures), so it appears that either the updater got smarter or the problematic cases are restricted elsewhere. If doing this still causes issues, then the place to address it would probably be `8f5bdaf481/llvm/lib/Transforms/IPO/Attributor.cpp (L1856-L1859)`, which already prevents replacement outside the SCC, so I'm not quite sure what this check is intended to avoid. Differential Revision: https://reviews.llvm.org/D120987	2022-03-07 11:54:37 +01:00
Nikita Popov	a9b03d9e2e	[Attributor] Remove function pointer restriction for AAAlign This check is not compatible with opaque pointers. We can avoid it by adjusting the getPointerAlignment() implementation to avoid creating unnecessary ptrtoint expressions for bitcasted pointers. The code already uses OnlyIfReduced to not create an expression if it does not simplify, and this makes sure that folding a bitcast and ptrtoint into a ptrtoint doesn't count as a simplification. Differential Revision: https://reviews.llvm.org/D120904	2022-03-07 10:02:45 +01:00
Johannes Doerfert	5af11ec34b	[Attributor] Determine potentially loaded values through memory We already look through memory to determine where a value that is stored might pop up again (potential copies). This patch introduces the other direction with similar logic. If a value is loaded, we can follow all the accesses to the pointer (or better object) and try to determine what value might have been stored.	2022-03-06 23:26:37 -06:00
Johannes Doerfert	eb73af4af4	[Attributor] Handle undef and null in AAAlignFloating Both `undef` and `nullptr` are maximally aligned. This is especially important as we often see `undef` until a proper value has been identified during simplification.	2022-03-06 23:26:22 -06:00
Johannes Doerfert	ad26e199ff	[Attributor] Use CFG reasoning also for read accesses With D106397 we used CFG reasoning to filter out writes that will not interfere with a given load instruction. With this patch we use the same logic (modulo the reversal in reachability check order) for store instructions. As an example, we can now proof stores to shared memory are dead if all the loads of the shared memory are not reachable from them.	2022-03-06 23:26:22 -06:00
Johannes Doerfert	acb3773491	[Attributor] Improve isValidAtPosition (mostly for old PM) To minimize the test difference between old and new PM we perform some local dominance check if no dominator tree is available.	2022-03-06 23:26:21 -06:00
Johannes Doerfert	ff758372bd	[Attributor][NFCI] Introduce fine-grained anonymous namespaces	2022-03-06 21:28:38 -06:00
Johannes Doerfert	192a34ddb0	[Attributor][OpenMPOpt][FIX] Register simplification callbacks Heap-2-stack and heap-2-shared can replace an allocation call with something else. To avoid us deriving information from the allocator implementation we register a simplification callback now that will force us to stop at the call site. We probably should create the replacement memory eagerly and return that instead though.	2022-03-06 21:28:38 -06:00
Johannes Doerfert	5859ae6a5d	[Attributor][FIX] Use maximal access for dereferenceability deduction While we can use range information when we derive dereferenceability we must make sure to pick he right end of the range. Before we always went with the minimal offset, which is not correct if we want to combine the base dereferenceability with some offset. In that case it's the maximum that gives the correct result.	2022-03-06 21:28:38 -06:00
Johannes Doerfert	1fcd4d0e3b	[Attributor][FIX] Initialize stack variable	2022-03-06 21:28:38 -06:00
Johannes Doerfert	6158f4a466	[Attributor][NFCI] No repeated manifest of AAValueSimplifyReturned (CGSCC)	2022-03-06 19:59:23 -06:00
Johannes Doerfert	efedf70aa5	[Attributor][NFC] Expose helper with more generic interface This simply makes the function argument of the `Attributor::checkForAllInstructions` helper explicit so one can iterate over instructions in other functions.	2022-03-06 19:59:23 -06:00
Johannes Doerfert	8fa839aa58	[Attributor][NFC] Improve debug messages	2022-03-06 19:59:22 -06:00
William S. Moses	87ec6f41bb	[OpenMPIRBuilder] Allocate temporary at the correct block in a nested parallel The OpenMPIRBuilder has a bug. Specifically, suppose you have two nested openmp parallel regions (writing with MLIR for ease) ``` omp.parallel { %a = ... omp.parallel { use(%a) } } ``` As OpenMP only permits pointer-like inputs, the builder will wrap all of the inputs into a stack allocation, and then pass this allocation to the inner parallel. For example, we would want to get something like the following: ``` omp.parallel { %a = ... %tmp = alloc store %tmp[] = %a kmpc_fork(outlined, %tmp) } ``` However, in practice, this is not what currently occurs in the context of nested parallel regions. Specifically to the OpenMPIRBuilder, the entirety of the function (at the LLVM level) is currently inlined with blocks marking the corresponding start and end of each region. ``` entry: ... parallel1: %a = ... ... parallel2: use(%a) ... endparallel2: ... endparallel1: ... ``` When the allocation is inserted, it presently inserted into the parent of the entire function (e.g. entry) rather than the parent allocation scope to the function being outlined. If we were outlining parallel2, the corresponding alloca location would be parallel1. This causes a variety of bugs, including https://github.com/llvm/llvm-project/issues/54165 as one example. This PR allows the stack allocation to be created at the correct allocation block, and thus remedies such issues. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121061	2022-03-06 18:34:25 -05:00
Johannes Doerfert	f9c2d6005e	[OpenMP][FIX] Ensure custom state machine works The custom state machine had a check for surplus threads that filtered the main thread if the kernel was executed by a single warp only. We now first check for the main thread, then for surplus threads, avoiding to filter the former out. Fixes #54214. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D121011	2022-03-04 13:51:19 -05:00
Nikita Popov	6b5b367858	[Attributor] Remove function pointer type check (NFCI) This check is not relevant for correctness, it can only avoid walking some recursive uses if the cast is to a non-function pointer type. As this distinction will no longer be possible with opaque pointers and all users will have to be walked anyway, I'm dropping the check in advance.	2022-03-04 12:09:51 +01:00
Jez Ng	dd29597e10	[LTO] Initialize canAutoHide() using canBeOmittedFromSymbolTable() Per discussion on https://reviews.llvm.org/D59709#inline-1148734, this seems like the right course of action. `canBeOmittedFromSymbolTable()` subsumes and generalizes the previous logic. In addition to handling `linkonce_odr` `unnamed_addr` globals, we now also internalize `linkonce_odr` + `local_unnamed_addr` constants. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D120173	2022-03-03 19:04:11 -05:00
Arthur Eubanks	f0b61f7957	Revert "[GlobalOpt] Don't replace alias with aliasee if either alias/aliasee may be preemptible" This reverts commit `30e8f83c84`. Causes huge compile time regressions on certain large files. Will followup offline with author.	2022-03-03 11:04:14 -08:00
Nikita Popov	1b6663a104	[FuncSpec] Remove unnecessary function pointer type check We will check a bit later that the constant is in fact a function, so the separate check for a function pointer type is largely redunant. Also simplify the cast stripping with stripPointerCasts().	2022-03-03 15:20:11 +01:00
Alexandros Lamprineas	910eb988eb	[FuncSpec][NFC] Refactor internal structures. `ArgInfo` is reduced to only contain a pair of {formal,actual} values. The specialized function `Fn` and the `Partial` flag are redundant in this structure. The `Gain` is moved to a new struct `SpecializationInfo`. The value mappings created by cloneCandidateFunction() are being used by rewriteCallSites() for matching the formal arguments of recursive functions. The list of specializations is passed by reference to calculateGains() instead of being returned by value. The `IsPartial` flag is removed from isArgumentInteresting() and getPossibleConstants() as it's no longer used anywhere in the code. Differential Revision: https://reviews.llvm.org/D120753	2022-03-03 13:08:13 +00:00
serge-sans-paille	59630917d6	Cleanup includes: Transform/Scalar Estimated impact on preprocessor output line: before: 1062981579 after: 1062494547 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120817	2022-03-03 07:56:34 +01:00
Hongtao Yu	07846e3387	[CSSPGO][PriorityInliner] Do not use block weight to drive callsite inlining. The priority-based inliner currenlty uses block count combined with callee entry count to drive callsite inlining. This doesn't work well with LTO where postlink inlining is driven by prelink-annotated block count which could be based on the merge of all context profiles. I'm fixing it by using callee profile entry count only which should be context-sensitive. I'm seeing 0.2% perf improvment for one of our internal large benchmarks with probe-based non-CS profile. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D120784	2022-03-01 18:43:19 -08:00
Joseph Huber	6632180745	[OpenMP][NFC] Add an option to print the module before in OpenMPOpt Previously there was a debug flag to print the module after optimizations. Sometimes we wanted to print the module before optimizations so this is being split into two flags. `-openmp-opt-print-module` is now `-openmp-opt-print-module-after`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120768	2022-03-01 17:09:09 -05:00
serge-sans-paille	a494ae43be	Cleanup includes: TransformsUtils Estimation on the impact on preprocessor output: before: 1065307662 after: 1064800684 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120741	2022-03-01 21:00:07 +01:00
serge-sans-paille	71c3a5519d	Cleanup includes: LLVMAnalysis Number of lines output by preprocessor: before: 1065940348 after: 1065307662 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120659	2022-03-01 18:01:54 +01:00
Alexandros Lamprineas	33830326aa	[FuncSpec] Remove definitions of fully specialized functions. A function is basically dead when: * it has no uses * it has only self-referencing uses (it's recursive) Differential Revision: https://reviews.llvm.org/D119878	2022-03-01 11:57:08 +00:00
Alexandros Lamprineas	b803aee67b	[FuncSpec][NFC] Improve debug messages. Adds diagnostic messages when debugging the pass. Differential Revision: https://reviews.llvm.org/D119875	2022-03-01 11:55:08 +00:00
Alexandros Lamprineas	7b74123a3d	[FuncSpec][NFC] Variable renaming. Just preparing the ground for follow up patches to make the reviews easier. Differential Revision: https://reviews.llvm.org/D119874	2022-03-01 11:38:57 +00:00
Simon Pilgrim	3b422455dd	[IPO] AAFunctionReachabilityFunction.updateImpl - reduce AAReachability scope. NFCI. We already have a check for !InstQueries.empty(), so move the for-range over InstQueries inside to avoid the AAReachability uninitialized variable static analysis warnings.	2022-02-25 14:42:31 +00:00
minglotus-6	142cedc283	[SampleProf][Inliner] Add an option to turn off inliner in sample-profile pass. Use case is offline evaluation (for inliner effectiveness) or debugging. Differential Revision: https://reviews.llvm.org/D120344	2022-02-23 14:21:33 -08:00
Augie Fackler	95f3cc222a	AttributorAttributes: avoid a crashing on bad alignments Prior to this change, LLVM would attempt to optimize an aligned_alloc(33, ...) call to the stack. This flunked an assertion when trying to emit the alloca, which crashed LLVM. Avoid that with extra checks. Differential Revision: https://reviews.llvm.org/D119604	2022-02-23 14:21:02 -05:00
Arthur Eubanks	1fd980de04	Revert "AttributorAttributes: avoid a crashing on bad alignments" This reverts commit `70ff6fbeb9`. Breaks bots, e.g. http://45.33.8.238/linux/69375/step_12.txt.	2022-02-23 09:08:03 -08:00
Augie Fackler	70ff6fbeb9	AttributorAttributes: avoid a crashing on bad alignments Prior to this change, LLVM would attempt to optimize an aligned_alloc(33, ...) call to the stack. This flunked an assertion when trying to emit the alloca, which crashed LLVM. Avoid that with extra checks. Differential Revision: https://reviews.llvm.org/D119604	2022-02-23 11:46:15 -05:00
Bill Wendling	a5bbc6ef99	[NFC] Remove unnecessary "#include"s from header files	2022-02-23 01:20:48 -08:00
minglotus-6	f415d74d1d	[SampleProfile] Handle the case when the option `MaxNumPromotions` is zero. In places where `MaxNumPromotions` is used to allocated an array, bail out early to prevent allocating an array of length 0. Differential Revision: https://reviews.llvm.org/D120295	2022-02-22 21:44:32 -08:00
Joseph Huber	0136a4401f	[OpenMP] Add an option to limit shared memory usage in OpenMPOpt One of the optimizations performed in OpenMPOpt pushes globalized variables to static shared memory. This is preferable to keeping the runtime call in all cases, however if too many variables are pushed to hared memory the kernel will crash. Since this is an optimization and not something the user specified explicitly, there should be an option to limit this optimization in those cases. This path introduces the `-openmp-opt-shared-limit=` option to limit the amount of bytes that will be placed in shared memory from HeapToShared. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120079	2022-02-18 08:35:26 -05:00
Kuba Mracek	6b53ad298e	[GlobalDCE] [VFE] Avoid dropping vfunc dependencies when an invalid vtable entry is present When we scan vtables for a particular vload in ScanVTableLoad and an entry in one possible vtable is invalid (null or non-fptr), we bail in a wrong way -- we completely stop the scanning of vtables and this results in dropped dependencies and incorrectly removed vfuncs from vtables. Let's fix that by correcting the bailing logic to keep iterating and only skip the invalid entries. Differential Revision: https://reviews.llvm.org/D120006	2022-02-17 19:41:46 -08:00
William S. Moses	d9da6a535f	[LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information. This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D119965	2022-02-17 20:13:07 -05:00
Joseph Huber	74cacf212b	[OpenMP] Add RTL function to externalization RAII This patch adds the '_kmpc_get_hardware_num_threads_in_block' OpenMP RTL function to the externalization RAII struct. This was getting optimized out and then being replaced with an undefined value once added back in, causing bugs for complex reductions. Fixes #53909. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120076	2022-02-17 14:30:58 -05:00
Johannes Doerfert	254d6da020	[Attributor][FIX] Ensure stable iteration order With `668c5c688b` we introduced an ordering issue revealed by the reverse iteration buildbot. Depending on the order of the map that tracks the AAIsDead AAs we ended up with slightly different attributes. This is not totally unexpected and can happen. We should however be deterministic in our orderings to avoid such issues.	2022-02-17 12:53:10 -06:00
Jay Foad	9071393c18	[GlobalDCE] Simplify and return Changed = true less often Removing dead constants should not count as making a change to the module. This means that RemoveUnusedGlobalValue simplifies to just calling removeDeadConstantUsers, so inline it. Differential Revision: https://reviews.llvm.org/D120052	2022-02-17 16:03:13 +00:00
Roman Lebedev	371fcb720e	[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in https://github.com/llvm/llvm-project/issues/53853 and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574 This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes https://github.com/llvm/llvm-project/issues/53853 Refs. https://github.com/rust-lang/rust/issues/85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854	2022-02-17 12:13:55 +03:00
Johannes Doerfert	8ad39fbaf2	[Attributor][FIX] Heap2Stack needs to use the alloca AS When we move an allocation from the heap to the stack we need to allocate it in the alloca AS and then cast the result. This also prevents us from inserting the alloca after the allocation call but rather right before. Fixes https://github.com/llvm/llvm-project/issues/53858	2022-02-16 15:58:32 -06:00
Johannes Doerfert	668c5c688b	[Attributor][FIX] Use liveness information of the right function When we use liveness for edges during the `genericValueTraversal` we need to make sure to use the AAIsDead of the correct function. This patch adds the proper logic and some simple caching scheme. We also add an assertion to the `isEdgeDead` call to make sure future misuse is detected earlier. Fixes https://github.com/llvm/llvm-project/issues/53872	2022-02-16 15:58:32 -06:00
Johannes Doerfert	6ed1ef0643	[Attributor][FIX] Pipe UsedAssumedInformation through more interfaces `UsedAssumedInformation` is a return argument utilized to determine what information is known. Most APIs used it already but `genericValueTraversal` did not. This adds it to `genericValueTraversal` and replaces `AllCallSitesKnown` of `checkForAllCallSites` with the commonly used `UsedAssumedInformation`. This was supposed to be a NFC commit, then the test change appeared. Turns out, we had one user of `AllCallSitesKnown` (AANoReturn) and the way we set `AllCallSitesKnown` was wrong as we ignored the fact some call sites were optimistically assumed dead. Included a dedicated test for this as well now. Fixes https://github.com/llvm/llvm-project/issues/53884	2022-02-16 14:44:20 -06:00
Bin Cheng	dfec0b3053	[FuncSpec] Save compilation time by caching uses for propagation We only need to do propagation on use instructions of the original value, rather than the replacing const value which might have lots of irrelavant uses. This is done by caching uses before replacing. Differential Revision: https://reviews.llvm.org/D119815	2022-02-16 10:46:26 +08:00
Hongtao Yu	62ef77ca63	[CSSPGO] Do not merge a context that is already duplicated into the base profile. Do not merge a context that is already duplicated into the base profile. Also fixing a typo caused by previous refactoring. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119735	2022-02-14 18:07:11 -08:00
Nick Desaulniers	9dcb006165	[funcattrs] check reachability to improve noreturn There was a fixme in the code pertaining to attributing functions as noreturn. By using reachability, if none of the blocks that are reachable from the entry return, then the function is noreturn. Previously, the code only checked if any blocks returned. If they're unreachable, then they don't matter. This improves codegen for the Linux kernel. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1563 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119571	2022-02-14 14:01:59 -08:00
Nikita Popov	41c5a762e5	[DeadArgElim] Check that function type is the same If the function types differ, the call arguments don't necessarily correspon to the function arguments. It's likely not worthwhile to handle this more precisely, but at least we shouldn't crash.	2022-02-14 14:08:42 +01:00
Dávid Bolvanský	1be1fd735d	[AlwaysInliner] Check for callsite noinline attribute simplified	2022-02-14 09:33:30 +01:00
Johannes Doerfert	ede248e614	[OpenMP][FIX] The `llvm.amdgcn.s.barrier` is actually not aligned If we assume `llvm.amdgcn.s.barrier` is aligned we may remove it and cause OpenMP GPU applications on the AMD GPU to be stuck or wrongly synchronized. Reported by Carlo Bertolli.	2022-02-11 12:42:50 -06:00
Dávid Bolvanský	d828281e78	[AlwaysInliner] Respect noinline call site attribute ``` always_inline foo() { } bar () { noinline foo(); } ``` We should prefer call site attribute over attribute on decl. This is fix for AlwaysInliner, similar fix is needed for normal Inliner (follow up). Related to https://reviews.llvm.org/D119061 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D119553	2022-02-11 19:23:11 +01:00
Nikita Popov	e24067819f	[ArgPromotion] Protect harder against recursive promotion (PR42028) In addition to the self-recursion check, also check whether there is more than one node in the SCC, which implies that there is a larger cycle. I believe checking SCC structure (rather than something like norecurse) is the right thing to do here, because this is specifically about preventing infinite loops over the SCC. Fixes https://github.com/llvm/llvm-project/issues/42028. Differential Revision: https://reviews.llvm.org/D119418	2022-02-11 09:30:39 +01:00
Teresa Johnson	dd3f483335	[ThinLTO][WPD] LICM set lookup (NFC) Minor efficiency fix. There is no reason to perform the same set lookup repeatedly in the inner loop as it is invariant there. Differential Revision: https://reviews.llvm.org/D119474	2022-02-10 13:16:31 -08:00
Johannes Doerfert	dd75c0ea64	[Attributor][NFC] Expose new API in AAPointerInfo New users might want to check bins without a load or store instruction at hand. Since we use those instructions only to find the offset and size of the access anyway, we can expose an offset and size interface to the outside world as well. This commit mainly moves code around and exposes a class (OffsetAndSize) as well as a method forallInterferingAccesses in AAPointerInfo. Differential Revision: https://reviews.llvm.org/D119249	2022-02-10 13:52:24 -06:00
Johannes Doerfert	d1387a26a5	[Attributor][FIX] Reachability needs to account for readonly callees The oversight caused us to ignore call sites that are effectively dead when we computed reachability (or more precise the call edges of a function). The problem is that loads in the readonly callee might depend on stores prior to the callee. If we do not track the call edge we mistakenly assumed the store before the call cannot reach the load. The problem is nicely visible in: `llvm/test/Transforms/Attributor/ArgumentPromotion/basictest.ll` Caused by D118673. Fixes https://github.com/llvm/llvm-project/issues/53726	2022-02-10 13:52:24 -06:00
Johannes Doerfert	e39b419312	[Attributor][FIX] Honor alloca address space in AAPrivatizablePtr When we privatize a pointer (~argument promotion) we introduce new private allocas as replacement. These need to be placed in the alloca address space as later passes cannot properly deal with them otherwise. Fixes https://github.com/llvm/llvm-project/issues/53725	2022-02-10 13:52:24 -06:00
Nikita Popov	8018d6be34	[ArgPromotion] Transfer metadata to promoted loads Also transfer selected non-AA metadata to the promoted load. Only metadata from guaranteed to execute loads is transferred.	2022-02-10 11:28:07 +01:00
Nikita Popov	68c1eeb4ba	[ArgPromotion] Make implementation offset based This rewrites ArgPromotion to be based on offsets rather than GEP structure. We inspect all loads at constant offsets and remember which types are loaded at which offsets. Then we promote based on those types. This generalizes ArgPromotion to work with bitcasted loads, and is compatible with opaque pointers. This patch also fixes incorrect handling of alignment during argument promotion. Previously, the implementation only checked that the pointer is dereferenceable, but was happy to speculate overaligned loads. (I would have fixed this separately in advance, but I found this hard to do with the previous implementation approach). Differential Revision: https://reviews.llvm.org/D118685	2022-02-09 09:35:01 +01:00
Sylvestre Ledru	f2c2e924e7	Fix a typo (occured => occurred) Reported: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1005195	2022-02-08 21:35:26 +01:00
Joseph Huber	caf7f05c1c	[Attributor] Emit fixed-point remark on function list This patch replaces the function we emit the remark on when we run into the fix-point limit. Previously we got a function to emit a remark on from the worklist's associated function. However, the worklist may not always have an associated function in the case of global variables. Replace this with the function set, and if there are no functions don't emit the remark. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119248	2022-02-08 12:10:21 -05:00
Nikita Popov	b896334834	[ArgPromotion] Check dereferenceability on argument as well Before walking all the callers, check whether we have a dereferenceable attribute directly on the argument. Also make it clearer that the code currently does not treat alignment correctly.	2022-02-08 10:29:51 +01:00
Johannes Doerfert	dd101c808b	[Attributor][FIX] Do not use assumed information for UB detection The helper `Attributor::checkForAllReturnedValuesAndReturnInsts` simplifies the returned value optimistically. In `AAUndefinedBehavior` we cannot use such optimistic values when deducing UB. As a result, we assumed UB for the return value of a function because we initially (=optimistically) thought the function return is `undef`. While we later adjusted this properly, the `AAUndefinedBehavior` was under the impression the return value is "known" (=fix) and could never change. To correct this we use `Attributor::checkForAllInstructions` and then manually to perform simplification of the return value, only allowing known values to be used. This actually matches the other UB deductions. Fixes #53647	2022-02-07 20:19:19 -06:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Kazu Hirata	cb13ebbf46	[Transforms] Use default member initialization in AAIsDeadCallSiteReturned (NFC)	2022-02-05 21:39:25 -08:00
Hongtao Yu	dee058c670	[CSSPGO] Turn on ext-tsp by default for CSSPGO. I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119048	2022-02-04 19:46:44 -08:00
Joseph Huber	6b78526b1b	[OpenMP] Emit remark on the captured call instead of the variable Changes the remark to emit on the function call that captures the globalized variable instead of the globalized variable itself. The user should be able to see which variable it was in the argument list of the function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106980	2022-02-04 17:50:53 -05:00
Benjamin Kramer	85243124cf	Tweak some uses of std::iota to skip initializing the underlying storage. NFCI.	2022-02-04 17:00:50 +01:00
Kazu Hirata	3710078ceb	[SampleProfile] Reduce indentation with an early return (NFC)	2022-02-03 12:22:23 -08:00
Alexandros Lamprineas	438a81a284	[Function Specialisation] Fix use after free This is a fix for a use-after-free found by the address sanitizer when compiling GCC: https://github.com/llvm/llvm-project/issues/52821 The Function Specialization pass may remove instructions, cached inside the PredicateBase class, which are later being dereferenced from the SCCPInstVisitor class. To prevent the dangling references I am lazily deleting the dead instructions after the Solver has run. Differential Revision: https://reviews.llvm.org/D118591	2022-02-02 16:32:10 +00:00
serge-sans-paille	e188aae406	Cleanup header dependencies in LLVMCore Based on the output of include-what-you-use. This is a big chunk of changes. It is very likely to break downstream code unless they took a lot of care in avoiding hidden ehader dependencies, something the LLVM codebase doesn't do that well :-/ I've tried to summarize the biggest change below: - llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h - llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h - llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h - llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h - llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h - llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h - llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h And the usual count of preprocessed lines: $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l before: 6400831 after: 6189948 200k lines less to process is no that bad ;-) Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D118652	2022-02-02 06:54:20 +01:00
Fangrui Song	30e8f83c84	[GlobalOpt] Don't replace alias with aliasee if either alias/aliasee may be preemptible Generalize D99629 for ELF. A default visibility non-local symbol is preemptible in a -shared link. `isInterposable` is an insufficient condition. Moreover, a non-preemptible alias may be referenced in a sub constant expression which intends to lower to a PC-relative relocation. Replacing the alias with a preemptible aliasee may introduce a linker error. Respect dso_preemptable and suppress optimization to fix the abose issues. With the change, `alias = 345` will not be rewritten to use aliasee in a `-fpic` compile. ``` int aliasee; extern int alias __attribute__((alias("aliasee"), visibility("hidden"))); void foo() { alias = 345; } // intended to access the local copy ``` While here, refine the condition for the alias as well. For some binary formats like COFF, `isInterposable` is a sufficient condition. But I think canonicalization for the changed case has little advantage, so I don't bother to add the `Triple(M.getTargetTriple()).isOSBinFormatELF()` or `getPICLevel/getPIELevel` complexity. For instrumentations, it's recommended not to create aliases that refer to globals that have a weak linkage or is preemptible. However, the following is supported and the IR needs to handle such cases. ``` int aliasee __attribute__((weak)); extern int alias __attribute__((alias("aliasee"))); ``` There are other places where GlobalAlias isInterposable usage may need to be fixed. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D107249	2022-02-01 10:41:16 -08:00
Nikita Popov	1652c3b80c	[GlobalOpt] Avoid early exit before dead constant check In a similar vein to `236fbf571d`, make sure we don't early-exit before the dead constant check.	2022-02-01 15:57:19 +01:00
Nikita Popov	79179a378b	[ArgPromotion] Use range-based for loop (NFC)	2022-02-01 10:34:14 +01:00
Johannes Doerfert	3b8ffe668d	[Attributor][FIX] Relax assertion in IRPosition::verify A call base can be a floating value if we talk about the instruction and not the return value. This distinction was not made before but is important for liveness, e.g., a call site return value might be unused (=dead) but the call site is not.	2022-02-01 02:25:44 -06:00
Johannes Doerfert	a265cf22af	[Attributor] Introduce the `AA::isPotentiallyReachable` helper APIs To make usage easier (compared to the many reachability related AAs), this patch introduces a helper API, `AA::isPotentiallyReachable`, which performs all the necessary steps. It also does the "backwards" reachability (see D106720) as that simplifies the AA a lot (backwards queries were somewhat different from the other query resolvers), and ensures we use cached values in every stage. To test inter-procedural reachability in a reasonable way this patch includes an extension to `AAPointerInfo::forallInterferingWrites`. Basically, we can exclude writes if they cannot reach a load "during the lifetime" of the allocation. That is, we need to go up the call graph to determine reachability until we can determine the allocation would be dead in the caller. This leads to new constant propagations (through memory) in `value-simplify-pointer-info-gpu.ll`. Note: The new code contains plenty debug output to determine how reachability queries are resolved. Parts extracted from D110078. Differential Revision: https://reviews.llvm.org/D118673	2022-02-01 01:40:45 -06:00
Johannes Doerfert	b51b83f68e	[Attributor] Introduce the concept of query AAs D106720 introduced features that did not work properly as we could add new queries after a fixpoint was reached and which could not be answered by the information gathered up to the fixpoint alone. As an alternative to D110078, which forced eager computation where we want to continue to be lazy, this patch fixes the problem. QueryAAs are AAs that allow lazy queries during their lifetime. They are never fixed if they have no outstanding dependences and always run as part of the updates in an iteration. To determine if we are done, all query AAs are asked if they received new queries, if not, we only need to consider updated AAs, as before. If new queries are present we go for another iteration. Differential Revision: https://reviews.llvm.org/D118669	2022-02-01 01:40:44 -06:00
Kuter Dinel	b2d1ae0611	[Attributor] AAFunctionReachability, Instruction reachability. This patch implement instruction reachability for AAFunctionReachability attribute. It is used to tell if a certain instruction can reach a function transitively. NOTE: I created a new commit based of D106720 and set the author back to Kuter. Other metadata, etc. is wrong. I also addressed the remaining review comments and fixed the unit test. Differential Revision: https://reviews.llvm.org/D106720	2022-02-01 01:40:44 -06:00
Johannes Doerfert	ac3ec22df9	[Attributor] Use AAFunctionReachability to determine AANoRecurse We missed out on AANoRecurse in the module pass because we had no call graph. With AAFunctionReachability we can simply ask if the function may reach itself. Differential Revision: https://reviews.llvm.org/D110099	2022-02-01 01:40:44 -06:00
Johannes Doerfert	d1186ce7a9	[Attributor] Make interprocedural value explicit in genericValueTraversal genericValueTraversal can look through arguments and allow value simplification across function boundaries. In fact, the latter already happened unchecked. With this change we allow the user of genericValueTraversal to opt-out of interprocedural traversal if required. We explicitly look through arguments now which helps to do various things, incl. the propagation of constants into OpenMP parallel regions (on the host).	2022-02-01 01:40:44 -06:00
Johannes Doerfert	a1db0e523d	[Attributor][FIX] Liveness handling in the isAssumedDead helpers This fixes a conceptual problem with our AAIsDead usage which conflated call site liveness with call site return value liveness. Without the fix tests would obviously miscompile as we make genericValueTraversal more powerful (in a follow up). The effects on the tests are mixed but mostly marginal. The most prominent one is the lack of `noreturn` for functions. The reason is that we make entire blocks live at the same time (for time reasons). Now that we actually look at the block liveness, which we need to do, the return instructions are live and will survive. As an example, `noreturn_async.ll` has been modified to retain the `noreturn` even with block granularity. We could address this easily but there is little need in practice.	2022-02-01 01:18:52 -06:00
Johannes Doerfert	0f471710f8	[Attributor] Use edge liveness rather than block liveness We moved to the edge API a while back, not all uses were adjusted. Edge liveness is more precise.	2022-02-01 01:18:51 -06:00
Johannes Doerfert	53b6753bdd	[Attributor][FIX] Address two oversights in AAIsDead No tests as these were found browsing the code and I'm not sure how to test them properly.	2022-02-01 01:18:51 -06:00
Johannes Doerfert	cfabffb034	[Attributor][NFCI] Improve debug diagnostic	2022-02-01 01:18:51 -06:00
Johannes Doerfert	adf0d57f15	[Attributor] Provide convenient helpers for isAssumedRead{None,Only} We have two attributes that can answer readnone queries. While there is a dependence between them, it seems best to not force the users to know what AA to ask. The helpers also allow to check for readonly nicely. Test changes show where we now deduce readnone but haven't before, mostly because we only asked AAMemoryBehavior and not AAMemoryLocation. AANoAlias has not been ported to the new API yet.	2022-02-01 01:18:51 -06:00
Johannes Doerfert	e140d51319	[Attributor] Use CFG reasoning to filter potentially interfering writes Since D104432 we can look through memory by analyzing all writes that might interfere with a load. This patch provides some logic to exclude writes that cannot interfere with a location, due to CFG reasoning. We make sure to avoid multi-thread write-read situations properly while we ignore writes that cannot reach a load or writes that will be overwritten before the load is reached. Differential Revision: https://reviews.llvm.org/D106397	2022-02-01 01:18:51 -06:00
Johannes Doerfert	191fa419a6	[Attributor][NFC] Make debug output more useful and concise	2022-02-01 01:18:51 -06:00
Johannes Doerfert	3f0e670498	[Attributor][NFCI] Expose some nosync reasoning to outside users. No-sync is a property that we need in more places as complex transformations emerge. To simplify the query we provide an `AA::isNoSyncInst` helper now and expose two existing helpers through the `AANoSync` class.	2022-02-01 01:07:50 -06:00
Johannes Doerfert	a5b6aef24e	[Attributor][NFCI] Remove anonymous namespaces The namespaces made it more complicate to implement static helpers, among other things. We should not need them at all.	2022-02-01 01:07:50 -06:00
Johannes Doerfert	3c8a4c6f47	[OpenMP] Eliminate redundant barriers in the same block Patch originally by Giorgis Georgakoudis (@ggeorgakoudis), typos and bugs introduced later by me. This patch allows us to remove redundant barriers if they are part of a "consecutive" pair of barriers in a basic block with no impacted memory effect (read or write) in-between them. Memory accesses to local (=thread private) or constant memory are allowed to appear. Technically we could also allow any other memory that is not used to share information between threads, e.g., the result of a malloc that is also not captured. However, it will be easier to do more reasoning once the code is put into an AA. That will also allow us to look through phis/selects reasonably. At that point we should also deal with calls, barriers in different blocks, and other complexities. Differential Revision: https://reviews.llvm.org/D118002	2022-02-01 01:07:50 -06:00
Johannes Doerfert	989674f110	[OpenMP] Ensure to remove noinline from all runtime functions eventually We used to remove noinline from known OpenMP runtime functions (which are declared in OMPKinds.td). Now we remove noinline from all functions with the proper prefixes: __kmpc, _ZN4_OMP (= namespace omp), omp_	2022-02-01 01:07:50 -06:00
Fangrui Song	7aaf024dac	[BitcodeWriter] Fix cases of some functions `WriteIndexToFile` is used by external projects so I do not touch it.	2022-01-31 16:46:11 -08:00
Andrew Litteken	3785c1d055	[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match. This also adds an additional command line flag debug option to disable outlining intrinsics. Recommit of: `8de76bd569` Adds extra checking of intrinsic function calls names to avoid taking the address of intrinsic calls when extracting function calls. Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D109450	2022-01-28 13:52:21 -06:00
Nikita Popov	cf0357a545	[BasicBlockUtils] Fix typo in API name (NFC) detatch -> detach. As this requires touching all uses, also lower-case it in accordance with the style guide.	2022-01-28 16:32:13 +01:00
Nikita Popov	0ebbf3435f	[ArgPromotion] Don't assume all entry block instrs are executed We should abort this walk if we hit any instruction that is not guaranteed to transfer.	2022-01-28 16:08:42 +01:00
Nikita Popov	8b36c437df	[ArgPromotion] Make areFunctionArgsABICompatible() static (NFC) This function used to be shared with the Attributor, but can now be made private.	2022-01-28 15:26:36 +01:00
Nikita Popov	9e7a2bfcf7	[OpenMPOpt] Add const qualifier (NFC) Make it clear that this large lambda does not modify the vector.	2022-01-26 10:35:57 +01:00
Giorgis Georgakoudis	7cb4c26173	[OMPIRBuilder] Generate aggregate argument for parallel region outlined functions Summary: This patch modifies code generation in OpenMPIRBuilder to pass arguments to the parallel region outlined function in an aggregate (struct), besides the global_tid and bound_tid arguments. It depends on the updated CodeExtractor (see D96854) for support. It mirrors functionality of Clang codegen (see D102107). Differential Revision: https://reviews.llvm.org/D110114	2022-01-25 20:53:45 -05:00
Andrew Litteken	ba79295c48	[NFC][IROutliner] fix namespace and unused variable	2022-01-25 18:41:30 -06:00
Andrew Litteken	e8f4e41b6b	[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region. We use the same similarity scheme we used for branch instructions for phi nodes, and allow them to be outlined. There is not a lot of special handling needed for these phi nodes when outlining, as they simply act as outputs. The code extractor does not currently allow for non entry blocks within the extracted region to have predecessors, so there are not conflicts to handle with respect to predecessors no longer contained in the function. Recommit of `515eec3553` Reviewers: paquette Differential Revision: https://reviews.llvm.org/D106997	2022-01-25 18:25:50 -06:00
Andrew Litteken	e50b217b4e	Revert "[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region." This reverts commit `515eec3553`. By mistake, commit message was not complete.	2022-01-25 18:24:19 -06:00
Andrew Litteken	515eec3553	[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region.	2022-01-25 18:20:10 -06:00
Andrew Litteken	9c2daf648c	Revert "[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions" This reverts commit `8de76bd569`. Reverting due to failure of different-intrinsics.ll on lld-x86_64-win buildbot.	2022-01-25 18:19:33 -06:00
Dávid Bolvanský	fe30370b00	Reland "[AlwaysInliner] Enable call site inlining to make flatten attribute working again (#53360 )"	2022-01-26 01:11:06 +01:00
Andrew Litteken	8de76bd569	[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match. This also adds an additional command line flag debug option to disable outlining intrinsics. Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D109450	2022-01-25 17:06:09 -06:00
Dávid Bolvanský	90f185c964	Revert "[AlwaysInliner] Enable call site inlining to make flatten attribute working again (#53360 )" This reverts commit `ceec438368`. Clang tests fail.	2022-01-25 23:13:46 +01:00
Dávid Bolvanský	ceec438368	[AlwaysInliner] Enable call site inlining to make flatten attribute working again (#53360 ) Problem: Migration to new PM broke flatten attribute. This is one use case why LLVM should support inlining call-site with alwaysinline. The flatten attribute is nowdays broken, so we should either land patch like this one or remove everything related to flatten attribute from Clang. Second use case is something like "per call site inlining intrinsics" to control inlining even more; mentioned in https://lists.llvm.org/pipermail/cfe-dev/2018-September/059232.html Fixes https://github.com/llvm/llvm-project/issues/53360 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D117965	2022-01-25 22:55:30 +01:00
Andrew Litteken	f5f377d1fc	[IRSim][IROutliner] Adding support for recognizing and outlining indirect function calls, and function calls with different names, but the same type The outliner currently requires that function calls not be indirect calls, and have that the function name, and function type must match, as well as other attributes such as calling conventions. This patch treats called functions as values, and just another operand, and named function calls as constants. This allows functions to be treated like any other constant, or input and output into the outlined functions. There are also debugging flags added to enforce the old behaviors where indirect calls not be allowed, and to enforce the old rule that function calls names must also match. Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D109448	2022-01-25 15:19:28 -06:00
Andrew Litteken	dcc3e728ca	[IROutliner] Allowing Phi Nodes in exit blocks In addition to having multiple exit locations, there can be multiple blocks leading to the same exit location, which results in a potential phi node. If we find that multiple blocks within the region branch to the same block outside the region, resulting in a phi node, the code extractor pulls this phi node into the function and uses it as an output. We make sure that this phi node is given an output slot, and that the two values are removed from the outputs if they are not used anywhere else outside of the region. Across the extracted regions, the phi nodes are combined into a single block for each potential output block, similar to the previous patch. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D106995	2022-01-25 11:33:53 -06:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Joseph Huber	5eb49009eb	[OpenMP] Add more identifier to created shared globals Currenly we push some variables to a global constant containing shared memory as an optimization. This generated constant had internal linkage and should not have collided with any known identifiers in the translation unit. However, there have been observed cases of this optimiztaion unintentionally colliding with undocumented PTX identifiers. This patch adds a suffix to the created globals to hopefully bypass this. Depends on D118059 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D118068	2022-01-24 20:37:54 -05:00
Joseph Huber	06cfdd5224	[OpenMP][Fix] Properly inherit calling convention Previously in OpenMPOpt we did not correctly inherit the calling convention of the callee when creating new OpenMP runtime calls. This created issues when the calling convention was changed during `GlobalOpt` but a new call was creating without the correct calling convention. This lead to the call being replaced with a poison value in `InstCombine` due to undefined behaviour and causing large portions of the program to be incorrectly eliminated. This patch correctly inherits the existing calling convention from the callee. Reviewed By: tianshilei1992, jdoerfert Differential Revision: https://reviews.llvm.org/D118059	2022-01-24 20:37:52 -05:00
Nikita Popov	67346b43e0	[Attributor] Use MemoryLocation to get pointer operand and accessed type (NFCI) This relies on existing APIs and avoids accessing the pointer element type. The alternative would be to extend getPointerOperand() to also return the accessed type, but I figured going through MemoryLocation would be cleaner. Differential Revision: https://reviews.llvm.org/D117868	2022-01-24 10:10:13 +01:00
Nikita Popov	e7762653d3	[Attributor] Avoid some pointer element type accesses	2022-01-21 11:20:10 +01:00
Johannes Doerfert	37e0c58559	[Attributor][FIX] AAValueConstantRange should not loop unconstrained The old method to avoid unconstrained expansion of the constant range in a loop did not work as soon as there were multiple instructions in between the phi and its input. We now take a generic approach and limit the number of updates as a fallback. The old method is kept as it catches "the common case" early.	2022-01-20 18:07:04 -06:00
Johannes Doerfert	7bf9065ad7	[Attributor][NFC] Clang format	2022-01-20 18:06:53 -06:00
Sjoerd Meijer	fabf1de132	[FuncSpec] Add a reference, and some other clarifying comments. NFC.	2022-01-20 17:01:08 +00:00
Johannes Doerfert	b4a7559844	[OpenMP][FIX] Replace ICVs only with values valid at the getter position While we might know the value if an ICV at a getter position it is not always clear that we can simply use it. Verify the value is valid first to avoid invalid IR. Fixes #53300.	2022-01-19 18:40:13 -06:00
Eli Friedman	86cdff0e21	[OpenMPOpt] Use SetVector to store list of kernels. Fixes test failures on reverse-iteration buildbot.	2022-01-19 13:55:32 -08:00
Wenlei He	7cca13bc3a	[PartialInline] Bail out on asm-goto/callbr Fixing ICE when partial inline tries to deal with blockaddress uses of function which is typical for asm-goto/callbr. We ran into this with PGO multi-region partial inline. Differential Revision: https://reviews.llvm.org/D117509	2022-01-19 10:57:57 -08:00
Nikita Popov	a115bbea9b	[Attributor] Remove notional overindexing check AAPointerInfo currently bails on constant expression GEPs with notional overindexing. I don't think this is necessary, as the following code handling GEPOperator will deal with arbitrary indices appropriately. Differential Revision: https://reviews.llvm.org/D117203	2022-01-19 11:30:04 +01:00
Mircea Trofin	3e8553aab4	[mlgo][inline] Improve global state tracking The global state refers to the number of the nodes currently in the module, and the number of direct calls between nodes, across the module. Node counts are not a problem; edge counts are because we want strictly the kind of edges that affect inlining (direct calls), and that is not easily obtainable without iteration over the whole module. This patch avoids relying on analysis invalidation because it turned out to be too aggressive in some cases. It leverages the fact that Node objects are stable - they do not get deleted while cgscc passes are run over the module; and cgscc pass manager invariants. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D115847	2022-01-18 17:45:34 +00:00
Philip Reames	26049b8ce3	[GlobalOpt] Generalize malloc-to-global for any allocation function We can generalize the malloc-to-global transform for other allocation functions which are both a) removable, and b) have a known initialization value. One subtlety that I want to point out - mostly because I hadn't realized it was true until I took a closer look - is that the existing code doesn't prove that initialization/malloc happens only once. The initialization function can be called multiple times. This is correct without special handling for malloc as undef can map to any value previously written, but a non-undef initializing allocation it means we may end up memseting the new global repeatedly. In particular, this means it's not legal to fold the memset into the initializer of the global. Differential Revision: https://reviews.llvm.org/D117503	2022-01-17 15:06:23 -08:00
Nikita Popov	12bee2c054	[GlobalOpt] Drop an incorrect check This was a last-minute addition to D117249, and of course I ended up inverting the condition in a way that caused an uninitialized memory read. I've dropped it entirely, as I don't think we actually care whether the size is zero or not here. The previous code wasn't checking this either.	2022-01-17 10:10:56 +01:00
Nikita Popov	499f1ca79f	[GlobalOpt] Use generic type when converting malloc to global The malloc to global transform currently determines the type of the global by looking at bitcasts of the malloc. This is limited (the transform fails if there are multiple different types) and incompatible with opaque pointers. My initial approach was to construct an appropriate struct type based on usage in loads/stores. What this patch does instead is to always create an [i8 x AllocSize] global, without trying to guess types at all. This does mean that other transforms that require a certain global type may break. I fixed two of these in D117034 and D117223, which I believe should be sufficient to avoid regressions. In particular, the global SRA change should end up splitting the global into naturally-typed sub-globals, at which point all other optimizations should work. Differential Revision: https://reviews.llvm.org/D117092	2022-01-17 09:55:33 +01:00
Nikita Popov	4796b4ae7b	[GlobalOpt] Make global SRA offset based Currently global SRA uses the GEP structure to determine how to split the global. This patch instead analyses the loads and stores that are performed on the global, and collects which types are used at which offset, and then splits the global according to those. This is both more general, and works fine with opaque pointers. This is also closer to how ordinary SROA is performed. Differential Revision: https://reviews.llvm.org/D117223	2022-01-17 09:28:36 +01:00
Bryce Wilson	28b6e2cb3d	[Attributor] [NFC] Use canonical variable name Differential Revision: https://reviews.llvm.org/D117241	2022-01-13 23:06:00 -08:00
Philip Reames	5d5d4d94f0	[Attributor] Generalize heap to stack to any allocator with relevant properties This completes removal of the isXLike queries, and depends on a whole series of earlier patches which have already landed. Differential Revision: https://reviews.llvm.org/D117242	2022-01-13 15:33:24 -08:00
Philip Reames	cf66f01ec1	[Attributor] Share code for abstract interpretation of allocation sizes with getObjectSize [NFC-ish] The basic idea is that we can parameterize the getObjectSize implementation with a callback which lets us replace the operand before analysis if desired. This is what Attributor is doing during it's abstract interpretation, and allows us to have one copy of the code. Note this is not NFC for two reasons: * The existing attributor code is wrong. (Well, this is under-specified to be honest, but at least inconsistent.) The intermediate math needs to be done in the index type of the pointer space. Imagine e.g. i64 arguments in a 32 bit address space. * I did not preserve the behavior in getAPInt where we return 0 for a partially analyzed value. This looks simply wrong in the original code, and nothing test wise contradicts that. Differential Revision: https://reviews.llvm.org/D117241	2022-01-13 15:33:24 -08:00
Arthur Eubanks	9a0fe1b0fc	[Inline] Attempt to delete any discardable if unused functions Previously we limited ourselves to only internal/private functions. We can also delete linkonce_odr functions. Minor compile time wins: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=instructions Major memory wins on tramp3d: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=max-rss Relanding with fix for compile times D117236. Reviewed By: nikic, mtrofin Differential Revision: https://reviews.llvm.org/D115545	2022-01-13 14:48:38 -08:00
Arthur Eubanks	757e044dce	[Inliner] Don't removeDeadConstantUsers() when checking if a function is dead If a function has many uses, this can take a good chunk of compile times. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D117236	2022-01-13 14:29:45 -08:00
Nikita Popov	1cbb456123	[GlobalOpt] Fix global to select transform under opaque pointers We need to check that the load/store type is also the same, as this is no longer implicitly checked through the pointer type.	2022-01-13 11:13:06 +01:00
James Y Knight	55fcbf0a84	Revert "[Inline] Attempt to delete any discardable if unused functions" Somehow this ends up causing an infinite loop in the inliner. This reverts commit `d5be48c66d`.	2022-01-13 03:06:47 +00:00
Philip Reames	9979299705	[Attributor] Simplify how we handle required alignment during heap-to-stack [NFC] The existing code duplicated the same concern in two places, and (weirdly) changed the inference of the allocation size based on whether we could meet the alignment requirement. Instead, just directly check the allocation requirement.	2022-01-12 17:34:17 -08:00
Philip Reames	d1f4c6a611	[Attributor] Generalize calloc handling in heap-to-stack for any init value [NFC] Rewrite the calloc specific handling in heap-to-stack to allow arbitrary init values. The basic problem being solved is that if an allocation is initilized to anything other than zero, this must be explicitly done for the formed alloca as well. This covers the calloc case today, but once a couple of earlier guards are removed in this code, downstream allocators with other init values could also be handled. Inspired by discussion on D116971	2022-01-12 16:58:39 -08:00
Philip Reames	8e76720cf2	[Attributor] Reuse object size evaluation code [NFC]	2022-01-12 16:58:39 -08:00
Philip Reames	db57065b36	[Attributor] Use getAllocAlignment where possible [NFC] Inspired by D116971.	2022-01-12 16:58:39 -08:00
Arthur Eubanks	fe827a93f6	[ModuleInliner] Properly delete dead functions Followup to D116964 where we only did this in the CGSCC inliner. Fixes leaks reported in D116964.	2022-01-12 09:57:43 -08:00
Arthur Eubanks	d5be48c66d	[Inline] Attempt to delete any discardable if unused functions Previously we limited ourselves to only internal/private functions. We can also delete linkonce_odr functions. Minor compile time wins: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=instructions Major memory wins on tramp3d: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=max-rss Reviewed By: nikic, mtrofin Differential Revision: https://reviews.llvm.org/D115545	2022-01-12 08:36:04 -08:00
Nikita Popov	5642ce5ac2	[GlobalOpt] Drop redundant setExternallyInitialized() call (NFC) This is part of copyAttributesFrom().	2022-01-12 09:42:58 +01:00
Nikita Popov	f3e87176e1	[GlobalOpt] Support "stored once" optimization for different types GlobalOpt can optimize a global with undef initializer and a single store to put the stored value into the initializer instead. Currently, this requires the type of the global and the store to match. This patch extends support to cases with different types (but same size), in which case we create a new global to replace the old one. Differential Revision: https://reviews.llvm.org/D117034	2022-01-12 09:39:31 +01:00
Mircea Trofin	248d55af3e	[NFC][MLGO] Use LazyCallGraph::Node to track functions. This avoids the InlineAdvisor carrying the responsibility of deleting Function objects. We use LazyCallGraph::Node objects instead, which are stable in memory for the duration of the Module-wide performance of CGSCC passes started under the same ModuleToPostOrderCGSCCPassAdaptor (which is the case here) Differential Revision: https://reviews.llvm.org/D116964	2022-01-11 19:23:47 -08:00
Johannes Doerfert	7b39dccbe4	[Attributor][FIX] Ensure "IsExact" is false for non-exact accesses If we look at potentially interfering accesses we need to ensure the "IsExact" flag is set appropriately. Accesses that have an "unknown" size or offset cannot be exact matches and we missed to flag that. Error and test reported by Serguei N. Dmitriev.	2022-01-10 10:09:36 -06:00
Serge Guelton	d2cc6c2d0c	Use a sorted array instead of a map to store AttrBuilder string attributes Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step. Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions Differential Revision: https://reviews.llvm.org/D116599	2022-01-10 14:49:53 +01:00
Nikita Popov	92d55e7336	[MemoryBuiltins] Remove isNoAliasFn() in favor of isNoAliasCall() We currently have two similar implementations of this concept: isNoAliasCall() only checks for the noalias return attribute. isNoAliasFn() also checks for allocation functions. We should switch to only checking the attribute. SLC is responsible for inferring the noalias return attribute for non-new allocation functions (with a missing case fixed in `348bc76e35`). For new, clang is responsible for setting the attribute, if -fno-assume-sane-operator-new is not passed. Differential Revision: https://reviews.llvm.org/D116800	2022-01-10 09:18:15 +01:00
Johannes Doerfert	4e8a02e7f4	[Attributor][FIX] Remove assumption that doesn't have to hold There is no guarantee we strip all GEPOperators and the conservative handling doesn't even require us to.	2022-01-09 13:15:53 -06:00
Johannes Doerfert	6c745e04fa	[Attributor][FIX] Ensure order for multiple references into map If we have multiple references into a map we need to ensure the ones created late do not invalidate the ones created early. To do that we need to make sure all but the first are not modifying the map, hence for them the keys have to be present already. Fixes #52875.	2022-01-08 16:59:21 -06:00
Simon Pilgrim	274359cf09	[OpenMPOpt] Use cast<> instead of dyn_cast<> to avoid dereference of nullptr. NFC	2022-01-08 13:47:35 +00:00
Kazu Hirata	b932bdf59f	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-07 17:45:09 -08:00
Arthur Eubanks	f96ab6cc1b	Revert "[Inline] Attempt to delete any discardable if unused functions" This reverts commit `335a3163aa`. Causes crashes when building llvm-test-suite's kc under ReleaseLTO-g.	2022-01-07 13:12:40 -08:00
Arthur Eubanks	335a3163aa	[Inline] Attempt to delete any discardable if unused functions Previously we limited ourselves to only internal/private functions. We can also delete linkonce_odr functions. Minor compile time wins: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=instructions Major memory wins on tramp3d: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=max-rss Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D115545	2022-01-07 11:05:26 -08:00

... 3 4 5 6 7 ...

5435 Commits