llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	9fb6782c69	[rs4gc] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds	2021-03-06 11:42:27 -08:00
Roman Lebedev	b46c085d2b	[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions These intrinsics, not the icmp+select are the canonical form nowadays, so we might as well directly emit them. This should not cause any regressions, but if it does, then then they would needed to be fixed regardless. Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`, but that is a pessimization, not a correctness issue. Additionally, the non-intrinsic form has issues with undef, see https://reviews.llvm.org/D88287#2587863	2021-03-06 21:52:46 +03:00
Ta-Wei Tu	8a003861a3	[NPM] Add -enable-loopinterchange option to NPM We have the `enable-loopinterchange` option in legacy pass manager but not in NPM. Add `LoopInterchange` pass to the optimization pipeline (at the same position as before) when `enable-loopinterchange` is turned on. Reviewed By: aeubanks, fhahn Differential Revision: https://reviews.llvm.org/D98116	2021-03-07 02:39:28 +08:00
William S. Moses	d163e75c81	[Attributor] Enable heap-to-stack of any size Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1 Differential Revision: https://reviews.llvm.org/D97873	2021-03-06 12:57:32 -05:00
Philip Reames	5db2735af9	[gvn] Handle simply phi equivalence cases GVN basically doesn't handle phi nodes at all. This is for a reason - we can't value number their inputs since the predecessor blocks have probably not been visited yet. However, it also creates a significant pass ordering problem. As it stands, instcombine and simplifycfg ends up implementing CSE of phi nodes. This means that for any series of CSE opportunities intermixed with phi nodes, we end up having to alternate instcombine/simplifycfg and gvn to make progress. This patch handles the simplest case by simply preprocessing the phi instructions in a block, and CSEing them if they are syntactically identical. This turns out to be powerful enough to handle many cases in a single invocation of GVN since blocks which use the cse'd phi results are visited after the block containing the phi. If there's a CSE opportunity in one the phi predecessors required to recognize the phi CSE opportunity, that will require a second iteration on the function. (Still within a single run of gvn though.) Compile time wise, this could go either way. On one hand, we're potentially causing GVN to iterate over the function more. On the other, we're cutting down on iterations between two passes and potentially shrinking the IR aggressively. So, a bit unclear what to expect. Note that this does still rely on instcombine to canonicalize block order of the phis, but that's a one time transformation independent of the values incoming to the phi. Differential Revision: https://reviews.llvm.org/D98080	2021-03-06 09:31:12 -08:00
Philip Reames	8fe59ba51e	[rs4gc] track the original value in the state use for base pointer rewriting I'd originally intended to build on this for another purpose and have decided not to, but at a minimum, the stronger asserts are useful.	2021-03-06 08:46:15 -08:00
Philip Reames	6334952ff0	[rs4gc] minor code style improvement	2021-03-06 08:46:15 -08:00
Philip Reames	51b13a7ea0	[gvn] CSE gc.relocates based on meaning, not spelling The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.) We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass. Differential Revision: https://reviews.llvm.org/D97974 (Note: Part of the reviewed change was split and landed as `f352463a`)	2021-03-05 10:16:12 -08:00
Philip Reames	f352463ade	Mark gc.relocate and gc.result as readnone For some reason, we had been marking gc.relocates as reading memory. There's no known reason for this, and I suspect it to be a legacy of very early implementation conservatism. gc.relocate and gc.result are simply projections of the return values from the associated statepoint. Note that the LangRef has always declared them readnone. The EarlyCSE change is simply moving the special casing from readonly to readnone handling. As noted by the test diffs, this does allow some additional CSE when relocates are separated by stores, but since we generate gc.relocates in batches, this is unlikely to help anything in practice. This was reviewed as part of https://reviews.llvm.org/D97974, but split at reviewer request before landing. The motivation is to enable the GVN changes in that patch.	2021-03-05 10:07:17 -08:00
Philip Reames	99f93dd3a5	[rs4gc] avoid insert base computation instructions for deopt uses If we have a value live over a call which is used for deopt at the call, we know that the value must be a base pointer. We can avoid potentially inserting IR to materialize a base for this value. In it's current form, this is mostly a compile time optimization. Building the base pointer graph (and then optimizing it away again) is a relatively expensive operation. We also sometimes end up with better codegen in practice - due to failures in optimizing away the inserted base pointer propogation - but those are optimization bugs we're fixing concurrently. The alternative to this would be to extend the base pointer inference with the ability to generally reuse multiple-base input instructions (phis and selects). That's somewhat invasive and complicated, so we're defering it a bit longer. Differential Revision: https://reviews.llvm.org/D97885	2021-03-05 09:55:36 -08:00
gbtozers	65600cb2a7	[DebugInfo] Add DIArgList MD to store multple values in DbgVariableIntrinsics This patch adds a new metadata node, DIArgList, which contains a list of SSA values. This node is in many ways similar in function to the existing ValueAsMetadata node, with the difference being that it tracks a list instead of a single value. Internally, it uses ValueAsMetadata to track the individual values, but there is also a reasonable amount of DIArgList-specific value-tracking logic on top of that. Similar to ValueAsMetadata, it is a special case in parsing and printing due to the fact that it requires a function state (as it may reference function-local values). This patch should not result in any immediate functional change; it allows for DIArgLists to be parsed and printed, but debug variable intrinsics do not yet recognize them as a valid argument (outside of parsing). Differential Revision: https://reviews.llvm.org/D88175	2021-03-05 17:02:24 +00:00
David Sherwood	fec0a0adac	[SVE][LoopVectorize] Add support for extracting the last lane of a scalable vector There are certain loops like this below: for (int i = 0; i < n; i++) { a[i] = b[i] + 1; *inv = a[i]; } that can only be vectorised if we are able to extract the last lane of the vectorised form of 'a[i]'. For fixed width vectors this already works since we know at compile time what the final lane is, however for scalable vectors this is a different story. This patch adds support for extracting the last lane from a scalable vector using a runtime determined lane value. I have added support to VPIteration for runtime-determined lanes that still permit the caching of values. I did this by introducing a new class called VPLane, which describes the lane we're dealing with and provides interfaces to get both the compile-time known lane and the runtime determined value. Whilst doing this work I couldn't find any explicit tests for extracting the last lane values of fixed width vectors so I added tests for both scalable and fixed width vectors. Differential Revision: https://reviews.llvm.org/D95139	2021-03-05 09:57:56 +00:00
Michael Kruse	b119120673	[clang][OpenMP] Use OpenMPIRBuilder for workshare loops. Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to: * Only the worksharing-loop directive. * Recognizes only the nowait clause. * No loop nests with more than one loop. * Untested with templates, exceptions. * Semantic checking left to the existing infrastructure. This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics: * The distance function: How many loop iterations there will be before entering the loop nest. * The loop variable function: Conversion from a logical iteration number to the loop variable. These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang. The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does. For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting. Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset. Reviewed By: jdenny Differential Revision: https://reviews.llvm.org/D94973	2021-03-04 22:52:59 -06:00
Wei Mi	2357d29335	[SampleFDO] Another fix to prevent repeated indirect call promotion in sample loader pass. In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9, to prevent repeated indirect call promotion for the same indirect call and the same target, we used zero-count value profile to indicate an indirect call has been promoted for a certain target. We removed PromotedInsns cache in the same patch. However, there was a problem in that patch described below, and that problem led me to add PromotedInsns back as a mitigation in https://reviews.llvm.org/rG4ffad1fb489f691825d6c7d78e1626de142f26cf. When we get value profile from metadata by calling getValueProfDataFromInst, we need to specify the maximum possible number of values we expect to read. We uses MaxNumPromotions in the last patch so the maximum number of value information extracted from metadata is MaxNumPromotions. If we have many values including zero-count values when we write the metadata, some of them will be dropped when we read them because we only read MaxNumPromotions values. It will allow repeated indirect call promotion again. We need to make sure if there are values indicating promoted targets, those values need to be saved in metadata with higher priority than other values. The patch fixed that problem. We change to use -1 to represent the count of a promoted target instead of 0 so it is easier to sort the values. When we prepare to update the metadata in updateIDTMetaData, we will sort the values in the descending count order and extract only MaxNumPromotions values to write into metadata. Since -1 is the max uint64_t number, if we have equal to or less than MaxNumPromotions of -1 count values, they will all be kept in metadata. If we have more than MaxNumPromotions of -1 count values, we will only save MaxNumPromotions such values maximally. In such case, we have logic in place in doesHistoryAllowICP to guarantee no more promotion in sample loader pass will happen for the indirect call, because it has been promoted enough. With this change, now we can remove PromotedInsns without problem. Differential Revision: https://reviews.llvm.org/D97350	2021-03-04 18:44:12 -08:00
David Blaikie	a2a55def35	Move llvm/Analysis/ObjCARCUtil.h to IR to fix layering. This is included from IR files, and IR doesn't/can't depend on Analysis (because Analysis depends on IR). Also fix the implementation - don't use non-member static in headers, as it leads to ODR violations, inaccurate "unused function" warnings, etc. And fix the header protection macro name (we don't generally include "LIB" in the names, so far as I can tell).	2021-03-04 16:14:53 -08:00
Jianzhou Zhao	db7fe6cd4b	[dfsan] Propagate origin tracking at store This is a part of https://reviews.llvm.org/D95835. Reviewed By: morehouse, gbalats Differential Revision: https://reviews.llvm.org/D97789	2021-03-04 23:34:44 +00:00
William S. Moses	2b896e39bf	Revert "[Attributor] Enable heap-to-stack of any size" This reverts commit `51bd42ef9b`.	2021-03-04 17:24:56 -05:00
Sanjay Patel	1bee549737	[LoopVectorize] propagate fast-math-flags from induction instructions This code assumed that FP math was only permissable if it was fully "fast", so it hard-coded "fast" when creating new instructions. The underlying code already allows matching recurrences/reductions that are only "reassoc", so this change should prevent the potential miscompile seen in the test diffs (we created "fast" ops even though none existed in the original code). I don't know if we need to create the temporary IRBuilder objects used here, so that could be follow-up clean-up. There's an open question about whether we should require "nsz" in addition to "reassoc" here. InstCombine uses that combo for its reassociative folds, but I think codegen is not as strict.	2021-03-04 17:21:32 -05:00
William S. Moses	51bd42ef9b	[Attributor] Enable heap-to-stack of any size Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1 Differential Revision: https://reviews.llvm.org/D97873	2021-03-04 17:17:23 -05:00
Francis Visoiu Mistrih	365b78396a	[Remarks] Emit variable info in auto-init remarks This enhances the auto-init remark with information about the variable that is auto-initialized. This is based of debug info if available, or alloca names (mostly for development purposes). ``` auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes.Variables: var (4096 bytes). [-Rpass-missed=annotation-remarks] int var[1024]; ^ ``` This allows to see things like partial initialization of a variable that the optimizer won't be able to completely remove. Differential Revision: https://reviews.llvm.org/D97734	2021-03-04 12:51:22 -08:00
Akira Hatanaka	1900503595	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `ed4718eccb`, which was reverted because it was causing a miscompile. The bug that was causing the miscompile has been fixed in `75805dce5f`. Original commit message: Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-03-04 11:22:30 -08:00
Adrian Prantl	d268febc56	Improve the debug info for coro-split .resume functions This patch updates the scope line to point to the suspend point. This makes the first address in the function point to the first source line in the resume function rather than the function declaration. Without this the line table "jumps" from the beginning of the function to the suspend point at the beginning. rdar://73386346 Differential Revision: https://reviews.llvm.org/D97345	2021-03-04 11:05:35 -08:00
Jianzhou Zhao	72abc9bf07	[dfsan] add a missing zero origin at atomic commands	2021-03-04 16:50:05 +00:00
Alexey Bataev	04ba80ca4d	[Instcombiner]Improve emission of logical or/and reductions. For logical or/and reductions we emit regular intrinsics @llvm.vector.reduce.or/and.vxi1 calls. These intrinsics are not effective for the logical or/and reductions, especially if the optimizer is able to emit short circuit versions of the scalar or/and instructions and vector code gets less effective than the scalar version. Instead, or reduction for i1 can be represented as: ``` %val = bitcast <ReduxWidth x i1> to iReduxWidth %res = cmp ne iReduxWidth %val, 0 ``` and reduction for i1 can be represented as: ``` %val = bitcast <ReduxWidth x i1> to iReduxWidth %res = cmp eq iReduxWidth %val, 11111 ``` This improves perfromance of the vector code significantly and make it to outperform short circuit scalar code. Part of D57059. Differential Revision: https://reviews.llvm.org/D97406	2021-03-04 08:01:02 -08:00
Sanjay Patel	36a489d194	[Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC Similar to `b3a33553ae`, but this shows a TODO and a potential miscompile is already present. We are tracking an FP instruction that does not have FMF (reassoc) properties, so calling that "Unsafe" seems opposite of the common reading. I also removed one getter method by rolling the null check into the access. Further simplification may be possible. The motivation is to clean up the interactions between FMF and function-level attributes in these classes and their callers. The new test shows that there is an existing bug somewhere in the callers. We assumed that the original code was fully 'fast' and so we produced IR with 'fast' even though it was just 'reassoc'.	2021-03-04 10:40:26 -05:00
Sanjay Patel	b3a33553ae	[Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC We are tracking an FP instruction that does not have FMF (reassoc) properties, so calling that "Unsafe" seems opposite of the common reading. I also removed one getter method by rolling the null check into the access. Further simplification seems possible. The motivation is to clean up the interactions between FMF and function-level attributes in these classes and their callers.	2021-03-04 08:53:04 -05:00
Hongtao Yu	c75da238b4	[CSSPGO] Deduplicating dangling pseudo probes. Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far. This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added. An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed. Differential Revision: https://reviews.llvm.org/D97482	2021-03-03 22:44:42 -08:00
Hongtao Yu	8985515822	[CSSPGO] Unblocking optimizations by dangling pseudo probes. This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments. The optimizations being unblocked are: 1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted. 2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions. 3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks. Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D97481	2021-03-03 22:44:42 -08:00
Johannes Doerfert	5b70c12f3e	[Attributor] Make DepClass a required argument We often used a sub-optimal dependence class in the past because we didn't see the argument. Let's make it explicit so we remember to think about it.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	e592dad82e	[Attributor] Fold "TrackDependence" into the DepClassTy enum We don't need a bool and an enum to express the three options we currently have. This makes the interface nicer and much easier to use optional dependencies. Also avoids mistakes where the bool is false and enum ignored.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	c8c93fdf0a	[Attributor] Avoid work for GEPs and wait till the users are visited	2021-03-04 00:35:52 -06:00
Johannes Doerfert	f3f88287c5	[Attributor] Use known alignment as lower bound to avoid work If we know already more than available from a use, we don't need to invest time on it.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	c14213e030	[Attributor][NFC] Move some trivial checks up	2021-03-04 00:35:52 -06:00
Johannes Doerfert	09c3eebf5f	[Attributor] Use sensible initialization in AANoCaptureCallSiteReturned	2021-03-04 00:35:51 -06:00
Evgeniy Brevnov	e94125f054	[DSE] Add support for not aligned begin/end This is an attempt to improve handling of partial overlaps in case of unaligned begin\end. Existing implementation just bails out if it encounters such cases. Even when it doesn't I believe existing code checking alignment constraints is not quite correct. It tries to ensure alignment of the "later" start/end offset while should be preserving relative alignment between earlier and later start/end. The idea behind the change is simple. When start/end is not aligned as we wish instead of bailing out let's adjust it as necessary to get desired alignment. I'll update with performance results as measured by the test-suite...it's still running... Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D93530	2021-03-04 12:24:23 +07:00
Serguei Katkov	a0ff0f30df	[InstCombine] Move statepoint intrinsic handling from visitCall to visitCallBase statepoint intrinsic can be used in invoke context, so it should be handled in visitCallBase to cover both call and invoke. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D97833	2021-03-04 11:00:22 +07:00
Xun Li	03f668613c	[LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment. This patch disable promotion by check whether loop contains coro.suspend intrinsics. This is a resubmit of D86190. Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose. In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame. Differential Revision: https://reviews.llvm.org/D96928	2021-03-03 15:21:57 -08:00
Whitney Tsang	58d531fd6f	[LoopUnrollRuntime] Add option to assume the non latch exit block to be predictable. Reviewed By: Meinersbur, bmahjour Differential Revision: https://reviews.llvm.org/D97747	2021-03-03 20:43:31 +00:00
Philip Reames	89d331a31e	Address review comment from D97219 (follow up to `8051156`) Probably should have done this before landing, but I forgot. Basic idea is to avoid using the SCEV predicate when it doesn't buy us anything. Also happens to set us up for handling non-add recurrences in the future if desired.	2021-03-03 12:20:27 -08:00
Philip Reames	99f5417346	Sink routine for replacing a operand bundle to CallBase [NFC] We had equivalent code for both CallInst and InvokeInst, but never cared about the result type.	2021-03-03 12:07:55 -08:00
Philip Reames	805115655e	[LSR] Unify scheduling of existing and inserted addrecs LSR goes to some lengths to schedule IV increments such that %iv and %iv.next never need to overlap. This is fairly fundamental to LSRs cost model. LSR assumes that an addrec can be represented with a single register. If %iv and %iv.next have to overlap, then that assumption does not hold. The bug - which this patch is fixing - is that LSR only does this scheduling for IVs which it inserts, but it's cost model assumes the same for existing IVs that it reuses. It will rewrite existing IV users such that the no-overlap property holds, but will not actually reschedule said IV increment. As you can see from the relatively lack of test updates, this doesn't actually impact codegen much. The main reason for doing it is to make a follow up patch series which improves post-increment use and scheduling easier to follow. Differential Revision: https://reviews.llvm.org/D97219	2021-03-03 12:07:55 -08:00
Fangrui Song	a84f4fc0df	[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF `__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not referenced via relocation in the translation unit. With `-z start-stop-gc` (LLD 13 (D96914); GNU ld 2.37 https://sourceware.org/bugzilla/show_bug.cgi?id=27451), the linker does not let `__start_/__stop_` references retain their sections. Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make them retained by the linker. This patch changes most existing `UsedVars` cases to `CompilerUsedVars` to reflect the ideal state - if the binary format properly supports section based GC (dead stripping), `llvm.compiler.used` should be sufficient. `__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars` since we want them to be unconditionally retained by both compiler and linker. Behaviors on COFF/Mach-O are not affected. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D97649	2021-03-03 11:32:24 -08:00
Arnold Schwaighofer	a42bea211a	[coro async] Allow a coro.suspend.async to specify which argument is the context argument Before we used the same argument as the entry point. The resume partial function might want to use a different ABI for its context argument Differential Revision: https://reviews.llvm.org/D97333	2021-03-03 08:27:37 -08:00
Nico Weber	64f5d7e972	Revert "[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF" This reverts commit `04c3040f41`. Breaks instrprof-value-merge.c in bootstrap builds.	2021-03-03 10:21:17 -05:00
Hans Wennborg	0a5dd06718	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR" This caused miscompiles of Chromium tests for iOS due clobbering of live registers. See discussion on the code review for details. > Background: > > This fixes a longstanding problem where llvm breaks ARC's autorelease > optimization (see the link below) by separating calls from the marker > instructions or retainRV/claimRV calls. The backend changes are in > https://reviews.llvm.org/D92569. > > https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue > > What this patch does to fix the problem: > > - The front-end adds operand bundle "clang.arc.attachedcall" to calls, > which indicates the call is implicitly followed by a marker > instruction and an implicit retainRV/claimRV call that consumes the > call result. In addition, it emits a call to > @llvm.objc.clang.arc.noop.use, which consumes the call result, to > prevent the middle-end passes from changing the return type of the > called function. This is currently done only when the target is arm64 > and the optimization level is higher than -O0. > > - ARC optimizer temporarily emits retainRV/claimRV calls after the calls > with the operand bundle in the IR and removes the inserted calls after > processing the function. > > - ARC contract pass emits retainRV/claimRV calls after the call with the > operand bundle. It doesn't remove the operand bundle on the call since > the backend needs it to emit the marker instruction. The retainRV and > claimRV calls are emitted late in the pipeline to prevent optimization > passes from transforming the IR in a way that makes it harder for the > ARC middle-end passes to figure out the def-use relationship between > the call and the retainRV/claimRV calls (which is the cause of > PR31925). > > - The function inliner removes an autoreleaseRV call in the callee if > nothing in the callee prevents it from being paired up with the > retainRV/claimRV call in the caller. It then inserts a release call if > claimRV is attached to the call since autoreleaseRV+claimRV is > equivalent to a release. If it cannot find an autoreleaseRV call, it > tries to transfer the operand bundle to a function call in the callee. > This is important since the ARC optimizer can remove the autoreleaseRV > returning the callee result, which makes it impossible to pair it up > with the retainRV/claimRV call in the caller. If that fails, it simply > emits a retain call in the IR if retainRV is attached to the call and > does nothing if claimRV is attached to it. > > - SCCP refrains from replacing the return value of a call with a > constant value if the call has the operand bundle. This ensures the > call always has at least one user (the call to > @llvm.objc.clang.arc.noop.use). > > - This patch also fixes a bug in replaceUsesOfNonProtoConstant where > multiple operand bundles of the same kind were being added to a call. > > Future work: > > - Use the operand bundle on x86-64. > > - Fix the auto upgrader to convert call+retainRV/claimRV pairs into > calls with the operand bundles. > > rdar://71443534 > > Differential Revision: https://reviews.llvm.org/D92808 This reverts commit `ed4718eccb`.	2021-03-03 15:51:40 +01:00
Jianzhou Zhao	ac4c1760b2	Fix the build error caused by D97570	2021-03-03 04:47:00 +00:00
Jianzhou Zhao	d866b9c99d	[dfsan] Propagate origin tracking at load This is a part of https://reviews.llvm.org/D95835. One issue is about origin load optimization: see the comments of useCallbackLoadLabelAndOrigin @gbalats This change may have some conflicts with your 8bit change. PTAL the change at visitLoad. Reviewed By: morehouse, gbalats Differential Revision: https://reviews.llvm.org/D97570	2021-03-03 04:32:30 +00:00
George Balatsouras	6ff18b08e6	[dfsan] Fix clang-tidy warnings This addresses ~50 clang-tidy warnings on dfsan instrumentation pass. It also contains some refactoring (all non-functional changes) to eliminate some variables and simplify code. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D97714	2021-03-02 17:37:45 -08:00
Andrei Elovikov	b24afec8ae	[NFCI][VPlan] Modify Recipes' print methods to honor Indent parameter Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97787	2021-03-02 15:32:10 -08:00
Nikita Popov	3d8f842712	[LICM] Make promotion faster Even when MemorySSA-based LICM is used, an AST is still populated for scalar promotion. As the AST has quadratic complexity, a lot of time is spent in this step despite the existing access count limit. This patch optimizes the identification of promotable stores. The idea here is pretty simple: We're only interested in must-alias mod sets of loop invariant pointers. As such, only populate the AST with loop-invariant loads and stores (anything else is definitely not promotable) and then discard any sets which alias with any of the remaining, definitely non-promotable accesses. If we promoted something, check whether this has made some other accesses loop invariant and thus possible promotion candidates. This is much faster in practice, because we need to perform AA queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable) instead of O(NumTotal^2), and NumPromotable tends to be small. Additionally, promotable accesses have loop invariant pointers, for which AA is cheaper. This has a signicant positive compile-time impact. We save ~1.8% geomean on CTMark at O3, with 6% on lencod in particular and 25% on individual files. Conceptually, this change is NFC, but may not be so in practice, because the AST is only an approximation, and can produce different results depending on the order in which accesses are added. However, there is at least no impact on the number of promotions (licm.NumPromoted) in test-suite O3 configuration with this change. Differential Revision: https://reviews.llvm.org/D89264	2021-03-02 22:10:48 +01:00
Simon Pilgrim	232f32f0da	[DSE] eliminateDeadStoresMemorySSA - fix "initialization is never read" clang-tidy warning. NFCI.	2021-03-02 15:01:33 +00:00
Alexey Bataev	a054e94e9e	[SLP]Merge reorder and reuse shuffles. It is possible to merge reuse and reorder shuffles and reduce the total cost of the vectorization tree/number of final instructions. Differential Revision: https://reviews.llvm.org/D94992	2021-03-02 06:39:47 -08:00
Juneyoung Lee	365f5e2475	[JumpThreading] Fix tryToUnfoldSelectInCurrBB to treat and/or and its select form equally This is a minor fix to update tryToUnfoldSelectInCurrBB to ignore select form of and/ors because the function does not look into binops as well	2021-03-02 18:35:18 +09:00
Ta-Wei Tu	ea1a1ebbc6	[NFC] Use std::swap in LoopInterchange	2021-03-02 11:42:48 +08:00
Fangrui Song	04c3040f41	[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF `__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not referenced via relocation in the translation unit. With `-z start-stop-gc` (D96914 https://sourceware.org/bugzilla/show_bug.cgi?id=27451), the linker no longer lets `__start_/__stop_` references retain them. Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make them retained by the linker. This patch changes most existing `UsedVars` cases to `CompilerUsedVars` to reflect the ideal state - if the binary format properly supports section based GC (dead stripping), `llvm.compiler.used` should be sufficient. `__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars` since we want them to be unconditionally retained by both compiler and linker. Behaviors on other COFF/Mach-O are not affected. Differential Revision: https://reviews.llvm.org/D97649	2021-03-01 13:43:23 -08:00
Arthur Eubanks	040c1b49d7	Move EntryExitInstrumentation pass location This seems to be more of a Clang thing rather than a generic LLVM thing, so this moves it out of LLVM pipelines and as Clang extension hooks into LLVM pipelines. Move the post-inline EEInstrumentation out of the backend pipeline and into a late pass, similar to other sanitizer passes. It doesn't fit into the codegen pipeline. Also fix up EntryExitInstrumentation not running at -O0 under the new PM. PR49143 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D97608	2021-03-01 10:08:10 -08:00
Florian Hahn	a6c81d3366	[VPlan] Remove recipes from back to front. Update the deletion order when destroying VPBasicBlocks. This ensures recipes that depend on earlier ones in the block are removed first. Otherwise this may cause issues when recipes have remaining users later in the block.	2021-03-01 16:06:30 +00:00
Florian Hahn	53dacb7b67	[LV] Generate RT checks up-front and remove them if required. This patch updates LV to generate the runtime checks just after cost modeling, to allow a more precise estimate of the actual cost of the checks. This information will be used in future patches to generate larger runtime checks in cases where the checks only make up a small fraction of the expected scalar loop execution time. The runtime checks are created up-front in a temporary block to allow better estimating the cost and un-linked from the existing IR. After deciding to vectorize, the checks are moved backed. If deciding not to vectorize, the temporary block is completely removed. This patch is similar in spirit to D71053, but explores a different direction: instead of delaying the decision on whether to vectorize in the presence of runtime checks it instead optimistically creates the runtime checks early and discards them later if decided to not vectorize. This has the advantage that the cost-modeling decisions can be kept together and can be done up-front and thus preserving the general code structure. I think delaying (part) of the decision to vectorize would also make the VPlan migration a bit harder. One potential drawback of this patch is that we speculatively generate IR which we might have to clean up later. However it seems like the code required to do so is quite manageable. Reviewed By: lebedev.ri, ebrevnov Differential Revision: https://reviews.llvm.org/D75980	2021-03-01 10:48:04 +00:00
Juneyoung Lee	5419b67137	[SimplifyCFG] Update FoldTwoEntryPHINode to handle and/or of select and binop equally This is a minor change that fixes FoldTwoEntryPHINode to handle phis with and/ors of select form and binop form equally.	2021-03-01 13:34:51 +09:00
Kazu Hirata	d639120983	[llvm] Use set_is_subset (NFC)	2021-02-28 10:59:20 -08:00
Sanjay Patel	9502061bcc	[InstCombine] avoid infinite loop in demanded bits for select https://llvm.org/PR49205	2021-02-28 10:17:53 -05:00
William S. Moses	b077d82b00	[Attributor] Conditinoally delete fns Allow the attributor to delete functions only if requested Differential Revision: https://reviews.llvm.org/D97238	2021-02-27 20:37:42 -05:00
Sanjay Patel	356cdabd3a	[SimplifyCFG] avoid illegal phi with both poison and undef In the example based on: https://llvm.org/PR49218 ...we are crashing because poison is a subclass of undef, so we merge blocks and create: PHI node has multiple entries for the same basic block with different incoming values! %k3 = phi i64 [ poison, %entry ], [ %k3, %g ], [ undef, %entry ] If both poison and undef values are incoming, we soften the poison values to undef. Differential Revision: https://reviews.llvm.org/D97495	2021-02-27 09:10:32 -05:00
Kazu Hirata	1d4a2f3778	[Transforms/Utils] Use range-based for loops (NFC)	2021-02-26 22:36:40 -08:00
Fangrui Song	bf176c49e8	[InstrProfiling] Use llvm.compiler.used instead of llvm.used for ELF Many optimizers (e.g. GlobalOpt/ConstantMerge) do not respect linker semantics for comdat and may not discard the sections as a unit. The interconnected `__llvm_prf_{cnts,data}` sections (in comdat for ELF) are similar to D97432: `__profd_` is not directly referenced, so `__profd_` may be discarded while `__profc_` is retained, breaking the interconnection. We currently conservatively add all such sections to `llvm.used` and let the linker do GC for ELF. In D97448, we will change GlobalObject's in the llvm.used list to use SHF_GNU_RETAIN, causing the metadata sections to be unnecessarily retained (some `check-profile` tests check for GC). Use `llvm.compiler.used` to retain the current GC behavior. Differential Revision: https://reviews.llvm.org/D97585	2021-02-26 16:14:03 -08:00
George Balatsouras	c9075a1c8e	[dfsan] Record dfsan metadata in globals This will allow identifying exactly how many shadow bytes were used during compilation, for when fast8 mode is introduced. Also, it will provide a consistent matching point for instrumentation tests so that the exact llvm type used (i8 or i16) for the shadow can be replaced by a pattern substitution. This is handy for tests with multiple prefixes. Reviewed by: stephan.yichao.zhao, morehouse Differential Revision: https://reviews.llvm.org/D97409	2021-02-26 14:42:46 -08:00
Jianzhou Zhao	a47d435bc4	[dfsan] Propagate origins for callsites This is a part of https://reviews.llvm.org/D95835. Each customized function has two wrappers. The first one dfsw is for the normal shadow propagation. The second one dfso is used when origin tracking is on. It calls the first one, and does additional origin propagation. Which one to use can be decided at instrumentation time. This is to ensure minimal additional overhead when origin tracking is off. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97483	2021-02-26 19:12:03 +00:00
Fangrui Song	b55f29c194	[SanitizerCoverage] Clarify llvm.used/llvm.compiler.used and partially fix unmatched metadata sections on Windows `__sancov_pcs` parallels the other metadata section(s). While some optimizers (e.g. GlobalDCE) respect linker semantics for comdat and retain or discard the sections as a unit, some (e.g. GlobalOpt/ConstantMerge) do not. So we have to conservatively retain all unconditionally in the compiler. When a comdat is used, the COFF/ELF linkers' GC semantics ensure the associated parallel array elements are retained or discarded together, so `llvm.compiler.used` is sufficient. Otherwise (MachO (see rL311955/rL311959), COFF special case where comdat is not used), we have to use `llvm.used` to conservatively make all sections retain by the linker. This will fix the Windows problem once internal linkage GlobalObject's in `llvm.used` are retained via `/INCLUDE:`. Reviewed By: morehouse, vitalybuka Differential Revision: https://reviews.llvm.org/D97432	2021-02-26 11:10:03 -08:00
Simon Pilgrim	455d43b951	[Utils] collectBitParts - bail for integers > 128-bits collectBitParts uses int8_t for the bit indices, leaving a 128-bit limit. We already test for this before calling collectBitParts, but rGb94c215592bd added truncate handling which meant we could end up processing wider integers. Thanks to @manojgupta for the repro.	2021-02-26 14:58:01 +00:00
Stephen Tozer	ec7b9b0c18	[InstCombine] Avoid redundant or out-of-order debug value sinking This patch modifies TryToSinkInstruction in the InstCombine pass, to prevent redundant debug intrinsics from being produced, and also prevent the intrinsics from being emitted in an incorrect order. It does this by ensuring that when this pass sinks an instruction and creates clones of the debug intrinsics that use that instruction, it inserts those debug intrinsics in their original order, and only inserts the last debug intrinsic for each variable in the Instruction's block. Differential revision: https://reviews.llvm.org/D95463	2021-02-26 13:04:33 +00:00
Evgeniy Brevnov	13a5cac2ba	Revert "[NARY-REASSOCIATE] Support reassociation of min/max" This reverts commit `83d134c3c4`.	2021-02-26 19:47:54 +07:00
Kazu Hirata	5fc9e30985	[Scalar] Use range-based for loops (NFC)	2021-02-25 19:54:38 -08:00
Jianzhou Zhao	c88fedef2a	[dfsan] Conservative solution to atomic load/store DFSan at store does store shadow data; store app data; and at load does load shadow data; load app data. When an application data is atomic, one overtainting case is thread A: load shadow thread B: store shadow thread B: store app thread A: load app If the application address had been used by other flows, thread A reads previous shadow, causing overtainting. The change is similar to MSan's solution. 1) enforce ordering of app load/store 2) load shadow after load app; store shadow before shadow app 3) do not track atomic store by reseting its shadow to be 0. The last one is to address a case like this. Thread A: load app Thread B: store shadow Thread A: load shadow Thread B: store app This approach eliminates overtainting as a trade-off between undertainting flows via shadow data race. Note that this change addresses only native atomic instructions, but does not support builtin libcalls yet. https://llvm.org/docs/Atomics.html#libcalls-atomic Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97310	2021-02-25 23:34:58 +00:00
James Y Knight	24539f1ef2	Add Alignment argument to IRBuilder CreateAtomicRMW and CreateAtomicCmpXchg. And then push those change throughout LLVM. Keep the old signature in Clang's CGBuilder for now -- that will be updated in a follow-on patch (D97224). The MLIR LLVM-IR dialect is not updated to support the new alignment attribute, but preserves its existing behavior. Differential Revision: https://reviews.llvm.org/D97223	2021-02-25 18:29:42 -05:00
Francis Visoiu Mistrih	fee9abe69c	[Remarks] Provide more information about auto-init calls This now analyzes calls to both intrinsics and functions. For intrinsics, grab the ones we know and care about (mem* family) and analyze the arguments. For calls, use TLI to get more information about the libcalls, then analyze the arguments if known. ``` auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes. [-Rpass-missed=annotation-remarks] int var[1024]; ^ ``` Differential Revision: https://reviews.llvm.org/D97489	2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih	4753a69a31	[Remarks] Provide more information about auto-init stores This adds support for analyzing the instruction with the !annotation "auto-init" in order to generate a more user-friendly remark. For now, support the store size, and whether it's atomic/volatile. Example: ``` auto-init.c:4:7: remark: Store inserted by -ftrivial-auto-var-init.Store size: 4 bytes. [-Rpass-missed=annotation-remarks] int var; ^ ``` Differential Revision: https://reviews.llvm.org/D97412	2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih	c49b600b2f	[Remarks] Emit remarks for "auto-init" !annotations Using the !annotation metadata, emit remarks pointing to code added by `-ftrivial-auto-var-init` that survived the optimizer. Example: ``` auto-init.c:4:7: remark: Initialization inserted by -ftrivial-auto-var-init. [-Rpass-missed=annotation-remarks] int buf[1024]; ^ ``` The tests are testing various situations like calls/stores/other instructions, with debug locations, and extra debug information on purpose: more patches will come to improve the reporting to make it more user-friendly, and these tests will show how the reporting evolves. Differential Revision: https://reviews.llvm.org/D97405	2021-02-25 15:14:09 -08:00
Adrian Prantl	1693180884	Add a nullptr check. This doesn't actually reproduce with a dbg.declare(i8* null, ...) which produces a non-null null Value, but I have seen this show up in crash logs. I'm suspecting that there may be another pass forcibly setting the operand to a nullptr.	2021-02-25 12:01:11 -08:00
Fangrui Song	4d63892acb	[SanitizerCoverage] Drop !associated on metadata sections In SanitizerCoverage, the metadata sections (`__sancov_guards`, `__sancov_cntrs`, `__sancov_bools`) are referenced by functions. After inlining, such a `__sancov_*` section can be referenced by more than one functions, but its sh_link still refers to the original function's section. (Note: a SHF_LINK_ORDER section referenced by a section other than its linked-to section violates the invariant.) If the original function's section is discarded (e.g. LTO internalization + `ld.lld --gc-sections`), ld.lld may report a `sh_link points to discarded section` error. This above reasoning means that `!associated` is not appropriate to be called by an inlinable function. Non-interposable functions are inline candidates, so we have to drop `!associated`. A `__sancov_pcs` is not referenced by other sections but is expected to parallel a metadata section, so we have to make sure the two sections are retained or discarded at the same time. A section group does the trick. (Note: we have a module ctor, so `getUniqueModuleId` guarantees to return a non-empty string, and `GetOrCreateFunctionComdat` guarantees to return non-null.) For interposable functions, we could keep using `!associated`, but LTO can change the linkage to `internal` and allow such functions to be inlinable, so we have to drop `!associated`, too. To not interfere with section group resolution, we need to use the `noduplicates` variant (section group flag 0). (This allows us to get rid of the ModuleID parameter.) In -fno-pie and -fpie code (mostly dso_local), instrumented interposable functions have WeakAny/LinkOnceAny linkages, which are rare. So the section group header overload should be low. This patch does not change the object file output for COFF (where `!associated` is ignored). Reviewed By: morehouse, rnk, vitalybuka Differential Revision: https://reviews.llvm.org/D97430	2021-02-25 11:59:23 -08:00
Jon Roelofs	7f6e331645	Support `#pragma clang section` directives on MachO targets rdar://59560986 Differential Revision: https://reviews.llvm.org/D97233	2021-02-25 09:30:10 -08:00
Rong Xu	6103b6ad69	[SampleFDO][NFC] Refactor: make SampleProfileLoaderBaseImpl a template class This patch makes SampleProfileLoaderBaseImpl a template class so it can be used in CodeGen transformation. Noticeable changes: * use one template parameter and use IRTraits to get other used types an type specific functions. * remove the temporary "inline" keywords in previous refactor patch. * change the template function findEquivalencesFor to a regular function. This function has a single caller with type of PostDominatorTree. It's simpler to use the type directly because MachinePostDominatorTree is not a derived type of template DominatorTreeBase. Differential Revision: https://reviews.llvm.org/D96981	2021-02-25 08:26:17 -08:00
Evgeniy Brevnov	d0a6f8bb65	[NFC] Fix build failure after `83d134c3c4`	2021-02-25 18:43:00 +07:00
Evgeniy Brevnov	83d134c3c4	[NARY-REASSOCIATE] Support reassociation of min/max Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88287	2021-02-25 18:22:39 +07:00
Xun Li	c38000a9fb	[Coroutine] Check indirect uses of alloca when checking lifetime info In the existing logic, we look at the lifetime.start marker of each alloca, and check all uses of the alloca, to see if any pair of the lifetime marker and an use of alloca crosses suspension point. This approach is unfortunately incorrect. An use of alloca does not need to be a direct use, but can be an indirect use through alias. Only checking direct uses can miss cases where indirect uses are crossing suspension point. This can be demonstrated in the newly added test case 007. In the test case, both x and y are only directly used prior to suspend, but they are captured into an alias, merged through a PHINode (so they couldn't be materialized), and used after CoroSuspend. If we only check whether the lifetime starts cross suspension points with direct uses, we will put the allocas to the stack, and then capture their addresses in the frame. Instead of fixing it in D96441 and D96566, this patch takes a different approach which I think is better. We still checks the lifetime info in the same way as before, but with two differences: 1. The collection of liftime.start is moved into AllocaUseVisitor to make the logic more concentrated. 2. When looking at lifetime.start and use pairs, we not only checks the direct uses as before, but in this patch we check all uses collected by AllocaUseVisitor, which would include all indirect uses through alias. This will make the analysis more accurate without throwing away the lifetime optimization. Differential Revision: https://reviews.llvm.org/D96922	2021-02-24 18:29:23 -08:00
Sanjay Patel	a7cee55762	[InstCombine] fold fdiv with powi divisor (PR49147) This extends `b40fde062c` for the especially non-standard powi pattern. We want to avoid being completely wrong on the negation-of-int-min corner case, so I'm adding an extra FMF check for 'ninf' assuming that gives us the flexibility to handle that possibility. https://llvm.org/PR49147	2021-02-24 16:44:36 -05:00
Sanjay Patel	868d43fbd6	[InstCombine] add helper for x/pow(); NFC We at least want to add powi to this list, so split it off into a switch to reduce code duplication.	2021-02-24 16:44:36 -05:00
Duncan P. N. Exon Smith	01701646d5	Transforms: Clone distinct nodes in metadata mapper unless RF_ReuseAndMutateDistinctMDs This is a follow up to `22a52dfddc` and a revert of `df763188c9`. With this change, we only skip cloning distinct nodes in MDNodeMapper::mapDistinct if RF_ReuseAndMutateDistinctMDs, dropping the no-longer-needed local helper `cloneOrBuildODR()`. Skipping cloning in other cases is unsound and breaks CloneModule, which is why the textual IR for PR48841 didn't pass previously. This commit adds the test as: Transforms/ThinLTOBitcodeWriter/cfi-debug-info-cloned-type-references-global-value.ll Cloning less often exposed a hole in subprogram cloning in CloneFunctionInto thanks to df763188c9a1ecb1e7e5c4d4ea53a99fbb755903's test ThinLTO/X86/Inputs/dicompositetype-unique-alias.ll. If a function has a subprogram attachment whose scope is a DICompositeType that shouldn't be cloned, but it has no internal debug info pointing at that type, that composite type was being cloned. This commit plugs that hole, calling DebugInfoFinder::processSubprogram from CloneFunctionInto. As hinted at in 22a52dfddcefad4f275eb8ad1cc0e200074c2d8a's commit message, I think we need to formalize ownership of metadata a bit more so that ValueMapper/CloneFunctionInto (and similar functions) can deal with cloning (or not) metadata in a more generic, less fragile way. This fixes PR48841. Differential Revision: https://reviews.llvm.org/D96734	2021-02-24 12:57:52 -08:00
Sander de Smalen	5e19208d96	[InstructionCost] NFC: Fix up missing cases in LoopVectorize and CodeGenPrep. This fixes the types of a few more cost variables to be of type InstructionCost.	2021-02-24 14:30:03 +00:00
Pierre Gousseau	27830bc2b1	[asan] Avoid putting globals in a comdat section when targetting elf. Putting globals in a comdat for dead-stripping changes the semantic and can potentially cause false negative odr violations at link time. If odr indicators are used, we keep the comdat sections, as link time odr violations will be dectected for the odr indicator symbols. This fixes PR 47925	2021-02-24 12:01:56 +00:00
Simon Pilgrim	b94c215592	[Utils] collectBitParts - add truncate() handling	2021-02-24 11:48:34 +00:00
Florian Hahn	6240f436dd	Recommit "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends." This reverts the revert commit `437f0bbcd5`. It adds a new toVPRecipeResult, which forces VPRecipeOrVPValueTy to be constructed with a VPRecipeBase *. This should address ambiguous constructor issues for recipe sub-types that also inherit from VPValue.	2021-02-24 10:36:02 +00:00
Dan Liew	7d3ef103b5	[ASan] Introduce a way set different ways of emitting module destructors. Previously there was no way to control how module destructors were emitted by `ModuleAddressSanitizerPass`. However, we want language frontends (e.g. Clang) to be able to decide how to emit these destructors (if at all). This patch introduces the `AsanDtorKind` enum that represents the different ways destructors can be emitted. There are currently only two valid ways to emit destructors. * `Global` - Use `llvm.global_dtors`. This was the previous behavior and is the default. * `None` - Do not emit module destructors. The `ModuleAddressSanitizerPass` and the various wrappers around it have been updated to take the `AsanDtorKind` as an argument. The `-asan-destructor-kind=` command line argument has been introduced to make this easy to test from `opt`. If this argument is specified it overrides the value passed to the `ModuleAddressSanitizerPass` constructor. Note that `AsanDtorKind` is not `bool` because we will introduce a new way to emit destructors in a subsequent patch. Note that `AsanDtorKind` is given its own header file because if it is declared in `Transforms/Instrumentation/AddressSanitizer.h` it leads to compile error (Module is ambiguous) when trying to use it in `clang/Basic/CodeGenOptions.def`. rdar://71609176 Differential Revision: https://reviews.llvm.org/D96571	2021-02-23 20:01:21 -08:00
Juneyoung Lee	56d228a14e	[SimplifyCFG] Update passingValueIsAlwaysUndefined to check more attributes This is a simple patch to update SimplifyCFG's passingValueIsAlwaysUndefined to inspect more attributes. A new function `CallBase::isPassingUndefUB` checks attributes that imply noundef. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D97244	2021-02-24 10:40:50 +09:00
Fangrui Song	ef312951fd	collectUsedGlobalVariables: migrate SmallPtrSetImpl overload to SmallVecImpl overload after D97128 And delete the SmallPtrSetImpl overload. While here, decrease inline element counts from 8 to 4. See D97128 for the choice. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D97257	2021-02-23 16:09:06 -08:00
Fangrui Song	ed02f52d28	Fix unstable SmallPtrSet iteration issues due to collectUsedGlobalVariables While here, decrease inline element counts from 8 to 4. See D97128 for the choice. Depends on D97128 (which added a new SmallVecImpl overload for collectUsedGlobalVariables). Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D97139	2021-02-23 16:09:05 -08:00
Fangrui Song	3adb89bb9f	[ThinLTO] Make cloneUsedGlobalVariables deterministic Iterating on `SmallPtrSet<GlobalValue *, 8>` with more than 8 elements is not deterministic. Use a SmallVector instead because `Used` is guaranteed to contain unique elements. While here, decrease inline element counts from 8 to 4. The number of `llvm.used`/`llvm.compiler.used` elements is usually 0 or 1. For full LTO/hybrid LTO, the number may be large, so we need to be careful. According to tejohnson's analysis https://reviews.llvm.org/D97128#2582399 , 4 is good for a large project with WholeProgramDevirt, when available_externally vtables are placed in the llvm.compiler.used set. Differential Revision: https://reviews.llvm.org/D97128	2021-02-23 16:09:05 -08:00
Teresa Johnson	0a5949dcfa	[WPD] Fix handling of pure virtual base class The fix in `3c4c205060` caused an assert in the case of a pure virtual base class. In that case, the vTableFuncs list on the summary will be empty, so we were hitting the new assert that the linkage type was not available_externally. In the case of pure virtual, we do not want to assert, and additionally need to set VS so that we don't treat it conservatively and quit the analysis of the type id early. This exposed a pre-existing issue where we were not updating the vcall visibility on pure virtual functions when whole program visibility was specified. We were skipping updating the visibility on any global vars that didn't have any vTableFuncs, which meant all pure virtual were not updated, and the later analysis would block any devirtualization of calls that had a type id used on those pure virtual vtables (see the handling in the other code modified in this patch). Simply remove that check. It will mean that we may update the vcall visibility on global vars that aren't vtables, but that setting is ignored for any global vars that didn't have type metadata anyway. Added a new test case that asserted without removing the assert, and that requires the other fixes in this patch (updateVCallVisibilityInIndex and not skipping all vtables without virtual funcs) to get a successful devirtualization with index-only WPD. I added cases to test hybrid and regular LTO for completeness, although those already worked without the fixes here. With this final fix, a clang multistage bootstrap with WPD builds and runs all tests successfully. Differential Revision: https://reviews.llvm.org/D97126	2021-02-23 16:07:09 -08:00
Jianzhou Zhao	a05aa0dd5e	[dfsan] Update memset and dfsan_(set\|add)_label with origin tracking This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97302	2021-02-23 23:16:33 +00:00
Matthew Voss	6da7d31416	[llvm-profdata] Emit Error when Invalid MemOpSize Section is Created by llvm-profdata Under certain (currently unknown) conditions, llvm-profdata is outputting profiles that have two consecutive entries in the MemOPSize section for the value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch instruction with two cases for 0. As mentioned, we’re not quite sure what’s causing this to happen, but this patch prevents llvm-profdata from outputting a profile that has this problem and gives an error with a request for a reproducible. Differential Revision: https://reviews.llvm.org/D92074	2021-02-23 12:51:54 -08:00
Andrei Elovikov	3605b873f6	[NFC][VPlan] Use VPUser to store block's predicate Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96529	2021-02-23 11:08:27 -08:00
Florian Hahn	de40423c85	[LV] Ensure fixNonInductionPHIs uses a valid insertion point. In some cases, Builder's insertion point may be invalidated before using it in VPTransformState::get. Make sure the insertion point is up-to-date. This should fix various sanitizer errors, like https://lab.llvm.org/buildbot/#/builders/5/builds/4933/steps/9/logs/stdio	2021-02-23 18:51:05 +00:00
Florian Hahn	437f0bbcd5	Revert "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends." This reverts commit `4efa097eb4`, because some the compilers used for some bots do not support automatic conversions to PointerUnion.	2021-02-23 16:57:21 +00:00
Florian Hahn	4efa097eb4	[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends. Generalize the return value of tryToCreateWidenRecipe to return either a newly create recipe or an existing VPValue. Use this to avoid creating unnecessary VPBlendRecipes. Fixes PR44800.	2021-02-23 16:52:03 +00:00
Juneyoung Lee	19c2e12947	[JumpThreading] Update computeValueKnownInPredecessors to recognize logical and/or patterns This allows JumpThreading's computeValueKnownInPredecessors to recognize select form of and/or patterns as well.	2021-02-24 00:06:10 +09:00
Nate Chandler	01b4890e47	Add @llvm.coro.async.size.replace intrinsic. The new intrinsic replaces the size in one specified AsyncFunctionPointer with the size in another. This ability is necessary for functions which merely forward to async functions such as those defined for partial applications. Reviewed By: aschwaighofer Differential Revision: https://reviews.llvm.org/D97229	2021-02-23 06:43:52 -08:00
David Green	dd2dbf7ee2	[TTI] Change getOperandsScalarizationOverhead to take Type args As a followup to D95291, getOperandsScalarizationOverhead was still using a VF as a vector factor if the arguments were scalar, and would assert on certain matrix intrinsics with differently sized vector arguments. This patch removes the VF arg, instead passing the Types through directly. This should allow it to more accurately compute the cost without having to guess at which operands will be vectorized, something difficult with more complex intrinsics. This adjusts one SVE test as it is now calling the wrong intrinsic vs veccall. Without invalid InstructCosts the cost of the scalarized intrinsic is too low. This should get fixed when the cost of scalarization is accounted for with scalable types. Differential Revision: https://reviews.llvm.org/D96287	2021-02-23 13:04:59 +00:00
David Green	bd4b61efbd	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-23 13:03:26 +00:00
Matteo Favaro	633e090528	[DSE] Allow ptrs defined in the entry block in IsGuaranteedLoopInvariant. The IsGuaranteedLoopInvariant function is making sure to check if the incoming pointer is guaranteed to be loop invariant, therefore I think the case where the pointer is defined in the entry block of a function automatically guarantees the pointer to be loop invariant, as the entry block of a function cannot have predecessors or be part of a loop. I implemented this small patch and tested it using ninja check-llvm-unit and ninja check-llvm. I added a contained test file that shows the problem and used opt -O3 -debug on it to make sure the case is not currently handled (in fact the debug log is showing that the DSE pass is bailing out when testing if the killer store is able to clobber the dead store). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96979	2021-02-23 12:00:44 +00:00
Juneyoung Lee	481c62277d	[BuildLibCalls] Add noundef to allocator fns' size This is a patch to explicitly mark the size parameter of allocator functions like malloc/realloc/... as noundef. For C/C++: undef can be created from reading an uninitialized variable or padding. Calling a function with uninitialized variable is already UB. Calling malloc with padding value is.. something that's not expected. Padding bits may appear in a coerced aggregate, which doesn't apply to malloc's size. Therefore, malloc's size can be marked as noundef. For transformations that introduce malloc/realloc/..: I ran LLVM unit tests with an updated Alive2 semantics, and found no regression, so it seems okay. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97045	2021-02-23 13:58:03 +09:00
Kazu Hirata	4ed47858ab	[llvm] Use llvm::drop_begin (NFC)	2021-02-22 20:17:16 -08:00
ksyx	4125cabce1	[GVN] Fix a typo in comment NFC. Differential Revision: https://reviews.llvm.org/D97200 Reviewed By: fhahn	2021-02-23 10:39:34 +08:00
Jianzhou Zhao	7424efd5ad	[dfsan] Propagate origins at non-memory/phi/call instructions This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97200	2021-02-23 02:12:45 +00:00
Petr Hosek	c24b7a16b1	[InstrProfiling] Use ELF section groups for counters, data and values __start_/__stop_ references retain C identifier name sections such as __llvm_prf_*. Putting these into a section group disables this logic. The ELF section group semantics ensures that group members are retained or discarded as a unit. When a function symbol is discarded, this allows allows linker to discard counters, data and values associated with that function symbol as well. Note that `noduplicates` COMDAT is lowered to zero-flag section group in ELF. We only set this for functions that aren't already in a COMDAT and for those that don't have available_externally linkage since we already use regular COMDAT groups for those. Differential Revision: https://reviews.llvm.org/D96757	2021-02-22 14:00:02 -08:00
Alexey Bataev	9a4dd4de9d	[SLP]No need to mark scatter load pointer as scalar as it gets vectorized. Pointer operand of scatter loads does not remain scalar in the tree (it gest vectorized) and thus must not be marked as the scalar that remains scalar in vectorized form. Differential Revision: https://reviews.llvm.org/D96818	2021-02-22 11:58:28 -08:00
Petr Hosek	4827492d9f	Revert "[InstrProfiling] Use ELF section groups for counters, data and values" This reverts commits: `5ca21175e0` `97184ab99c` The instrprof-gc-sections.c is failing on AArch64 LLD bot.	2021-02-22 11:13:55 -08:00
Florian Hahn	95daec6a84	[ConstraintElimination] Use unsigned > 0 instead of != 0. ICMP_NE predicates cannot be directly represented as constraint. But we can use ICMP_UGT instead ICMP_NE for %x != 0. See https://alive2.llvm.org/ce/z/XlLCsW	2021-02-22 17:54:36 +00:00
Nikita Popov	4125afc357	[MemCpyOpt] Fix handling of readnone byval arguments If the call is readnone, then there may not be any MemoryAccess associated with the call. Bail out in that case. This fixes the issue reported at https://reviews.llvm.org/D94376#2578312.	2021-02-22 18:48:31 +01:00
Nikita Popov	5e7e499b91	[JumpThreading] Clone noalias.scope.decl when threading blocks When cloning instructions during jump threading, also clone and adapt any declared scopes. This is primarily important when threading loop exits, because we'll end up with two dominating scope declarations in that case (at least after additional loop rotation). This addresses a loose thread from https://reviews.llvm.org/rG2556b413a7b8#975012. Differential Revision: https://reviews.llvm.org/D97154	2021-02-22 18:35:30 +01:00
Florian Hahn	c7ee57f1dc	[LV] Directly use incoming value for single VPBlendRecipes. VPBlendRecipes with single incoming (value, mask) pair are no-ops. Use the incoming value directly.	2021-02-22 16:10:08 +00:00
Florian Hahn	c11fd0df64	[VPlan] Skip VPWidenPHIRecipe in VPInterleavedACcessInfo. Update unit tests that did not expect VPWidenPHIRecipes after `15a74b64df`.	2021-02-22 10:35:09 +00:00
Florian Hahn	15a74b64df	[VPlan] Manage pairs of incoming (VPValue, VPBB) in VPWidenPHIRecipe. This patch extends VPWidenPHIRecipe to manage pairs of incoming (VPValue, VPBasicBlock) in the VPlan native path. This is made possible because we now directly manage defined VPValues for recipes. By keeping both the incoming value and block in the recipe directly, code-generation in the VPlan native path becomes independent of the predecessor ordering when fixing up non-induction phis, which currently can cause crashes in the VPlan native path. This fixes PR45958. Reviewed By: sguggill Differential Revision: https://reviews.llvm.org/D96773	2021-02-22 09:44:25 +00:00
Petr Hosek	5ca21175e0	[InstrProfiling] Use ELF section groups for counters, data and values __start_/__stop_ references retain C identifier name sections such as __llvm_prf_*. Putting these into a section group disables this logic. The ELF section group semantics ensures that group members are retained or discarded as a unit. When a function symbol is discarded, this allows allows linker to discard counters, data and values associated with that function symbol as well. Note that `noduplicates` COMDAT is lowered to zero-flag section group in ELF. We only set this for functions that aren't already in a COMDAT and for those that don't have available_externally linkage since we already use regular COMDAT groups for those. Differential Revision: https://reviews.llvm.org/D96757	2021-02-21 16:13:06 -08:00
Nikita Popov	e0615bcd39	[Loads] Add optimized FindAvailableLoadedValue() overload (NFCI) FindAvailableLoadedValue() accepts an iterator by reference. If no available value is found, then the iterator will either be left at a clobbering instruction or the beginning of the basic block. This allows using FindAvailableLoadedValue() across multiple blocks. If this functionality is not needed, as is the case in InstCombine, then we can use a much more efficient implementation: First try to find an available value, and only perform clobber checks if we actually found one. As this function only looks at a very small number of instructions (6 by default) and usually doesn't find an available value, this saves many expensive alias analysis queries.	2021-02-21 18:42:56 +01:00
Kristina Bessonova	e97aab8d15	[ThinLTO] Fix import of multiply defined global variables Currently, if there is a module that contains a strong definition of a global variable and a module that has both a weak definition for the same global and a reference to it, it may result in an undefined symbol error while linking with ThinLTO. It happens because: * the strong definition become internal because it is read-only and can be imported; * the weak definition gets replaced by a declaration because it's non-prevailing; * the strong definition failed to be imported because the destination module already contains another definition of the global yet this def is non-prevailing. The patch adds a check to computeImportForReferencedGlobals() that allows considering a global variable for being imported even if the module contains a definition of it in the case this def has an interposable linkage type. Note that currently the check is based only on the linkage type (and this seems to be enough at the moment), but it might be worth to account the information whether the def is prevailing or not. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95943	2021-02-21 18:34:12 +02:00
Jianzhou Zhao	9524632fa2	[dfsan] Comment out unused methods by D97087 temporarily	2021-02-21 03:31:19 +00:00
Sanjay Patel	e772618f1e	[InstCombine] fold fdiv with exp/exp2 divisor (PR49147) Follow-up to: D96648 / `b40fde062` ...for the special-case base calls. From the earlier commit: This is unusual in the general (non-reciprocal) case because we need an extra instruction, but that should be better for general FP reassociation and codegen. We conservatively check for "arcp" FMF here as we do with existing fdiv folds, but it is not strictly necessary to have that.	2021-02-20 16:02:58 -05:00
Teresa Johnson	fde55a9c9b	[LTO] Fix cloning of llvm.used when splitting module Refines the fix in `3c4c205060` to only put globals whose defs were cloned into the split regular LTO module on the cloned llvm.used globals. This avoids an issue where one of the attached values was a local that was promoted in the original module after the module was cloned. We only need to have the values defined in the new module on those globals. Fixes PR49251. Differential Revision: https://reviews.llvm.org/D97013	2021-02-20 09:46:43 -08:00
Simon Pilgrim	609d0c9772	[InstCombine] matchBSwapOrBitReverse - remove pattern matching early-out. NFCI. recognizeBSwapOrBitReverseIdiom + collectBitParts have pattern matching to bail out early if a bswap/bitreverse pattern isn't possible - we should be able to rely on this instead without any notable change in compile time. This is part of a cleanup towards letting matchBSwapOrBitReverse /recognizeBSwapOrBitReverseIdiom use 'root' instructions that aren't ORs (FSHL/FSHRs in particular which can be prematurely created). Differential Revision: https://reviews.llvm.org/D97056	2021-02-20 13:15:34 +00:00
Dávid Bolvanský	cd54c57919	Reland "[Libcalls, Attrs] Annotate libcalls with noundef" Fixed Clang tests.	2021-02-20 06:18:48 +01:00
Dávid Bolvanský	94d034fb86	Revert "[Libcalls, Attrs] Annotate libcalls with noundef" This reverts commit `33b0c63775`. Bots are failing. Some Clang tests need to be updated too.	2021-02-20 04:18:42 +01:00
Dávid Bolvanský	33b0c63775	[Libcalls, Attrs] Annotate libcalls with noundef I think we can use here same logic as for nonnull. strlen(X) - X must be noundef => valid pointer. for libcalls with size arg, we add noundef only if size is known and greater than 0 - so pointers must be noundef (valid ones) Reviewed By: jdoerfert, aqjune Differential Revision: https://reviews.llvm.org/D95122	2021-02-20 04:10:07 +01:00
Dávid Bolvanský	68e6025cf7	Revert "[BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly" This reverts commit `05d891a19e`.	2021-02-20 03:58:53 +01:00
Dávid Bolvanský	05d891a19e	[BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94850	2021-02-20 03:56:01 +01:00
Jianzhou Zhao	dab953c8e4	[dfsan] Add utils that get/set origins This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97087	2021-02-20 00:52:33 +00:00
Jianzhou Zhao	cb1f1aab90	[dfsan] Add origin address calculation This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97065	2021-02-19 21:30:07 +00:00
Jianzhou Zhao	efc8f3311b	[msan] Set cmpxchg shadow precisely In terms of https://llvm.org/docs/LangRef.html#cmpxchg-instruction, the return type of chmpxchg is a pair {ty, i1}, while I think we only wanted to set the shadow for the address 0th op, and it has type ty. Reviewed-by: eugenis Differential Revision: https://reviews.llvm.org/D97029	2021-02-19 20:23:23 +00:00
Wei Mi	4ffad1fb48	[SampleFDO] Add PromotedInsns to prevent repeated ICP. In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9, We use 0 count value profile to memorize which target has been promoted and prevent repeated ICP for the same target, so we delete PromotedInsns. However, I found the implementation in the patch has some shortcomings to be fixed otherwise there will still be repeated ICP. So I add PromotedInsns back temorarily. Will remove it after I get a thorough fix.	2021-02-19 10:01:49 -08:00
Benjamin Kramer	59f442e6bb	[LV] Fold single-use variable into assert. NFC.	2021-02-19 18:11:39 +01:00
Nikita Popov	71a8e4e7d6	[MemCopyOpt] Enable MemorySSA by default This enables use of MemorySSA instead of MemDep in MemCpyOpt. To allow this without significant compile-time impact, the MemCpyOpt pass is moved directly before DSE (in the cases where this was not already the case), which allows us to reuse the existing MemorySSA analysis. Unlike the MemDep-based implementation, the MemorySSA-based MemCpyOpt can also perform simple optimizations across basic blocks. Differential Revision: https://reviews.llvm.org/D94376	2021-02-19 18:06:25 +01:00
Florian Hahn	edc92a1c42	[LV] Remove VPCallback. Now that all state for generated instructions is managed directly in VPTransformState, VPCallBack is no longer needed. This patch updates the last use of `getOrCreateScalarValue` to instead manage the value directly in VPTransformState and removes VPCallback. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95383	2021-02-19 12:50:41 +00:00
Nikita Popov	2f17ed294f	[DCE] Don't remove non-willreturn calls In both ADCE and BDCE (via DemandedBits) we should not remove instructions that are not guaranteed to return. This issue was pointed out by fhahn in the recent llvm-dev thread. Differential Revision: https://reviews.llvm.org/D96993	2021-02-19 12:35:40 +01:00
Nikita Popov	370addb996	[IR] Move willReturn() to Instruction This moves the willReturn() helper from CallBase to Instruction, so that it can be used in a more generic manner. This will make it easier to fix additional passes (ADCE and BDCE), and will give us one place to change if additional instructions should become non-willreturn (e.g. there has been talk about handling volatile operations this way). I have also included the IntrinsicInst workaround directly in here, so that it gets applied consistently. (As such this change is not entirely NFC -- FuncAttrs will now use this as well.) Differential Revision: https://reviews.llvm.org/D96992	2021-02-19 11:56:01 +01:00
Djordje Todorovic	1a2b3536ef	Reland "[Debugify] Make the debugify aware of the original (-g) Debug Info" As discussed on the RFC [0], I am sharing the set of patches that enables checking of original Debug Info metadata preservation in optimizations. The proof-of-concept/proposal can be found at [1]. The implementation from the [1] was full of duplicated code, so this set of patches tries to merge this approach into the existing debugify utility. For example, the utility pass in the original-debuginfo-check mode could be invoked as follows: $ opt -verify-debuginfo-preserve -pass-to-test sample.ll Since this is very initial stage of the implementation, there is a space for improvements such as: - Add support for the new pass manager - Add support for metadata other than DILocations and DISubprograms [0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ [1] https://github.com/djolertrk/llvm-di-checker Differential Revision: https://reviews.llvm.org/D82545 The test that was failing is now forced to use the old PM.	2021-02-18 23:29:22 -08:00
Xun Li	3bf8f162a0	[Coroutine] Relax CoroElide musttail check As discussed in D94834, we don't really need to do complicated analysis. It's safe to just drop the tail call attribute. Differential Revision: https://reviews.llvm.org/D96926	2021-02-18 19:36:11 -08:00
Wei Mi	5fb65c02ca	[SampleFDO] Stop repeated indirect call promotion for the same target. Found a problem in indirect call promotion in sample loader pass. Currently if an indirect call is promoted for a target, and if the parent function is inlined into some other function, the indirect call can be promoted for the same target again. That is redundent which can harm performance and can cause excessive compile time in some extreme case. The patch fixes the issue. If a target is promoted for an indirect call, the patch will write ICP metadata with the target call count being set to 0. In the later ICP in sample profile loader, if it sees a target has 0 count for an indirect call, it knows the target has been promoted and won't do indirect call promotion for the indirect call. The fix brings 0.1~0.2% performance on our search benchmark. Differential Revision: https://reviews.llvm.org/D96806	2021-02-18 17:01:32 -08:00
Jianzhou Zhao	7e658b2fdc	[dfsan] Instrument origin variable and function definitions This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse, gbalats Differential Revision: https://reviews.llvm.org/D96977	2021-02-18 23:50:05 +00:00
Hongtao Yu	e87b1b1d4e	[CSSPGO] Use callsite sample counts to annotate indirect call sites. With CSSPGO all indirect call targets are counted torwards the original indirect call site in the profile, including both inlined and non-inlined targets. Therefore no need to look for callee entry counts. This also fixes the issue where callee entry count doesn't match callsite count due to the nature of CS sampling. I'm also cleaning up the orginal code that called `findIndirectCallFunctionSamples` just to compute the sum, the return value of which was disgarded. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D96990	2021-02-18 14:52:34 -08:00
Nikita Popov	70e3c9a8b6	[BasicAA] Always strip single-argument phi nodes We can always look through single-argument (LCSSA) phi nodes when performing alias analysis. getUnderlyingObject() already does this, but stripPointerCastsAndInvariantGroups() does not. We still look through these phi nodes with the usual aliasPhi() logic, but sometimes get sub-optimal results due to the restrictions on value equivalence when looking through arbitrary phi nodes. I think it's generally beneficial to keep the underlying object logic and the pointer cast stripping logic in sync, insofar as it is possible. With this patch we get marginally better results: aa.NumMayAlias \| 5010069 \| 5009861 aa.NumMustAlias \| 347518 \| 347674 aa.NumNoAlias \| 27201336 \| 27201528 ... licm.NumPromoted \| 1293 \| 1296 I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(), as we're past the point where we can explicitly spell out everything that's getting stripped. Differential Revision: https://reviews.llvm.org/D96668	2021-02-18 23:07:50 +01:00
Ta-Wei Tu	f70cdc5b5c	[NPM] Properly reset parent loop after loop passes This fixes https://bugs.llvm.org/show_bug.cgi?id=49185 When `NDEBUG` is not set, `LPMUpdater` checks if the added loops have the same parent loop as the current one in `addSiblingLoops`. If multiple loop passes are executed through `LoopPassManager`, `U.ParentL` will be the same across all passes. However, the parent loop might change after running a loop pass, resulting in assertion failures in subsequent passes. This patch resets `U.ParentL` after running individual loop passes in `LoopPassManager`. Reviewed By: asbirlea, ychen Differential Revision: https://reviews.llvm.org/D96727	2021-02-19 02:50:53 +08:00
Jianzhou Zhao	406dc54903	[dfsan] Refactor defining TLS variables This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D96941	2021-02-18 18:04:21 +00:00

1 2 3 4 5 ...

26805 Commits