llvm-project

Commit Graph

Author	SHA1	Message	Date
Serge Pavlov	47b3b76825	Implement inlining of strictfp functions According to the current design, if a floating point operation is represented by a constrained intrinsic somewhere in a function, all floating point operations in the function must be represented by constrained intrinsics. It imposes additional requirements to inlining mechanism. If non-strictfp function is inlined into strictfp function, all ordinary FP operations must be replaced with their constrained counterparts. Inlining strictfp function into non-strictfp is not implemented as it would require replacement of all FP operations in the host function, which now is undesirable due to expected performance loss. Differential Revision: https://reviews.llvm.org/D69798	2022-03-31 19:15:52 +07:00
Alexandros Lamprineas	b4417075dc	[FuncSpec] Constant propagate multiple arguments for recursive functions. This fixes a TODO in constantArgPropagation() to make it feature complete. However, I do find myself in agreement with the review comments in https://reviews.llvm.org/D106426. I don't think we should pursue specializing such recursive functions as the code size increase becomes linear to 'max-iters'. Compiling the modified test just with -O3 (no function specialization) generates the same code. Differential Revision: https://reviews.llvm.org/D122755	2022-03-31 13:00:08 +01:00
Florian Hahn	2760cdc9c6	Revert "[LV] Remove unneeded createHeaderBranch.(NFCI)" This reverts commit `32bc83d11e`. This is causing bots with expensive-checks to fail. Revert while I investigate.	2022-03-31 12:32:50 +01:00
Florian Hahn	32bc83d11e	[LV] Remove unneeded createHeaderBranch.(NFCI) The only remaining use was to get the exit block of the loop. Instead of relying on the loop, use the successor of VectorHeaderBB (LoopMiddleBlock) directly to set VPTransformState::CFG::ExitB Depends on D121621. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121623	2022-03-31 11:48:52 +01:00
Florian Hahn	2c494f0941	[VPlan] Remove unneeded Loop variable (NFC). Suggested in D121623. The remaining uses of L can be replaced, reducing the need for the variable.	2022-03-31 10:34:28 +01:00
Marco Elver	b8e49fdcb1	[AddressSanitizer] Allow prefixing memintrinsic calls in kernel mode Allow receiving memcpy/memset/memmove instrumentation by using __asan or __hwasan prefixed versions for AddressSanitizer and HWAddressSanitizer respectively when compiling in kernel mode, by passing params -asan-kernel-mem-intrinsic-prefix or -hwasan-kernel-mem-intrinsic-prefix. By default the kernel-specialized versions of both passes drop the prefixes for calls generated by memintrinsics. This assumes that all locations that can lower the intrinsics to libcalls can safely be instrumented. This unfortunately is not the case when implicit calls to memintrinsics are inserted by the compiler in no_sanitize functions [1]. To solve the issue, normal memcpy/memset/memmove need to be uninstrumented, and instrumented code should instead use the prefixed versions. This also aligns with ASan behaviour in user space. [1] https://lore.kernel.org/lkml/Yj2yYFloadFobRPx@lakrids/ Reviewed By: glider Differential Revision: https://reviews.llvm.org/D122724	2022-03-31 11:14:42 +02:00
David Green	b65267ca7b	[LV] Invalidate widening decisions after maximizing vector bandwidth When MaximizeVectorBandwidth is enabled, we can end up (via calls to collectUniformsAndScalars/setCostBasedWideningDecision through calculateRegisterUsage) making widening decisions before we have decided whether to fold the tail by masking. These decisions will be wrong if we later decided to fold the tail, for example when the trip count is very low. It will use incorrect costs for loads that should get masked, using standard memory operation costs instead. This still at the moment uses the EmulatedMaskMemRefHack costs (a bit unfortunately), but the old costs without this change were 1, leading to too optimistic vectorization. This slightly changes the way that the MaximizeVectorBandwidth option works to make it easier to test, always honouring the option if it is set. Differential Revision: https://reviews.llvm.org/D120215	2022-03-31 09:19:31 +01:00
Aditya Kumar	368681f803	[GVNHoist] drop debug location according to the debug info guide According to the LLVM debug info update guide: https://llvm.org/docs/HowToUpdateDebugInfo.html, "Hoisting identical instructions which appear in several successor blocks into a predecessor block. In this case there is no single merged instruction. The rule for dropping locations applies". Thanks to Yuanbo Li for reporting this. Reviewed By: dblaikie Reviewers: sebpop, tejohnson, dblaikie Differential Revision: https://reviews.llvm.org/D122730	2022-03-30 20:17:53 -07:00
Stephen Long	e02f4976ac	[LoopIdiom] Merge TBAA of adjacent stores when creating memset Factor in the TBAA of adjacent stores instead of just the head store when merging stores into a memset. We were seeing GVN remove a load that had a TBAA that matched the 2nd store because GVN determined it didn't match the TBAA of the memset. The memset had the TBAA of only the first store. i.e. Loading the field pi_ of shared_count after memset to create an array of shared_ptr template<class T> class shared_ptr { T p; shared_count refcount; }; class shared_count { sp_counted_base pi_; }; Differential Revision: https://reviews.llvm.org/D122205	2022-03-30 16:54:49 -07:00
Florian Hahn	e4543af4e6	[VPlan] Track current vector loop in VPTransformState (NFC). Instead of looking up the vector loop using the header, keep track of the current vector loop in VPTransformState. This removes the requirement for the vector header block being part of the loop up front. A follow-up patch will move the code to generate the Loop object for the vector loop to VPRegionBlock. Depends on D121619. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121621	2022-03-30 22:16:40 +01:00
Chang-Sun Lin Jr	c28ce745cf	Value-number GVNHoist loads by result type as well as pointer address. Avoids merge errors when opaque pointers are loaded into different types. Reviewed by: jcranmer-intel, hiraditya Differential Revision: https://reviews.llvm.org/D122521	2022-03-30 11:33:49 -07:00
Florian Hahn	e8673f2f20	[LV] Do not create separate latch block in VPlan::execute. Now that all dependencies on creating the latch block up-front have been removed, there is no need to create it early. Depends on D121618. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121619	2022-03-30 17:31:38 +01:00
Florian Hahn	8a4077fac0	[LV] Pass LoopHeaderBB directly to updateDominatorTree. (NFC) At the call site, we already know what the vector header block is. Pass it directly.	2022-03-30 13:11:20 +01:00
Florian Hahn	ecb4171dcb	[LV] Handle zero cost loops in selectInterleaveCount. In some case, like in the added test case, we can reach selectInterleaveCount with loops that actually have a cost of 0. Unfortunately a loop cost of 0 is also used to communicate that the cost has not been computed yet. To resolve the crash, bail out if the cost remains zero after computing it. This seems like the best option, as there are multiple code paths that return a cost of 0 to force a computation in selectInterleaveCount. Computing the cost at multiple places up front there would unnecessarily complicate the logic. Fixes #54413.	2022-03-29 22:52:43 +01:00
Chris Bieneman	9130e471fe	Add DXContainer DXIL is wrapped in a container format defined by the DirectX 11 specification. Codebases differ in calling this format either DXBC or DXILContainer. Since eventually we want to add support for DXBC as a target architecture and the format is used by DXBC and DXIL, I've termed it DXContainer here. Most of the changes in this patch are just adding cases to switch statements to address warnings. Reviewed By: pete Differential Revision: https://reviews.llvm.org/D122062	2022-03-29 14:34:23 -05:00
Florian Hahn	d1d3563278	[LV] Move code to place pointer induction increment to VPlan post-processing. This patch moves the code to set the correct incoming block for the backedge value to VPlan::execute. When generating the phi node, the backedge value is temporarily added using the pre-header as incoming block. The invalid phi node will be fixed up during VPlan::execute after main VPlan code generation. At the same time, the backedge value is also moved to the latch. This change removes the requirement to create the latch block up-front for VPWidenInductionPHIRecipe::execute, which in turn will enable modeling the pre-header in VPlan. Depends on D121617. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121618	2022-03-29 20:27:59 +01:00
Hirochika Matsumoto	a3cffc1150	[InstCombine] Fold (ctpop(X) == 1) \| (X == 0) into ctpop(X) < 2 https://alive2.llvm.org/ce/z/94yRMN Fixes #54177 Differential Revision: https://reviews.llvm.org/D122077	2022-03-29 11:30:06 -04:00
Nikita Popov	682ef39b1a	[InstCombine] Remove call to getPointerElementType() This was erroneously re-introduced as part of `bb0b23174e`.	2022-03-29 16:52:29 +02:00
Florian Hahn	3dbb5eb2cd	[ConstraintElimination] Move ConstraintInfo after ConstraintTy. (NFC) Code movement to it slightly easier to use ConstraintTy & co in ConstraintInfo directly, for follow-up patches.	2022-03-29 09:59:03 +01:00
Serguei Katkov	6444a65514	[LSR] Fixup canonicalization formula and its checker. According to definition of canonical form, it is a canonical if scale reg does not contain addrec for loop L then none of bases should contain addrec for this loop. The critical word here is "contains". Current checker of canonical form checks not "containing" property but "is". So it does not check whether it contains but whether it is. Fix the checker and canonicalizing utility to follow definition. Without this fix in the test attached the base formula looking as reg((-1 * {0,+,8}<nuw><nsw><%bb2>)<nsw>) + 1reg((8 (%arg /u 8))<nuw>) is considered as conanocial while base contains an addrec. And modified formula we want to insert reg({0,+,8}<nuw><nsw><%bb2>) + 1reg((-8 (%arg /u 8))) is considered as not canonical. Reviewed By: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D122457	2022-03-29 14:05:04 +07:00
serge-sans-paille	01be9be2f2	Cleanup includes: final pass Cleanup a few extra files, this closes the work on libLLVM dependencies on my side. Impact on libLLVM preprocessed output: -35876 lines Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122576	2022-03-29 09:00:21 +02:00
Paul Kirth	90cb325abd	Revert "[misexpect] Re-implement MisExpect Diagnostics" This reverts commit `2add3fbd97`.	2022-03-29 06:20:30 +00:00
Philip Reames	33deaa13b8	[memcpyopt] Common code into performCallSlotOptzn [NFC] We have the same code repeated in both callers, sink it into callee. The motivation here isn't just code style, we can also defer the relatively expensive aliasing checks until the cheap structural preconditions have been validated. (e.g. Don't bother aliasing if src is not an alloca.) This helps compile time significantly.	2022-03-28 20:10:13 -07:00
Philip Reames	7d6e8f2a96	[slp] Delete dead scalar instructions feeding vectorized instructions If we vectorize a e.g. store, we leave around a bunch of getelementptrs for the individual scalar stores which we removed. We can go ahead and delete them as well. This is purely for test output quality and readability. It should have no effect in any sane pipeline. Differential Revision: https://reviews.llvm.org/D122493	2022-03-28 20:10:13 -07:00
Johannes Doerfert	7df2eba7fa	[Attributor][OpenMP] Add assumption for non-call assembly instructions Inline assembly is scary but we need to support it for the OpenMP GPU device runtime. The new assumption expresses the fact that it may not have call semantics, that is, it will not call another function but simply perform an operation or side-effect. This is important for reachability in the presence of inline assembly. Differential Revision: https://reviews.llvm.org/D109986	2022-03-28 20:57:52 -05:00
Johannes Doerfert	bb0b23174e	[InstCombineCalls] Optimize call of bitcast even w/ parameter attributes Before we gave up if a call through bitcast had parameter attributes. Interestingly, we allowed attributes for the return value already. We now handle both the same way, namely, we drop the ones that are incompatible with the new type and keep the rest. This cannot cause "more UB" than initially present. Differential Revision: https://reviews.llvm.org/D119967	2022-03-28 20:57:52 -05:00
Paul Kirth	2add3fbd97	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D115907	2022-03-28 23:30:04 +00:00
Alina Sbirlea	f7381a795a	Revert `29fada4a3d` Seeing a test failure with asan in Halide generated code, reverting while I investigate. Differential Revision: https://reviews.llvm.org/D121987	2022-03-28 16:17:41 -07:00
chenglin.bi	9a53793ab8	[InstCombine] Fold two select patterns into and-or select (~a \| c), a, b -> and a, (or c, b) https://alive2.llvm.org/ce/z/bnDobs select (~c & b), a, b -> and b, (or a, c) https://alive2.llvm.org/ce/z/k2jJHJ Differential Revision: https://reviews.llvm.org/D122152	2022-03-28 16:07:55 -04:00
Florian Hahn	e7bf2ea934	[LV] Move code to place induction increment to VPlan post-processing. This patch moves the code to set the correct incoming block for the backedge value to VPlan::execute. When generating the phi node, the backedge value is temporarily added using the pre-header as incoming block. The invalid phi node will be fixed up during VPlan::execute after main VPlan code generation. At the same time, the backedge value is also moved to the latch. This change removes the requirement to create the latch block up-front for VPWidenIntOrFpInductionRecipe::execute, which in turn will enable modeling the pre-header in VPlan. As an alternative, the increment could be modeled as separate recipe, but that would require more work and a bit of redundant code, as we need to create the step-vector during VPWidenIntOrFpInductionRecipe::execute anyways, to create the values for different parts. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121617	2022-03-28 16:20:02 +01:00
Nikita Popov	db561064f6	[GlobalOpt] Handle non-instruction MTI source (PR54572) This was reusing a cast to GlobalVariable to check for an Instruction, which means we'll try to dereference a null pointer if it's not actually a GlobalVariable. We should be casting MTI->getSource() instead. I don't think this problem is really specific to opaque pointers, but it certainly makes it a lot easier to reproduce. Fixes https://github.com/llvm/llvm-project/issues/54572.	2022-03-28 14:28:47 +02:00
Alexandros Lamprineas	8045bf9d0d	[FuncSpec] Support function specialization across multiple arguments. The current implementation of Function Specialization does not allow specializing more than one arguments per function call, which is a limitation I am lifting with this patch. My main challenge was to choose the most suitable ADT for storing the specializations. We need an associative container for binding all the actual arguments of a specialization to the function call. We also need a consistent iteration order across executions. Lastly we want to be able to sort the entries by Gain and reject the least profitable ones. MapVector fits the bill but not quite; erasing elements is expensive and using stable_sort messes up the indices to the underlying vector. I am therefore using the underlying vector directly after calculating the Gain. Differential Revision: https://reviews.llvm.org/D119880	2022-03-28 12:01:53 +01:00
Gulfem Savrun Yeniceri	ead8586645	[InstrProfiling] Add comments for no runtime hook This patch adds comments about `c7f91e227a`, and follows LLVM style guideline about nested if statements.	2022-03-26 00:26:43 +00:00
Philip Reames	f80aaa675f	[SLP] Simplify eraseInstruction [NFC] This simplifies the implementation of eraseInstruction by moving the odd-replace-users-with-undef handling back to the only caller which uses it. This handling was not obviously correct, so add the asserts which make it clear why this is safe to do at all. The result is simpler code and stronger assertions.	2022-03-25 12:01:52 -07:00
Florian Hahn	8c3281db49	[ConstraintElimination] Use AddOverflow for offset summation. Fixes an incorrect transformation due to values overflowing https://alive2.llvm.org/ce/z/uizoea	2022-03-25 18:08:24 +00:00
Philip Reames	48cc9287f5	Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3) The original commit exposed several missing dependencies (e.g. latent bugs in SLP scheduling). Most of these were fixed over the weekend and have had several days to bake. The last was fixed this morning after being noticed in manual review of test changes yesterday. See the review thread for links to each change. Original commit message follows: SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-03-25 10:39:23 -07:00
Philip Reames	ec858f0201	[SLP] Optimize stacksave dependence handling [NFC] After writing the commit message for 4b1bace28, realized that the mentioned optimization was rather straight forward. We already have the code for scanning a block during region initialization, we can simply keep track if we've seen a stacksave or stackrestore. If we haven't, none of these dependencies are relevant and we can avoid the relatively expensive scans entirely.	2022-03-25 10:04:10 -07:00
Philip Reames	a16308c282	[SLP] Explicit track required stacksave/alloca dependency (try 3) This is an extension of commit b7806c to handle one last case noticed in test changes for D118538. Again, this is thought to be a latent bug in the existing code, though this time I have not managed to reduce tests for the original algoritthm. The prior attempt had failed to account for this case: %a = alloca i8 stacksave stackrestore store i8 0, i8* %a If we allow '%a' to reorder into the stacksave/restore region, then the alloca will be deallocated before the use. We will have taken a well defined program, and introduced a use-after-free bug. There's also an inverse case where the alloca originally follows the stackrestore, and we need to prevent the reordering it above the restore. Compile time wise, we potentially do an extra scan of the block for each alloca seen in a bundle. This is significantly more expensive than the stacksave rooted version and is why I'd tried to avoid this in the initial patch. There is room to optimize this (by essentially caching a "has stacksave" bit per block), but I'm leaving that to future work if it actually shows up in practice. Since allocas in bundles should be rare in practice, I suspect we can defer the complexity for a long while.	2022-03-25 10:04:10 -07:00
Gulfem Savrun Yeniceri	c7f91e227a	[InstrProfiling] No runtime hook for unused funcs CoverageMappingModuleGen generates a coverage mapping record even for unused functions with internal linkage, e.g. static int foo() { return 100; } Clang frontend eliminates such functions, but InstrProfiling pass still pulls in profile runtime since there is a coverage record. Fuchsia uses runtime counter relocation, and pulling in profile runtime for unused functions causes a linker error: undefined hidden symbol: __llvm_profile_counter_bias. Since `389dc94d4b`, we do not hook profile runtime for the binaries that none of its translation units have been instrumented in Fuchsia. This patch extends that for the instrumented binaries that consist of only unused functions. Differential Revision: https://reviews.llvm.org/D122336	2022-03-25 17:03:03 +00:00
Florian Hahn	e47d220230	[LV] Use getVectorLoopRegion to retrieve header. (NFC) Update all places that currently assume the entry block to the plan is also the vector loop header to use getVectorLoopRegion instead. getVectorLoopRegion will keep doing the right thing when the pre-header is modeled explicitly (and becomes the new entry block in the plan).	2022-03-25 16:57:12 +00:00
Hongtao Yu	7a316c0a1f	[CSSPGO] Turn on profi and ext-tsp when using probe-based profile. Probe-based profile leads to a better performance when combined with profi and ext-tsp block layout. I'm turning them on by default. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122442	2022-03-25 09:09:21 -07:00
Philip Reames	d9756fa723	[slp] Factor out a lambda to avoid uplicating code a third time in upcoming patch [nfc]	2022-03-25 09:02:39 -07:00
Simon Pilgrim	6a094a6264	[InstCombine] SimplifyDemandedUseBits - remove ashr node if we only demand known sign bits We already do this for SelectionDAG, but we're missing it here. Noticed while re-triaging PR21929 Differential Revision: https://reviews.llvm.org/D122340	2022-03-25 15:39:08 +00:00
Johannes Doerfert	a81fff8afd	Reapply "[Intrinsics] Add `nocallback` to the default intrinsic attributes" This reverts commit `c5f789050d` and reapplies `7aea3ea8c3` with additional test changes.	2022-03-25 09:36:50 -05:00
Roman Lebedev	f6b60b3b79	[SimplifyCFG] `FoldBranchToCommonDest()`: allow branch-on-select This whole check is bogus, it's some kind of a profitability check. For now, simply extend it to not only allow branch-on-binary-ops, but also on poison-safe logic ops. Refs. https://github.com/llvm/llvm-project/issues/53861 Refs. https://github.com/llvm/llvm-project/issues/54553	2022-03-25 16:12:17 +03:00
Simon Pilgrim	1a943923b8	[Utils] stripDebugifyMetadata - use cast<> instead of dyn_cast_or_null<> to avoid dereference of nullptr The pointer is dereferenced immediately, so assert the cast is correct instead of returning nullptr	2022-03-25 10:25:04 +00:00
Fraser Cormack	2e44b7872b	[VectorCombine] Insert addrspacecast when crossing address space boundaries We can not bitcast pointers across different address spaces. This was previously fixed in D89577 but then in D93229 an enhancement was added which peeks further through the ponter operand, opening up the possibility that address-space violations could be introduced. Instead of bailing as the previous fix did, simply insert an addrspacecast cast instruction. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D121787	2022-03-24 19:08:08 +00:00
Johannes Doerfert	c5f789050d	Revert "[Intrinsics] Add `nocallback` to the default intrinsic attributes" This reverts commit `7aea3ea8c3` as it breaks the buildbots. I didn't see these failures in the pre-merge checks, looking into it.	2022-03-24 14:04:41 -05:00
Johannes Doerfert	7aea3ea8c3	[Intrinsics] Add `nocallback` to the default intrinsic attributes Most intrinsics, especially "default" ones, will not call back into the IR module. `nocallback` encodes this nicely. As it was not used before, this patch also makes use of `nocallback` in the Attributor which results in many more `norecurse` deductions. Tablegen part is mechanical, test updates by script. Differential Revision: https://reviews.llvm.org/D118680	2022-03-24 13:50:54 -05:00
Simon Pilgrim	597aefa89c	Fix unused variable warning by embedding inside assertion	2022-03-24 17:41:24 +00:00
Florian Hahn	46432a0088	[VPlan] Add VPWidenPointerInductionRecipe. This patch moves pointer induction handling from VPWidenPHIRecipe to its own recipe. In the process, it adds all information required to generate code for pointer inductions without relying on Legal to access the list of induction phis. Alternatively VPWidenPHIRecipe could also take an optional pointer to InductionDescriptor. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121615	2022-03-24 14:58:45 +00:00
Sanjay Patel	5dbb53b1b4	[InstCombine] merge shuffled vector negate and multiply Add the "(0 - X) --> (X * -1)" reverse identity to the list of alternate form binops. We need a little hack to make the existing logic work because it does not expect to move constants from op0 to op1, but the code comment hopefully makes that clear. I don't think there are any other identities like that. Fixes #54364 Differential Revision: https://reviews.llvm.org/D122390	2022-03-24 10:25:16 -04:00
Djordje Todorovic	9dbc687a5e	NFC: [LICM] Update some stale comments After removing the MaybePromotable, some comments became stale. This improves them. Differential Revision: https://reviews.llvm.org/D122319	2022-03-24 14:37:20 +01:00
Alexey Bataev	20973c0841	[SLP][NFC]Fix param name in comments, NFC.	2022-03-24 05:58:42 -07:00
Dávid Bolvanský	4397504c2d	[NFCI] Fix set-but-unused warning in InstCombineAddSub.cpp	2022-03-24 08:33:40 +01:00
Dávid Bolvanský	470e1d9584	[NFCI] Fix set-but-unused warning in AddressSanitizer.cpp	2022-03-24 08:13:29 +01:00
Julian Lettner	64902d335c	Reland "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121736	2022-03-23 18:36:55 -07:00
Vasileios Porpodas	39aa202aff	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit `e6ead19b77`.	2022-03-23 18:32:17 -07:00
Zequan Wu	581dc3c729	Revert "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" This reverts commit `22570bac69`.	2022-03-23 16:11:54 -07:00
Johannes Doerfert	ee94a4a3d0	[Attributor][FIX] Avoid endless recursion, simple case There is potential for endless recursion if we try to determine the underlying objects of a load, just to end up with the load as underlying object. A proper solution will require us to pass a visited set around. This will happen as we cleanup genericValueTraversal soon.	2022-03-23 15:55:32 -05:00
minglotus-6	e2074de6a8	[ProfSampleLoader] When disable-sample-loader-inlining is true, merge profiles of inlined instances to outlining versions. When --disable-sample-loader-inlining is true, skip inline transformation, but merge profiles of inlined instances to outlining versions. Differential Revision: https://reviews.llvm.org/D121862	2022-03-23 13:02:48 -07:00
chenglin.bi	52f323d0f1	[InstCombine] Fold abs of known negative operand when source is sub When abs source comes from (x - y), check if a "x > y" dominating condition exists. Fixes #54132 Differential Revision: https://reviews.llvm.org/D122013	2022-03-23 15:21:33 -04:00
Arthur Eubanks	9bd66b312c	[PassManager][Coroutine] Run passes under -O0 conditionally and run GlobalDCE CoroSplit lowers various coroutine intrinsics. It's a CGSCC pass and CGSCC passes don't run on unreachable functions. Normally GlobalDCE will come along and delete unreachable functions, but we don't run GlobalDCE under -O0, so an unreachable function with coroutine intrinsics may never have CoroSplit run on it. This patch adds GlobalDCE when coroutines intrinsics are present. It also now runs all coroutine passes conditional when coroutine intrinsics are present. This should also solve the -O0 regression reported in D105877 due to LazyCallGraph construction. Fixes https://github.com/llvm/llvm-project/issues/54117 Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D122275	2022-03-23 11:03:26 -07:00
Arthur Eubanks	e6ead19b77	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash." This reverts commit `27bd8f9492`. Causes crashes, see comments in D121973	2022-03-23 10:57:45 -07:00
Nikita Popov	29fada4a3d	[EarlyCSE] Don't eagerly optimize MemoryUses EarlyCSE currently optimizes all MemoryUses upfront. However, EarlyCSE only actually queries the clobbering memory access for a subset of uses, namely those where a CSE candidate has already been identified. Delaying use optimization to the clobber query improves compile-time in practice. This change is not NFC because EarlyCSE has a limit on the number of clobber queries (EarlyCSEMssaOptCap), in which case it falls back to the defining access. The defining access for uses will now no longer coincide with the optimized access. If there are performance regressions from this change, we should be able to address them by raising this limit. Differential Revision: https://reviews.llvm.org/D121987	2022-03-23 16:47:35 +01:00
Sanjay Patel	0fcff69bcb	[InstCombine] try to narrow shifted bswap-of-zext (2nd try) The first attempt at this missed a validity check. This version includes a test of the narrow source type for modulo-16-bits. Original commit message: This is the IR counterpart to `370ebc9d9a` which provided a bswap narrowing fix for issue #53867. Here we can be more general (although I'm not sure yet what would happen for illegal types in codegen - too rare to worry about?): https://alive2.llvm.org/ce/z/3-CPfo This will be more effective if we have moved the shift after the bswap as proposed in D122010, but it is independent of that patch. Differential Revision: https://reviews.llvm.org/D122166	2022-03-23 11:28:37 -04:00
Alexandros Lamprineas	a687f96b0f	[FuncSpec][NFC] Clang-format the source code and fix debug typo.	2022-03-23 14:39:58 +00:00
Nikita Popov	ba36556145	[InstrProfiling] Account for missing bitcast/GEP This code is supposed to clean up a constexpr bitcast/GEP, but with opaque pointers this ends up dropping references to the global.	2022-03-23 15:39:39 +01:00
serge-sans-paille	1b89c83254	Cleanup includes: Transforms/Instrumentation & Transforms/Vectorize Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122181	2022-03-23 11:06:13 +01:00
Nathan Chancellor	4e0008dcbe	Revert "[InstCombine] try to narrow shifted bswap-of-zext" This reverts commit `9e9bda2e8f`. This causes a backend error when building the Linux kernel for arm64. See https://reviews.llvm.org/D122166 for a simplified reproducer.	2022-03-22 17:32:33 -07:00
Vasileios Porpodas	27bd8f9492	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit `f7d7d2a08d`.	2022-03-22 16:41:55 -07:00
Philip Reames	7abefc4222	[instcombine] Fold away memset/memmove from otherwise unused alloca The motivation for this is that while both memcpyopt and dse will catch this case, both are limited by MSSA's walk back threshold when finding clobbers. As such, if you have a memcpy of an otherwise dead alloca placed towards the end of a long basic block with lots of other memory instructions, it would be missed. This is a bit undesirable for such an "obviously" useless bit of code. As noted in comments, we should probably generalize instcombine's escape analysis peephole (see visitAllocInst) to allow read xor write. Doing that would subsume this code in a more general way, but is also a more involved change. For the moment, I went with the easiest fix.	2022-03-22 13:48:48 -07:00
Arthur Eubanks	f7d7d2a08d	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads."" This reverts commit `79613185d3`. Causes crashes, see comments in https://reviews.llvm.org/D121973.	2022-03-22 13:33:49 -07:00
Sanjay Patel	ccf8c969c2	[InstCombine] reorder code, fix formatting; NFC The affected code can be updated to solve #54364, so make some cosmetic diffs before real changes.	2022-03-22 16:33:01 -04:00
Florian Hahn	50c8588e44	[LV] Remove Loop argument from createInductionResumeValues (NFCI). createInductionResumeValues only uses its loop argument only to get the pre-header, but the pre-header is already known (we created/cached it earlier). Remove the unneeded loop argument.	2022-03-22 14:23:12 +00:00
Sanjay Patel	60820e53ec	[InstCombine] try to canonicalize logical shift after bswap When shifting by a byte-multiple: bswap (shl X, C) --> lshr (bswap X), C bswap (lshr X, C) --> shl (bswap X), C This is an IR implementation of a transform suggested in D120648. The "swaps cancel" test models the motivating optimization from that proposal. Alive2 checks (as noted in the other review, we could use knownbits to handle shift-by-variable-amount, but that can be an enhancement patch): https://alive2.llvm.org/ce/z/pXUaRf https://alive2.llvm.org/ce/z/ZnaMLf Differential Revision: https://reviews.llvm.org/D122010	2022-03-22 09:10:55 -04:00
Djordje Todorovic	91ea247039	[Debugify] Use DebugifyLevel in Debugify original mode Before this patch the DebugifyLevel option was used for the synthetic mode, so after this, it will be used in the original mode as well. Differential Revision: https://reviews.llvm.org/D115623	2022-03-22 14:04:56 +01:00
Nikita Popov	afb526b3f4	[LICM] Handle store of pointer to itself (PR54495) Rather than iterating over users and comparing operands, iterate over uses and check operand number. Otherwise, we'll end up promoting a store twice if it has two equal operands. This can only happen with opaque pointers, as otherwise both operands differ by a level of indirection, so a bitcast would have to be involved. Fixes https://github.com/llvm/llvm-project/issues/54495.	2022-03-22 14:00:07 +01:00
Sanjay Patel	9e9bda2e8f	[InstCombine] try to narrow shifted bswap-of-zext This is the IR counterpart to `370ebc9d9a` which provided a bswap narrowing fix for issue #53867. Here we can be more general (although I'm not sure yet what would happen for illegal types in codegen - too rare to worry about?): https://alive2.llvm.org/ce/z/3-CPfo This will be more effective if we have moved the shift after the bswap as proposed in D122010, but it is independent of that patch. Differential Revision: https://reviews.llvm.org/D122166	2022-03-22 08:22:30 -04:00
Djordje Todorovic	73777b4c35	[Debugify] Optimize debugify original mode Before we start addressing the issue with having a lot of false positives when using debugify in the original mode, we have made a few patches that should speed up the execution of the testing utility Passes. For example, when testing a large project (let's say LLVM project itself), we can face a lot of potential DI issues. Usually, we use -verify-each-debuginfo-preserve (that is very similar to -debugify-each) -- it collects DI metadata before each Pass, and after the Pass it checks if the Pass preserved the DI metadata. However, we can speed up this process, since we don't need to collect DI metadata before each Pass -- we could use the DI metadata that are collected after the previous Pass from the pipeline as an input for the next Pass. This patch speeds up the utility for ~2x. Differential Revision: https://reviews.llvm.org/D115622	2022-03-22 12:14:00 +01:00
serge-sans-paille	a53b689f0c	Fix missing include under -DEXPENSIVE_CHECK Regression introduced by `f1985a3f85`	2022-03-22 10:37:56 +01:00
serge-sans-paille	f1985a3f85	Cleanup includes: Transforms/IPO Preprocessor output diff: -238205 lines Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122183	2022-03-22 10:06:28 +01:00
Chuanqi Xu	902f4708fe	[NFC] [Coroutines] Remove unnecessary check and constraints on SmallVector The CoroSplit pass would check the existence of coroutine intrinsic before starting work. It is not necessary and wasteful since it would iterate over the Module. This patch also removes the constraint on the corresponding of the SmallVector for the possible coroutines in the Modules. The original value is 4. Given coroutines is used actually in practice. 4 is really relatively a low threshold.	2022-03-22 14:24:46 +08:00
Vasileios Porpodas	79613185d3	Recommit "[SLP] Fix lookahead operand reordering for splat loads." Original review: https://reviews.llvm.org/D121354 The original commit `9136145eb0` broke the build on several targets. Differential Revision: https://reviews.llvm.org/D121973	2022-03-21 15:57:32 -07:00
Hirochika Matsumoto	86f970e595	[IROutliner][NFC] Fix typo in doc of findOrCreatePHIInBlock Typo Fix in Documentation Author: hkmatsumoto Reviewers: AndrewLitteken Differential Revision: https://reviews.llvm.org/D121627	2022-03-21 12:34:20 -05:00
Philip Reames	ee7324b898	Rename mayBeMemoryDependent to mayHaveNonDefUseDependency [nfc]	2022-03-21 10:01:40 -07:00
Andrew Litteken	4e500df89e	[IROutliner] Fix phi nodes when self referential within block but doesn't contain branch When outlining a phi node, if the the incoming branch is a block contained in the region and the branch from that block is not outlined, we create broken code. The fix is to recognize when that branch from the included incoming block is not contained, and ignore the region. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D121311	2022-03-21 11:05:15 -05:00
psamolysov-intel	2ed030ba88	[InferAddressSpaces][NFC] Small code improvements for the InferAddressSpaces pass There is a bunch of code improvements in the patch: marking as const everything what can be const and fixing some typos in comments. Also the patch removes the shadowing parameter TTI from the rewriteWithNewAddressSpaces method, the TTI parameter is not required because the same field is in the class. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D121671	2022-03-21 11:03:12 -05:00
Alexey Bataev	79a182371e	[SLP]Make stricter check for instructions that do not require scheduling. Need to check that the instructions with external operands can be reordered safely before actualy exclude them from the scheduling.	2022-03-21 06:09:12 -07:00
Sophia	72bde608d2	[LV] Fix typo in comment Reviewed by: fhahn (Florian Hahn) Differential Revision: https://reviews.llvm.org/D121781	2022-03-21 20:30:05 +08:00
Florian Hahn	0ebac76e6e	[LV] Remove unneeded Loop argument from completeLoopSkeleton. (NFCI) completeLoopSkeleton only uses its loop argument only to get the pre-header, but the pre-header is already known (we created/cached it earlier). Remove the unneeded loop argument.	2022-03-21 10:07:25 +00:00
Andrew Litteken	38e8880e93	[IROutliner] Do not outlined from functions with optnone Since the IROutliner is performing an optimization, it should not outline from functions explicitly marked with optnone. This adds an extra check and test to make sure this does not occur. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D121567	2022-03-20 23:39:23 -05:00
Florian Hahn	487629cc61	[LV] Remove dead Loop argument from emitMemRuntimeChecks. (NFC)	2022-03-20 21:01:15 +00:00
Philip Reames	b7806c8b37	[SLP] Explicit track required stacksave/alloca dependency The semantics of an inalloca alloca instruction requires that it not be reordered with a preceeding stacksave intrinsic call. Unfortunately, there's no def/use edge or memory dependence edge. (THe memory point is slightly subtle, but in general a new allocation can't alias with a call which executes strictly before it comes into existance.) I'd tried to tackle this same case previously in `689babdf6`, but the fix chosen there turned out to be incomplete. As such, this change contains a fully revert of the first fix attempt. This was noticed when investigating problems which surfaced with D118538, but this is definitely an existing bug. This time around, I managed to reduce a couple of additional cases, including one which was being actively miscompiled even without the new scheduling change. (See test diffs) Compile time wise, we only spend extra time when seeing a stacksave (rare), and even then we walk the block at most once per schedule window extension. Likely a non-issue.	2022-03-20 13:58:45 -07:00
Kazu Hirata	bce1bf0ee2	[Transform] Apply clang-tidy fixes for readability-redundant-smartptr-get (NFC)	2022-03-20 10:41:22 -07:00
Philip Reames	6253b77da9	[SLP] Respect control dependence within a block during scheduling This fixes an active miscompile visible in the test changes. The basic problem is that the scheduling dependency graph didn't have any edges for control dependence within a single basic block. The result is that we could (and in some rare cases did) perform reorderings within a block which could introduce new undefined behavior along paths which didn't previously contain any. Impact wise, we have two major cases where control is not guaranteed to reach a later instruction in the block: may throw calls, and calls containing infinite loops. * The former case was mostly covered by the memory dependencies, and to trigger require a function which can throw, but not write to memory. In theory, such a case is possible, but not likely in practice. * The later case is likely more of an issue in practice. After this code was first written, we changed the IR semantics to allow well defined infinite loops without satisifying mustprogress. Even for C/C++ - which do imply mustprogress - recent changes to how we treat atomics (e.g. an atomic read does not always imply a write) could expose this issue. I'm a bit shocked we don't seem to have a bug report which hit this in real code actually. Compile time wise, this results in a single extra scan of the scheduling window in the common case. Since we stop scanning at the next instruction which isn't guaranteed to execute, no matter what order we traverse instructions in, we scan the block once. The exception to this is that when we extend the scheduling window downwards, we invalidate all dependencies, and thus rescan. So the potentially expensive case is when we a call in a big schedule window which is frequently extended. We could optimize this case (by caching the last instruction not guaranteeed to transfer execution and scanning only the extended window) and starting there), but I decided to leave the complexity until it mattered. That same case is already degenerate with memory dependences which is more expensive than the control dependence scan. We could also consider combining the memory dependence and control dependence sets to reduce memory usage, but since it complicates the code slightly and makes debugging a bit harder, I went with the simplest scheme for now. This was noticed while trying to understand the failures reported against D118538, but is not otherwise related to that change.	2022-03-19 13:36:24 -07:00
Florian Hahn	1a820ff039	[LV] Remove unnecessary uses of Loop* (NFC). Update functions that previously took a loop pointer but only to get the pre-header. Instead, pass the block directly. This removes the requirement for the loop object to be created up-front.	2022-03-19 20:18:47 +00:00
Johannes Doerfert	4166738c38	[OpenMP][FIX] Do not crash when kernels are debug wrapper functions With debug information enabled (-g) Clang will wrap the actual target region into a new function which is called from the "kernel". The problem is that the "kernel" is now basically a wrapper without all the things we expect. More importantly, if we end up asking for an AAKernelInfo for the "target region function" we might try to turn it into SPMD mode. That used to cause an assertion as that function doesn't have an appropriately named `_exec_mode` global. While the global is going away soon we still need to make sure to properly handle this case, e.g., perform optimizations reliably. Differential Revision: https://reviews.llvm.org/D122043	2022-03-19 14:15:55 -05:00
Fangrui Song	c6692f819e	[GlobalOpt] Don't replace alias with aliasee if either alias/aliasee may be preemptible Generalize D99629 for ELF. A default visibility non-local symbol is preemptible in a -shared link. `isInterposable` is an insufficient condition. Moreover, a non-preemptible alias may be referenced in a sub constant expression which intends to lower to a PC-relative relocation. Replacing the alias with a preemptible aliasee may introduce a linker error. Respect dso_preemptable and suppress optimization to fix the abose issues. With the change, `alias = 345` will not be rewritten to use aliasee in a `-fpic` compile. ``` int aliasee; extern int alias __attribute__((alias("aliasee"), visibility("hidden"))); void foo() { alias = 345; } // intended to access the local copy ``` While here, refine the condition for the alias as well. For some binary formats like COFF, `isInterposable` is a sufficient condition. But I think canonicalization for the changed case has little advantage, so I don't bother to add the `Triple(M.getTargetTriple()).isOSBinFormatELF()` or `getPICLevel/getPIELevel` complexity. For instrumentations, it's recommended not to create aliases that refer to globals that have a weak linkage or is preemptible. However, the following is supported and the IR needs to handle such cases. ``` int aliasee __attribute__((weak)); extern int alias __attribute__((alias("aliasee"))); ``` There are other places where GlobalAlias isInterposable usage may need to be fixed. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D107249	2022-03-18 14:17:05 -07:00
Philip Reames	1093949cff	[SLP] Add comment clarifying assumption that tripped me up [NFC] I keep thinking this assumption is probably exploitable for a bug in the existing implementation, but all of my attempts at writing a test case have failed. So for the moment, just document this very subtle assumption.	2022-03-18 11:40:19 -07:00
Kazu Hirata	3e0f7c7881	[Vectorize] Fix an 'unused function' warning This patch fixes: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3917:13: error: unused function 'needToScheduleSingleInstruction' [-Werror,-Wunused-function]	2022-03-18 11:24:57 -07:00
Kazu Hirata	b3d8c0d069	[Vectorize] Fix an 'unused variable' warning This patch fixes: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8148:18: error: unused variable 'SDTE' [-Werror,-Wunused-variable]	2022-03-18 11:24:54 -07:00
Nick Desaulniers	e1bae23f6f	[SCCP] do not clean up dead blocks that have their address taken [SCCP] do not clean up dead blocks that have their address taken Fixes a crash observed in IPSCCP. Because the SCCPSolver has already internalized BlockAddresses as Constants or ConstantExprs, we don't want to try to update their Values in the ValueLatticeElement. Instead, continue to propagate these BlockAddress Constants, continue converting BasicBlocks to unreachable, but don't delete the "dead" BasicBlocks which happen to have their address taken. Leave replacing the BlockAddresses to another pass. Fixes: https://github.com/llvm/llvm-project/issues/54238 Fixes: https://github.com/llvm/llvm-project/issues/54251 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121744	2022-03-18 11:02:15 -07:00
Philip Reames	8f108c32bc	Revert "[SLP] Optionally preserve MemorySSA" This reverts commit `1cfa986d68`. See https://github.com/llvm/llvm-project/issues/54256 for why I'm discontinuing the project. Seperately, it turns out that while this patch does correctly preserve MSSA, it's correct only at the end of the pass; not between vectorization attempts. Even if we decide to resurrect this, we'll need to fix that before reapplying.	2022-03-18 10:45:59 -07:00
Florian Mayer	078b546555	[HWASan] do not replace lifetime intrinsics with tagged address. Quote from the LLVM Language Reference If ptr is a stack-allocated object and it points to the first byte of the object, the object is initially marked as dead. ptr is conservatively considered as a non-stack-allocated object if the stack coloring algorithm that is used in the optimization pipeline cannot conclude that ptr is a stack-allocated object. By replacing the alloca pointer with the tagged address before this change, we confused the stack coloring algorithm. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D121835	2022-03-18 10:39:51 -07:00
Florian Mayer	dbc918b649	Revert "[HWASan] do not replace lifetime intrinsics with tagged address." Failed on buildbot: /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/llc: error: : error: unable to get target for 'aarch64-unknown-linux-android29', see --version and --triple. FileCheck error: '<stdin>' is empty. FileCheck command line: /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/FileCheck /home/buildbot/buildbot-root/llvm-project/llvm/test/Instrumentation/HWAddressSanitizer/stack-coloring.ll --check-prefix=COLOR This reverts commit `208b923e74`.	2022-03-18 10:04:48 -07:00
Florian Hahn	5ab421fb4e	[LICM] Add allowspeculation pass options. This adds a new option to control AllowSpeculation added in D119965 when using `-passes=...`. This allows reproducing #54023 using opt. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D121944	2022-03-18 16:51:57 +00:00
Florian Mayer	208b923e74	[HWASan] do not replace lifetime intrinsics with tagged address. Quote from the LLVM Language Reference If ptr is a stack-allocated object and it points to the first byte of the object, the object is initially marked as dead. ptr is conservatively considered as a non-stack-allocated object if the stack coloring algorithm that is used in the optimization pipeline cannot conclude that ptr is a stack-allocated object. By replacing the alloca pointer with the tagged address before this change, we confused the stack coloring algorithm. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D121835	2022-03-18 09:45:05 -07:00
Nikita Popov	ab2284a643	[LowerConstantIntrinsics] Make TLI a required dependency The way the pass is actually used in the optimization pipeline, TLI will be available, but this is not the case when running just -lower-constant-intrinsics in tests, which ends up being quite confusing. Require TLI unconditionally, as we usually do.	2022-03-18 14:59:18 +01:00
Nikita Popov	fc8946fae7	[InstCombine] Remove integer SPF of SPF folds (NFCI) Now that we canonicalize to intrinsics, these folds should no longer be needed. Only one fold that also applies to floating-point min/max is retained.	2022-03-18 10:20:48 +01:00
Nikita Popov	f96428e16d	[MemorySSA] Don't optimize uses during construction This changes MemorySSA to be constructed in unoptimized form. MemorySSA::ensureOptimizedUses() can be called to optimize all uses (once). This should be done by passes where having optimized uses is beneficial, either because we're going to query all uses anyway, or because we're doing def-use walks. This should help reduce the compile-time impact of MemorySSA for some use cases (the reason why I started looking into this is D117926), which can avoid optimizing all uses upfront, and instead only optimize those that are actually queried. Actually, we have an existing use-case for this, which is EarlyCSE. Disabling eager use optimization there gives a significant compile-time improvement, because EarlyCSE will generally only query clobbers for a subset of all uses (this change is not included in this patch). Differential Revision: https://reviews.llvm.org/D121381	2022-03-18 09:56:16 +01:00
Florian Hahn	4a699ae9c6	[LoopSimplifyCFG] Check predecessors of exits before marking them dead. LoopSimplifyCFG may process loops that are not in loop-simplify/canonical form. For loops not in canonical form, exit blocks may be reachable from non-loop blocks and we cannot consider them as dead if they only are not reachable from the loop itself. Unfortunately the smallest test I could come up with requires running multiple passes: -passes='loop-mssa(loop-instsimplify,loop-simplifycfg,simple-loop-unswitch)' The reason is that loops are canonicalized at the beginning of loop pipelines, so a later transform has to break canonical form in a way that breaks LoopSimplifyCFG's dead-exit analysis. Alternatively we could try to require all loop passes to maintain canonical form. That in turn would also require additional verification. Fixes #54023, #49931. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121925	2022-03-18 08:54:44 +00:00
Andrew Wei	0af3e6a22d	[InstCombine] Sink instructions with multiple users in a successor block. This patch tries to sink instructions when they are only used in a successor block. This is a further enhancement patch based on Anna's commit: D109700, which allows sinking an instruction having multiple uses in a single user. In this patch, sink instructions with multiple users in a single successor block will be supported. It could fix a known issue from rust: https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610 Reviewed By: nikic, reames Differential Revision: https://reviews.llvm.org/D121585	2022-03-18 11:53:45 +08:00
Vasileios Porpodas	9136145eb0	Revert "[SLP] Fix lookahead operand reordering for splat loads." due to build failures This reverts commit `5efa78985b`.	2022-03-17 18:22:04 -07:00
Vasileios Porpodas	5efa78985b	[SLP] Fix lookahead operand reordering for splat loads. Splat loads are inexpensive in X86. For a 2-lane vector we need just one instruction: `movddup (%reg), xmm0`. Using the standard Splat score leads to worse code. This patch adds a new score dedicated for splat loads. Please note that a splat is usually three IR instructions: - It is usually a load and 2 inserts: %ld = load double, double* %gep %ins1 = insertelement <2 x double> poison, double %ld, i32 0 %ins2 = insertelement <2 x double> %ins1, double %ld, i32 1 - But it can also be a load, an insert and a shuffle: %ld = load double, double* %gep %ins = insertelement <2 x double> poison, double %ld, i32 0 %shf = shufflevector <2 x double> %ins, <2 x double> poison, <2 x i32> zeroinitializer Because of this some of the lit tests contain more IR instructions. Differential Revision: https://reviews.llvm.org/D121354	2022-03-17 18:05:54 -07:00
Paul Kirth	964398ccb1	Revert "Revert "Revert "[misexpect] Re-implement MisExpect Diagnostics""" This reverts commit `6cf560d69a`.	2022-03-18 00:21:33 +00:00
Paul Kirth	6cf560d69a	Revert "Revert "[misexpect] Re-implement MisExpect Diagnostics"" I mistakenly reverted my commit, so I'm relanding it. This reverts commit `10866a1df4`.	2022-03-18 00:04:22 +00:00
Paul Kirth	10866a1df4	Revert "[misexpect] Re-implement MisExpect Diagnostics" This reverts commit `e7749d4713`.	2022-03-17 23:54:26 +00:00
Paul Kirth	e7749d4713	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Differential Revision: https://reviews.llvm.org/D115907	2022-03-17 23:46:23 +00:00
Johannes Doerfert	4308fdf83b	[Attributor] Remove more non-deterministic behavior and debug output	2022-03-17 17:42:32 -05:00
Johannes Doerfert	59a6b668ab	[OpenMP][FIX] Initialize member to avoid undefined value in debug output	2022-03-17 17:42:32 -05:00
Johannes Doerfert	88ea86c369	[Attributor][FIX] Remove reference into map that might dangle The reference was taken and the map was modified after. This can (and did) lead to dangling pointers and all sorts of problems afterwards.	2022-03-17 17:42:32 -05:00
Ellis Hoag	f6b5142ac2	[AlwaysInliner] Emit inline remark only when successful Failures in `InlineFunction()` are caught after D121722, but `emitInlinedIntoBasedOnCost()` should only be called when inlining is successful. This also removes an unnecessary call to `shouldInline()` which always returned `InlineCost::getAlways()`. Reviewed By: kyulee, nikic Differential Revision: https://reviews.llvm.org/D121946	2022-03-17 15:40:24 -07:00
Kyungwoo Lee	ddb85f34f5	[ObjCARC] Fix non-determinism We often failed in the assertion, non-deterministically with a large IR: ``` Assertion `notDifferentParent(LocA.Ptr, LocB.Ptr) && "BasicAliasAnalysis doesn't support interprocedural queries." ``` Looking at the comment in https://reviews.llvm.org/D87806, it appears it's actually a module pass for new PM while the legacy PM still works as a function pass. The fix is to align the same behavior in between new PM and old PM, which initializes ObjCARCContract for each function. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D121949	2022-03-17 15:01:09 -07:00
Andrew Litteken	f7d90ad57b	[IROutliner] Make sure that loop debug info is stripped. As pointed out in https://github.com/llvm/llvm-project/issues/54155#issuecomment-1057465479, there was a crash when loop info was being outlined. It was not being properly stripped and adjusted, so would point to the wrong location. This uses similar logic found in the CodeExtractor to adjust the loop debug info. Reviewer: fhahn, paquette Differential Revision: https://reviews.llvm.org/D120869	2022-03-17 14:41:53 -06:00
Alexey Bataev	d65cc85977	[SLP]Do not schedule instructions with constants/argument/phi operands and external users. No need to schedule entry nodes where all instructions are not memory read/write instructions and their operands are either constants, or arguments, or phis, or instructions from others blocks, or their users are phis or from the other blocks. The resulting vector instructions can be placed at the beginning of the basic block without scheduling (if operands does not need to be scheduled) or at the end of the block (if users are outside of the block). It may save some compile time and scheduling resources. Differential Revision: https://reviews.llvm.org/D121121	2022-03-17 11:03:45 -07:00
Julian Lettner	22570bac69	Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121736	2022-03-17 10:47:13 -07:00
Ellis Hoag	84c6689b15	[AlwaysInliner] Check inliner errors even without assserts When we build clang without asserts we should still check the result of `InlineFunction()` to be sure there wasn't an error. Otherwise we could incorrectly merge attributes in the next line. This also removes a redundent call to `getCaller()`. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121722	2022-03-17 10:16:23 -07:00
Fraser Cormack	fe74183564	[Coroutines][NFC] Format line to 80 cols	2022-03-17 15:34:24 +00:00
Marco Elver	cbe1e67ead	[Instruction] Introduce getAtomicSyncScopeID() An analysis may just be interested in checking if an instruction is atomic but system scoped or single-thread scoped, like ThreadSanitizer's isAtomic(). Unfortunately Instruction::isAtomic() can only answer the "atomic" part of the question, but to also check scope becomes rather verbose. To simplify and reduce redundancy, introduce a common helper getAtomicSyncScopeID() which returns the scope of an atomic operation. Start using it in ThreadSanitizer. NFCI. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D121910	2022-03-17 14:59:37 +01:00
Florian Hahn	151c144350	[LV] Use usesScalars in widenPHIInstruction. This uses the existing VPlan helpers to check whether there are scalar uses of a phi recipe. It remove one of the few remaining dependencies on the cost model from VPlan code generation. Depends on D121612. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121613	2022-03-17 13:16:32 +00:00
Florian Hahn	a6e70e4056	[VPlan] VPInterleaveRecipe only requires the first lane of the address. VPInterleaveRecipe only uses the first lane of the address. Add onlyFirstLaneUsed implementation. This is needed for a follow-up patch. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121612	2022-03-17 11:56:43 +00:00
Nikita Popov	1dbeb64493	[SLP] Avoid unnecessary getIncomingValueForBlock() call (NFC) This code just wants to check all incoming values, we don't care care what the incoming block is here.	2022-03-17 12:23:46 +01:00
Nikita Popov	4010a7a5d0	Reapply [InstCombine] Support switch in phi to cond fold Reapply with an explicit check for multi-edges, as the expected behavior of multi-edge dominance is unclear (D120811). ----- For conditional branches, we know the value is i1 0 or i1 1 along the outgoing edges. For switches we can apply exactly the same optimization, just with the known values determined by the switch cases.	2022-03-17 10:03:09 +01:00
Alexey Bataev	150ea76543	Revert "[SLP]Do not schedule instructions with constants/argument/phi operands and external users." This reverts commit `1eeb2bfe72` to fix a bug reported in https://reviews.llvm.org/D121121	2022-03-16 13:54:59 -07:00
Florian Hahn	470a975c84	[ConstraintElimination] Add missing dominance check. When dealing with an unconditional branch, the condition can only added if BB properly dominates the successor.	2022-03-16 20:01:24 +00:00
Malhar Jajoo	a36d269658	[VPlan] Avoid collecting scalars for SVE This patch ensures scalars (except for uniforms) are no longer collected (prior to LVP planning phase) for scalable vectorization. This is to avoid the chances of generating scalarized instructions later (during LVP execute phase) as they are not supported for scalable vectorization. Relevant test has also been added. Differential Revision: https://reviews.llvm.org/D121452	2022-03-16 16:33:34 +00:00
Nikita Popov	d7cf7ec05d	[SROA] Handle over-large loads during presplitting When a load extends past the extent of the alloca, SROA will restrict the slice size to extend to the end of the alloca only. However, presplitting was asserting that the load size and the slice size match exactly, which does not hold in this case. Relax the assertion to only require that the load size is greater or equal than the slice size.	2022-03-16 15:41:11 +01:00
Florian Hahn	f473d4aa80	[ConstraintElimination] Support BBs with single successor in CanAdd. If BB has a single successor, conditions can be added safely.	2022-03-16 14:13:52 +00:00
Alexey Bataev	1eeb2bfe72	[SLP]Do not schedule instructions with constants/argument/phi operands and external users. No need to schedule entry nodes where all instructions are not memory read/write instructions and their operands are either constants, or arguments, or phis, or instructions from others blocks, or their users are phis or from the other blocks. The resulting vector instructions can be placed at the beginning of the basic block without scheduling (if operands does not need to be scheduled) or at the end of the block (if users are outside of the block). It may save some compile time and scheduling resources. Differential Revision: https://reviews.llvm.org/D121121	2022-03-16 06:05:43 -07:00
Florian Hahn	e5822ded56	[FunctionAttrs] Infer argmemonly . This patch adds initial argmemonly inference, by checking the underlying objects of locations returned by MemoryLocation. I think this should cover most cases, except function calls to other argmemonly functions. I'm not sure if there's a reason why we don't infer those yet. Additional argmemonly can improve codegen in some cases. It also makes it easier to come up with a C reproducer for `7662d1687b` (already fixed, but I'm trying to see if C/C++ fuzzing could help to uncover similar issues.) Compile-time impact: NewPM-O3: +0.01% NewPM-ReleaseThinLTO: +0.03% NewPM-ReleaseLTO+g: +0.05% https://llvm-compile-time-tracker.com/compare.php?from=067c035012fc061ad6378458774ac2df117283c6&to=fe209d4aab5b593bd62d18c0876732ddcca1614d&stat=instructions Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121415	2022-03-16 10:24:33 +00:00
Nikita Popov	20531b3a6b	[RelLookupTableConverter] Avoid querying TTI for declarations This code queries TTI on a single function, which is considered to be representative. This is a bit odd, but probably fine in practice. However, I think we should at least avoid querying declarations, which e.g. will generally lack target attributes, and for which we don't seem to ever query TTI in other places.	2022-03-16 10:39:28 +01:00
Philip Reames	1cfa986d68	[SLP] Optionally preserve MemorySSA This initial patch adds code to preserve MemorySSA through a run of SLP vectorizer. The eventual plan is to use MemorySSA to accelerate SLP's memory dependence checking, but we're a ways from that. In particular, this patch is correct, but really slow. It's being landed so that we can work incrementally in tree, not because it's expected to be useful to anyone just yet. The broader effort is being tracked in https://github.com/llvm/llvm-project/issues/54256. Its worth noting expicitly that this may not work out, and if not, we will be reverting all of the MSSA support in SLP at some point in the next few weeks. Differential Revision: https://reviews.llvm.org/D117926	2022-03-15 16:36:15 -07:00
Florian Hahn	014f5bcf7a	[FunctionAttrs] Replace MemoryAccessKind with FMRB. Update FunctionAttrs to use FunctionModRefBehavior instead MemoryAccessKind. This allows for adding support for inferring argmemonly and others, see D121415. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121460	2022-03-15 19:35:54 +00:00
Sanjay Patel	598721f866	[InstCombine] try harder to propagate 'nsz' through fneg-of-select This can be viewed as swapping the select arms: https://alive2.llvm.org/ce/z/jUvFMJ ...so we don't have the 'nsz' problem with the more general fold. This unlocks other folds for the motivating fabs example. This was discussed in issue #38828.	2022-03-15 11:05:29 -04:00
Simon Pilgrim	7e4cf582cf	[InstCombine] Add general constant support to eq/ne icmp(add(X,C1),add(Y,C2)) -> icmp(add(X,C1-C2),Y) fold A further extension for Issue #32161 For eq/ne comparisons - the sign mismatch and bounds constraints are redundant, so if the that fold fails, fallback and just fold the constants directly. https://alive2.llvm.org/ce/z/cdodNQ The loop rotation test change looks mostly benign - the backend doesn't seem to suffer? https://gcc.godbolt.org/z/dErMY78To Differential Revision: https://reviews.llvm.org/D121551	2022-03-15 14:17:38 +00:00
Simon Pilgrim	7262eacd41	Revert rG9c542a5a4e1ba36c24e48185712779df52b7f7a6 "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" Mane of the build bots are complaining: Unknown command line argument '-lower-global-dtors'	2022-03-15 13:01:35 +00:00
Nikita Popov	875782bd9e	[OpenMPOpt] Avoid pointer element type access during region merging Hardcode the function type as ParallelTask, which is the guaranteed pointee type of this runtime function argument (if pointee types exist). The elimination of the callee bitcast is left for InstCombine. Differential Revision: https://reviews.llvm.org/D120885	2022-03-15 09:52:46 +01:00
Florian Hahn	ca1b2fc9fb	[LV] Remove LoopVectorBody from InnerLoopVectorizer. (NFCI) Update places still referencing LoopVectorBody to use the vector loop to get the vector loop header. This is needed to move vector loop code-generation to VPlan completely, which in turn is needed to model pre-header & exit blocks in VPlan as well.	2022-03-15 08:22:31 +00:00
Julian Lettner	9c542a5a4e	Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121327	2022-03-14 17:51:18 -07:00
Andrew Browne	dbf8c00b09	[DFSan] Remove trampolines to unblock opaque pointers. (Reland with fix) https://github.com/llvm/llvm-project/issues/54172 Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D121250	2022-03-14 16:03:25 -07:00
Andrew Litteken	228cc2c38b	[IROutliner] Ensure merged PHINodes respect order and incoming blocks, not just incoming values When matching PHINodes when margining functions the IROutliner only checks that an incoming value exists in phi node in overall function. It doesn't check the length, the order, or that the incoming block also matches. In the given example, we see that both phi nodes have the same incoming values, but from different blocks. The fix is to to enforce stricter a match of the incoming value, and the incoming block as well when matching the created phi nodes. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D121310	2022-03-14 16:48:21 -05:00
Craig Topper	ce78e68261	[InstCombine] Fold select based logic of fcmps with same operands when FMF is present. If we have a logical and/or in select form and the true/false operand is an fcmp with poison generating FMF, we won't be able to fold it to an and/or instruction. This prevents us from optimizing the case where it is a logical operation of two fcmps with identical operands. This patch adds explicit checks for this case that doesn't rely on converting to and/or to do the optimization. It reuses the existing foldLogicOfFCmps, but adds a new flag to disable the other combine that is inside that function. FMF flags from the two FCmps are intersected using the logic added in D121243. The FIXME has been updated to indicate that we can only use a union for the non-select form. This allows us to optimize cases like this from compare-fp-3.c in the gcc torture suite with fast math. void test1 (float x, float y) { if ((x==y) && (x!=y)) link_error0(); } Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D121323	2022-03-14 14:45:07 -07:00
Nick Desaulniers	236695e70c	[IRLinker] make IRLinker::AddLazyFor optional (llvm::unique_function). NFC 2 of the 3 callsite of IRMover::move() pass empty lambda functions. Just make this parameter llvm::unique_function. Came about via discussion in D120781. Probably worth making this change regardless of the resolution of D120781. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D121630	2022-03-14 14:37:34 -07:00
Andrew Browne	edc33fa569	Revert "[DFSan] Remove trampolines to unblock opaque pointers." This reverts commit `84af90336f`.	2022-03-14 13:47:41 -07:00
Andrew Browne	84af90336f	[DFSan] Remove trampolines to unblock opaque pointers. https://github.com/llvm/llvm-project/issues/54172 Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D121250	2022-03-14 13:39:49 -07:00
Andrew Litteken	c79ab1065e	[IROutliner] Separate split PHI nodes from multiple exits by different outlinable regions. The IR Outliner is supposed to extract the outputs contained in an external phi node and place them into a phi node contained within the outlined function. However, when the output values of two outlined functions with two different output sets are contained within the same phi node, they are counted as the same exit path when first analyzed. In reality, these create two different phi nodes, creating an inconsistency, resulting in a mismatch in the expected number of output paths and a crash. This fixes that counting when analyzing the outputs by also analyzing the incoming blocks rather than just the incoming values. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D121313	2022-03-14 14:56:59 -05:00
Florian Hahn	4a0481e981	[LV] Check for users of truncated IVs, add more detailed comment. Add missing outside user check for truncated IVs. Also hoist the code in the helper with additional explanations. Fixes #54370.	2022-03-14 19:39:30 +00:00
Teresa Johnson	fee0bde4c6	[WPD] Extend checking mode to support fallback to indirect call Extend -wholeprogramdevirt-check to support both the existing trapping mode on an incorrect devirtualization, as well as a new mode to fallback to an indirect call on a mismatch. The new mode is The new mode is useful in cases where we want to enable devirtualization but cannot fully guarantee whole program visibility (e.g in the case where LTO has been disabled for a small set of objects that could potentially override virtual methods without having a symbol reference to anything in the base class including the vtable). Remove !prof and !callees metadata (which are used by indirect call promotion) from both the new direct call and the fallback indirect call (so that we don't perform another round of promotion on the latter). Also remove it from the direct call in the non-fallback cases, which was an oversight, although it didn't seem to cause any issues. Add tests for the metadata removal covering the various cases. Differential Revision: https://reviews.llvm.org/D121419	2022-03-14 10:16:28 -07:00
Andrew Litteken	3c90812f3b	[IROutliner] Avoid reusing PHINodes that have already been matched when merging outlined functions' phi node blocks When there are two external phi nodes for two different outlined regions, when compressing the created phi nodes between the two regions, the matching for the second phi node in the second region matches the first phi node created for the first region rather than the second phi node created for the first region. This adds an extra output path where there should not be one. The fix is the ignore phi nodes that have already been matched for each region. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D121312	2022-03-14 12:00:01 -05:00
Nikita Popov	8361c5da30	[SLPVectorizer] Handle external load/store pointer uses with opaque pointers In this case we may not generate a bitcast, so the new load/store becomes the external user.	2022-03-14 16:55:09 +01:00
Florian Hahn	d621ae30e2	[LV] Remove dead Loop argument from emitMinimumVector... (NFC) The argument is not used, remove it.	2022-03-14 15:47:40 +00:00
Florian Hahn	3ee2d908a9	[LV] Remove dead Loop argument from emitSCEVChecks. (NFC) The argument is not used, remove it.	2022-03-14 13:00:03 +00:00
Nikita Popov	ce6ca00a92	[CoroSplit] Avoid self-replacement With opaque pointers, the bitcast might be a no-op, and this can end up trying to replace a value with itself, which is illegal.	2022-03-14 13:53:31 +01:00
Florian Hahn	8896c36624	[LV] Do not set insert point in completeLoopSkeleton. (NFCI) The insertion point for the builder used during VPlan code generation is set during code generation. Setting the insert point here is dead code and can be removed.	2022-03-14 12:21:26 +00:00
Nikita Popov	3ec44c22b1	[DeadArgElim] Guard against function type mismatch If the call function type and function type don't match, we should consider the function live (there is effectively a bitcast sitting in between).	2022-03-14 13:03:04 +01:00
Nikita Popov	cf18ec445d	[GVN] Check load type in select PRE This is no longer implicitly guaranteed with opaque pointers.	2022-03-14 12:46:54 +01:00
Benoit Jacob	9879c555f2	Expose ScalarizerPass options to C++ (not just commandline) Context: I needed this for https://github.com/google/iree/pull/8474 . I found that TSan instrumentation expects vector sizes to be <= 16, and in my project (IREE) we have tests with higher vector sizes. That left some test functions uninstrumented, resulting in crashes as instrumented code called into them. Differential Revision: https://reviews.llvm.org/D121182	2022-03-14 12:00:35 +01:00
Florian Hahn	1c0fc1f074	[VPlan] Ensure each iv user is only visited once in transform. If a recipe has multiple uses of an IV, we crash. It causes a crash when building llvm-test-suite. Exposed by `95f76bff1c`.	2022-03-13 21:42:17 +00:00
Florian Hahn	95f76bff1c	[LV] Create & use VPScalarIVSteps for all scalar users. This patch is a follow-up to D115953. It updates optimizeInductions to also introduce new VPScalarIVStepsRecipes if an IV has both vector and scalar uses. It updates all uses that only need scalar values to use the newly created recipe for the scalar steps. This completes untangling of VPWidenIntOrFpInductionRecipe code-generation. Now the recipe only creates the widened vector values, as it says on the tin. The code to genereate IR has been moved directly to VPWidenIntOrFpInductionRecipe::execute. Note that the recipe has been updated to hold a reference to ScalarEvolution, which is needed to expand the step, until we can place the corresponding SCEV expansion in the pre-header. Depends on D120827. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D120828	2022-03-13 17:15:24 +00:00
serge-sans-paille	ed98c1b376	Cleanup includes: DebugInfo & CodeGen Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121332	2022-03-12 17:26:40 +01:00
Johannes Doerfert	85daf6973d	[Attributor] Remove capture tracker usage and follow uses explicitly Before we used the capture tracker to follow pointer uses, now we do it explicitly ourselves through the Attributor API. There are multiple benefits: For one, the boilerplate is cut down by a lot. The class, potential copies vector, etc. is all not needed anymore. We also do avoid explicitly looking through memory here, something that was duplicated and should only live in the `checkForAllUses~ helper. More importantly, as we do simplifications we need to make sure all parties are in sync when they reason about uses. The old way did not allow us to do this but the new one does as every use visiting AA goes through `checkForAllUses` now..	2022-03-11 22:56:16 -06:00
Johannes Doerfert	f44f60a297	[Attributor] Avoid replacing return operands twice As replacements will become more complex it is better to have a single AA responsible for replacing a use. Before this patch AAValueSimplify* and AAValueSimplifyReturned could both try to replace the returned value. The latter was marginally better for the old pass manager when a function was already carrying a `returned` attribute and when the context of the return instruction was important. The second shortcoming was resolved by looking for return attributes in the AAValueSimplifyCallSiteReturned initialization. The old PM impact is not concerning. This is yet another step towards the removal of AAReturnedValues, the very first AA we should now try to eliminate due to the overlapping logic with value simplification.	2022-03-11 21:55:19 -06:00
Johannes Doerfert	55a970fbd4	[Attributor][FIX] Make sure to not ignore non-load users of stores When we look through memory for a store we used to allow any other use of the memory that is reachable. This is generally OK but we need to make sure to actually let the user look at these properly. For now, we simply require loads (via exact reloads).	2022-03-11 18:41:13 -06:00
Johannes Doerfert	f3ad8cf00e	[Attributor] Cleanup manifest and liveness for CGSCC passes There was some ad-hoc handling of liveness and manifest to avoid breaking CGSCC guarantees. Things always slipped through though. This cleanup will: 1) Prevent us from manifesting any "information" outside the CGSCC. This might be too conservative but we need to opt-in to annotation not try to avoid some problematic ones. 2) Avoid running any liveness analysis outside the CGSCC. We did have some AAIsDeadFunction handling to this end but we need this for all AAIsDead classes. The reason is that AAIsDead information is only correct if we actually manifest it, since we don't (see point 1) we cannot actually derive/use it at all. We are currently trying to avoid running any AA updates outside the CGSCC but that seems to impact things quite a bit. 3) Assert, don't check, that our modifications (during cleanup) modifies only CGSCC functions.	2022-03-11 16:46:02 -06:00
Benjamin Kramer	dbc32e2aa7	[LoopUnswitch] Use SmallPtrSet instead of std::set. NFCI.	2022-03-11 19:14:34 +01:00
Florian Hahn	d3e1094473	[VPlan] Implement VPCanonicalIVPHIRecipe::onlyFirstLaneUsed. The recipe only uses the first lane of its operands. Suggested & split off D120827.	2022-03-11 18:07:26 +00:00
Johannes Doerfert	9ddb1a49ac	[Attributor][FIX] Avoid double free (and useless state copy) In an attempt to remove the memory leak we introduced a double free. The problem was that we allowed a plain copy of the state and it was actually used. The use was useless, so it is gone now. The copy constructor is gone as well. The move constructor ensures the Accesses pointers are owned by a single state, I hope. Reported by: https://lab.llvm.org/buildbot/#/builders/16/builds/25820	2022-03-11 10:10:36 -06:00
Johannes Doerfert	3570b0c5c7	[Attributor][FIX] Remove memory leak The leak was introduced when we made things deterministic. It was reported by the sanitizer buildbot: https://lab.llvm.org/buildbot/#/builders/168	2022-03-11 09:52:44 -06:00
Florian Hahn	ecea477df3	[VPlan] Helper to check if a recipe uses scalar values of op. This patch adds a helper to check if a recipe only uses scalars of a given operand. This is similar to onlyFirstLaneUsed, which was introduced earlier. By default, usesScalars falls back on onlyFirstLaneUsed. Will be used by D120828. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D120827	2022-03-11 13:41:08 +00:00
Florian Hahn	e07b899192	[FunctionAttrs] Rename addReadAttrs -> addMemoryAttrs. The addReadAttrs name is out of date, as the function also adds the writeonly attribute. addMemoryAttrs is more accurate.	2022-03-11 11:49:22 +00:00
Johannes Doerfert	e8fadafe77	[Attributor][NFCI] Make AAPointerInfo deterministic The order in which we kept accesses was non-deterministic and a debug output was a pointer value. Fixed both.	2022-03-10 23:27:47 -06:00
Johannes Doerfert	7211dbd01d	[Attributor][NFCI] Remove non-deterministic behavior and debug output	2022-03-10 23:27:47 -06:00
Sanjay Patel	3491f2f4b0	[InstCombine] replace negated operand in fcmp with 0.0 X (any pred) -X --> X (any pred) 0.0 This works with all FP values and preserves FMF. Alive2 examples: https://alive2.llvm.org/ce/z/dj6jhp This can also create one of the patterns that we match as "fabs" as shown in one of the test diffs.	2022-03-10 12:53:32 -05:00
Sanjay Patel	9fac110bf7	Revert "[InstCombine] fold fcmp with lossy casted constant" This reverts commit `9397bdc67e`. This optimization is likely to surprise programmers as seen in post-commit comments, so we should add a clang warning first (that is proposed in D121306).	2022-03-10 10:22:22 -05:00
Nikita Popov	067c035012	[GlobalOpt] Handle undef global_ctors gracefully If there are no ctors, then this can have an arbirary zero-sized value. The current code checks for null, but it could also be undef or poison. Replacing the specific null check with a check for non-ConstantArray.	2022-03-10 16:02:12 +01:00
Simon Pilgrim	808d9d260b	[InstCombine] Add vector support to icmp(add(X,C1),add(Y,C2)) -> icmp(add(X,C1-C2),Y) fold As discussed on Issue #32161 this fold can be generalized a lot more than it currently is, but this patch at least adds vector support. Differential Revision: https://reviews.llvm.org/D121358	2022-03-10 13:30:48 +00:00
Nikita Popov	479d684ba5	[Coroutines] Support opaque pointers in solveTypeName() As far as I can tell, these names are only intended to be informative, so just use a generic "PointerType" for opaque pointers. The code in solveDIType() also treats pointers as basic types (and does not try to encode the pointed-to type further), so I believe this should be fine. Differential Revision: https://reviews.llvm.org/D121280	2022-03-10 09:33:55 +01:00
Xiang1 Zhang	c31014322c	TLS loads opimization (hoist) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120000	2022-03-10 09:29:06 +08:00
Florian Mayer	0f770f4d00	[NFC] [HWASan] document why we tag Size but untag AlignedSize.	2022-03-09 16:18:04 -08:00
Michael Gottesman	0b647fc529	[debug-info] Debug salvage llvm.dbg.addr in original function that point into the coroutine frame when splitting coros. We are already doing this in the split functions while we clone. This just handles the original function. I also updated the coroutine split test to validate that we are always referring to the msg in the context object instead of in a shadow copy. rdar://83957028 Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D121324	2022-03-09 14:02:09 -08:00
Congzhe Cao	abc8ca65c3	[LoopInterchange] Detect output dependency of a store instruction with itself This patch is motivated by pr48057 where an output dependency is not detected since loop interchange did not check a store instruction with itself. Fixed that deficiency. Reviewed By: bmahjour, Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D118102	2022-03-09 15:50:27 -05:00
Alok Kumar Sharma	94823500a7	[DebugInfo][SROA] Correct debug info for global variables in case of SROA The existing handling produced crash for test case (attached with patch). Now the function transferSRADebugInfo is modified to - Ignore the current variable if it starts after the current Fragment. - Ignore the current variable if it ends before the current Fragment. - Generate (!DIExpression()) if current variable completely fits the current Fragment. - Otherwise (as earlier), generate the DW_OP_LLVM_fragment in IR if current Fragment partially defines current variable. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D121107	2022-03-10 00:41:30 +05:30
Florian Hahn	f98125abb2	Revert "[PassManager] Add pretty stack entries before P->run() call." This reverts commit `128745cc26`. This increased compile-time unnecessarily. Revert this change and follow ups `2c7afadb47` & `add0c5856d`. http://llvm-compile-time-tracker.com/compare.php?from=338dfcd60f843082bb589b287d890dbd9394eb82&to=128745cc2681c284bc6d0150a319673a6d6e8424&stat=instructions	2022-03-09 18:46:32 +00:00
Andrew Litteken	0b3a6c8d20	[IROutliner] Handling outlined code with no exit paths As a result of adding multiblock outlining, it became possible to outline the entirety of basic block, and branches that only pointed to the basic blocks contained in the outlined section. This means that there are no exit paths, and no return statement. There was a previous assertion from the older version of the outliner that explicitly made sure there was a return statement. This removes that assertion. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D120868	2022-03-09 10:43:48 -08:00
Benoit Jacob	851332a1f2	Fix linking error, undefined class static constants. Reviewed By: spupyrev Differential Revision: https://reviews.llvm.org/D121293	2022-03-09 10:01:38 -08:00
Craig Topper	f72fe2ef67	[InstCombine] Preserve FMF in foldLogicOfFCmps. This patch intersects the fast math flags from the two fcmps instead of dropping them. I poked at this a bunch with Alive2 for nnan and ninf flags and it seemed to check out. With the other flags it told me "Couldn't prove the correctness of the transformation". Not sure if I should just preserve nnan and ninf? Reviewed By: spatel, lebedev.ri Differential Revision: https://reviews.llvm.org/D121243	2022-03-09 09:17:09 -08:00
Florian Hahn	a12403cfea	[LV] Do not consider instrs dead if used by phi that's not in plan. Single value phis won't be modeled in VPlan. If the phi only gets used outside the loop, the current code misses the fact that the incoming value is not dead. Update the code to also look through such phis to check for outside users. Fixes #54266	2022-03-09 16:04:44 +00:00
Nikita Popov	e81f566de6	[Coroutines] Avoid pointer element access for resume function type For switch ABI, the function type is always "void (%frame*)", so just hardcode that rather than fetching it from a pointer element type.	2022-03-09 14:47:17 +01:00
Florian Hahn	128745cc26	[PassManager] Add pretty stack entries before P->run() call. This patch adds PrettyStackEntries before running passes. The entries include the pass name and the IR unit the pass runs on. The information is used the print additional information when a pass crashes, including the name and a reference to the IR unit on which it crashed. This is similar to the behavior of the legacy pass manager. The improved stack trace now includes: Stack dump: 0. Program arguments: bin/opt -loop-vectorize -force-vector-width=4 crash.ll 1. Running pass 'ModuleToFunctionPassAdaptor' on module 'crash.ll' 2. Running pass 'LoopVectorizePass' on function '@a' Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D120993	2022-03-09 13:01:09 +00:00
Nikita Popov	3d9386a349	[CoroFrame] Avoid pointer element type access for swifterror These must have pointer-to-pointer type, and with opaque pointers we don't care about the specific pointer type anymore.	2022-03-09 11:15:10 +01:00
Nikita Popov	f682a8386b	[Attributor] Use byval type instead of pointer element type For compatibility with opaque pointers, use the byval type rather than the pointer element type. Differential Review: https://reviews.llvm.org/D120983	2022-03-09 09:30:42 +01:00
Florian Mayer	4bfd8a2c5f	[NFC] [MTE] [HWASan] fixed orphaned comments.	2022-03-08 16:42:31 -08:00
Florian Mayer	af22478933	[NFC] [MTE] [HWASan] simply code.	2022-03-08 16:36:10 -08:00
Vitaly Buka	ce29a0429b	Revert "Attempt to fix linking issue on the bot" The issue was fixed with `48c74bb2e2` This reverts commit `ac423a8c8a`.	2022-03-08 16:16:01 -08:00
Florian Mayer	e86bd32b71	[NFC] [HWASan] [MTE] Use function_ref over template.	2022-03-08 15:49:55 -08:00
Vitaly Buka	ac423a8c8a	Attempt to fix linking issue on the bot	2022-03-08 15:33:10 -08:00
Fangrui Song	48c74bb2e2	[SampleProfileInference] Work around odr-use of const non-inline static data member to fix -O0 builds after D120508 MinBaseDistance may be odr-used by std::max, leading to an undefined symbol linker error: ``` ld.lld: error: undefined symbol: (anonymous namespace)::MinCostMaxFlow::MinBaseDistance >>> referenced by SampleProfileInference.cpp:744 (/home/ray/llvm-project/llvm/lib/Transforms/Utils/SampleProfileInference.cpp:744) >>> lib/Transforms/Utils/CMakeFiles/LLVMTransformUtils.dir/SampleProfileInference.cpp.o:((anonymous namespace)::FlowAdjuster::jumpDistance(llvm::FlowJump*) const) ``` Since llvm-project is still using C++ 14, workaround it with a cast.	2022-03-08 14:34:53 -08:00
Florian Hahn	e10b0ea371	[ConstraintElimination] Remove over-eager assertion. After moving the CanAdd check in `c60cdb44f7` and using it for the assume cases as well, the passed in block may not have a branch instruction as terminator. This can trigger the assertion. Given the new use case, it doesn't add value any longer and can be removed. Fixes https://github.com/llvm/llvm-project/issues/54281	2022-03-08 22:02:08 +00:00
spupyrev	81aedab7dd	introducing some profi flags Differential Revision: https://reviews.llvm.org/D120508	2022-03-08 12:35:15 -08:00
Sanjay Patel	9397bdc67e	[InstCombine] fold fcmp with lossy casted constant This is noted as a missing clang warning in #54222 (and we should still make that enhancement). Alive2 proofs: https://alive2.llvm.org/ce/z/Q8drDq https://alive2.llvm.org/ce/z/pE6LRt I don't see a single conversion for all predicates using "getFCmpCode" logic, so other predicates are left as a TODO item.	2022-03-08 12:41:12 -05:00
Arnold Schwaighofer	dcdc1f29bb	InstCombine: Can't fold a phi arg load into the phi if the load is from a swifterror address `swifterror` addresses are only allowed as operands to load, store, and calls. The following transformation is not allowed. It would create a phi with a `swifterror` address operand. ``` %addr = alloca swifterror i8* br %cond, label %bb1, label %b22 bb1: %val1 = load i8, i8* %addr br exit bb2: %val2 = load i8, i8* %addr br exit exit: %val = phi [%val1, %bb1] [%val2, %bb2] ``` => ``` %addr = alloca swifterror i8* br %cond, label %bb1, label %b22 bb1: br exit bb2: br exit exit: %val_addr = phi [%addr, %bb1] [%addr, %bb2] %val2 = load i8, i8* %val_addr ``` rdar://89865485 Differential Revision: https://reviews.llvm.org/D121217	2022-03-08 09:09:51 -08:00
Arthur Eubanks	53e5e58670	[NewPM][Inliner] Make inlined calls to functions in same SCC as callee exponentially expensive Introduce a new attribute "function-inline-cost-multiplier" which multiplies the inline cost of a call site (or all calls to a callee) by the multiplier. When processing the list of calls created by inlining, check each call to see if the new call's callee is in the same SCC as the original callee. If so, set the "function-inline-cost-multiplier" attribute of the new call site to double the original call site's attribute value. This does not happen when the original call site is intra-SCC. This is an alternative to D120584, which marks the call sites as noinline. Hopefully fixes PR45253. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D121084	2022-03-07 23:51:09 -08:00
Johannes Doerfert	5b4acb20ff	[OpenMP][FIX] Ensure flag to disable de-globalization works properly If the user disables de-globalization we did not seed the AAHeapToShared and AAHeapToStack but we still could end up with them through in-flight lookups. With this patch we disable AAHeapToShared completely if the user disabled de-globalization. Heap-2-stack is still run though. Differential Revision: https://reviews.llvm.org/D121059	2022-03-07 23:43:05 -06:00
Philip Reames	a2e9c68fcd	[SLP] Extract a helper for buildvector [nfc]	2022-03-07 19:11:40 -08:00
Philip Reames	8ab3befa3f	[SLP] Fix spelling in a lambda name [NFC]	2022-03-07 18:52:57 -08:00
Ahmed Bougacha	1067f2177a	[sancov] Don't instrument calls to bitcast funcs: they're not indirect. Currently, when instrumenting indirect calls, this uses CallBase::getCalledFunction to determine whether a given callsite is eligible. However, that returns null if: this is an indirect function invocation or the function signature does not match the call signature. So, we end up instrumenting direct calls where the callee is a bitcast ConstantExpr, even though we presumably don't need to. Use isIndirectCall to ignore those funky direct calls. Differential Revision: https://reviews.llvm.org/D119594	2022-03-07 12:43:37 -08:00
Roman Lebedev	2f80ea7f4f	[NFC][LV] Use different braces in debug output The analysis passes output function name encapsulated in `'` braces, but LV uses `"`. Harmonizing this may help in creating an update script for the LV costmodel test checks. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D121105	2022-03-07 19:32:37 +03:00
Florian Hahn	4bbee17ecb	[ConstraintElimination] Use ZExtValue for unsigned decomposition. When decomposing constraints for unsigned conditions, we can use negative values by zero-extending them, as long as they are less than the maximum constraint value. Fixes https://github.com/llvm/llvm-project/issues/54224	2022-03-07 13:34:01 +00:00
Florian Hahn	c60cdb44f7	[ConstraintElimination] Only add cond from assume to succs if valid. Add missing CanAdd check before adding a condition from an assume to the successor blocks. When adding information from assume to successor blocks we need to perform the same CanAdd as we do for adding a condition from a branch. Fixes https://github.com/llvm/llvm-project/issues/54217	2022-03-07 12:01:15 +00:00
Nikita Popov	0636c93d3e	[Attributor] Remove restriction on simplifying function pointers Dropping this restriction seems to work fine (there are no assertion failures), so it appears that either the updater got smarter or the problematic cases are restricted elsewhere. If doing this still causes issues, then the place to address it would probably be `8f5bdaf481/llvm/lib/Transforms/IPO/Attributor.cpp (L1856-L1859)`, which already prevents replacement outside the SCC, so I'm not quite sure what this check is intended to avoid. Differential Revision: https://reviews.llvm.org/D120987	2022-03-07 11:54:37 +01:00
Nikita Popov	1bd33691cb	[CoroElide] Remove fallback for frame layout determination Only determine the frame layout based on dereferenceable and align attributes, and remove the type-based fallback, which is incompatible with opaque pointers. The dereferenceable attribute is required, while the align attribute uses default alignment of 1 (commonly, align 1 attributes do not get placed, relying on default alignment). The CoroSplit pass producing the resume function adds the necessary attributes in `7daed35911/llvm/lib/Transforms/Coroutines/CoroSplit.cpp (L840)`, and their presence is checked in coro-debug.ll at least. Differential Revision: https://reviews.llvm.org/D120988	2022-03-07 11:23:02 +01:00
Nikita Popov	9bca4ea364	[Coroutines] Allow FramePtr to be an Argument With opaque pointers, after splitRetconCoroutine() the FramePtr may be an Argument rather than an Instruction. With typed pointers, this currently doesn't happen because the FramePtr would be a bitcast instruction. Fix this by making FramePtr a Value and adding a helper for the "after FramePtr" insertion point, which would be the start of the function in the Argument case. Differential Revision: https://reviews.llvm.org/D120994	2022-03-07 10:58:56 +01:00
Florian Hahn	542c335159	[ConstraintElimination] Remove dead variables when dropping constraints. This patch extends ConstraintElimination to also remove dead variables when removing a constraint. When a constraint is removed because it is out of scope, all new variables added for this constraint can also be removed. This keeps the total size of the systems much smaller, because it reduces the number of variables drastically. It also fixes a bug where variables where removed incorrectly. Fixes https://github.com/llvm/llvm-project/issues/54228	2022-03-07 09:04:07 +00:00
Nikita Popov	a9b03d9e2e	[Attributor] Remove function pointer restriction for AAAlign This check is not compatible with opaque pointers. We can avoid it by adjusting the getPointerAlignment() implementation to avoid creating unnecessary ptrtoint expressions for bitcasted pointers. The code already uses OnlyIfReduced to not create an expression if it does not simplify, and this makes sure that folding a bitcast and ptrtoint into a ptrtoint doesn't count as a simplification. Differential Revision: https://reviews.llvm.org/D120904	2022-03-07 10:02:45 +01:00
Nikita Popov	d1e880acaa	[SCEV] Enable verification in LoopPM Currently, we hardly ever actually run SCEV verification, even in tests with -verify-scev. This is because the NewPM LPM does not verify SCEV. The reason for this is that SCEV verification can actually change the result of subsequent SCEV queries, which means that you see different transformations depending on whether verification is enabled or not. To allow verification in the LPM, this limits verification to BECounts that have actually been cached. It will not calculate new BECounts. BackedgeTakenInfo::getExact() is still not entirely readonly, it still calls getUMinFromMismatchedTypes(). But I hope that this is not problematic in the same way. (This could be avoided by performing the umin in the other SCEV instance, but this would require duplicating some of the code.) Differential Revision: https://reviews.llvm.org/D120551	2022-03-07 09:46:20 +01:00
Johannes Doerfert	5af11ec34b	[Attributor] Determine potentially loaded values through memory We already look through memory to determine where a value that is stored might pop up again (potential copies). This patch introduces the other direction with similar logic. If a value is loaded, we can follow all the accesses to the pointer (or better object) and try to determine what value might have been stored.	2022-03-06 23:26:37 -06:00
Johannes Doerfert	eb73af4af4	[Attributor] Handle undef and null in AAAlignFloating Both `undef` and `nullptr` are maximally aligned. This is especially important as we often see `undef` until a proper value has been identified during simplification.	2022-03-06 23:26:22 -06:00
Johannes Doerfert	ad26e199ff	[Attributor] Use CFG reasoning also for read accesses With D106397 we used CFG reasoning to filter out writes that will not interfere with a given load instruction. With this patch we use the same logic (modulo the reversal in reachability check order) for store instructions. As an example, we can now proof stores to shared memory are dead if all the loads of the shared memory are not reachable from them.	2022-03-06 23:26:22 -06:00
Johannes Doerfert	acb3773491	[Attributor] Improve isValidAtPosition (mostly for old PM) To minimize the test difference between old and new PM we perform some local dominance check if no dominator tree is available.	2022-03-06 23:26:21 -06:00
Johannes Doerfert	ff758372bd	[Attributor][NFCI] Introduce fine-grained anonymous namespaces	2022-03-06 21:28:38 -06:00
Johannes Doerfert	192a34ddb0	[Attributor][OpenMPOpt][FIX] Register simplification callbacks Heap-2-stack and heap-2-shared can replace an allocation call with something else. To avoid us deriving information from the allocator implementation we register a simplification callback now that will force us to stop at the call site. We probably should create the replacement memory eagerly and return that instead though.	2022-03-06 21:28:38 -06:00
Johannes Doerfert	5859ae6a5d	[Attributor][FIX] Use maximal access for dereferenceability deduction While we can use range information when we derive dereferenceability we must make sure to pick he right end of the range. Before we always went with the minimal offset, which is not correct if we want to combine the base dereferenceability with some offset. In that case it's the maximum that gives the correct result.	2022-03-06 21:28:38 -06:00
Johannes Doerfert	1fcd4d0e3b	[Attributor][FIX] Initialize stack variable	2022-03-06 21:28:38 -06:00
Johannes Doerfert	6158f4a466	[Attributor][NFCI] No repeated manifest of AAValueSimplifyReturned (CGSCC)	2022-03-06 19:59:23 -06:00
Johannes Doerfert	efedf70aa5	[Attributor][NFC] Expose helper with more generic interface This simply makes the function argument of the `Attributor::checkForAllInstructions` helper explicit so one can iterate over instructions in other functions.	2022-03-06 19:59:23 -06:00
Johannes Doerfert	8fa839aa58	[Attributor][NFC] Improve debug messages	2022-03-06 19:59:22 -06:00
William S. Moses	87ec6f41bb	[OpenMPIRBuilder] Allocate temporary at the correct block in a nested parallel The OpenMPIRBuilder has a bug. Specifically, suppose you have two nested openmp parallel regions (writing with MLIR for ease) ``` omp.parallel { %a = ... omp.parallel { use(%a) } } ``` As OpenMP only permits pointer-like inputs, the builder will wrap all of the inputs into a stack allocation, and then pass this allocation to the inner parallel. For example, we would want to get something like the following: ``` omp.parallel { %a = ... %tmp = alloc store %tmp[] = %a kmpc_fork(outlined, %tmp) } ``` However, in practice, this is not what currently occurs in the context of nested parallel regions. Specifically to the OpenMPIRBuilder, the entirety of the function (at the LLVM level) is currently inlined with blocks marking the corresponding start and end of each region. ``` entry: ... parallel1: %a = ... ... parallel2: use(%a) ... endparallel2: ... endparallel1: ... ``` When the allocation is inserted, it presently inserted into the parent of the entire function (e.g. entry) rather than the parent allocation scope to the function being outlined. If we were outlining parallel2, the corresponding alloca location would be parallel1. This causes a variety of bugs, including https://github.com/llvm/llvm-project/issues/54165 as one example. This PR allows the stack allocation to be created at the correct allocation block, and thus remedies such issues. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121061	2022-03-06 18:34:25 -05:00
Florian Hahn	bc00f47c01	[LoopSink] Do not try to sink phi nodes. Skip phi nodes in the preheader. They may not be considered loop invariant by the assertion below. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D121010	2022-03-06 11:16:22 +00:00
Roman Lebedev	e47257e251	Revert "Reland [SROA] Maintain shadow/backing alloca when some slices are noncapturnig read-only calls to allow alloca partitioning/promotion" There seems to be one more uncaught problem, SROA may now end up trying to re-re-repromote the just-promoted shadow alloca, and do that endlessly. This reverts commit `adc0984d81`.	2022-03-05 01:09:51 +03:00
Roman Lebedev	adc0984d81	Reland [SROA] Maintain shadow/backing alloca when some slices are noncapturnig read-only calls to allow alloca partitioning/promotion This is inspired by the original variant of D109749 by Graham Hunter, but is a more general version. Roughly, instead of promoting the alloca, we call it a shadow/backing alloca, go through all it's slices, clone(!) instructions that operated on it, but make them operate on the cloned alloca, and promote cloned alloca instead. This keeps the shadow/backing alloca, and all the original instructions around, which results in said shadow/backing alloca being a perfect mirror/representation of the promoted alloca's content, so calls that take the alloca as arguments (non-capturingly!) can be supported. For now, we require that the calls also don't modify the alloca's content, but that is only to simplify the initial implementation, and that will be supported in a follow-up. Overall, this leads to smaller codesize: https://llvm-compile-time-tracker.com/compare.php?from=a8b4f5bbab62091835205f3d648902432a4a5b58&to=aeae054055b125b011c1122f82c86457e159436f&stat=size-total and is roughly neutral compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=a8b4f5bbab62091835205f3d648902432a4a5b58&to=aeae054055b125b011c1122f82c86457e159436f&stat=instructions This relands commit `703240c71f`, that was reverted by commit `7405581f7c`, because the assertion `isa<LoadInst>(OrigInstr)` didn't hold in practice, as the newly added test `@select_of_ptrs` shows: If the pointers into alloca are used by select's/PHI's, then even if we manage to fracture the alloca, some sub-alloca's will likely remain. And if there are any non-capturing calls, then we will also decide to keep the original backing alloca around, and we suddenly ~doubled the alloca size, and the amount of memory traffic. I'm not sure if this is a problem or we could live with it, but let's leave that for later... Reviewed By: djtodoro Differential Revision: https://reviews.llvm.org/D113520	2022-03-05 00:14:12 +03:00
Augie Fackler	b32735d599	BuildLibCalls: add allocalign attributes for memalign and aligned_alloc This gets us close to being able to remove a column from the table in MemoryBuiltins.cpp. Differential Revision: https://reviews.llvm.org/D117923	2022-03-04 15:57:53 -05:00
Augie Fackler	d664c4b73c	Attributes: add a new allocalign attribute This will let us start moving away from hard-coded attributes in MemoryBuiltins.cpp and put the knowledge about various attribute functions in the compilers that emit those calls where it probably belongs. Differential Revision: https://reviews.llvm.org/D117921	2022-03-04 15:57:53 -05:00
Johannes Doerfert	f9c2d6005e	[OpenMP][FIX] Ensure custom state machine works The custom state machine had a check for surplus threads that filtered the main thread if the kernel was executed by a single warp only. We now first check for the main thread, then for surplus threads, avoiding to filter the former out. Fixes #54214. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D121011	2022-03-04 13:51:19 -05:00
Roman Lebedev	7405581f7c	Revert "[SROA] Maintain shadow/backing alloca when some slices are noncapturnig read-only calls to allow alloca partitioning/promotion" Bots are reporting that the assertion about only expecting loads is wrong. This reverts commit `703240c71f`.	2022-03-04 21:49:30 +03:00
Roman Lebedev	703240c71f	[SROA] Maintain shadow/backing alloca when some slices are noncapturnig read-only calls to allow alloca partitioning/promotion This is inspired by the original variant of D109749 by Graham Hunter, but is a more general version. Roughly, instead of promoting the alloca, we call it a shadow/backing alloca, go through all it's slices, clone(!) instructions that operated on it, but make them operate on the cloned alloca, and promote cloned alloca instead. This keeps the shadow/backing alloca, and all the original instructions around, which results in said shadow/backing alloca being a perfect mirror/representation of the promoted alloca's content, so calls that take the alloca as arguments (non-capturingly!) can be supported. For now, we require that the calls also don't modify the alloca's content, but that is only to simplify the initial implementation, and that will be supported in a follow-up. Overall, this leads to smaller codesize: https://llvm-compile-time-tracker.com/compare.php?from=a8b4f5bbab62091835205f3d648902432a4a5b58&to=aeae054055b125b011c1122f82c86457e159436f&stat=size-total and is roughly neutral compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=a8b4f5bbab62091835205f3d648902432a4a5b58&to=aeae054055b125b011c1122f82c86457e159436f&stat=instructions Reviewed By: djtodoro Differential Revision: https://reviews.llvm.org/D113520	2022-03-04 21:08:43 +03:00
Augie Fackler	5e4c75db3b	InstructionCombining: avoid eliding mismatched alloc/free pairs Prior to this change LLVM would happily elide a call to any allocation function and a call to any free function operating on the same unused pointer. This can cause problems in some obscure cases, for example if the body of operator::new can be inlined but the body of operator::delete can't, as in this example from jyknight: #include <stdlib.h> #include <stdio.h> int allocs = 0; void operator new(size_t n) { allocs++; void mem = malloc(n); if (!mem) abort(); return mem; } __attribute__((noinline)) void operator delete(void mem) noexcept { allocs--; free(mem); } void deleteit(inti) { delete i; } int main() { int*i = new int; deleteit(i); if (allocs != 0) printf("MEMORY LEAK! allocs: %d\n", allocs); } This patch addresses the issue by introducing the concept of an allocator function family and uses it to make sure that alloc/free function pairs are only removed if they're in the same family. Differential Revision: https://reviews.llvm.org/D117356	2022-03-04 10:41:10 -05:00
Nikita Popov	6467d1d275	[CoroFrame] Remove unused insertSpills() return value (NFC)	2022-03-04 15:11:24 +01:00
Nikita Popov	6b5b367858	[Attributor] Remove function pointer type check (NFCI) This check is not relevant for correctness, it can only avoid walking some recursive uses if the cast is to a non-function pointer type. As this distinction will no longer be possible with opaque pointers and all users will have to be walked anyway, I'm dropping the check in advance.	2022-03-04 12:09:51 +01:00
Nikita Popov	d3a52089eb	Reapply [MergeICmps] Don't require GEP Recommit without changes over `53abe3ff66`, which addressed the cause of the reported crash. ----- With opaque pointers, the zero-offset load will generally not use a GEP. Allow a direct load without GEP, which is treated the same way as a zero-offset GEP.	2022-03-04 11:39:11 +01:00
Nikita Popov	53abe3ff66	[MergeICmp] Make instruction move robust against empty block (NFCI) Use the overload that support moving into an empty block. I don't think that this situation can occur right now, but it can happen with the change from `e7fb1c15cb`, and the test is derived from the issue reported there.	2022-03-04 11:15:08 +01:00
Jez Ng	dd29597e10	[LTO] Initialize canAutoHide() using canBeOmittedFromSymbolTable() Per discussion on https://reviews.llvm.org/D59709#inline-1148734, this seems like the right course of action. `canBeOmittedFromSymbolTable()` subsumes and generalizes the previous logic. In addition to handling `linkonce_odr` `unnamed_addr` globals, we now also internalize `linkonce_odr` + `local_unnamed_addr` constants. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D120173	2022-03-03 19:04:11 -05:00
Arthur Eubanks	bc1574b495	Revert "[MergeICmps] Don't require GEP" This reverts commit `e7fb1c15cb`. Causes crashes, see https://reviews.llvm.org/rGe7fb1c15cb85d748c1c4fdd5a2eb5613ec7bef1d.	2022-03-03 15:01:39 -08:00
Philip Reames	00a877f96a	[DSE] Cache liveOnEntry as clobbering access This builds on @fhahn's D112313, and caches the liveOnEntry node as a optimized access. D112313 tied to only cache a known clobber. This change adds caching the fact that no clobber exists. It still does not cache may-clobber results. Differential Revision: https://reviews.llvm.org/D120842	2022-03-03 11:36:21 -08:00
Philip Reames	deae979a2c	Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""" This reverts commit `738042711b`. A second, apparently separate, issue has been reported on the original review.	2022-03-03 11:35:34 -08:00
Arthur Eubanks	f0b61f7957	Revert "[GlobalOpt] Don't replace alias with aliasee if either alias/aliasee may be preemptible" This reverts commit `30e8f83c84`. Causes huge compile time regressions on certain large files. Will followup offline with author.	2022-03-03 11:04:14 -08:00
Craig Topper	608161225e	[InstCombine][Analysis] Move getFCmpCode and getPredForFCmpCode to CmpInstAnalysis. NFC The similar getICmpCode and getPredForICmpCode are already there. This moves FP for consistency. I think InstCombine is currently the only user of both. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120754	2022-03-03 09:33:24 -08:00
Paul Robinson	7b85f0f32f	[PS4] isPS4 and isPS4CPU are not meaningfully different	2022-03-03 11:36:59 -05:00
Nikita Popov	1b6663a104	[FuncSpec] Remove unnecessary function pointer type check We will check a bit later that the constant is in fact a function, so the separate check for a function pointer type is largely redunant. Also simplify the cast stripping with stripPointerCasts().	2022-03-03 15:20:11 +01:00
Alexandros Lamprineas	910eb988eb	[FuncSpec][NFC] Refactor internal structures. `ArgInfo` is reduced to only contain a pair of {formal,actual} values. The specialized function `Fn` and the `Partial` flag are redundant in this structure. The `Gain` is moved to a new struct `SpecializationInfo`. The value mappings created by cloneCandidateFunction() are being used by rewriteCallSites() for matching the formal arguments of recursive functions. The list of specializations is passed by reference to calculateGains() instead of being returned by value. The `IsPartial` flag is removed from isArgumentInteresting() and getPossibleConstants() as it's no longer used anywhere in the code. Differential Revision: https://reviews.llvm.org/D120753	2022-03-03 13:08:13 +00:00
Nikita Popov	c1b9667148	[InstCombine] Support opaque pointers in callee bitcast fold To make this actually trigger, we also need to check whether the function types differ, which is a hidden cast under opaque pointers. The transform is somewhat less relevant there because it is primarily about pointer bitcasts, but it can also happen with other bit- or pointer-castable types. Byval handling is easier with opaque pointers because there is no need to adjust the byval type, we only need to make sure that it's still a pointer.	2022-03-03 11:07:39 +01:00
Nikita Popov	6c8adc5054	[InstCombine] Remove unnecessary byval check in callee cast fold The logic for handling this was fixed in `8d7f118ab2`, but the check for byval on the callee was retained. This resulted in a weird situation where the transform would work depending on whether the byval was only on the call or on both the call and the function.	2022-03-03 10:55:14 +01:00
Nikita Popov	c262ba2aab	[Scalarizer] Avoid pointer element type accesses Pass through the load/store type to the Scatterer instead.	2022-03-03 10:28:58 +01:00
serge-sans-paille	f90a66a544	Add missing include under -DEXPENSIVE_CHECKS This is a follow-up to `59630917d6`	2022-03-03 10:19:39 +01:00
Nikita Popov	b214f550f7	[DSE] Drop redundant WalkerStepLimit adjustment There is a general WalkerStepLimit adjustment higher up in the loop, and I don't see any reason why this particular case would need additional adjustment. Furthermore, this could underflow.	2022-03-03 09:42:38 +01:00
serge-sans-paille	59630917d6	Cleanup includes: Transform/Scalar Estimated impact on preprocessor output line: before: 1062981579 after: 1062494547 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120817	2022-03-03 07:56:34 +01:00
spupyrev	f2ade65fb2	[CSSPGO] Even flow distribution Differential Revision: https://reviews.llvm.org/D118640	2022-03-02 13:12:05 -08:00
Philip Reames	738042711b	Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" Root issue which triggered the revert was fixed in 689bab. No changes in the reapplied patch. Original commit message follows: SLP currently schedules all instructions within a scheduling window which stretches from the first instr uction potentially vectorized to the last. This window can include a very large number of unrelated instruct ions which are not being considered for vectorization. This change switches the code to only schedule the su b-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-03-02 10:47:20 -08:00
Philip Reames	689babdf68	[SLP] Don't try to vectorize allocas While a collection of allocas are technically vectorizeable - by forming a wider alloca - this was not a transform SLP actually knows how to do. Instead, we were forming a bundle with missing dependencies, and then relying on the scheduling code to preserve program order if multiple instructions were scheduleable at once. I haven't been able to write a test case, but I'm 99% sure this was wrong in some edge case. The unknown op case was flowing down the shufflevector path. This did result in some splat handling being lost with this change, but the same lack of splat handling is visible in a whole bunch of simple examples for the gather path. I didn't consider this interesting to fix given how narrow the splat of allocas case is.	2022-03-02 10:08:43 -08:00
Stephen Long	2f6c14816a	[LoopPeel] Add EXPENSIVE_CHECKS ifdef guard around domtree verify call The verify call was taking 50% of the compile time in our internal LLVM fork when trying to unroll many loops. Differential Revision: https://reviews.llvm.org/D113028	2022-03-02 09:56:20 -08:00
Florian Hahn	8777cb66a8	[VPlan] Remove reliance on underlying instr for ScalarIVSteps (NFCI). Instead of relying on underlying instructions, this patch updates VPScalarIVStepsRecipe to only store the required type information. This removes access to unrelated information, as well as avoiding issues with the same underlying instruction being shared by multiple recipes. This change should only change the debug output and not cause any codegen changes, hence NFCI.	2022-03-02 16:23:19 +00:00
Nikita Popov	61580d0949	Reapply [InstCombine] Remove one-use limitation from X-Y==0 fold This is a recommit without changes. I originally reverted this due to a significant code-size regression on tramp3d-v4, however further investigation showed that in the tramp3d-v4 case this change enables additional optimizations (in particular more jump threading), which happens to reduce the size of a function just enough to be eligible for inlining at hot callsites, which results in the code size increase. As such, this was just bad luck. ----- This one-use limitation is artificial, we do not increase instruction count if we perform the fold with multiple uses. The motivating case is shown in @sub_eq_zero_select, where the one-use limitation causes us to miss a subsequent select fold. I believe the backend is pretty good about reusing flag-producing subs for cmps with same operands, so I think doing this is fine. Differential Revision: https://reviews.llvm.org/D120337	2022-03-02 16:43:33 +01:00
spupyrev	bcdc047731	speeding up ext-tsp for huge instances Differential Revision: https://reviews.llvm.org/D120780	2022-03-02 07:17:48 -08:00
Florian Hahn	9e46866c0c	[LV] Remove dead EntryVal argument from buildScalarSteps (NFC). The EntryVal argument is not needed after recent refactoring. Remove it.	2022-03-02 14:59:22 +00:00
Nikita Popov	5cf06d10f8	Revert "[InstCombine] Support switch in phi to cond fold" This reverts commit `0817ce86b5`. Seeing some ppc64le stage2 failures, reverting to investigate.	2022-03-02 12:49:47 +01:00
Nikita Popov	0817ce86b5	[InstCombine] Support switch in phi to cond fold For conditional branches, we know the value is i1 0 or i1 1 along the outgoing edges. For switches we can apply exactly the same optimization, just with the known values determined by the switch cases.	2022-03-02 12:16:32 +01:00
Xiang1 Zhang	65588a0776	Revert "TLS loads opimization (hoist)" Revert for more reviews This reverts commit `30e612ebdf`.	2022-03-02 14:10:11 +08:00
Hongtao Yu	07846e3387	[CSSPGO][PriorityInliner] Do not use block weight to drive callsite inlining. The priority-based inliner currenlty uses block count combined with callee entry count to drive callsite inlining. This doesn't work well with LTO where postlink inlining is driven by prelink-annotated block count which could be based on the merge of all context profiles. I'm fixing it by using callee profile entry count only which should be context-sensitive. I'm seeing 0.2% perf improvment for one of our internal large benchmarks with probe-based non-CS profile. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D120784	2022-03-01 18:43:19 -08:00
Xiang1 Zhang	30e612ebdf	TLS loads opimization (hoist) Reviewed By: Wang Pheobe, Topper Craig Differential Revision: https://reviews.llvm.org/D120000	2022-03-02 10:37:24 +08:00
Arthur Eubanks	9c6250ee41	Revert "[SLP] Schedule only sub-graph of vectorizable instructions" This reverts commit `0539a26d91`. Causes a miscompile, see comments on D118538. Required updating bottom-to-top-reorder.ll.	2022-03-01 17:31:16 -08:00
Arthur Eubanks	6987ac7903	Revert "[SLP] Remove SchedulingPriority from ScheduleData [NFC]" This reverts commit `a3e9b32c00`. Required for reverting D118538.	2022-03-01 17:28:52 -08:00
Florian Mayer	1d730d80ce	[HWASAN] erase lifetime intrinsics if tag is outside. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D120437	2022-03-01 14:47:33 -08:00
Joseph Huber	6632180745	[OpenMP][NFC] Add an option to print the module before in OpenMPOpt Previously there was a debug flag to print the module after optimizations. Sometimes we wanted to print the module before optimizations so this is being split into two flags. `-openmp-opt-print-module` is now `-openmp-opt-print-module-after`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120768	2022-03-01 17:09:09 -05:00
serge-sans-paille	a494ae43be	Cleanup includes: TransformsUtils Estimation on the impact on preprocessor output: before: 1065307662 after: 1064800684 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120741	2022-03-01 21:00:07 +01:00
Craig Topper	7bc6667845	[Analysis] Simplify the interface to llvm::getICmpCode. NFC Instead of passing an InstCmpInt * and a bool just pass the predicate from the caller. I'm considering moving the similar FCmp functions from InstCombine over here and this makes the interface consistent with what is used for FCmp. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120609	2022-03-01 09:53:27 -08:00
Tong Zhang	17ce89fa80	[SanitizerBounds] Add support for NoSanitizeBounds function Currently adding attribute no_sanitize("bounds") isn't disabling -fsanitize=local-bounds (also enabled in -fsanitize=bounds). The Clang frontend handles fsanitize=array-bounds which can already be disabled by no_sanitize("bounds"). However, instrumentation added by the BoundsChecking pass in the middle-end cannot be disabled by the attribute. The fix is very similar to D102772 that added the ability to selectively disable sanitizer pass on certain functions. In this patch, if no_sanitize("bounds") is provided, an additional function attribute (NoSanitizeBounds) is attached to IR to let the BoundsChecking pass know we want to disable local-bounds checking. In order to support this feature, the IR is extended (similar to D102772) to make Clang able to preserve the information and let BoundsChecking pass know bounds checking is disabled for certain function. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D119816	2022-03-01 18:47:02 +01:00
serge-sans-paille	71c3a5519d	Cleanup includes: LLVMAnalysis Number of lines output by preprocessor: before: 1065940348 after: 1065307662 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120659	2022-03-01 18:01:54 +01:00
Nikita Popov	a1f442b278	[InstCombine] Support phi to cond fold with more than two preds This transform can still be applied if there are more than two phi inputs, as long as phi inputs with the same value are dominated by the same idom edge.	2022-03-01 16:31:49 +01:00
Nikita Popov	26748bb15a	[InstCombine] Slightly relax one-use check in abs canonicalization Treat the icmp and sub symmetrically, and require that one of them has one use, not the icmp in particular. This could be further relaxed in the abs (but not nabs) case to not check one-use at all.	2022-03-01 15:06:41 +01:00
Sanjay Patel	84812b9b07	[InstCombine] drop FMF in select->copysign transform It is not correct to propagate flags from the select to the new instructions: https://alive2.llvm.org/ce/z/tNATrd https://alive2.llvm.org/ce/z/VwcVzn Fixes #54077	2022-03-01 08:51:41 -05:00
Nikita Popov	c2428a4fad	[InstCombine] Remove SPF min/max check from select demanded bits (NFCI) This should no longer be necessary now that we canonicalize to intrinsics. This may not be entirely NFC in practice if worklist order gets inverted and we perform demanded bits simplification of a select user before the select is canonicalized.	2022-03-01 14:50:37 +01:00
Alexandros Lamprineas	33830326aa	[FuncSpec] Remove definitions of fully specialized functions. A function is basically dead when: * it has no uses * it has only self-referencing uses (it's recursive) Differential Revision: https://reviews.llvm.org/D119878	2022-03-01 11:57:08 +00:00
Alexandros Lamprineas	b803aee67b	[FuncSpec][NFC] Improve debug messages. Adds diagnostic messages when debugging the pass. Differential Revision: https://reviews.llvm.org/D119875	2022-03-01 11:55:08 +00:00
Alexandros Lamprineas	7b74123a3d	[FuncSpec][NFC] Variable renaming. Just preparing the ground for follow up patches to make the reviews easier. Differential Revision: https://reviews.llvm.org/D119874	2022-03-01 11:38:57 +00:00
Kirill Stoimenov	b7fd30eac3	[ASan] Removed unused AddressSanitizerPass functional pass. This is a clean-up patch. The functional pass was rolled into the module pass in D112732. Reviewed By: vitalybuka, aeubanks Differential Revision: https://reviews.llvm.org/D120674	2022-03-01 00:41:29 +00:00
Philip Reames	8cb0ac5825	[SLP] Check invariant that all instructions in bundle are in same block [NFC]	2022-02-28 13:17:44 -08:00
Sanjay Patel	278b407a30	[InstCombine] fold mul-with-overflow intrinsic with -1 operand extractvalue (any_mul_with_overflow X, -1), 0 --> -X There are similar other potential transforms that we could do as noted by the last TODO in the test diffs. Fixes #54053	2022-02-28 14:13:48 -05:00
Sanjay Patel	f422c5d871	[InstCombine] fold select-of-zero-or-ones with negated op (X u< 2) ? -X : -1 --> sext (X != 0) (X u> 1) ? -1 : -X --> sext (X != 0) https://alive2.llvm.org/ce/z/U3y5Bb https://alive2.llvm.org/ce/z/hgi-4p This is part of solving:	2022-02-28 12:07:49 -05:00
Alexey Bataev	e4b9640867	[SLP]Improve bottom-to-top reordering. Currently bottom-to-top reordering analysis counts orders of the operands and then adds natural order counts for the operand users. It is very conservative, this the user nodes themselves may require reordering. Patch improves bottom-to-top analysis by checking for the user nodes if they require/allows the reordring. If the user node must be reordered, has reused scalars, is an alternate op vectorization node, is a non-ordered gather node or may allow reordering because of the reordered operands, such node is considered as the node that allows reodring and is not counted as a node with the natural order. Differential Revision: https://reviews.llvm.org/D120492	2022-02-28 06:48:46 -08:00
Florian Hahn	b3e8ace198	Recommit "[VPlan] Introduce recipe to build scalar steps." This reverts the revert commit `ff93260bf6`. The underlying issue causing the PPC bot failures has been fixed in `cbaac14734` and a corresponding test case has been added in `ad2cad1c52`. Original message: This patch adds a new VPScalarIVStepsRecipe to handle building scalar steps. In the first patch, it only handles the case where there is no vector induction variable needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115953	2022-02-28 14:12:20 +00:00
Florian Hahn	cbaac14734	[LV] Remove induction recipes only used outside vector loop. Exit values of vector inductions are generated completely independent of the induction recipes. Consider them for removal, if they are not used in loop. This fixes a crash exposed by `49b23f451c`.	2022-02-28 11:14:22 +00:00
Nikita Popov	5423b0a525	[InstCombine] Remove not of SPF min/max fold (NFCI) This should no longer be necessary now that we canonicalize to intrinsics. Might not be strictly NFC due to worklist order.	2022-02-28 11:02:31 +01:00
Nikita Popov	d5ea3b2f33	[InstCombine] Remove sub of SPF min/max fold (NFCI) This isn't necessary anymore, now that we canonicalize SPF min/max to intrinsics. Might not be strictly NFC due to worklist order changes.	2022-02-28 10:57:24 +01:00
Nikita Popov	9353ed6a53	[InstCombine] Don't call matchSAddSubSat() for SPF (NFC) Only call it for intrinsic min/max. The moved implementation is unchanged apart from the one-use check: It is now hardcoded to one-use, without the two-use special case for SPF.	2022-02-28 10:41:56 +01:00
Nikita Popov	53602e4c70	[InstCombine] Remove SPF moveAddAfterMinMax() (NFC) As SPF min/max is canonicalized to intrinsics before this point, this change should be entirely NFC.	2022-02-28 10:28:16 +01:00
Nikita Popov	ee62dcdb34	[InstCombine] Remove SPF moveNotAfterMinMax() (NFC) This happens after SPF -> intrinsic canonicalization, and as such should be entirely NFC.	2022-02-28 10:23:07 +01:00
Nikita Popov	0bc3e233d7	[InstCombine] Remove SPF factorizeMinMaxTree() (NFC) SPF integer min/max is canonicalized to min/max intrinsics before this code is reached, so this should be entirely NFC.	2022-02-28 10:22:05 +01:00
Philip Reames	319265328c	[SLP] Remove field unused after `33ce97f` to silence buildbots [NFC]	2022-02-27 10:18:10 -08:00
Florian Hahn	ff93260bf6	Revert "[VPlan] Introduce recipe to build scalar steps." This reverts commit `49b23f451c`. This appears to break some PPC build bots. Revert while I investigate.	2022-02-27 17:51:19 +00:00
Philip Reames	33ce97f413	[SLP] Use BatchAA to reduce capture analysis cost [NFC] SLP makes very heavy use of aliasing queries to construct pointer dependencies for scheduling purposes. AA internally usings pointerMayBeCaptured to prove some noalias results. In a local profile, we were spending about 4% of total O2 time in capture tracking. By using BatchAA interface - which caches capture results - this drops to 2%. Note that there is no invalidation of BatchAA here. This assumes that no transformation done by SLP invalidates alias or capture results. This is the same assumption made by the existing AliasCache, so this is not a new assumption in the code.	2022-02-27 09:47:24 -08:00
Florian Hahn	49b23f451c	[VPlan] Introduce recipe to build scalar steps. This patch adds a new VPScalarIVStepsRecipe to handle building scalar steps. In the first patch, it only handles the case where there is no vector induction variable needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115953	2022-02-27 17:32:41 +00:00
Florian Hahn	9bc866cc6f	[VPlan] Add recipe to handle SCEV expansion (NFC). This can be used to explicitly model VPValues that depend on SCEV expansion, like the step for inductions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116288	2022-02-27 12:47:02 +00:00
Florian Hahn	da740492b0	[VPlan] Remove dead header-phi recipes. This patch adds a new transform to remove dead recipes. For now, it only removes dead recipes in the header, to keep the number tests that require updating manageable. Future patches will extend this to remove dead recipes across the whole plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D118051	2022-02-26 16:26:39 +00:00
Craig Topper	1b1f8d6eff	[SeparateConstOffsetFromGEP] Remove TargetMachine.h include. NFC This doesn't appear to be used and it would be a layering violation if it was.	2022-02-25 21:40:00 -08:00
Evgeniy Brevnov	10e99eb7e4	[SLP] "Normal" instructions should not go between PHI and Lading pad Currently, SLP can insert "shuffle" instruction beween PHI and Landing pad instruction. The problem is demonstrated by LIT test. The solution is to adjust insertion point once we are done with PHI generation. Differential Revision: https://reviews.llvm.org/D120552	2022-02-26 11:44:26 +07:00
Nikita Popov	e7fb1c15cb	[MergeICmps] Don't require GEP With opaque pointers, the zero-offset load will generally not use a GEP. Allow a direct load without GEP, which is treated the same way as a zero-offset GEP.	2022-02-25 17:38:02 +01:00
Simon Pilgrim	3b422455dd	[IPO] AAFunctionReachabilityFunction.updateImpl - reduce AAReachability scope. NFCI. We already have a check for !InstQueries.empty(), so move the for-range over InstQueries inside to avoid the AAReachability uninitialized variable static analysis warnings.	2022-02-25 14:42:31 +00:00
Nikita Popov	4736e57199	[IndVars] Use phis() (NFC)	2022-02-25 12:08:12 +01:00
Nikita Popov	e1608a9df8	[InstCombine] Remove SPF min/max canonicalization Now that we canonicalize SPF min/max to intrinsics, there's no need to canonicalize the structure of the SPF min/max itself anymore. This is conceptually NFC, but in practice does slightly impact results due to folding order differences.	2022-02-25 11:24:09 +01:00
Nikita Popov	16a2d5f885	[SCEVExpander] Use early returns in FindValueInExprValueMap() (NFC)	2022-02-25 10:09:16 +01:00
Nikita Popov	2d0fc3e46f	[SCEV] Return ArrayRef from getSCEVValues() (NFC) Return a read-only view on this set. For the one internal use, directly access ExprValueMap.	2022-02-25 09:32:22 +01:00
Nikita Popov	d9715a7266	[SCEV] Don't try to reuse expressions with offset SCEVs ExprValueMap currently tracks not only which IR Values correspond to a given SCEV expression, but additionally stores that it may be expanded in the form X+Offset. In theory, this allows reusing existing IR Values in more cases. In practice, this doesn't seem to be particularly useful (the test changes are rather underwhelming) and adds a good bit of complexity. Per https://github.com/llvm/llvm-project/issues/53905, we have an invalidation issue with these offseted expressions. Differential Revision: https://reviews.llvm.org/D120311	2022-02-25 09:16:48 +01:00
Anton Afanasyev	904a00d17a	[AggressiveInstCombine] Fix `TruncInstCombine` (fix `f84d732f`) Erase phi-nodes from `InstInfoMap` before erasing themselves	2022-02-25 08:04:11 +03:00
Anton Afanasyev	0dd8401371	[AggressiveInstCombine] Add `phi` nodes support to `TruncInstCombine` Expand `TruncInstCombine` to handle loops by adding `phi` nodes to expression graph. Reviewed by: RKSimon, lebedev.ri (recommit of fixed `f84d732f`, reverted by `8ad6d5e` after sanitizer breakage) Differential Revision: https://reviews.llvm.org/D109817	2022-02-25 07:57:35 +03:00
Vasileios Porpodas	4bbc3290a2	[SLP] Fix for the min/max intrinsic cost. The min/max intrinsic cost is currently too low because in the cost calculation we subtract the cost of the vector compare as we will not emit it. For the cost of the vector compare we are currently passing BAD_ICMP_PREDICATE which returns 3, the worst case cost. I think we should be passing VecPred instead, since we know the predicates of the compare instr. I think this is related to commit `b3b993a7ad` which introduced the predicate argument to getCmpSelInstrCost(). https://reviews.llvm.org/rGb3b993a7ad817c3c5801341fa78f34332900eb83 Differential Revision: https://reviews.llvm.org/D120439	2022-02-24 18:08:40 -08:00
Joseph Huber	7aef8b3754	[OpenMP] Make section variable external to prevent collisions Summary: We use a section to embed offloading code into the host for later linking. This is normally unique to the translation unit as it is thrown away during linking. However, if the user performs a relocatable link the sections will be merged and we won't be able to access the files stored inside. This patch changes the section variables to have external linkage and a name defined by the section name, so if two sections are combined during linking we get an error.	2022-02-24 10:57:09 -05:00
Sanjay Patel	5379f76e63	[InstCombine] try harder to preserve 'nsz' in fneg-of-select transform The corner case where 'nsz' needs to be removed is very narrow as discussed here: https://reviews.llvm.org/rG3cdd05e519dd If the select condition is not undef, there's no problem with propagating 'nsz': https://alive2.llvm.org/ce/z/4GWJdq	2022-02-24 10:43:53 -05:00
Nikita Popov	a266af7211	[InstCombine] Canonicalize SPF to min/max intrinsics Now that integer min/max intrinsics have good support in both InstCombine and other passes, start canonicalizing SPF min/max to intrinsic min/max. Once this sticks, we can stop matching SPF min/max in various places, and can remove hacks we have for preventing infinite loops and breaking of SPF canonicalization. Differential Revision: https://reviews.llvm.org/D98152	2022-02-24 09:01:20 +01:00
Nikita Popov	aa551ad198	Revert "[InstCombine] Remove one-use limitation from X-Y==0 fold" This reverts commit `65dc78d63e`. This caused a major code-size regression on tramp3d-v4, revert until I can investigate.	2022-02-24 08:50:40 +01:00
Matthias Braun	6a383369f9	PGOInstrumentation, GCOVProfiling: Split indirectbr critical edges regardless of PHIs The `SplitIndirectBrCriticalEdges` function was originally designed for `CodeGenPrepare` and skipped splitting of edges when the destination block didn't contain any `PHI` instructions. This only makes sense when reducing COPYs like `CodeGenPrepare`. In the case of `PGOInstrumentation` or `GCOVProfiling` it would result in missed counters and wrong result in functions with computed goto. Differential Revision: https://reviews.llvm.org/D120096	2022-02-23 16:27:37 -08:00
minglotus-6	142cedc283	[SampleProf][Inliner] Add an option to turn off inliner in sample-profile pass. Use case is offline evaluation (for inliner effectiveness) or debugging. Differential Revision: https://reviews.llvm.org/D120344	2022-02-23 14:21:33 -08:00
Philip Reames	ed54296ea3	[SLP] Fastpath instructions not in block being scheduled [nfc]	2022-02-23 13:51:36 -08:00
Philip Reames	a4541fdfe4	[SLP] Replace a impossible branch condition with an assert [NFC] An entire bundle must be inside the scheduling window. Assert that this property holds as opposed to checking it at runtime.	2022-02-23 13:43:45 -08:00
Philip Reames	9a40f9f681	{SLP] Make it clear ScheduleDataMap is keyed by instructions [NFC]	2022-02-23 13:31:36 -08:00
Philip Reames	9392c0d4ef	Revert "[SLP] Remove cap on schedule window size" This reverts commit `6adf4b039e`. Reverting while investigating https://github.com/llvm/llvm-project/issues/54029	2022-02-23 13:12:07 -08:00
Philip Reames	a83441e8cd	Revert "[SLP] Simplify extendSchedulingRegion" This reverts commit `8c85f3a052`.	2022-02-23 13:12:07 -08:00
Philip Reames	222e8610f1	[SLP] Rearrange fields in ScheduleData for density [NFC]	2022-02-23 12:33:43 -08:00
Philip Reames	a3e9b32c00	[SLP] Remove SchedulingPriority from ScheduleData [NFC] First step in trying to shrink the memory footprint of ScheduleData to improve cache locality.	2022-02-23 11:43:46 -08:00
Philip Reames	8c85f3a052	[SLP] Simplify extendSchedulingRegion This change uses instruction's comesBefore method to simplify the code significantly. There's little compile time concern here because getSpillCost already calls comesBefore on every basic block which contains a vectorization candidate. The only additional times we'll build basic block ordering is when we can't schedule a vector candidate anywhere in the containing block. Differential Revision: https://reviews.llvm.org/D120364	2022-02-23 11:23:38 -08:00
Augie Fackler	95f3cc222a	AttributorAttributes: avoid a crashing on bad alignments Prior to this change, LLVM would attempt to optimize an aligned_alloc(33, ...) call to the stack. This flunked an assertion when trying to emit the alloca, which crashed LLVM. Avoid that with extra checks. Differential Revision: https://reviews.llvm.org/D119604	2022-02-23 14:21:02 -05:00
Arthur Eubanks	1fd980de04	Revert "AttributorAttributes: avoid a crashing on bad alignments" This reverts commit `70ff6fbeb9`. Breaks bots, e.g. http://45.33.8.238/linux/69375/step_12.txt.	2022-02-23 09:08:03 -08:00
Augie Fackler	70ff6fbeb9	AttributorAttributes: avoid a crashing on bad alignments Prior to this change, LLVM would attempt to optimize an aligned_alloc(33, ...) call to the stack. This flunked an assertion when trying to emit the alloca, which crashed LLVM. Avoid that with extra checks. Differential Revision: https://reviews.llvm.org/D119604	2022-02-23 11:46:15 -05:00
Philip Reames	6adf4b039e	[SLP] Remove cap on schedule window size This cap was first added in `848c1aa45` (back in 2015). Per the original commit message, the purpose was to avoid a compile time explosion in long basic blocks. The algorithmic problem in scheduling has now been fixed in `0539a26d`. In the meantime, the code has rotten fairly badly. Some intermediate refactoring caused the size to only be incremented if both iterators advance in the window search. This causes the size to be badly undercounted when near one end of a basic block. We no longer have any test which exercises the logic in an intentional way; there's one test which differs with this change, but the changes appear fairly orthoganol to the purpose of the test file. Unfortunately, we no longer have the original motivating example, so it's possible that it also hits some other issue. I tested locally with a large example, but even at it's worst, that one doesn't demonstrate anything too extreme even without the algorithmic fix. It's clearly faster with, but only by ~20% which doesn't seem in line with the original commit message. If regressions with this patch are seen, please file a bug and I'll try to fix any other algorithmic problems which fall out.	2022-02-23 08:27:45 -08:00
Nikita Popov	587c7ff15c	[InstCombine] Support min/max intrinsics in udiv->lshr fold This complements the existing fold for selects. This fold is a bit more conservative, requiring one-use. The other folds here should probably also be subjected to a one-use restriction. https://alive2.llvm.org/ce/z/Q9eCDU https://alive2.llvm.org/ce/z/8YK2CJ	2022-02-23 15:51:36 +01:00
Nikita Popov	03e6efb8c2	[InstCombine] Further simplify udiv -> lshr folding Rather than queuing up actions, have one function that does the log2() fold in the obvious way, but with a flag that allows us to check whether the fold will succeed without actually performing it.	2022-02-23 15:29:21 +01:00
Nikita Popov	5ccb0582c2	[InstCombine] Simplify udiv -> lshr folding What we're really doing here is converting Op0 udiv Op1 into Op0 lshr log2(Op1), so phrase it in that way. Actually pushing the lshr into the log2(Op1) expression should be seen as a separate transform.	2022-02-23 14:55:23 +01:00
Anton Afanasyev	8ad6d5e465	Revert "[AggressiveInstCombine] Add `phi` nodes support to `TruncInstCombine`" This reverts commit `f84d732f8c`. Breakage of "sanitizer-x86_64-linux-fast"	2022-02-23 15:56:11 +03:00
Nikita Popov	5fb65557e3	[InstCombine] Remove unused visitUDivOperand() argument (NFC) This function only works on the RHS operand.	2022-02-23 13:16:44 +01:00
Anton Afanasyev	f84d732f8c	[AggressiveInstCombine] Add `phi` nodes support to `TruncInstCombine` Expand `TruncInstCombine` to handle loops by adding `phi` nodes to expression graph. Reviewed by: RKSimon, lebedev.ri Differential Revision: https://reviews.llvm.org/D109817	2022-02-23 14:01:55 +03:00
Nikita Popov	e2f627e5e3	[InstCombine] Fold sub of umin to usub.sat We were handling sub of umax, but not the conjugated umin case. https://alive2.llvm.org/ce/z/4fdZfy https://alive2.llvm.org/ce/z/BhUQBM	2022-02-23 12:00:34 +01:00
Bill Wendling	a5bbc6ef99	[NFC] Remove unnecessary "#include"s from header files	2022-02-23 01:20:48 -08:00
Nikita Popov	65dc78d63e	[InstCombine] Remove one-use limitation from X-Y==0 fold This one-use limitation is artificial, we do not increase instruction count if we perform the fold with multiple uses. The motivating case is shown in @sub_eq_zero_select, where the one-use limitation causes us to miss a subsequent select fold. I believe the backend is pretty good about reusing flag-producing subs for cmps with same operands, so I think doing this is fine. Differential Revision: https://reviews.llvm.org/D120337	2022-02-23 09:37:30 +01:00
minglotus-6	f415d74d1d	[SampleProfile] Handle the case when the option `MaxNumPromotions` is zero. In places where `MaxNumPromotions` is used to allocated an array, bail out early to prevent allocating an array of length 0. Differential Revision: https://reviews.llvm.org/D120295	2022-02-22 21:44:32 -08:00
Brendon Cahoon	3cc15e2cb6	[SLP] Fix assert from non-constant index in insertelement A call to getInsertIndex() in getTreeCost() is returning None, which causes an assert because a non-constant index value for insertelement was not expected. This case occurs when the insertelement index value is defined with a PHI. Differential Revision: https://reviews.llvm.org/D120223	2022-02-22 15:57:14 -06:00
Dmitry Vassiliev	90a3b31091	[Transforms] Enhance CorrelatedValuePropagation to handle both values of select The "Correlated Value Propagation" pass was missing a case when handling select instructions. It was only handling the "false" constant value, while in NVPTX the select may have the condition (and thus the branches) inverted, for example: ``` loop: %phi = phi i32* [ %sel, %loop ], [ %x, %entry ] %f = tail call i32* @f(i32* %phi) %cmp1 = icmp ne i32* %f, %y %sel = select i1 %cmp1, i32* %f, i32* null %cmp2 = icmp eq i32* %sel, null br i1 %cmp2, label %return, label %loop ``` But the select condition can be inverted: ``` %cmp1 = icmp eq i32* %f, %y %sel = select i1 %cmp1, i32* null, i32* %f ``` The fix is to enhance "Correlated Value Propagation" to handle both branches of the select instruction. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D119643	2022-02-23 00:11:20 +04:00
Philip Reames	8612b11c86	[SLP] Use isInSchedulingRegion consistently [NFC]	2022-02-22 10:27:16 -08:00
Philip Reames	0539a26d91	[SLP] Schedule only sub-graph of vectorizable instructions SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-02-22 10:15:55 -08:00
Jay Foad	0e74d75a29	[StructurizeCFG] Fix boolean not bug D118623 added code to fold not-of-compare into a compare with the inverted predicate, if the compare had no other uses. This relies on accurate use lists in the IR but it was run before setPhiValues, when some phi inputs are still stored in a data structure on the side, instead of being real uses in the IR. The effect was that a phi that should be using the original compare result would now get an inverted result instead. Fix this by moving simplifyConditions after setPhiValues. Differential Revision: https://reviews.llvm.org/D120312	2022-02-22 17:36:20 +00:00
Egor Zhdan	3a1cb36237	Add DriverKit support This patch is the first in a series of patches to upstream the support for Apple's DriverKit. Once complete, it will allow targeting DriverKit platform with Clang similarly to AppleClang. This code was originally authored by JF Bastien. Differential Revision: https://reviews.llvm.org/D118046	2022-02-22 13:42:53 +00:00
Kerry McLaughlin	12fb133eba	[LoopVectorize] Support conditional in-loop vector reductions Extends getReductionOpChain to look through Phis which may be part of the reduction chain. adjustRecipesForReductions will now also create a CondOp for VPReductionRecipe if the block is predicated and not only if foldTailByMasking is true. Changes were required in tryToBlend to ensure that we don't attempt to convert the reduction Phi into a select by returning a VPBlendRecipe. The VPReductionRecipe will create a select between the Phi and the reduction. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117580	2022-02-22 12:04:35 +00:00
Nikita Popov	3c0096a1d4	[MergeICmps] Don't call comesBefore() if in different blocks (PR53959) Only call comesBefore() if the instructions are in the same block. Otherwise make a conservative assumption. Fixes https://github.com/llvm/llvm-project/issues/53959.	2022-02-22 12:27:20 +01:00
Nikita Popov	f8d7210032	[GlobalStatus] Keep Visited set in isSafeToDestroyConstant() Constants cannot be cyclic, but they can be tree-like. Keep a visited set to ensure we do not degenerate to exponential run-time. This fixes the problem reported in https://reviews.llvm.org/D117223#3335482, though I haven't been able to construct a concise test case for the issue. This requires a combination of dead constants and the kind of constant expression tree that textual IR cannot represent (because the textual representation, unlike the in-memory representation, is also exponential in size).	2022-02-22 10:02:37 +01:00
Florian Hahn	7662d1687b	[MemCpyOpt] Check all access for MemoryUses in writtenBetween. Currently writtenBetween can miss clobbers of Loc between End and Start, if End is a MemoryUse. To guarantee we see all write clobbers of Loc between Start and End for MemoryUses, restrict to Start and End being in the same block and check all accesses between them. This fixes 2 mis-compiles illustrated in llvm/test/Transforms/MemCpyOpt/memcpy-byval-forwarding-clobbers.ll Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119929	2022-02-21 16:54:30 +00:00
Arthur Eubanks	053c2a0020	[SimplifyCFG][OpaquePtr] Check store type when merging conditional store	2022-02-20 11:29:54 -08:00
Florian Hahn	c141d158e5	[VectorCombine] Remove redundant checks (NFC). The removed conditions are already checked by the if above. Fixes #53761.	2022-02-19 21:05:32 +00:00
Philip Reames	6f9d557e08	[instcombine] Cleanup foldAllocaCmp slightly [NFC]	2022-02-18 18:49:39 -08:00
Philip Reames	3ad0bdae8f	[SLP] Address post commit comment from `2e50760`	2022-02-18 10:57:15 -08:00
Simon Pilgrim	be1ffda0a5	[InstCombine] visitCallInst - pull out repeated bswap scalar type bitwidth. NFC.	2022-02-18 17:33:11 +00:00
Florian Hahn	00ab91b70d	[ConstraintElimination] Remove ConstraintListTy (NFCI). This patch simplifies constraint handling by removing the ConstraintListTy wrapper struct and moving the Preconditions directly into ConstraintTy. This reduces the amount of memory needed for managing constraints. The only use case for ConstraintListTy was adding 2 constraints to model ICMP_EQ conditions. But this can be handled by adding an IsEq flag. When adding an equality constraint, we need to add the constraint and the inverted constraint.	2022-02-18 14:35:01 +00:00
Joseph Huber	0136a4401f	[OpenMP] Add an option to limit shared memory usage in OpenMPOpt One of the optimizations performed in OpenMPOpt pushes globalized variables to static shared memory. This is preferable to keeping the runtime call in all cases, however if too many variables are pushed to hared memory the kernel will crash. Since this is an optimization and not something the user specified explicitly, there should be an option to limit this optimization in those cases. This path introduces the `-openmp-opt-shared-limit=` option to limit the amount of bytes that will be placed in shared memory from HeapToShared. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120079	2022-02-18 08:35:26 -05:00
Alexey Bataev	b0a0df9809	[SLP]Fix vectorization of the alternate cmp instruction with swapped predicates. If the alternate cmp instruction is a swapped predicate of the main cmp instruction, need to generate alternate instruction, not the one with the swapped predicate. Also, the lane with the alternate opcode should be selected only, if the corresponding operands are not compatible. Correctness confirmed: https://alive2.llvm.org/ce/z/94BG66 Differential Revision: https://reviews.llvm.org/D119855	2022-02-18 04:27:45 -08:00
Alexander Potapenko	c85a26454d	[asan] Add support for disable_sanitizer_instrumentation attribute For ASan this will effectively serve as a synonym for __attribute__((no_sanitize("address"))). Adding the disable_sanitizer_instrumentation to functions will drop the sanitize_XXX attributes on the IR level. This is the third reland of https://reviews.llvm.org/D114421. Now that TSan test is fixed (https://reviews.llvm.org/D120050) there should be no deadlocks. Differential Revision: https://reviews.llvm.org/D120055	2022-02-18 09:51:54 +01:00
Kuba Mracek	6b53ad298e	[GlobalDCE] [VFE] Avoid dropping vfunc dependencies when an invalid vtable entry is present When we scan vtables for a particular vload in ScanVTableLoad and an entry in one possible vtable is invalid (null or non-fptr), we bail in a wrong way -- we completely stop the scanning of vtables and this results in dropped dependencies and incorrectly removed vfuncs from vtables. Let's fix that by correcting the bailing logic to keep iterating and only skip the invalid entries. Differential Revision: https://reviews.llvm.org/D120006	2022-02-17 19:41:46 -08:00
William S. Moses	d9da6a535f	[LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information. This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D119965	2022-02-17 20:13:07 -05:00
Arthur Eubanks	af6b9939aa	[EarlyCSE][OpaquePtr] Check access type when performing DSE This will bail out on target specific intrinsics. If those are deemed important enough for EarlyCSE to handle, we can augment MemIntrinsicInfo with an access type for TargetTransformInfo::getTgtMemIntrinsic() to handle. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D120077	2022-02-17 11:58:53 -08:00
Joseph Huber	74cacf212b	[OpenMP] Add RTL function to externalization RAII This patch adds the '_kmpc_get_hardware_num_threads_in_block' OpenMP RTL function to the externalization RAII struct. This was getting optimized out and then being replaced with an undefined value once added back in, causing bugs for complex reductions. Fixes #53909. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120076	2022-02-17 14:30:58 -05:00
Johannes Doerfert	254d6da020	[Attributor][FIX] Ensure stable iteration order With `668c5c688b` we introduced an ordering issue revealed by the reverse iteration buildbot. Depending on the order of the map that tracks the AAIsDead AAs we ended up with slightly different attributes. This is not totally unexpected and can happen. We should however be deterministic in our orderings to avoid such issues.	2022-02-17 12:53:10 -06:00
Daniil Suchkov	7c3e2b92cf	[RewriteStatepointsForGC] Fix an incorrect assertion The assertion verifying that a newly computed value matches what is already cached used stripPointerCasts() to strip bitcasts, however the values can be not only pointers, but also vectors of pointers. That is problematic because stripPointerCasts() doesn't handle vectors of pointers. This patch introduces an ad-hoc utility function to strip all bitcasts regardless of the value type. Reviewed By: skatkov, reames Differential Revision: https://reviews.llvm.org/D119994	2022-02-17 18:44:57 +00:00
Arthur Eubanks	4a26abc0b9	[InstCombine][OpaquePtr] Check store type in DSE implementation	2022-02-17 10:01:14 -08:00
Arthur Eubanks	129af4daa7	[SCEVExpander][OpaquePtr] Check GEP source type when finding identical GEP Fixes an opaque pointers miscompile. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D120004	2022-02-17 08:48:11 -08:00
Jay Foad	9071393c18	[GlobalDCE] Simplify and return Changed = true less often Removing dead constants should not count as making a change to the module. This means that RemoveUnusedGlobalValue simplifies to just calling removeDeadConstantUsers, so inline it. Differential Revision: https://reviews.llvm.org/D120052	2022-02-17 16:03:13 +00:00
Sanjay Patel	58df2da054	[InstCombine] push constant operand down/outside in sequence of min/max intrinsics A generalization like this was suggested in D119754. This is the inverse direction of D119851, and we get all of the folds there plus the one that was missed. There is precedence for this kind of transform in instcombine with "or" instructions (but strangely only with that one opcode AFAICT). Similar justification as in the other patch: The line between instcombine and reassociate for these kinds of folds is blurry. This doesn't appear to have much cost and gives us the expected wins from repeated folds as seen in the last set of test diffs. Differential Revision: https://reviews.llvm.org/D119955	2022-02-17 10:36:37 -05:00
Alexey Bataev	d1cd64ffdd	[SLP][NFC]Fix misprint in function name, NFC.	2022-02-17 05:57:51 -08:00
Nikita Popov	36fdfaba19	[RelLookupTableConverter] Ensure that GV, GEP and load types match This code could be generalized to be type-independent, but for now just ensure that the same type constraints are enforced with opaque pointers as with typed pointers.	2022-02-17 12:05:05 +01:00
Roman Lebedev	371fcb720e	[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in https://github.com/llvm/llvm-project/issues/53853 and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574 This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes https://github.com/llvm/llvm-project/issues/53853 Refs. https://github.com/rust-lang/rust/issues/85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854	2022-02-17 12:13:55 +03:00
Florian Mayer	c195addb60	[NFC] [MTE] [HWASan] Remove unnecessary member of AllocaInfo Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119981	2022-02-16 15:19:30 -08:00
Arthur Eubanks	826fae51d2	[SLPVectorizer][OpaquePtrs] Check GEP source element type Fixes a miscompile with opaque pointers. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D119980	2022-02-16 14:47:20 -08:00
Johannes Doerfert	8ad39fbaf2	[Attributor][FIX] Heap2Stack needs to use the alloca AS When we move an allocation from the heap to the stack we need to allocate it in the alloca AS and then cast the result. This also prevents us from inserting the alloca after the allocation call but rather right before. Fixes https://github.com/llvm/llvm-project/issues/53858	2022-02-16 15:58:32 -06:00
Johannes Doerfert	668c5c688b	[Attributor][FIX] Use liveness information of the right function When we use liveness for edges during the `genericValueTraversal` we need to make sure to use the AAIsDead of the correct function. This patch adds the proper logic and some simple caching scheme. We also add an assertion to the `isEdgeDead` call to make sure future misuse is detected earlier. Fixes https://github.com/llvm/llvm-project/issues/53872	2022-02-16 15:58:32 -06:00
Johannes Doerfert	6ed1ef0643	[Attributor][FIX] Pipe UsedAssumedInformation through more interfaces `UsedAssumedInformation` is a return argument utilized to determine what information is known. Most APIs used it already but `genericValueTraversal` did not. This adds it to `genericValueTraversal` and replaces `AllCallSitesKnown` of `checkForAllCallSites` with the commonly used `UsedAssumedInformation`. This was supposed to be a NFC commit, then the test change appeared. Turns out, we had one user of `AllCallSitesKnown` (AANoReturn) and the way we set `AllCallSitesKnown` was wrong as we ignored the fact some call sites were optimistically assumed dead. Included a dedicated test for this as well now. Fixes https://github.com/llvm/llvm-project/issues/53884	2022-02-16 14:44:20 -06:00
Nikita Popov	c9032f1a69	[LowerMemIntrinsics] Explicitly use i8 type in memmove lowering By convention, memcpy/memmove intrinsics are always used with i8 pointers (though this is not enforced), so in practice this code was always using an i8 type. Make that explicit. Of course, i8 is not a very profitable choice, and this code could be more performant by picking an appropriate larger type. But that would require additional test coverage and correctness review, and certainly shouldn't be a decision based on the pointer element type.	2022-02-16 16:31:55 +01:00
Florian Hahn	d03d3d7966	[DSE] Fall back to CFG scan for unreachable terminators. Blocks with UnreachableInst terminators are considered as root nodes in the PDT. This pessimize DSE, if there are no aliasing reads from the potentially dead store and the block with the unreachable terminator. If any of the root nodes of the PDF has UnreachableInst as terminator, fall back to the CFG scan, even the common dominator of all killing blocks does not post-dominate the block with potentially dead store. It looks like the compile-time impact for the extra scans is negligible. https://llvm-compile-time-tracker.com/compare.php?from=779bbbf27fe631154bdfaac7a443f198d4654688&to=ac59945f1bec1c6a7d7f5590c8c69fd9c5369c53&stat=instructions Fixes #53800. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119760	2022-02-16 14:06:40 +00:00
Bin Cheng	dfec0b3053	[FuncSpec] Save compilation time by caching uses for propagation We only need to do propagation on use instructions of the original value, rather than the replacing const value which might have lots of irrelavant uses. This is done by caching uses before replacing. Differential Revision: https://reviews.llvm.org/D119815	2022-02-16 10:46:26 +08:00
Philip Reames	2e50760775	[SLP] Add assert that entities are scheduled as expected Requested in D118538	2022-02-15 12:21:49 -08:00
Florian Mayer	59e7de26aa	[HWASan] remove replacement of DbgVariableIntrinsics. This code was dead because we AI->replaceUsesWithIf above. I verified this doesn't actually get run by applying https://gist.github.com/fmayer/aea7cbb4700cfe2c9d932591ae1073c3 to the Android toolchain and building AOSP, without any crash. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119802	2022-02-15 11:40:58 -08:00
Max Kazantsev	bfc1217119	[NFC] Introduce option to switch off compatible invokes merge Does not affect default behavior (transform is on).	2022-02-15 21:51:03 +07:00
Alexander Potapenko	05ee1f4af8	Revert "[asan] Add support for disable_sanitizer_instrumentation attribute" This reverts commit `dd145f953d`. https://reviews.llvm.org/D119726, like https://reviews.llvm.org/D114421, still causes TSan to fail, see https://lab.llvm.org/buildbot/#/builders/70/builds/18020 Differential Revision: https://reviews.llvm.org/D119838	2022-02-15 15:04:53 +01:00
Sanjay Patel	6357ccf57f	[InstCombine] reassociate min/max intrinsics with constant operands Integer min/max operations are associative: max (max X, C0), C1 --> max X, (max C0, C1) --> max X, NewC https://alive2.llvm.org/ce/z/wW5HVM This would avoid a regression when we canonicalize to min/max intrinsics (see D98152 ). Differential Revision: https://reviews.llvm.org/D119754	2022-02-15 08:31:23 -05:00
Simon Pilgrim	9606c69087	[InstCombine] Fold sub(Y,and(lshr(X,C),1)) --> add(ashr(shl(X,(BW-1)-C),BW-1),Y) (PR53610) As noted on PR53610, we can fold a 'bit splat' negation of a shifted bitmask pattern into a pair of shifts. https://alive2.llvm.org/ce/z/eGrsoN Differential Revision: https://reviews.llvm.org/D119715	2022-02-15 13:24:20 +00:00
Anton Afanasyev	b7574b092a	[SLP] Don't try to vectorize pair with insertelement Particularly this breaks vectorization of insertelements where some of intermediate (i.e. not last) insertelements are used externally. Fixes PR52275 Fixes #51617 Differential Revision: https://reviews.llvm.org/D119679	2022-02-15 16:12:59 +03:00
Alexander Potapenko	dd145f953d	[asan] Add support for disable_sanitizer_instrumentation attribute For ASan this will effectively serve as a synonym for __attribute__((no_sanitize("address"))) This is a reland of https://reviews.llvm.org/D114421 Reviewed By: melver, eugenis Differential Revision: https://reviews.llvm.org/D119726	2022-02-15 14:06:12 +01:00
Nikita Popov	2460a2ce47	[DSE] Extract a common PDT check (NFC)	2022-02-15 13:05:45 +01:00
Hongtao Yu	62ef77ca63	[CSSPGO] Do not merge a context that is already duplicated into the base profile. Do not merge a context that is already duplicated into the base profile. Also fixing a typo caused by previous refactoring. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119735	2022-02-14 18:07:11 -08:00
Florian Mayer	8de457eafc	[HWASAN] use common alignAndPadAlloca Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119614	2022-02-14 15:28:32 -08:00
Florian Mayer	205308de6b	[NFC] [MTE] Move alignAndPadAlloca to MemoryTaggingSupport. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119610	2022-02-14 14:54:04 -08:00
Nick Desaulniers	9dcb006165	[funcattrs] check reachability to improve noreturn There was a fixme in the code pertaining to attributing functions as noreturn. By using reachability, if none of the blocks that are reachable from the entry return, then the function is noreturn. Previously, the code only checked if any blocks returned. If they're unreachable, then they don't matter. This improves codegen for the Linux kernel. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1563 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119571	2022-02-14 14:01:59 -08:00
Ahmed Bougacha	c703f852c9	[IR] Define "ptrauth" operand bundle. This introduces a new "ptrauth" operand bundle to be used in call/invoke. At the IR level, it's semantically equivalent to an @llvm.ptrauth.auth followed by an indirect call, but it additionally provides additional hardening, by preventing the intermediate raw pointer from being exposed. This mostly adds the IR definition, verifier checks, and support in a couple of general helper functions. Clang IRGen and backend support will come separately. Note that we'll eventually want to support this bundle in indirectbr as well, for similar reasons. indirectbr currently doesn't support bundles at all, and the IR data structures need to be updated to allow that. Differential Revision: https://reviews.llvm.org/D113685	2022-02-14 11:27:35 -08:00
Nikita Popov	41c5a762e5	[DeadArgElim] Check that function type is the same If the function types differ, the call arguments don't necessarily correspon to the function arguments. It's likely not worthwhile to handle this more precisely, but at least we shouldn't crash.	2022-02-14 14:08:42 +01:00
Anton Afanasyev	954ea0f044	[SLP] Simplify indices processing for insertelements Get rid of non-constant and undef indices of insertelements at `buildTree()` stage. Fix bugs. Differential Revision: https://reviews.llvm.org/D119623	2022-02-14 14:50:44 +03:00
Nikita Popov	7c83f8c45d	[InstCombine] Check GEP source type in select of gep fold This is no longer implicitly checked through the pointer type with opaque pointers.	2022-02-14 11:46:45 +01:00
Nikita Popov	efece08ae2	[InstCombine] Remove manual debug loc transfer While this might be marginally more precise, we generally don't bother with this in InstCombine, and let the IRBuilder assign the debug location. I don't see why this one fold, out of the thousands done in InstCombine, should be treated specially.	2022-02-14 11:07:05 +01:00
Nikita Popov	18bf42c0a6	[CVP] Extract helper from phi processing (NFC) So we can use early returns and avoid those awkward !V checks.	2022-02-14 10:51:34 +01:00
Dávid Bolvanský	1be1fd735d	[AlwaysInliner] Check for callsite noinline attribute simplified	2022-02-14 09:33:30 +01:00
Kazu Hirata	befeb5acf6	[Transforms] Use default member initialization in MemmoveVerifier (NFC)	2022-02-13 10:34:03 -08:00
Kazu Hirata	fd3e8044cd	[Transforms] Use default member initialization in Prefetch (NFC)	2022-02-13 10:34:02 -08:00
Kazu Hirata	0b9a610a75	[Transforms] Use default member initialization in ConditionInfo (NFC)	2022-02-13 10:34:00 -08:00
Kazu Hirata	fda6a1ad42	[Transforms] Use default member initialization in CHRStats (NFC)	2022-02-13 10:33:56 -08:00
Florian Hahn	2cd22ce0d0	[LV] Pass start value directly to emitTransformedIndex (NFC).	2022-02-12 19:03:32 +00:00
Florian Mayer	6759cdd829	[NFC] [MTE] Use helpers for stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119503	2022-02-11 16:01:46 -08:00
Florian Mayer	bf2f72fa10	[hwasan] keep debug intrinsicts in AllocaInfo. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119498	2022-02-11 16:01:02 -08:00
Michael Gottesman	19279ffc77	[debug-info] If one sees a spill with a dbg.addr use, salvageDebugInfo upon it and don't hoist it. This ensures that if we have a dbg.addr in a coroutine funclet that is on one of our function arguments, that the dbg.addr is not mapped to undef and also that later it isn't hoisted to the front of the basic block. Instead it remains at its original cloned location. rdar://83957028 Differential Revision: https://reviews.llvm.org/D119576	2022-02-11 15:15:13 -08:00
Florian Mayer	26dbc47468	Revert "[hwasan] keep debug intrinsicts in AllocaInfo." This reverts commit `19fdf85f58`.	2022-02-11 14:41:24 -08:00
Florian Mayer	b1bd64aeee	Revert "[NFC] [MTE] Use helpers for stack tagging." This reverts commit `8f0e5b4e26`.	2022-02-11 14:41:24 -08:00
Florian Hahn	66400fc2dd	[ConstraintElimination] Support add with precondition. If we can prove that an addition without wrap flags won't wrap, decompse the operation. Issue #48253	2022-02-11 20:26:25 +00:00
Arthur Eubanks	b59a402237	[MSan][OpaquePtr] Use inline asm elementtype instead of getPointerElementType()	2022-02-11 11:50:35 -08:00
Florian Mayer	8f0e5b4e26	[NFC] [MTE] Use helpers for stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119503	2022-02-11 10:59:09 -08:00
Florian Mayer	19fdf85f58	[hwasan] keep debug intrinsicts in AllocaInfo. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119498	2022-02-11 10:56:53 -08:00
Florian Mayer	e7356fb3e2	[nfc] [hwasan] factor out logic to collect info about stack this is the first step in unifying some of the logic between hwasan and mte stack tagging. this only moves around code, changes to converge different implementations of the same logic follow later. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118947	2022-02-11 10:54:12 -08:00
Johannes Doerfert	ede248e614	[OpenMP][FIX] The `llvm.amdgcn.s.barrier` is actually not aligned If we assume `llvm.amdgcn.s.barrier` is aligned we may remove it and cause OpenMP GPU applications on the AMD GPU to be stuck or wrongly synchronized. Reported by Carlo Bertolli.	2022-02-11 12:42:50 -06:00
Dávid Bolvanský	d828281e78	[AlwaysInliner] Respect noinline call site attribute ``` always_inline foo() { } bar () { noinline foo(); } ``` We should prefer call site attribute over attribute on decl. This is fix for AlwaysInliner, similar fix is needed for normal Inliner (follow up). Related to https://reviews.llvm.org/D119061 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D119553	2022-02-11 19:23:11 +01:00
Austin Kerbow	0bb25b4603	[InferAddressSpaces] Fix assert on invalid cast ordering If a cast is needed when replacing uses with newly created values, the cast must be inserted after the instruction that defines the new value. Fixes: SWDEV-321215 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D119524	2022-02-11 10:02:30 -08:00
Arthur Eubanks	22f4f94256	[CoroFrame][OpaquePtr] Remove getPointerElementType() call Get it from the byval type instead.	2022-02-11 09:53:20 -08:00
Sameer Sahasrabuddhe	d8f99bb6e0	[AMDGPU] replace hostcall module flag with function attribute The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph. If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument. The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument. Reviewed By: jdoerfert, arsenm, kpyzhov Differential Revision: https://reviews.llvm.org/D119216	2022-02-11 22:51:56 +05:30
Nikita Popov	4c6289c369	[InstCombine] Check source element type in gep of phi of gep fold	2022-02-11 17:10:48 +01:00
Matt Arsenault	52fbb786a6	InferAddressSpaces: Fix assert on inferred source for inttoptr/ptrtoint If we had some source value we could infer an address space from that went through a ptrtoint/inttoptr pair, this would fail since bitcast can't change the address space. Fixes issue 53665.	2022-02-11 10:35:29 -05:00
Anton Afanasyev	cd685f5736	[NFC][SLP] Set default parameter for Offset equal to zero	2022-02-11 17:22:33 +03:00
Nikita Popov	5450963085	[InstCombine] Check source element type in phi of gep fold Rather than checking that the type is the same (which is always the case, given how these are part of the same phi) check that the source element type is the same. With opaque pointers, this is no longer implied.	2022-02-11 14:26:18 +01:00
Nikita Popov	2a1b1f1b1b	[GVN] Store source element type for GEP expressions To avoid incorrectly merging GEPs with different source types under opaque pointers. To avoid increasing the Expression structure size, this reuses the existing type member. The code does not rely on this to be the expression result type, it's only used as a disambiguator.	2022-02-11 13:03:30 +01:00
Simon Pilgrim	a5d6851489	LoopReroll::isLoopControlIV - use cast<> instead of dyn_cast<> to avoid dereference of nullptr The pointer is always dereferenced by isCompareUsedByBranch, so assert the cast is correct instead of returning nullptr	2022-02-11 10:19:25 +00:00
Nikita Popov	e714b98fff	[InstCombine] Check type compatibility in indexed load fold This fold could use a rewrite to an offset-based implementation, but for now make sure it doesn't crash with opaque pointers.	2022-02-11 10:16:27 +01:00
Nikita Popov	3571bdb4f3	[InstCombine] Require equal source element type in icmp of gep fold Without opaque pointers, this is implicitly enforced. This previously resulted in a miscompile.	2022-02-11 09:38:28 +01:00
Nikita Popov	e24067819f	[ArgPromotion] Protect harder against recursive promotion (PR42028) In addition to the self-recursion check, also check whether there is more than one node in the SCC, which implies that there is a larger cycle. I believe checking SCC structure (rather than something like norecurse) is the right thing to do here, because this is specifically about preventing infinite loops over the SCC. Fixes https://github.com/llvm/llvm-project/issues/42028. Differential Revision: https://reviews.llvm.org/D119418	2022-02-11 09:30:39 +01:00
Nico Weber	e76037db44	[llvm] Remove unused file MaximumSpanningTree.h The last use of this file was removed in late 2013 in `ea56494625`. The last use was in PathProfiling.cpp, which had an overview comment of the overall approach. Similar functionality lives in the slight more cryptically named CFGMST.h in this same directory. A similar overview comment is in PGOInstrumentation.cpp. No behavior change.	2022-02-10 21:01:24 -05:00
Philip Reames	5ba115031d	[PSE] Remove assumption that top level predicate is union from public interface [NFC] Note that this doesn't actually cause the top level predicate to become a non-union just yet. The above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.	2022-02-10 16:14:52 -08:00
Teresa Johnson	dd3f483335	[ThinLTO][WPD] LICM set lookup (NFC) Minor efficiency fix. There is no reason to perform the same set lookup repeatedly in the inner loop as it is invariant there. Differential Revision: https://reviews.llvm.org/D119474	2022-02-10 13:16:31 -08:00
Simon Pilgrim	6af7c1371a	[LoopVectorize] getStepVector - reduce scope of local variable. NFC.	2022-02-10 20:44:25 +00:00
Johannes Doerfert	dd75c0ea64	[Attributor][NFC] Expose new API in AAPointerInfo New users might want to check bins without a load or store instruction at hand. Since we use those instructions only to find the offset and size of the access anyway, we can expose an offset and size interface to the outside world as well. This commit mainly moves code around and exposes a class (OffsetAndSize) as well as a method forallInterferingAccesses in AAPointerInfo. Differential Revision: https://reviews.llvm.org/D119249	2022-02-10 13:52:24 -06:00
Johannes Doerfert	d1387a26a5	[Attributor][FIX] Reachability needs to account for readonly callees The oversight caused us to ignore call sites that are effectively dead when we computed reachability (or more precise the call edges of a function). The problem is that loads in the readonly callee might depend on stores prior to the callee. If we do not track the call edge we mistakenly assumed the store before the call cannot reach the load. The problem is nicely visible in: `llvm/test/Transforms/Attributor/ArgumentPromotion/basictest.ll` Caused by D118673. Fixes https://github.com/llvm/llvm-project/issues/53726	2022-02-10 13:52:24 -06:00
Johannes Doerfert	e39b419312	[Attributor][FIX] Honor alloca address space in AAPrivatizablePtr When we privatize a pointer (~argument promotion) we introduce new private allocas as replacement. These need to be placed in the alloca address space as later passes cannot properly deal with them otherwise. Fixes https://github.com/llvm/llvm-project/issues/53725	2022-02-10 13:52:24 -06:00
Simon Pilgrim	aca355a3bb	[InstCombine] Extend fold (icmp sgt smin(PosA, B) 0) -> (icmp sgt B 0) to support smin intrinsic Replace matchSelectPattern pattern match with the more general m_SMin so that it can handle smin intrinsics as well as the icmp+select pattern Noticed while reviewing regressions from D98152	2022-02-10 13:28:15 +00:00
Sanjay Patel	995d400f3a	[InstCombine] reduce mul operands based on undemanded high bits We already do this in SDAG, but mul was left out of the fold for unused high bits in IR. The high bits of a mul's operands do not change the low bits of the result: https://alive2.llvm.org/ce/z/XRj5Ek Verify some test diffs to confirm that they are correct: https://alive2.llvm.org/ce/z/y_W8DW https://alive2.llvm.org/ce/z/7DM5uf https://alive2.llvm.org/ce/z/GDiHCK This gets a fold that was presumed not possible in D114272: https://alive2.llvm.org/ce/z/tAN-WY Removing nsw/nuw is needed for general correctness (and is also done in the codegen version), but we might be able to recover more of those with better analysis. Differential Revision: https://reviews.llvm.org/D119369	2022-02-10 08:10:22 -05:00
Florian Hahn	80eea38d8d	[ConstraintElimination] Remove unnecessary recursion (NFC). Perform predicate normalization in a single switch, rather then going through recursions.	2022-02-10 12:26:35 +00:00
Nikita Popov	8018d6be34	[ArgPromotion] Transfer metadata to promoted loads Also transfer selected non-AA metadata to the promoted load. Only metadata from guaranteed to execute loads is transferred.	2022-02-10 11:28:07 +01:00
Florian Hahn	79d60b93b4	[ConstraintElimination] Skip floating point compares. (NFC) The solver only supports integer conditions. Adding floating point compares to the worklist only adds extra work. Just skip them.	2022-02-09 21:16:49 +00:00
Philip Reames	d39f4ac494	[SCEV] Unwind SCEVUnionPredicate from getPredicatedBackedgeTakenCount [NFC] For those curious, the whole reason for tracking the predicate set seperately as opposed to just immediately registering the dependencies appears to be allowing the printing code to print a result without changing the PSE state. It's slightly questionable if this justifies the complexity, but since we can preserve it with local ugliness, I did so.	2022-02-09 12:55:40 -08:00
David Green	b55d4c2ad8	Revert "[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`" This reverts commit `77a0da926c` as we've received multiple reports of this significantly impacting performance, in ways that don't seem to just be target specific cost models going wrong. I would offer some reproducers, but the test changes here seem to be full of them! Reverting for now and hopefully we can remove the "hack" more carefully as we go.	2022-02-09 20:02:54 +00:00
Florian Hahn	b71eed7e8f	[ConstraintElimination] Remove redundant lookup (NFC).	2022-02-09 18:00:03 +00:00
Florian Hahn	902db4ec1c	[ConstraintElimination] Move some definitions closer to uses (NFC).	2022-02-09 17:29:49 +00:00
Arthur Eubanks	1bdc6eacba	[LoopLoadElim] Support opaque pointers With typed pointers the pointer operand type checks the address space and the load/store type. With opaque pointers we have to check the load/store type separately.	2022-02-09 09:22:21 -08:00
Alexey Bataev	370ea1a199	[SLP][NFC]Fix comment, NFC.	2022-02-09 07:14:14 -08:00
Florian Hahn	8aa122081f	[LV] Pass step to emitTransformedIndex (NFC). Move out the induction step creation from emitTransformedIndex to the callers. In some places (e.g. widenIntOrFpInduction) the step is already created. Passing the step in ensures the steps are kept in sync.	2022-02-09 11:12:45 +00:00
Nikita Popov	68c1eeb4ba	[ArgPromotion] Make implementation offset based This rewrites ArgPromotion to be based on offsets rather than GEP structure. We inspect all loads at constant offsets and remember which types are loaded at which offsets. Then we promote based on those types. This generalizes ArgPromotion to work with bitcasted loads, and is compatible with opaque pointers. This patch also fixes incorrect handling of alignment during argument promotion. Previously, the implementation only checked that the pointer is dereferenceable, but was happy to speculate overaligned loads. (I would have fixed this separately in advance, but I found this hard to do with the previous implementation approach). Differential Revision: https://reviews.llvm.org/D118685	2022-02-09 09:35:01 +01:00
Florian Hahn	c9e6678b56	[LV] Move buildScalarSteps out of ILV (NFC). This makes the function independent of shared state in ILV (ensures no new dependencies on things like the cost model are introduced) and allows for use directly in recipe's ::execute functions.	2022-02-08 21:18:40 +00:00
Sylvestre Ledru	f2c2e924e7	Fix a typo (occured => occurred) Reported: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1005195	2022-02-08 21:35:26 +01:00
Roman Lebedev	c8ba2b67a0	[SimplifyCFG] 'merge compatible invokes': fully support indirect invokes As long as all the invokes in the set are indirect, we can merge them, but don't merge direct invokes into the set, even though it would be legal to do.	2022-02-08 21:29:38 +03:00
Roman Lebedev	414b47645d	[SimplifyCFG] 'merge compatible invokes': don't create trivial PHI's with all-identical incoming values	2022-02-08 21:29:38 +03:00
Joseph Huber	caf7f05c1c	[Attributor] Emit fixed-point remark on function list This patch replaces the function we emit the remark on when we run into the fix-point limit. Previously we got a function to emit a remark on from the worklist's associated function. However, the worklist may not always have an associated function in the case of global variables. Replace this with the function set, and if there are no functions don't emit the remark. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119248	2022-02-08 12:10:21 -05:00
Philip Reames	c302f1e677	[SCEV] Generalize SCEVEqualsPredicate to any compare [NFC] PredicatedScalarEvolution has a predicate type for representing A == B. This change generalizes it into something which can represent a A <pred> B. This generality is currently unused, but is motivated by a couple of recent cases which have come up. In particular, I'm currently playing around with using this to simplify the runtime checking code in LoopVectorizer. Regardless of the outcome of that prototyping, generalizing the compare node seemed useful.	2022-02-08 08:18:09 -08:00
Nikita Popov	074561a4a2	[Mem2Reg] Check that load type matches alloca type Alloca promotion can only deal with cases where the load/store types match the alloca type (it explicitly does not support bitcasted load/stores). With opaque pointers this is no longer enforced through the pointer type, so add an explicit check.	2022-02-08 17:16:15 +01:00
Roman Lebedev	42ca7cc889	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ uses If the original invokes had uses, the uses must have been in PHI's, but that immediately results in the incoming values being incompatible. But we'll replace uses of the original invokes with the use of the merged invoke, so as long as the incoming values become compatible after that, we can merge.	2022-02-08 17:49:38 +03:00
Roman Lebedev	9986d60224	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ PHIs but no uses As long as the incoming values for all the invokes in the set are identical, we can merge the invokes.	2022-02-08 17:49:38 +03:00
Roman Lebedev	8411560fd0	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ no uses, no PHI's Even if the invokes have normal destination, iff it's the same block, we can merge them. For now, require that there are no PHI nodes, and the returned values of invokes aren't used.	2022-02-08 17:49:38 +03:00
Nikita Popov	b896334834	[ArgPromotion] Check dereferenceability on argument as well Before walking all the callers, check whether we have a dereferenceable attribute directly on the argument. Also make it clearer that the code currently does not treat alignment correctly.	2022-02-08 10:29:51 +01:00
Johannes Doerfert	dd101c808b	[Attributor][FIX] Do not use assumed information for UB detection The helper `Attributor::checkForAllReturnedValuesAndReturnInsts` simplifies the returned value optimistically. In `AAUndefinedBehavior` we cannot use such optimistic values when deducing UB. As a result, we assumed UB for the return value of a function because we initially (=optimistically) thought the function return is `undef`. While we later adjusted this properly, the `AAUndefinedBehavior` was under the impression the return value is "known" (=fix) and could never change. To correct this we use `Attributor::checkForAllInstructions` and then manually to perform simplification of the return value, only allowing known values to be used. This actually matches the other UB deductions. Fixes #53647	2022-02-07 20:19:19 -06:00
David Green	b4c6d1bb37	[LoopVectorizer] Don't perform interleaving of predicated scalar loops The vectorizer will choose at times to "vectorize" loops with a scalar factor (VF=1) with interleaving (IC > 1). This can occasionally produce better code than the unroller (notable for reductions where it can produce independent reduction chains that are combined after the loop). At times this is not very beneficial though, for example when runtime checks are needed or when the scalar code requires predication. This addresses the second point, preventing the vectorizer from interleaving when the scalar loop will require predication. This prevents it from making a bit of a mess, that is worse than the original and better left for the unroller to unroll if beneficial. It helps reverse some of the regressions from D118090. Differential Revision: https://reviews.llvm.org/D118566	2022-02-07 19:34:28 +00:00
Florian Hahn	5a72357697	[LV] Use IRBuilderBase in VPlan.h, remove IRBuilder.h include (NFC). By using IRBuilderBase instead of IRBuilder<> a forward declaration can be used instead of including IRBuilder.h	2022-02-07 17:46:16 +00:00
Sanjay Patel	897d92faef	[InstCombine] generalize 2 LSB of demanded bits for X*X This is a follow-up suggested in D119060. Instead of checking each of the bottom 2 bits individually, we can check them together and handle the possibility that we demand both together. https://alive2.llvm.org/ce/z/C2ihC2 Differential Revision: https://reviews.llvm.org/D119139	2022-02-07 11:33:55 -05:00
Nikita Popov	cdc0573f75	[MatrixBuilder] Remove unnecessary IRBuilder template (NFC) IRBuilderBase exists specifically to avoid the need for this.	2022-02-07 16:42:38 +01:00
Sanjay Patel	79b3fe8070	[InstCombine] SimplifyDemandedBits - mul(x,x) is odd iff x is odd https://alive2.llvm.org/ce/z/AXPr3k	2022-02-07 08:43:12 -05:00
Roman Lebedev	77a0da926c	[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model. What it essentially does is prevents scalarized vectorization of masked memory operations: ``` // TODO: Cost model for emulated masked load/store is completely // broken. This hack guides the cost model to use an artificially // high enough value to practically disable vectorization with such // operations, except where previously deployed legality hack allowed // using very low cost values. This is to avoid regressions coming simply // from moving "masked load/store" check from legality to cost model. // Masked Load/Gather emulation was previously never allowed. // Limited number of Masked Store/Scatter emulation was allowed. ``` While i don't really understand about what specifically `is completely broken` was talking about, i believe that at least on X86 with AVX2-or-later, this is no longer true. (or at least, i would like to know what is still broken). So i would like to follow suit after D111460, and like wise disable that hack for AVX2+. But since this was added for X86 specifically, let's just instead completely remove this hack. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114779	2022-02-07 16:08:31 +03:00
Djordje Todorovic	afd54e1ed1	[SLPVectorizer] Fix "unused variable" build warning	2022-02-07 10:38:19 +01:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Kazu Hirata	a1a8d10a17	[Transforms] Use default member initialization in LibCallSimplifier (NFC)	2022-02-06 16:36:27 -08:00
Kazu Hirata	3fce5bb7b0	[Transforms] Use default member initialization in LoopVersioning (NFC)	2022-02-06 16:36:25 -08:00
Congzhe Cao	1ef04326ec	[LoopInterchange] Support loop interchange with floating point reductions Enabled loop interchange support for floating point reductions if it is allowed to reorder floating point operations. Previously when we encouter a floating point PHI node in the outer loop exit block, we bailed out since we could not detect floating point reductions in the early days. Now we remove this limiation since we are able to detect floating point reductions. Reviewed By: #loopoptwg, Meinersbur Differential Revision: https://reviews.llvm.org/D117450	2022-02-06 17:04:47 -05:00
Florian Hahn	541ca12dcd	[LV] Use VPReplicateRecipe::isUniform instead isUniformAfterVec (NFCI). In scalarizeInstruction(), isUniformAfterVectorization is used to detect cases where it is sufficient to always access the first lane. This should map directly checking whether the operand is a uniform replicate recipe. Differential Revision: https://reviews.llvm.org/D116654	2022-02-06 16:37:20 +00:00
Kazu Hirata	2d650ee03e	[Transforms] Use default member initialization in SCEVFindUnsafe (NFC)	2022-02-05 21:39:27 -08:00
Kazu Hirata	cb13ebbf46	[Transforms] Use default member initialization in AAIsDeadCallSiteReturned (NFC)	2022-02-05 21:39:25 -08:00
Kazu Hirata	31d72f0e45	[Transforms] Use default member initialization in TruncInstCombine (NFC)	2022-02-05 21:39:23 -08:00
Kazu Hirata	9ed6800ef9	[Transforms] Use default member initialization in MaskOps (NFC)	2022-02-05 21:39:21 -08:00
Kazu Hirata	e24384b506	[Transforms] Use default member initialization in SimplifyIndvar (NFC)	2022-02-05 16:29:22 -08:00
Benjamin Kramer	ce9417348e	[SLP] Skip a DenseSet<unsigned> -> bit vector conversion. NFCI.	2022-02-06 00:57:47 +01:00
Benjamin Kramer	a40dc4eaf8	Simplify mask creation with llvm::seq. NFCI.	2022-02-05 23:35:41 +01:00
Sanjay Patel	5372160a18	[InstCombine] SimplifyDemandedBits - mul(x,x) - if only demand bit[1] then fold to zero This is a translation of the fold added to codegen with: `2d1390efbe` Part of solving issue #48027	2022-02-05 09:51:38 -05:00
Bill Wendling	c6f0940d99	[NFC] Remove unnecessary #includes An attempt to reduce the number of files that are recompiled due to a change. Differential Revision: https://reviews.llvm.org/D119055	2022-02-04 21:22:41 -08:00
Hongtao Yu	dee058c670	[CSSPGO] Turn on ext-tsp by default for CSSPGO. I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119048	2022-02-04 19:46:44 -08:00
Roman Lebedev	18ff1ec3c3	Reland [SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable` As per LangRef's definition of `noreturn` attribute: ``` noreturn This function attribute indicates that the function never returns normally, hence through a return instruction. This produces undefined behavior at runtime if the function ever does dynamically return. nnotated functions may still raise an exception, i.a., nounwind is not implied. ``` So if we `invoke` a `noreturn` function, and the normal destination of an invoke is not an `unreachable`, point it at the new `unreachable` block. The change/fix from the original commit is that we now actually create the new block, and don't just repurpose the original block, because said normal destination block could have other users. This reverts commit `db1176ce66`, relanding commit `598833c987`.	2022-02-05 02:58:19 +03:00
Roman Lebedev	db1176ce66	Revert "[SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable`" The normal destination may have other uses. This reverts commit `598833c987`.	2022-02-05 02:30:20 +03:00
Roman Lebedev	598833c987	[SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable` As per LangRef's definition of `noreturn` attribute: ``` noreturn This function attribute indicates that the function never returns normally, hence through a return instruction. This produces undefined behavior at runtime if the function ever does dynamically return. nnotated functions may still raise an exception, i.a., nounwind is not implied. ```	2022-02-05 02:15:07 +03:00
Roman Lebedev	cd9e6a9c10	[NFC][InstCombine] `visitCallInst()`: make comment more understandable	2022-02-05 02:15:07 +03:00
Joseph Huber	6b78526b1b	[OpenMP] Emit remark on the captured call instead of the variable Changes the remark to emit on the function call that captures the globalized variable instead of the globalized variable itself. The user should be able to see which variable it was in the argument list of the function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106980	2022-02-04 17:50:53 -05:00
Philip Reames	0cc6165d05	[SLP] Strengthen internal asserts about scheduled node state [NFC] All members of a scheduled bundle must have valid dependencies, with no unscheduled ones, and only the lead element gets marked scheduled.	2022-02-04 12:22:52 -08:00
Philip Reames	f3f8e3da9f	[SLP] Remove ScheduleData::UnscheduledDepsInBundle field [NFC-ish] We can simply compute the value of this field on demand. Doing so clarifies the behavior when one of the instructions within a bundle doesn't have valid dependencies. I vaguely thing this could change behavior slightly, but none of the test cases are affected, and my attempts to write one by hand have failed. This also minorly reduces memory usage, but that's a secondary value at best.	2022-02-04 10:12:09 -08:00
Roman Lebedev	55cd727c9a	[SimplifyCFG] 'merge compatible invokes': allow PHI nodes in landing pads ... iff the incoming values for the invokes-to-be-merged are compatible (identical).	2022-02-04 20:26:44 +03:00
Roman Lebedev	0d384e9228	[NFC][SimplifyCFG] Extract `IncomingValuesAreCompatible()` out of `SafeToMergeTerminators()`	2022-02-04 20:26:44 +03:00
Sanjay Patel	0236c57181	[InstCombine] try to fold one-demanded-bit-of-multiply This is a generalization of the icmp fold in D118061 (and that can be abandoned). We're looking for a disguised form of "odd * odd must be odd". Some Alive2 proofs to show correctness: https://alive2.llvm.org/ce/z/60Y8hz https://alive2.llvm.org/ce/z/HfAP6R Differential Revision: https://reviews.llvm.org/D118539	2022-02-04 11:40:54 -05:00
Benjamin Kramer	85243124cf	Tweak some uses of std::iota to skip initializing the underlying storage. NFCI.	2022-02-04 17:00:50 +01:00
Roman Lebedev	36df803dfd	[SimplifyCFG] Merge compatible `invoke`s of a `landingpad` While nowadays SimplifyCFG knows how to hoist code from then-else blocks, sink code from unconditional predecessors, and even promote the latter by tail-merging `ret`/`resume` function terminators, that isn't everything. While i (& others) have been trying to deal with merging/sinking `unreachable`, apparently perhaps the more impactful remaining problem is merging the `throw` calls. If we start at the `landingpad`, all the predecessors are unwind edges of `invoke`s, and in some cases some of the `invoke`s are mergeable. ``` /// This is a weird mix of hoisting and sinking. Visually, it goes from: /// [...] [...] /// \| \| /// [invoke0] [invoke1] /// / \ / \ /// [cont0] [landingpad] [cont1] /// to: /// [...] [...] /// \ / /// [invoke] /// / \ /// [cont] [landingpad] ``` This simplifies the IR/CFG, at the cost of debug info and extra PHI nodes. Note that we don't require for all the `invokes` of the `landingpad` to be mergeable, they can form more than a single set, we gracefully handle that. For now, i completely disallowed normal destination, PHI nodes and indirect invokes but that can be supported. Out of all the CTMark projects, only 7zip is C++, so there isn't much impact: https://llvm-compile-time-tracker.com/compare.php?from=ba8eb31bd9542828f6424e15a3014f80f14522c8&to=722fc871c84f14157d45c2159bc9c8c7e2825785&stat=size-total ... but there it currently causes size-total decrease. Differential Revision: https://reviews.llvm.org/D117805	2022-02-04 17:04:21 +03:00
Florian Hahn	0a781d98fb	[ConstraintElimination] Add initial signed support. This patch adds initial support for signed conditions. To do so, ConstraintElimination maintains two separate systems, one with facts from signed and one for unsigned conditions. To start with this means information from signed and unsigned conditions is kept completely separate. When it is safe to do so, information from signed conditions may be also transferred to the unsigned system and vice versa. That's left for follow-ups. In the initial version, de-composition of signed values just handles constants and otherwise just uses the value, without trying to decompose the operation. Again this can be extended in follow-up changes. The main benefit of this limited signed support is proving >=s 0 pre-conditions added in D118799. But even this initial version also fixes PR53273. Depends on D118799. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D118806	2022-02-04 14:02:48 +00:00
Florian Hahn	06f3ef6626	[ConstraintElimination] Allow adding pre-conditions for constraints. With this patch pre-conditions can be added to a list of constraints. Constraints with pre-conditions can only be used if all pre-conditions are satisfied when the constraint is used. The pre-conditions at the moment are specified as a list of (Predicate, Value ,Value ) tuples. This allow easily checking them like any other condition, using the existing infrastructure. This then is used to limit GEP decomposition to cases where we can prove that offsets are signed positive. This fixes a couple of incorrect transforms where GEP offsets where assumed to be signed positive, but they were not. Note that this effectively disables GEP decomposition, as there's no support for reasoning about signed predicates. D118806 adds initial signed support. Fixes PR49624. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D118799	2022-02-04 11:45:07 +00:00
serge-sans-paille	ffe8720aa0	Reduce dependencies on llvm/BinaryFormat/Dwarf.h This header is very large (3M Lines once expended) and was included in location where dwarf-specific information were not needed. More specifically, this commit suppresses the dependencies on llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used, this has a decent impact on number of preprocessed lines generated during compilation of LLVM, as showcased below. This is achieved by moving some definitions back to the .cpp file, no performance impact implied[0]. As a consequence of that patch, downstream user may need to manually some extra files: llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h In some situations, codes maybe relying on the fact that llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden dependency now needs to be explicit. $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l after: 10978519 before: 11245451 Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup [0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions Differential Revision: https://reviews.llvm.org/D118781	2022-02-04 11:44:03 +01:00
Nikita Popov	c680eeab30	[IRBuilder][RS4GC] Require FunctionCallee when creating statepoint This makes the statepoint methods in IRBuilder accept a FunctionCallee, which carries both the callee and function type. This is used to add the elementtype attribute to the statepoint call. RS4GC requires an additional tweak to actually preserve that attribute -- previously the attributes on the call were completely overwritten. Differential Revision: https://reviews.llvm.org/D118886	2022-02-04 09:47:32 +01:00
Philip Reames	bb9964ba43	[SLP] Have only ready items in ready list [NFC] This adds the assertion that all items in the ready list are in-fact scheduleable entities ready to be scheduled. This involves changing the ReadyInsts structure to be a set, and fixing a couple places where we left nodes on the list when they were no longer ready.	2022-02-03 19:49:24 -08:00
Serguei Katkov	66f1c6fc71	[RS4GC] Extract rematerilazable candidate search. NFC. Finding re-materialization chain for derived pointer does not depend on call site. To avoid this finding for each call site it can be extracted in a separate routine. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D118676	2022-02-04 09:26:03 +07:00
Florian Mayer	374f5f0df4	[hwasan] [nfc] simplify getAllocaSizeInBytes AllocaInst::getAllocationSize implements essentially the same logic as our custom function. Reviewed By: hctim Differential Revision: https://reviews.llvm.org/D118958	2022-02-03 17:59:24 -08:00
Philip Reames	2cbc92fb11	[SLP] Strengthen internal invariant assertions slightly This builds on the invariant checks introduced in `1519629`, and adds a couple more than seem to hold without additional work.	2022-02-03 14:56:39 -08:00
Philip Reames	1519629a20	[SLP] Add basic self consistency asserts into scheduling The idea here is to have a verify routine we can call during scheduling to ensure broken invariants are reported. The intent is to help in debugging scheduling bugs. At the moment, only the most basic properties are checked as adding several I thought held reported failures.	2022-02-03 13:27:35 -08:00
Kazu Hirata	3710078ceb	[SampleProfile] Reduce indentation with an early return (NFC)	2022-02-03 12:22:23 -08:00
Florian Mayer	8ada962a34	[NFC] [hwasan] use InstIterator Differential Revision: https://reviews.llvm.org/D118865	2022-02-03 11:10:18 -08:00
Philip Reames	6d0c007bc1	[SLP] Fix a typo in comment	2022-02-03 09:11:47 -08:00
Sander de Smalen	eaee477eda	[LV] Use VScaleForTuning to allow wider epilogue VFs. When the main loop is e.g. VF=vscale x 1 and the epilogue VF cannot be any smaller, the vectorizer should try to estimate how many lanes are executed at runtime and allow a suitable fixed-width VF to be chosen. It can use VScaleForTuning to figure out what a suitable fixed-width VF could be. For the case where the main loop VF is VF=vscale x 1, and VScaleForTuning=8, it could still choose an epilogue VF upto VF=4. This was a bit tricky to test, so this patch also introduces a wrapper function to get 'VScaleForTuning' by also considering vscale_range. If min and max are equal, then that will be the vscale we compile for. It makes little sense to tune for a different width if the code will not be portable for other widths. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118709	2022-02-03 15:40:17 +00:00
Alexey Bataev	802ceb8343	[SLP]Excluded external uses from the reordering estimation. Compiler adds the estimation for the external uses during operands reordering analysis, which makes it tend to prefer duplicates in the lanes rather than diamond/shuffled match in the graph. It changes the sizes of the vector operands and may prevent some vectorization. We don't need this kind of estimation for the analysis phase, because we just need to choose the most compatible instruction and it does not matter if it has external user or used in the non-matching lane. Instead, we count the number of unique instruction in the lane and see if the reassociation changes the number of unique scalars to be power of 2 or not. If we have power of 2 unique scalars in the lane, it is considered more profitable rather than having non-power-of-2 number of unique scalars. Metric: SLP.NumVectorInstructions test-suite :: MultiSource/Benchmarks/FreeBench/distray/distray.test 70.00 86.00 22.9% test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test 346.00 353.00 2.0% test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test 346.00 353.00 2.0% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 235.00 239.00 1.7% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 235.00 239.00 1.7% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 8723.00 8834.00 1.3% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1051.00 1064.00 1.2% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1628.00 1646.00 1.1% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1628.00 1646.00 1.1% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9100.00 9184.00 0.9% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3565.00 3577.00 0.3% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3565.00 3577.00 0.3% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4235.00 4245.00 0.2% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1996.00 1998.00 0.1% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1671.00 1672.00 0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 783.00 782.00 -0.1% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 69.00 68.00 -1.4% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 207.00 192.00 -7.2% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 207.00 192.00 -7.2% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 89.00 80.00 -10.1% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 89.00 80.00 -10.1% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 260.00 215.00 -17.3% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 256.00 211.00 -17.6% MultiSource/Benchmarks/Prolangs-C/TimberWolfMC - pretty the same. SingleSource/Benchmarks/Misc/oourafft.test - 2 <2 x > loads replaced by one <4 x> load. External/SPEC/CINT2017speed/641.leela_s - function gets vectorized and not inlined anymore. External/SPEC/CINT2017rate/541.leela_r - same xternal/SPEC/CINT2017rate/531.deepsjeng_r - changed the order in multi-block tree, the result is pretty the same. External/SPEC/CINT2017speed/631.deepsjeng_s - same. MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a - the result is the same as before. MultiSource/Benchmarks/MiBench/consumer-jpeg - same. Differential Revision: https://reviews.llvm.org/D116688	2022-02-03 06:50:06 -08:00
Alexey Bataev	ad2a0ccf8f	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-03 06:24:10 -08:00
Simon Pilgrim	6b4ebdd46f	ModuleUtils - VFABI::setVectorVariantNames - use ArrayRef<> instead of const SmallVector to pass argument	2022-02-03 12:11:48 +00:00
Florian Hahn	413e47ecd4	[ConstraintElimination] Handle degenerate case with branch to same dest. When a conditional branch has the same block as both true and false successor it is not safe to add the condition. Fixes PR49819.	2022-02-03 11:09:14 +00:00
Roman Lebedev	ee4ba9f3a1	Revert "[SimplifyCFG] Start redesigning `FoldTwoEntryPHINode()`." Unfortunately, it seems we really do need to take the long route; start from the "merge" block, find (all the) "dispatch" blocks, and deal with each "dispatch" block separately, instead of simply starting from each "dispatch" block like it would logically make sense, otherwise we run into a number of other missing folds around `switch` formation, missing sinking/hoisting and phase ordering. This reverts commit `85628ce75b`. This reverts commit `c5fff90953`. This reverts commit `34a98e1046`. This reverts commit `1e353f0922`.	2022-02-03 12:32:50 +03:00
Florian Mayer	fa75a62cb5	[NFC] pull retvec logic to MemoryTaggingSupport. we will also need this for aarch64 stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118852	2022-02-02 16:05:52 -08:00
Fangrui Song	85628ce75b	[SimplifyCFG] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds	2022-02-02 15:11:22 -08:00
Florian Mayer	f7a6c341cb	[mte] support more complicated lifetimes (e.g. for exceptions). Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118848	2022-02-02 14:39:22 -08:00
Florian Mayer	1d679097da	[NFC] remove excessive whitespace.	2022-02-02 13:35:33 -08:00
Florian Mayer	712b31e2d4	[NFC] factor isStandardLifetime out of HWASan this is so we can use it for aarch64 stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118836	2022-02-02 13:23:55 -08:00
Alexey Bataev	8a1dfbc4d8	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit `842a2360a8` to fix the bugs reported by users in https://reviews.llvm.org/D115955#3291538.	2022-02-02 12:06:36 -08:00
Anna Thomas	a73e4ce6a5	[LoopFuse] Change DT to reference in FusionCandidate struct. NFC Assertion added in `f50821cff0` confirms that the DT is indeed nonnull. Change it to a reference instead of a pointer to make this explicit in FusionCandidate. Suggested in D118472.	2022-02-02 14:55:37 -05:00
Alexey Bataev	842a2360a8	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-02 10:32:52 -08:00
Alexandros Lamprineas	438a81a284	[Function Specialisation] Fix use after free This is a fix for a use-after-free found by the address sanitizer when compiling GCC: https://github.com/llvm/llvm-project/issues/52821 The Function Specialization pass may remove instructions, cached inside the PredicateBase class, which are later being dereferenced from the SCCPInstVisitor class. To prevent the dangling references I am lazily deleting the dead instructions after the Solver has run. Differential Revision: https://reviews.llvm.org/D118591	2022-02-02 16:32:10 +00:00
Roman Lebedev	c5fff90953	[NFC][SimplifyCFG] Merge `FoldTwoEntryPHINode()` into it's only callee	2022-02-02 17:53:56 +03:00
Roman Lebedev	34a98e1046	[NFC][SimplifyCFG] `FoldTwoEntryPHINode()`: s/BB/MergeBB/	2022-02-02 17:53:56 +03:00
Roman Lebedev	1e353f0922	[SimplifyCFG] Start redesigning `FoldTwoEntryPHINode()`. The current `FoldTwoEntryPHINode()` is not quite designed correctly. It starts from the merge point, and then tries to detect the 'divergence' point. Because of that, it is limited to the simple two-predecessor case, where the PHI completely goes away. but that is rather pessimistic, and it doesn't make much sense from the costmodel side of things. For example if there is some other unrelated predecessor of the merge point, we could split the merge point so that the then/else blocks first branch to an empty block and then to the merge point, and then we'd be able to speculate the then/else code. But if we'd instead simply start at the divergence point, and look for the merge point, then we'll just natively support this case. There's also the fact that `SpeculativelyExecuteBB()` already does just that, but only if there is a single block to speculate, and with a much more restrictive cost model. But that also means we have code duplication. Now, sadly, while this is as much NFCI as possible, there is just no way to cleanly migrate to the proper implementation. The results are going to be different somewhat because of various phase ordering effects and SimplifyCFG block iteration strategy.	2022-02-02 17:53:56 +03:00
Benjamin Kramer	0c3d22a592	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit `83620bd2ad`. It's causing miscompilations, see review comments at https://reviews.llvm.org/D115955	2022-02-02 13:08:51 +01:00
Florian Hahn	1c9f15426f	[GVN] Replace PointerIntPair with separate pointer & kind fields (NFC). After adding another value kind in `8a12cae862`, Value * pointers do not have enough available empty bits to store the kind (e.g. on ARM) To address this, the patch replaces the PointerIntPair with separate value and kind fields.	2022-02-02 09:44:15 +00:00
Florian Hahn	8a12cae862	[GVN] Support load of pointer-select to value-select conversion. This patch extends the available-value logic to detect loads of pointer-selects that can be replaced by a value select. For example, consider the code below: loop: %sel.phi = phi i32* [ %start, %ph ], [ %sel, %ph ] %l = load %ptr %l.sel = load %sel.phi %sel = select cond, %ptr, %sel.phi ... exit: %res = load %sel use(%res) The load of the pointer phi can be replaced by a load of the start value outside the loop and a new phi/select chain based on the loaded values, as illustrated below %l.start = load %start loop: sel.phi.prom = phi i32 [ %l.start, %ph ], [ %sel.prom, %ph ] %l = load %ptr %sel.prom = select cond, %l, %sel.phi.prom ... exit: use(%sel.prom) This is a first step towards alllowing vectorizing loops using common libc++ library functions, like std::min_element (https://clang.godbolt.org/z/6czGzzqbs) #include <vector> #include <algorithm> int foo(const std::vector<int> &V) { return *std::min_element(V.begin(), V.end()); } Reviewed By: reames Differential Revision: https://reviews.llvm.org/D118143	2022-02-02 09:23:09 +00:00
serge-sans-paille	e188aae406	Cleanup header dependencies in LLVMCore Based on the output of include-what-you-use. This is a big chunk of changes. It is very likely to break downstream code unless they took a lot of care in avoiding hidden ehader dependencies, something the LLVM codebase doesn't do that well :-/ I've tried to summarize the biggest change below: - llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h - llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h - llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h - llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h - llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h - llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h - llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h And the usual count of preprocessed lines: $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l before: 6400831 after: 6189948 200k lines less to process is no that bad ;-) Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D118652	2022-02-02 06:54:20 +01:00
Sander de Smalen	2a44eaf20f	[LV] Allow a scalable VF for the epilogue. For some reason we limited the epilogue VF to be fixed-width, but there is not necessarily a reason for doing so. If the main VF=vscale x 16, the epilogue VF could be either fixed-width, or a scalable VF upto vscale x 8. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118688	2022-02-01 22:38:55 +00:00
Anna Thomas	f50821cff0	[LoopFuse] Add assertion for non-null DT in fusion candidate The code paths analyzed (all constructor invocations of fusion candidate) pass in a non-null DT. Adding this assert as requested in D118472 before converting this to a reference argument.	2022-02-01 17:00:09 -05:00
Anna Thomas	bc48a26655	[LoopPeel] Use reference instead of pointer for DT argument Cleanup code in peelLoop API. We already have usage of DT without guarding against a null DT, so this change constant folds the remaining null DT checks. Also make the argument a reference so that it is clear the argument is a nonnull DT. Extracted from D118472.	2022-02-01 17:00:08 -05:00
Florian Mayer	aefb2e134d	[hwasan] work around lifetime issue with setjmp. setjmp can return twice, but PostDominatorTree is unaware of this. as such, it overestimates postdominance, leaving some cases (see attached compiler-rt) where memory does not get untagged on return. this causes false positives later in the program execution. this is a crude workaround to unblock use-after-scope for now, in the longer term PostDominatorTree should bemade aware of returns_twice function, as this may cause problems elsewhere. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118647	2022-02-01 12:14:20 -08:00
Matt Morehouse	de4e8bc3ac	[HWASan] Properly handle musttail calls. Fixes a compile error when the `clang::musttail` attribute is used. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118712	2022-02-01 11:23:43 -08:00
Anna Thomas	4fc52db116	[InstCombine] Remove weaker fence adjacent to a stronger fence We have an instCombine rule to remove identical consecutive fences. We can extend this to remove weaker fences when we have consecutive stronger fence. As stated in the LangRef, a fence with a stronger ordering also implies ordering weaker than itself: "A fence which has seq_cst ordering, in addition to having both acquire and release semantics specified above, participates in the global program order of other seq_cst operations and/or fences." Reviewed-By: reames Differential Revision: https://reviews.llvm.org/D118607	2022-02-01 11:05:34 -08:00
Fangrui Song	30e8f83c84	[GlobalOpt] Don't replace alias with aliasee if either alias/aliasee may be preemptible Generalize D99629 for ELF. A default visibility non-local symbol is preemptible in a -shared link. `isInterposable` is an insufficient condition. Moreover, a non-preemptible alias may be referenced in a sub constant expression which intends to lower to a PC-relative relocation. Replacing the alias with a preemptible aliasee may introduce a linker error. Respect dso_preemptable and suppress optimization to fix the abose issues. With the change, `alias = 345` will not be rewritten to use aliasee in a `-fpic` compile. ``` int aliasee; extern int alias __attribute__((alias("aliasee"), visibility("hidden"))); void foo() { alias = 345; } // intended to access the local copy ``` While here, refine the condition for the alias as well. For some binary formats like COFF, `isInterposable` is a sufficient condition. But I think canonicalization for the changed case has little advantage, so I don't bother to add the `Triple(M.getTargetTriple()).isOSBinFormatELF()` or `getPICLevel/getPIELevel` complexity. For instrumentations, it's recommended not to create aliases that refer to globals that have a weak linkage or is preemptible. However, the following is supported and the IR needs to handle such cases. ``` int aliasee __attribute__((weak)); extern int alias __attribute__((alias("aliasee"))); ``` There are other places where GlobalAlias isInterposable usage may need to be fixed. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D107249	2022-02-01 10:41:16 -08:00
Alexey Bataev	83620bd2ad	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-01 09:54:20 -08:00
Olle Fredriksson	9d555b4a83	[DFAJumpThreading] make update order deterministic We tracked down some non-determinism in compilation output to the DFAJumpThreading pass. These changes fixed our issue: * Make the DefMap type a MapVector to make its iteration order depend on insertion order. * Sort the values to be inserted into NewDefs by instruction order to make the insertion order deterministic. Since these values come from iterating over a ValueMap, which doesn't have deterministic iteration order, I couldn't fix this at its source. Reviewed By: alexey.zhikhar Differential Revision: https://reviews.llvm.org/D118590	2022-02-01 11:02:58 -05:00
Nikita Popov	1652c3b80c	[GlobalOpt] Avoid early exit before dead constant check In a similar vein to `236fbf571d`, make sure we don't early-exit before the dead constant check.	2022-02-01 15:57:19 +01:00
Nikita Popov	236fbf571d	[GlobalStatus] Skip non-pointer dead constant users Constant expressions with a non-pointer result type used an early exit that bypassed the later dead constant user check, and resulted in different optimization outcomes depending on whether dead users were present or not. This fixes the issue reported in https://reviews.llvm.org/D117223#3287039.	2022-02-01 15:51:32 +01:00
Benjamin Kramer	5281f0dab2	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit `afaaecc88c`. Crashes when compiling SciPy, test case https://reviews.llvm.org/P8276	2022-02-01 11:40:43 +01:00
Florian Hahn	7fe4fa9a0a	[LV] Use onlyFirstLaneDemanded when widening pointer phis (NFCI). This removes another instance of recipe execution still relying on the cost model. Depends on D116554. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D116656	2022-02-01 09:50:47 +00:00
Jay Foad	d2e5d3512b	[StructurizeCFG] Clean up some boolean not instructions In some cases StructurizeCFG inserts i1 xor instructions to invert predicates. Add a quick loop to clean these up afterwards if we can get away with modifying an existing compare instruction instead. (StructurizeCFG is generally run late in the pipeline so instcombine does not clean them up for us.) Differential Revision: https://reviews.llvm.org/D118623	2022-02-01 09:35:37 +00:00
Nikita Popov	79179a378b	[ArgPromotion] Use range-based for loop (NFC)	2022-02-01 10:34:14 +01:00
Johannes Doerfert	3b8ffe668d	[Attributor][FIX] Relax assertion in IRPosition::verify A call base can be a floating value if we talk about the instruction and not the return value. This distinction was not made before but is important for liveness, e.g., a call site return value might be unused (=dead) but the call site is not.	2022-02-01 02:25:44 -06:00
Johannes Doerfert	a265cf22af	[Attributor] Introduce the `AA::isPotentiallyReachable` helper APIs To make usage easier (compared to the many reachability related AAs), this patch introduces a helper API, `AA::isPotentiallyReachable`, which performs all the necessary steps. It also does the "backwards" reachability (see D106720) as that simplifies the AA a lot (backwards queries were somewhat different from the other query resolvers), and ensures we use cached values in every stage. To test inter-procedural reachability in a reasonable way this patch includes an extension to `AAPointerInfo::forallInterferingWrites`. Basically, we can exclude writes if they cannot reach a load "during the lifetime" of the allocation. That is, we need to go up the call graph to determine reachability until we can determine the allocation would be dead in the caller. This leads to new constant propagations (through memory) in `value-simplify-pointer-info-gpu.ll`. Note: The new code contains plenty debug output to determine how reachability queries are resolved. Parts extracted from D110078. Differential Revision: https://reviews.llvm.org/D118673	2022-02-01 01:40:45 -06:00
Johannes Doerfert	b51b83f68e	[Attributor] Introduce the concept of query AAs D106720 introduced features that did not work properly as we could add new queries after a fixpoint was reached and which could not be answered by the information gathered up to the fixpoint alone. As an alternative to D110078, which forced eager computation where we want to continue to be lazy, this patch fixes the problem. QueryAAs are AAs that allow lazy queries during their lifetime. They are never fixed if they have no outstanding dependences and always run as part of the updates in an iteration. To determine if we are done, all query AAs are asked if they received new queries, if not, we only need to consider updated AAs, as before. If new queries are present we go for another iteration. Differential Revision: https://reviews.llvm.org/D118669	2022-02-01 01:40:44 -06:00
Kuter Dinel	b2d1ae0611	[Attributor] AAFunctionReachability, Instruction reachability. This patch implement instruction reachability for AAFunctionReachability attribute. It is used to tell if a certain instruction can reach a function transitively. NOTE: I created a new commit based of D106720 and set the author back to Kuter. Other metadata, etc. is wrong. I also addressed the remaining review comments and fixed the unit test. Differential Revision: https://reviews.llvm.org/D106720	2022-02-01 01:40:44 -06:00
Johannes Doerfert	ac3ec22df9	[Attributor] Use AAFunctionReachability to determine AANoRecurse We missed out on AANoRecurse in the module pass because we had no call graph. With AAFunctionReachability we can simply ask if the function may reach itself. Differential Revision: https://reviews.llvm.org/D110099	2022-02-01 01:40:44 -06:00
Johannes Doerfert	d1186ce7a9	[Attributor] Make interprocedural value explicit in genericValueTraversal genericValueTraversal can look through arguments and allow value simplification across function boundaries. In fact, the latter already happened unchecked. With this change we allow the user of genericValueTraversal to opt-out of interprocedural traversal if required. We explicitly look through arguments now which helps to do various things, incl. the propagation of constants into OpenMP parallel regions (on the host).	2022-02-01 01:40:44 -06:00
Johannes Doerfert	a1db0e523d	[Attributor][FIX] Liveness handling in the isAssumedDead helpers This fixes a conceptual problem with our AAIsDead usage which conflated call site liveness with call site return value liveness. Without the fix tests would obviously miscompile as we make genericValueTraversal more powerful (in a follow up). The effects on the tests are mixed but mostly marginal. The most prominent one is the lack of `noreturn` for functions. The reason is that we make entire blocks live at the same time (for time reasons). Now that we actually look at the block liveness, which we need to do, the return instructions are live and will survive. As an example, `noreturn_async.ll` has been modified to retain the `noreturn` even with block granularity. We could address this easily but there is little need in practice.	2022-02-01 01:18:52 -06:00
Johannes Doerfert	0f471710f8	[Attributor] Use edge liveness rather than block liveness We moved to the edge API a while back, not all uses were adjusted. Edge liveness is more precise.	2022-02-01 01:18:51 -06:00
Johannes Doerfert	53b6753bdd	[Attributor][FIX] Address two oversights in AAIsDead No tests as these were found browsing the code and I'm not sure how to test them properly.	2022-02-01 01:18:51 -06:00
Johannes Doerfert	cfabffb034	[Attributor][NFCI] Improve debug diagnostic	2022-02-01 01:18:51 -06:00
Johannes Doerfert	adf0d57f15	[Attributor] Provide convenient helpers for isAssumedRead{None,Only} We have two attributes that can answer readnone queries. While there is a dependence between them, it seems best to not force the users to know what AA to ask. The helpers also allow to check for readonly nicely. Test changes show where we now deduce readnone but haven't before, mostly because we only asked AAMemoryBehavior and not AAMemoryLocation. AANoAlias has not been ported to the new API yet.	2022-02-01 01:18:51 -06:00
Johannes Doerfert	e140d51319	[Attributor] Use CFG reasoning to filter potentially interfering writes Since D104432 we can look through memory by analyzing all writes that might interfere with a load. This patch provides some logic to exclude writes that cannot interfere with a location, due to CFG reasoning. We make sure to avoid multi-thread write-read situations properly while we ignore writes that cannot reach a load or writes that will be overwritten before the load is reached. Differential Revision: https://reviews.llvm.org/D106397	2022-02-01 01:18:51 -06:00
Johannes Doerfert	191fa419a6	[Attributor][NFC] Make debug output more useful and concise	2022-02-01 01:18:51 -06:00
Johannes Doerfert	3f0e670498	[Attributor][NFCI] Expose some nosync reasoning to outside users. No-sync is a property that we need in more places as complex transformations emerge. To simplify the query we provide an `AA::isNoSyncInst` helper now and expose two existing helpers through the `AANoSync` class.	2022-02-01 01:07:50 -06:00
Johannes Doerfert	a5b6aef24e	[Attributor][NFCI] Remove anonymous namespaces The namespaces made it more complicate to implement static helpers, among other things. We should not need them at all.	2022-02-01 01:07:50 -06:00
Johannes Doerfert	3c8a4c6f47	[OpenMP] Eliminate redundant barriers in the same block Patch originally by Giorgis Georgakoudis (@ggeorgakoudis), typos and bugs introduced later by me. This patch allows us to remove redundant barriers if they are part of a "consecutive" pair of barriers in a basic block with no impacted memory effect (read or write) in-between them. Memory accesses to local (=thread private) or constant memory are allowed to appear. Technically we could also allow any other memory that is not used to share information between threads, e.g., the result of a malloc that is also not captured. However, it will be easier to do more reasoning once the code is put into an AA. That will also allow us to look through phis/selects reasonably. At that point we should also deal with calls, barriers in different blocks, and other complexities. Differential Revision: https://reviews.llvm.org/D118002	2022-02-01 01:07:50 -06:00
Johannes Doerfert	989674f110	[OpenMP] Ensure to remove noinline from all runtime functions eventually We used to remove noinline from known OpenMP runtime functions (which are declared in OMPKinds.td). Now we remove noinline from all functions with the proper prefixes: __kmpc, _ZN4_OMP (= namespace omp), omp_	2022-02-01 01:07:50 -06:00
Serguei Katkov	28c5e1b760	[RS4GC] Make PointerToBase mapping be independent on call site. NFC. PointerToBase is a mapping between potentially derived pointer to its base. As soon as we are in SSA form if there is a base of derived pointer and it is available at def of derived pointer, the same base will be available at any point where derived pointer is alive. So the mapping of derived pointer to base pointer is not a property of a call site but the same on function level. Reviewers: reames, yrouban Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D118604	2022-02-01 11:47:36 +07:00
Fangrui Song	7aaf024dac	[BitcodeWriter] Fix cases of some functions `WriteIndexToFile` is used by external projects so I do not touch it.	2022-01-31 16:46:11 -08:00
Fangrui Song	85dfe19b36	[ModuleUtils] Move EmbedBufferInModule to LLVMTransformsUtils D116542 adds EmbedBufferInModule which introduces a layer violation (https://llvm.org/docs/CodingStandards.html#library-layering). See `2d5f857a1e` for detail. EmbedBufferInModule does not use BitcodeWriter functionality and should be moved LLVMTransformsUtils. While here, change the function case to the prevailing convention. It seems that EmbedBufferInModule just follows the steps of EmbedBitcodeInModule. EmbedBitcodeInModule calls WriteBitcodeToFile but has IR update operations which ideally should be refactored to another library. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D118666	2022-01-31 16:33:57 -08:00
Kirill Stoimenov	a5dd6c7419	[ASan] Fixed null pointer bug introduced in D112098. Also added some more test to cover the "else if" part. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118645	2022-01-31 21:50:10 +00:00
William S. Moses	8cb9c73609	[LoopIdiom] Keep TBAA when creating memcpy/memmove When upgrading a loop of load/store to a memcpy, the existing pass does not keep existing aliasing information. This patch allows existing aliasing information to be kept. Reviewed By: jeroen.dobbelaere Differential Revision: https://reviews.llvm.org/D108221	2022-01-31 16:28:13 -05:00
Alexey Bataev	afaaecc88c	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-01-31 11:11:25 -08:00
Jay Foad	8faad29634	Revert "[Local] invertCondition: try modifying an existing ICmpInst" This reverts commit `a6b54ddaba`. Apparently it is not safe to modify the condition even if it passes the hasOneUse test, because StructurizeCFG might have other references to the condition that are not manifest in the IR use-def chains.	2022-01-31 14:55:36 +00:00
Jay Foad	a6b54ddaba	[Local] invertCondition: try modifying an existing ICmpInst This avoids various cases where StructurizeCFG would otherwise insert an xor i1 instruction, and it since it generally runs late in the pipeline, instcombine does not clean up the xor-of-cmp pattern. Differential Revision: https://reviews.llvm.org/D118478	2022-01-31 10:44:17 +00:00
Nikita Popov	4810051a82	[Inline][Cloning] Reliably remove unreachable blocks during cloning (PR53206) The pruning cloner already tries to remove unreachable blocks. The original cloning process will simplify instructions and constant terminators, and only clone blocks that are reachable at that point. However, phi nodes can only be simplified after everything has been cloned. For that reason, additional blocks may become unreachable after phi simplification. The code does try to handle this as well, but only removes blocks that don't have predecessors. It misses unreachable cycles. This can cause issues if SEH exception handling code is part of an unreachable cycle, as the inliner is not prepared to deal with that. This patch instead performs an explicit scan for reachable blocks, and drops everything else. Fixes https://github.com/llvm/llvm-project/issues/53206. Differential Revision: https://reviews.llvm.org/D118449	2022-01-31 09:31:34 +01:00
Max Kazantsev	70b3beb0e2	[InstCombine] Generalize and-reduce pattern to handle `ne` case as well as `eq` Following Sanjay's proposal from discussion in D118317, this patch generalizes and-reduce handling to fold the following pattern ``` icmp ne (bitcast(icmp ne (lhs, rhs)), 0) ``` into ``` icmp ne (bitcast(lhs), bitcast(rhs)) ``` https://alive2.llvm.org/ce/z/WDcuJ_ Differential Revision: https://reviews.llvm.org/D118431 Reviewed By: lebedev.ri	2022-01-31 12:14:08 +07:00
Ricky Zhou	30ac5f9e64	[InstCombine] Do not combine atomic and non-atomic loads Before this change, InstCombine was willing to fold atomic and non-atomic loads through a PHI node as long as the first PHI argument is not an atomic load. The combined load would be non-atomic, which is incorrect. Fix this by only combining the loads in a PHI node when all of the arguments are non-atomic loads. Thanks to Eli Friedman for pointing out the bug at https://github.com/llvm/llvm-project/issues/50777#issuecomment-981045342! Fixes #50777 Differential Revision: https://reviews.llvm.org/D115113	2022-01-30 10:05:11 -05:00
Ricky Zhou	de80b53d1a	[InstCombine] Use range for loops (NFC) Preliminary clean-up for D115113 Differential Revision: https://reviews.llvm.org/D116086	2022-01-30 09:10:39 -05:00
Ricky Zhou	4aabed05a8	[InstCombine] Uppercase some variable names (NFC) Uppercase some variable names, per LLVM coding standards. This change intentionally does not rename every miscased variable, as a follow-up change ( D116086 ) intends to eliminate many of those by switching loops to range for loops. Differential Revision: https://reviews.llvm.org/D118553	2022-01-30 09:10:39 -05:00
Florian Hahn	8f12175fed	[VPlan] Use VPlan to check if only the first lane is used. This removes the remaining dependence on LoopVectorizationCostModel from buildScalarSteps and is required so it can be moved out of ILV. It also improves allows us to remove a few unneeded instructions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116554	2022-01-30 13:07:29 +00:00
Nuno Lopes	dd995aceda	[InstCombine] remove incorrect gep(x, undef) -> undef optimization gep(x, undef) carries the provenance of x, so we can't replace it with any pointer like undef. This leaves room for improvement for the poison case, but that's currently not possible as the demanded bits API doesn't distinguish between undef & poison bits. Fixes #44790	2022-01-30 11:34:32 +00:00
Nuno Lopes	f1c18acb07	[NewGVN] do phi(undef, x) -> x only if x is not poison phi([undef, A], [x, B]) -> x is only correct x is guaranteed to be a non-poison value. Otherwise we would be changing an undef to poison in the branch A. Differential Revision: https://reviews.llvm.org/D117907	2022-01-29 21:43:57 +00:00
Florian Hahn	efd4938723	[VPlan] Handle IV vector splat using VPWidenCanonicalIV. This patch tries to use an existing VPWidenCanonicalIVRecipe instead of creating another step-vector for canonical induction recipes in widenIntOrFpInduction. This has the following benefits: 1. First step to avoid setting both vector and scalar values for the same induction def. 2. Reducing complexity of widenIntOrFpInduction through making things more explicit in VPlan 3. Only need to splat the vector IV for block in masks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116123	2022-01-29 16:25:27 +00:00
Max Kazantsev	3b194ca7ab	Recommit "[InstCombine] Fold and-reduce idiom" Checks of original vector types made more thorough. Differential Revision: https://reviews.llvm.org/D118317	2022-01-29 11:27:48 +07:00
Philip Reames	6888081e32	[SLP] Use moveBefore to simplify code [NFC]	2022-01-28 12:44:07 -08:00
Ahmed Bougacha	634ca7349d	[ObjCARC] Require the function argument in the clang.arc.attachedcall bundle. Currently, the clang.arc.attachedcall bundle takes an optional function argument. Depending on whether the argument is present, calls with this bundle have the following semantics: - on x86, with the argument present, the call is lowered to: call _target mov rax, rdi call _objc_retainAutoreleasedReturnValue - on AArch64, without the argument, the call is lowered to: bl _target mov x29, x29 and the objc runtime call is expected to be emitted separately. That's because, on x86, the objc runtime checks for both the mov and the call on x86, and treats the combination as the ARC autorelease elision marker. But on AArch64, it only checks for the dedicated NOP marker, as that's historically been sufficiently unique. Thanks to that, the runtime call wasn't required to be adjacent to the NOP marker, so it wasn't emitted as part of the bundle sequence. This patch unifies both architectures: on AArch64, we now emit all 3 instructions for the bundle. This guarantees that the runtime call is adjacent to the marker in the sequence, and that's information the runtime can use to further optimize this. This helps simplify some of the handling, in particular BundledRetainClaimRVs, which no longer needs to know whether the bundle is sufficient or not: it now always should be. Note that this does not include an AutoUpgrade for the nullary bundles, as they are only produced in ObjCContract as part of the obj/asm emission pipeline, and are not expected to be in bitcode. Differential Revision: https://reviews.llvm.org/D118214	2022-01-28 12:41:45 -08:00
Philip Reames	746e435ff7	Revert "[SLP] Add a clarifying assert in block scheduling [NFC]" This reverts commit `db49a78900`. The reasoning in the patch applied to a downstream branch, and I got myself confused when trying to split apart pieces. Thankfully, the assert was simply weaker than the actual invariant currently upstream which is that ReadyInsts is not empty.	2022-01-28 12:10:31 -08:00
Andrew Litteken	3785c1d055	[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match. This also adds an additional command line flag debug option to disable outlining intrinsics. Recommit of: `8de76bd569` Adds extra checking of intrinsic function calls names to avoid taking the address of intrinsic calls when extracting function calls. Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D109450	2022-01-28 13:52:21 -06:00
Philip Reames	db49a78900	[SLP] Add a clarifying assert in block scheduling [NFC] The fact we could have a block with a valid scheduling window, but nothing to schedule was surprising to me. After digging through the code, this can only happen if we don't find anything to directly vectorize. However, the reduction handling code relies on this mode, so we can't simply consider such trees unvectorizeable. The assert conveys both that this situation can happen, but also that it can only happen for an immediate gather. Context: We built the bundle before deciding that vectorization of a bundle is possible. A side effect of bundle construction is manipulating the scheduling window, so a bundle which isn't vectorizable can cause the creation or expansion of a scheduling window.	2022-01-28 11:08:59 -08:00
Ellis Hoag	eea002a9c4	[InstrProf][NFC] Move function out of InstrProf.h `createIRLevelProfileFlagVar()` seems to be only used in `PGOInstrumentation.cpp` so we move it to that file. Then it can also take advantage of directly using options rather than passing them as arguments. Reviewed By: kyulee, phosek Differential Revision: https://reviews.llvm.org/D118097	2022-01-28 09:24:26 -08:00
Alexey Bataev	cec8b614f3	[SLP]Do not reorder top nodes if they do not require reordering. No need to reorder the top nodes, if they are not stores or insertelement instructions and each node should be analized only once, when the bottom-to-top analysis is performed. We still endup with extractelements for the top node scalars and the final shuffle just adds an extra cost and currently crashes the compiler for PHI nodes. Differential Revision: https://reviews.llvm.org/D116760	2022-01-28 09:16:18 -08:00
Nikita Popov	7d176844d0	[CodeExtractor] Fix warning in assert (NFC)	2022-01-28 16:33:34 +01:00
Nikita Popov	cf0357a545	[BasicBlockUtils] Fix typo in API name (NFC) detatch -> detach. As this requires touching all uses, also lower-case it in accordance with the style guide.	2022-01-28 16:32:13 +01:00
Nikita Popov	0ebbf3435f	[ArgPromotion] Don't assume all entry block instrs are executed We should abort this walk if we hit any instruction that is not guaranteed to transfer.	2022-01-28 16:08:42 +01:00
Nikita Popov	8b36c437df	[ArgPromotion] Make areFunctionArgsABICompatible() static (NFC) This function used to be shared with the Attributor, but can now be made private.	2022-01-28 15:26:36 +01:00
Hans Wennborg	fabaca10b8	Revert "[InstCombine] Fold and-reduce idiom" It causes builds to fail with llvm/include/llvm/Support/Casting.h:269: typename llvm::cast_retty<X, Y>::ret_type llvm::cast(Y) [with X = llvm::IntegerType; Y = const llvm::Type; typename llvm::cast_retty<X, Y>::ret_type = const llvm::IntegerType]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed. See the code review for link to a reproducer. > This patch introduces folding of and-reduce idiom and generates code > that is easier to read and which is lest costly in terms of icmp operations. > The folding is > ``` > icmp eq (bitcast(icmp ne (lhs, rhs)), 0) > ``` > into > ``` > icmp eq(bitcast(lhs), bitcast(rhs)) > ``` > > See PR53419. > > Differential Revision: https://reviews.llvm.org/D118317 > Reviewed By: lebedev.ri, spatel This reverts commit `8599bb0f26`. This also revertes the dependent change: "[Test] Add 'ne' tests for and-reduce pattern folding" This reverts commit `a4aaa59953`.	2022-01-28 12:16:03 +01:00
Florian Hahn	b339bbdb19	[Matrix] Use ArrayType for allocas instead of VectorType. When creating an alloca to copy a matrix due to memory conflicts, those allocas used to use VectorTypes, which forced them to have huge alignments for large vectors. This patch updates LowerMatrixIntrinsics to use a corresponding array type, like Clang already does, to get more manageable alignments. Reviewed By: anemet, thegameg Differential Revision: https://reviews.llvm.org/D118239	2022-01-28 10:47:52 +00:00
Nikita Popov	91e5096d82	[InlineFunction] Use phis() iterator (NFC)	2022-01-28 10:36:28 +01:00
Florian Hahn	96400f179f	[VPlan] Record whether scalar IVs are need in induction recipe. (NFC) This explicitly records whether a scalar IV is needed in the VPWidenIntOrFpInductionRecipe, to remove a dependence on the cost-model during its ::execute. It will also be used in D116123 to determine if a vector phi will be generated. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D118167	2022-01-28 09:34:03 +00:00
Max Kazantsev	8599bb0f26	[InstCombine] Fold and-reduce idiom This patch introduces folding of and-reduce idiom and generates code that is easier to read and which is lest costly in terms of icmp operations. The folding is ``` icmp eq (bitcast(icmp ne (lhs, rhs)), 0) ``` into ``` icmp eq(bitcast(lhs), bitcast(rhs)) ``` See PR53419. Differential Revision: https://reviews.llvm.org/D118317 Reviewed By: lebedev.ri, spatel	2022-01-28 11:20:08 +07:00
Ellis Hoag	11d3074267	[InstrProf] Add single byte coverage mode Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because * We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions * We use a single byte per function rather than 8 bytes per block The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code. When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%). [0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D116180	2022-01-27 17:38:55 -08:00
Vitaly Buka	bddc814b44	[msan] Copy origin of byval arguments Depends on D117278 Reviewed By: kda, eugenis Differential Revision: https://reviews.llvm.org/D117285	2022-01-27 16:24:07 -08:00
Florian Hahn	9fd7a2e379	[ConstraintElimination] Use constraints with 0 or 1 coefficients. isConditionImplied is able to correctly handle 0 or 1 coefficients, so let it handle those cases, rather than skipping them.	2022-01-27 18:41:33 +00:00
Florian Hahn	258a0a3a55	[ConstraintElimination] Use simplified constraint for == 0. When checking x == 0, checking x u<= 0 is sufficient and simpler than x u>= 0 && x u<= 0. https://alive2.llvm.org/ce/z/btM7d3 ---------------------------------------- define i1 @src(i4 %a) { %0: %c = icmp eq i4 %a, 0 ret i1 %c } => define i1 @tgt(i4 %a) { %0: %c = icmp ule i4 %a, 0 ret i1 %c } Transformation seems to be correct!	2022-01-27 13:31:23 +00:00
Florian Hahn	a78ce48c37	[ConstraintElimination] Introduce struct to manage constraints. (NFC) This patch adds a struct to manage a list of constraints. It simplifies a follow-up change, that adds pre-conditions that must hold before a list of constraints can be used.	2022-01-27 12:40:09 +00:00
Nikita Popov	d839afe3f9	[InstCombine] Avoid pointer element type access in PointerReplacer This code replaces the address space of the pointers while keeping the element type. Use the appropriate helpers to make this work with opaque pointers.	2022-01-27 12:28:32 +01:00
Nikita Popov	648faa3b5d	[InstCombine] Mark element type access as non-opaque (NFC) Also make the function static to make it more obvious that it is only used in the one place.	2022-01-27 11:40:29 +01:00
Florian Hahn	bb5c1b0691	[LoopVersioning] Use IRBuilder for OR simplification.	2022-01-27 09:55:51 +00:00
Nikita Popov	2c736f666b	[InstCombine] Skip GEP of bitcast transform with opaque pointers This transform is fundamentally incompatible with opaque pointers. Usually we would not hit it anyway because the bitcast is folded away earlier, but due to worklist order it might survive until here, so make sure we bail out explicitly.	2022-01-27 10:51:45 +01:00
Nikita Popov	b7179d9279	[InstCombine] Extract GEP of bitcast folds into separate function (NFC)	2022-01-27 10:48:00 +01:00
Nikita Popov	73cd8e29ad	[InstCombine] Skip PromoteCastOfAllocation() transform under opaque pointers I think this can't be hit anyway (because a ptr-to-ptr bitcast would get folded earlier), but in the interest of being explicit skip this transform for opaque pointers entirely.	2022-01-27 10:25:45 +01:00
Nikita Popov	8d992862a0	[InstCombine] Remove some pointer element type accesses One of these is guarded against opaque pointers, and the others were accessing the call function type in a rather convoluted way.	2022-01-27 10:15:35 +01:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
Nikita Popov	de8867a0b6	[AMDGPUEmitPrintf] Don't require specific pointer element type Rather than checking for i8, simply add a bitcast to i8, so the appendString() code sees the expected type.	2022-01-26 16:16:32 +01:00
Nikita Popov	903c3d2863	[SCEVExpander] Always use i8 GEP for reused value offset We could keep the non-i8 GEP code for non-opaque pointers, but there's two reasons I'm dropping it: First, this actually appears to be dead code, at least it isn't hit in any of our tests. I expect that this is because we usually expand trip counts, and those are never pointers (anymore). Second, the non-i8 GEP was actually incorrect in multiple ways, because it used SCEV type sizes, which don't match DL type sizes (for pointers) and certainly don't match type alloc sizes (which is what GEPs actually use). As such, I'm simplifying the code to always use the i8 GEP code path if it does get hit.	2022-01-26 15:38:58 +01:00
Nikita Popov	03d0acc545	[DSE] Use helper for unwind check (NFCI) This should be no functional change, as the cases supported by the helper and the cases supported by DSE are currently the same, the code structure is just slightly different.	2022-01-26 14:08:08 +01:00
Nikita Popov	6b69985da4	[MemCpyOpt] Use helper for unwind check This extends support to byval arguments. It would be further extended to handle the case of non-captured noalias returns.	2022-01-26 12:43:31 +01:00
Benjamin Kramer	0776f6e04d	[LSV] Vectorize loads of vectors by turning it into a larger vector Use shufflevector to do the subvector extracts. This allows a lot more load merging on AMDGPU and also on NVPTX when <2 x half> is involved. Differential Revision: https://reviews.llvm.org/D117219	2022-01-26 11:38:41 +01:00
Nuno Lopes	24a49e99f3	[NewGVN] FIx phi-of-ops in the presence of memory read operations The phi-of-ops functionality has a function OpIsSafeForPHIOfOps to determine when it's safe to create the new phi. But this function only checks for the obvious dominator conditions and ignores memory. This patch takes the conservative approach and disables phi-of-ops whenever there's a load that doesn't dominate the phi, as its value may be affected by a store inside the loop. This can be improved later to check aliasing between the load/stores. Fixes https://llvm.org/PR53277 Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D117999	2022-01-26 10:19:18 +00:00
Nikita Popov	44cfc3a816	[LICM] Generalize unwinding check during scalar promotion This extract a common isNotVisibleOnUnwind() helper into AliasAnalysis, which handles allocas, byval arguments and noalias calls. After D116998 this could also handle sret arguments. We have similar logic in DSE and MemCpyOpt, which will be switched to use this helper as well. The noalias call case is a bit different from the others, because it also requires that the object is not captured. The caller is responsible for doing the appropriate check. Differential Revision: https://reviews.llvm.org/D117000	2022-01-26 11:15:03 +01:00
Nikita Popov	bec4e865de	[SCEVExpander] Remove pointer element type access in assertion Assert directly on i8 rather than the element type of i8*.	2022-01-26 10:35:57 +01:00
Nikita Popov	9e7a2bfcf7	[OpenMPOpt] Add const qualifier (NFC) Make it clear that this large lambda does not modify the vector.	2022-01-26 10:35:57 +01:00
Nikita Popov	c82cb5d000	[AddressSanitizer] Avoid pointer element type accesses Determine masked load/store type based on the value operand and result types, rather than pointer element type.	2022-01-26 10:16:15 +01:00
Giorgis Georgakoudis	7cb4c26173	[OMPIRBuilder] Generate aggregate argument for parallel region outlined functions Summary: This patch modifies code generation in OpenMPIRBuilder to pass arguments to the parallel region outlined function in an aggregate (struct), besides the global_tid and bound_tid arguments. It depends on the updated CodeExtractor (see D96854) for support. It mirrors functionality of Clang codegen (see D102107). Differential Revision: https://reviews.llvm.org/D110114	2022-01-25 20:53:45 -05:00
Giorgis Georgakoudis	95b981ca2a	[CodeExtractor] Enable partial aggregate arguments Summary: Enable CodeExtractor to construct output functions that partially aggregate inputs/outputs in their argument list. A use case is the OMPIRBuilder to create outlined functions for parallel regions that aggregate in a struct the payload variables for the region while passing as scalars thread and bound identifiers. Differential Revision: https://reviews.llvm.org/D96854	2022-01-25 20:50:34 -05:00
Andrew Litteken	ba79295c48	[NFC][IROutliner] fix namespace and unused variable	2022-01-25 18:41:30 -06:00
Andrew Litteken	e8f4e41b6b	[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region. We use the same similarity scheme we used for branch instructions for phi nodes, and allow them to be outlined. There is not a lot of special handling needed for these phi nodes when outlining, as they simply act as outputs. The code extractor does not currently allow for non entry blocks within the extracted region to have predecessors, so there are not conflicts to handle with respect to predecessors no longer contained in the function. Recommit of `515eec3553` Reviewers: paquette Differential Revision: https://reviews.llvm.org/D106997	2022-01-25 18:25:50 -06:00
Andrew Litteken	e50b217b4e	Revert "[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region." This reverts commit `515eec3553`. By mistake, commit message was not complete.	2022-01-25 18:24:19 -06:00
Andrew Litteken	515eec3553	[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region.	2022-01-25 18:20:10 -06:00
Andrew Litteken	9c2daf648c	Revert "[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions" This reverts commit `8de76bd569`. Reverting due to failure of different-intrinsics.ll on lld-x86_64-win buildbot.	2022-01-25 18:19:33 -06:00
Dávid Bolvanský	fe30370b00	Reland "[AlwaysInliner] Enable call site inlining to make flatten attribute working again (#53360 )"	2022-01-26 01:11:06 +01:00
Andrew Litteken	8de76bd569	[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match. This also adds an additional command line flag debug option to disable outlining intrinsics. Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D109450	2022-01-25 17:06:09 -06:00
Dávid Bolvanský	90f185c964	Revert "[AlwaysInliner] Enable call site inlining to make flatten attribute working again (#53360 )" This reverts commit `ceec438368`. Clang tests fail.	2022-01-25 23:13:46 +01:00
Dávid Bolvanský	ceec438368	[AlwaysInliner] Enable call site inlining to make flatten attribute working again (#53360 ) Problem: Migration to new PM broke flatten attribute. This is one use case why LLVM should support inlining call-site with alwaysinline. The flatten attribute is nowdays broken, so we should either land patch like this one or remove everything related to flatten attribute from Clang. Second use case is something like "per call site inlining intrinsics" to control inlining even more; mentioned in https://lists.llvm.org/pipermail/cfe-dev/2018-September/059232.html Fixes https://github.com/llvm/llvm-project/issues/53360 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D117965	2022-01-25 22:55:30 +01:00
Andrew Litteken	f5f377d1fc	[IRSim][IROutliner] Adding support for recognizing and outlining indirect function calls, and function calls with different names, but the same type The outliner currently requires that function calls not be indirect calls, and have that the function name, and function type must match, as well as other attributes such as calling conventions. This patch treats called functions as values, and just another operand, and named function calls as constants. This allows functions to be treated like any other constant, or input and output into the outlined functions. There are also debugging flags added to enforce the old behaviors where indirect calls not be allowed, and to enforce the old rule that function calls names must also match. Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D109448	2022-01-25 15:19:28 -06:00
Andrew Litteken	dcc3e728ca	[IROutliner] Allowing Phi Nodes in exit blocks In addition to having multiple exit locations, there can be multiple blocks leading to the same exit location, which results in a potential phi node. If we find that multiple blocks within the region branch to the same block outside the region, resulting in a phi node, the code extractor pulls this phi node into the function and uses it as an output. We make sure that this phi node is given an output slot, and that the two values are removed from the outputs if they are not used anywhere else outside of the region. Across the extracted regions, the phi nodes are combined into a single block for each potential output block, similar to the previous patch. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D106995	2022-01-25 11:33:53 -06:00
Nikita Popov	2f02c7e1f2	[SanitizerCoverage] Avoid pointer element type access Use the load/store type instead.	2022-01-25 17:22:20 +01:00
Nikita Popov	98db33349b	[SLC] Fix pointer diff type in sprintf() optimization We should always be calculating a byte-wise difference here. Previously this calculated the pointer difference while taking the pointer element type into account, which is incorrect.	2022-01-25 15:22:56 +01:00
Nikita Popov	7cc3e141d7	[MemProf] Avoid pointer element type access Determine the masked load/store access type from the value type of the intrinsics, rather than the pointer element type. For cleanliness, include the access type in InterestingMemoryAccess.	2022-01-25 14:52:54 +01:00
Nikita Popov	6a008de82a	[Evaluator] Simplify handling of bitcasted calls When fetching the function, strip casts. When casting the result, use the call result type. Don't actually inspect the bitcast.	2022-01-25 14:19:04 +01:00
Nikita Popov	78e1f70220	[ObjCARCOpts] Use standard non-terminator unreachable pattern This is what CreateNonTerminatorUnreachable() in InstCombine uses. Specific choice here doesn't really matter, but we should pick one that is pointer element type independent.	2022-01-25 13:08:03 +01:00
Nikita Popov	30d4a7e295	[IRBuilder] Require explicit element type in CreatePtrDiff() For opaque pointer compatibility, we cannot derive the element type from the pointer type.	2022-01-25 12:43:57 +01:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Ahmed Bougacha	e7298464c5	[ObjCARC] Use "UnsafeClaimRV" to refer to unsafeClaim in enums. NFC. This matches the actual runtime function more closely. I considered also renaming both RetainRV/UnsafeClaimRV to end with "ARV", for AutoreleasedReturnValue, but there's less potential for confusion there.	2022-01-24 19:37:01 -08:00
Ahmed Bougacha	03e9ba2740	[ObjCARC] Remove unused RetainRVDep dependency kind. NFC.	2022-01-24 19:37:01 -08:00
Joseph Huber	5eb49009eb	[OpenMP] Add more identifier to created shared globals Currenly we push some variables to a global constant containing shared memory as an optimization. This generated constant had internal linkage and should not have collided with any known identifiers in the translation unit. However, there have been observed cases of this optimiztaion unintentionally colliding with undocumented PTX identifiers. This patch adds a suffix to the created globals to hopefully bypass this. Depends on D118059 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D118068	2022-01-24 20:37:54 -05:00
Joseph Huber	06cfdd5224	[OpenMP][Fix] Properly inherit calling convention Previously in OpenMPOpt we did not correctly inherit the calling convention of the callee when creating new OpenMP runtime calls. This created issues when the calling convention was changed during `GlobalOpt` but a new call was creating without the correct calling convention. This lead to the call being replaced with a poison value in `InstCombine` due to undefined behaviour and causing large portions of the program to be incorrectly eliminated. This patch correctly inherits the existing calling convention from the callee. Reviewed By: tianshilei1992, jdoerfert Differential Revision: https://reviews.llvm.org/D118059	2022-01-24 20:37:52 -05:00
Florian Hahn	8a15caaae5	[ConstraintElimination] Fix sign of sub decomposition. Update the decomposition code to make sure the right coefficient (-1) is used for the second operand of the subtract. Fixes PR53123.	2022-01-24 18:32:32 +00:00
eopXD	6be77561f8	[SLP][NFC] Add debug logs for entry. Tell the users they are specifying something without vector register. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D117980	2022-01-24 09:05:21 -08:00
Sjoerd Meijer	ada6d78a78	[LoopFlatten] Address FIXME about getTripCountFromExitCount. NFC. Together with the previous commit which mainly documents better LoopFlatten's overall strategy, this addresses a concern added as a FIXME comment in D110587; the code refactoring (NFC) introduces functions (also for the SCEV usage) to make this clearer.	2022-01-24 13:46:19 +00:00
Sjoerd Meijer	f6ac8088b0	[LoopFlatten] Added comments about usage of various Loop APIs. NFC.	2022-01-24 13:46:19 +00:00
Kerry McLaughlin	8082ab2fc3	[LoopVectorize] Support epilogue vectorisation of loops with reductions isCandidateForEpilogueVectorization will currently return false for loops which contain reductions. This patch removes this restriction and makes the following changes to support epilogue vectorisation with reductions: - `fixReduction`: If fixReduction is being called during vectorisation of the epilogue, the phi node it creates will need to additionally carry incoming values from the middle block of the main loop. - `createEpilogueVectorizedLoopSkeleton`: The incoming values of the phi created by fixReduction are updated after the vec.epilog.iter.check block is added. The phi is also moved to the preheader of the epilogue. - `processLoop`: The start value of any VPReductionPHIRecipes are updated before vectorising the epilogue loop. The getResumeInstr function added to the ILV will return the resume instruction associated with the recurrence descriptor. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D116928	2022-01-24 12:03:31 +00:00
Nikita Popov	67346b43e0	[Attributor] Use MemoryLocation to get pointer operand and accessed type (NFCI) This relies on existing APIs and avoids accessing the pointer element type. The alternative would be to extend getPointerOperand() to also return the accessed type, but I figured going through MemoryLocation would be cleaner. Differential Revision: https://reviews.llvm.org/D117868	2022-01-24 10:10:13 +01:00
Nikita Popov	d29e319263	[OpaquePtrs] Add getNonOpaquePointerElementType() method (NFC) This method is intended for use in places that cannot be reached with opaque pointers, or part of deprecated methods. This makes it easier to see that some uses of getPointerElementType() don't need further action. Differential Revision: https://reviews.llvm.org/D117870	2022-01-24 10:03:49 +01:00
Kazu Hirata	f63a9cd99d	[Vectorize] Remove unused variables (NFC)	2022-01-23 20:32:54 -08:00
Sanjay Patel	2e26633af0	[IR] document and update ctlz/cttz intrinsics to optionally return poison rather than undef The behavior in Analysis (knownbits) implements poison semantics already, and we expect the transforms (for example, in instcombine) derived from those semantics, so this patch changes the LangRef and remaining code to be consistent. This is one more step in removing "undef" from LLVM. Without this, I think https://github.com/llvm/llvm-project/issues/53330 has a legitimate complaint because that report wants to allow subsequent code to mask off bits, and that is allowed with undef values. The clang builtins are not actually documented anywhere AFAICT, but we might want to add that to remove more uncertainty. Differential Revision: https://reviews.llvm.org/D117912	2022-01-23 11:22:48 -05:00
Sanjay Patel	39e602b6c4	[InstCombine] try to fold binop with phi operands This is an alternate version of D115914 that handles/tests all binary opcodes. I suspect that we don't see these patterns too often because -simplifycfg would convert the minimal cases into selects rather than leave them in phi form (note: instcombine has logic holes for combining the select patterns too though, so that's another potential patch). We only create a new binop in a predecessor that unconditionally branches to the final block. https://alive2.llvm.org/ce/z/C57M2F https://alive2.llvm.org/ce/z/WHwAoU (not safe to speculate an sdiv for example) https://alive2.llvm.org/ce/z/rdVUvW (but it is ok on this path) Differential Revision: https://reviews.llvm.org/D117110	2022-01-22 15:00:06 -05:00
Florian Hahn	5f2854f1da	[LV] Always create VPWidenCanonicalIVRecipe, optimize away later. This patch updates createBlockInMask to always generate VPWidenCanonicalIVRecipe and adds a transform to optimize it away later, if it is not needed. This is a step towards breaking up VPWidenIntOrFpInductionRecipe and explicitly distinguishing between vector phis and scalarizing. Split off from D116123. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D117140	2022-01-22 15:34:20 +00:00
Florian Mayer	754d6af7c3	[NFC] Improve code reuse. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D116711	2022-01-21 10:50:54 -08:00
Caroline Concatto	ad43217a04	[InstCombine] Fold for masked gather when loading the same value each time. This patch checks in the masked gather when the first operand value is a splat and the mask is all one, because the masked gather is reloading the same value each time. This patch replaces this pattern of masked gather by a scalar load of the value and splats it in a vector. Differential Revision: https://reviews.llvm.org/D115726	2022-01-21 14:19:51 +00:00
Nikita Popov	bfbdb5e43e	[Coroutines] Avoid some pointer element type accesses These are just verifying that pointer types are correct, which is no longer relevant under opaque pointers.	2022-01-21 12:36:19 +01:00
Nikita Popov	9c5b856dac	[CoroSplit] Avoid pointer element type accesses Use isOpaqueOrPointeeTypeMatches() for the assertions instead.	2022-01-21 12:22:09 +01:00
Nikita Popov	e7762653d3	[Attributor] Avoid some pointer element type accesses	2022-01-21 11:20:10 +01:00
Florian Hahn	55689904d2	[VPlan] Move ::isCanonical outside ifdef. This fixes a build failure with assertions disabled.	2022-01-21 09:44:31 +00:00
Florian Hahn	c0cf209076	[VPlan] Add VPWidenIntOrFpInductionRecipe::isCanonical, use it (NFCI). This patch adds VPWidenIntOrFpInductionRecipe::isCanonical to check if an induction recipe is canonical. The code is also updated to use it instead of isCanonicalID. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D117551	2022-01-21 09:35:06 +00:00
Pawe Bylica	1d7604fdce	[InstCombine] Simplify bswap -> shift Simplify bswap(x) to shl(x) or lshr(x) if x has exactly one "active byte", i.e. all active bits are contained in boundaries of a single byte of x. https://alive2.llvm.org/ce/z/nvbbU5 https://alive2.llvm.org/ce/z/KiiL3J Reviewed By: spatel, craig.topper, lebedev.ri Differential Revision: https://reviews.llvm.org/D117680	2022-01-21 01:25:30 +01:00
Johannes Doerfert	37e0c58559	[Attributor][FIX] AAValueConstantRange should not loop unconstrained The old method to avoid unconstrained expansion of the constant range in a loop did not work as soon as there were multiple instructions in between the phi and its input. We now take a generic approach and limit the number of updates as a fallback. The old method is kept as it catches "the common case" early.	2022-01-20 18:07:04 -06:00
Johannes Doerfert	7bf9065ad7	[Attributor][NFC] Clang format	2022-01-20 18:06:53 -06:00
Philip Reames	c0906f6b21	[SLP] Remove stray semicolon to make bots happy Certain bots (e.g. sanitizer-x86_64-linux-android) appear to be running with strict c++98 flags which disallow ; at global scope.	2022-01-20 14:09:28 -08:00
Philip Reames	5a670f1378	[SLP] Kill an unused param and use a for-loop in calculateDependencies [NFC]	2022-01-20 13:58:20 -08:00
Philip Reames	60f6191879	[SLP] Extract formBundle helper for readability [NFC]	2022-01-20 13:08:37 -08:00
Sanjay Patel	a7a2860d0e	[InstCombine] convert mul with sexted bool and constant to select We already have the related folds for zext-of-bool, so it should make things more consistent to have this transform to select for sext-of-bool too: https://alive2.llvm.org/ce/z/YikdfA Fixes #53319	2022-01-20 15:57:01 -05:00
Philip Reames	118babe67a	[SLP] Use for loops for walking bundle elements	2022-01-20 12:44:33 -08:00
Philip Reames	860038e0d7	[SLP] Rename a couple lambdas to be more clearly separate from method names	2022-01-20 12:13:30 -08:00
Roman Lebedev	ba8eb31bd9	[InstCombine] Instruction sinking: fix check for function terminating block Checking for specific function terminating opcodes means we don't handle other non-hardcoded ones :) This should probably be generalized to something similar to the `IsBlockFollowedByDeoptOrUnreachable()`. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D117810	2022-01-20 22:41:31 +03:00
Sanjay Patel	2d031ec5e5	[InstCombine] add one-use check to opposite shift folds Test comments say this might be intentional, but I don't see any hard evidence to support it. The extra instruction shows up as a potential regression in D117680. One test does show a missed fold that might be recovered with better demanded bits analysis.	2022-01-20 13:49:23 -05:00
Craig Topper	9abc593e98	[TargetLowering][InstCombine] Simplify BSwap demanded bits code a little. NFC Use alignDown instead of &= ~7. Replace ResultBit with NLZ. (BitWidth - NLZ - NTZ == 8) so (BitWidth - NTZ - 8 == NLZ). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D117804	2022-01-20 10:45:17 -08:00
Nadav Rotem	191a6e9dfa	optimize icmp-ugt-ashr This diff optimizes the sequence icmp-ugt(ashr,C_1) C_2. InstCombine already implements this optimization for sgt, and this patch adds support ugt. This patch adds the check for UGT. @craig.topper came up with the idea and proof: define i1 @src(i8 %x, i8 %y, i8 %c) { %cp1 = add i8 %c, 1 %i = shl i8 %cp1, %y %i.2 = ashr i8 %i, %y %cmp = icmp eq i8 %cp1, %i.2 ;Assume: C + 1 == (((C + 1) << y) >> y) call void @llvm.assume(i1 %cmp) ; uncomment for the sgt case %j = shl i8 %cp1, %y %j.2 = sub i8 %j, 1 %cmp2 = icmp ne i8 %j.2, 127 ;Assume (((c + 1 ) << y) - 1) != 127 call void @llvm.assume(i1 %cmp2) %s = ashr i8 %x, %y %r = icmp sgt i8 %s, %c ret i1 %r } define i1 @tgt(i8 %x, i8 %y, i8 %c) { %cp1 = add i8 %c, 1 %j = shl i8 %cp1, %y %j.2 = sub i8 %j, 1 %r = icmp sgt i8 %x, %j.2 ret i1 %r } declare void @llvm.assume(i1) This change is related to the optimizations in D117252. Differential Revision: https://reviews.llvm.org/D117365	2022-01-20 09:31:46 -08:00
Sjoerd Meijer	fabf1de132	[FuncSpec] Add a reference, and some other clarifying comments. NFC.	2022-01-20 17:01:08 +00:00
Philip Reames	c104fca36b	{SLP] Delete dead code in favor of proper assert [NFC]	2022-01-20 08:54:12 -08:00
Philip Reames	c43ebae838	[SLP] Reduce nesting depth in calculateDependencies via for loop and early continue [NFC]	2022-01-20 08:46:44 -08:00
Philip Reames	3c422cbe6b	[SLP] Add an asser to make a non-obvious precondition clear [NFC]	2022-01-20 08:24:10 -08:00
Nikita Popov	0d20407d1a	Reapply [MemCpyOpt] Look through pointer casts when checking capture This is a recommit of the patch without changes. The reason for the revert has been addressed in D117679. ----- The user scanning loop above looks through pointer casts, so we also need to strip pointer casts in the capture check. Previously the source was incorrectly considered not captured if a bitcast was passed to the call.	2022-01-20 09:30:21 +01:00
Nikita Popov	655a7024db	Reapply [MemCpyOpt] Make capture check during call slot optimization more precise This is a recommit of the patch without changes. The reason for the revert has been addressed in D117679. ----- Call slot optimization is currently supposed to be prevented if the call can capture the source pointer. Due to an implementation bug, this check currently doesn't trigger if a bitcast of the source pointer is passed instead. I'm somewhat afraid of the fallout of fixing this bug (due to heavy reliance on call slot optimization in rust), so I'd like to strengthen the capture reasoning a bit first. In particular, I believe that the capture is fine as long as a) the call itself cannot depend on the pointer identity, because neither dest has been captured before/at nor src before the call and b) there is no potential use of the captured pointer before the lifetime of the source alloca ends, either due to lifetime.end or a return from a function. At that point the potentially captured pointer becomes dangling. Differential Revision: https://reviews.llvm.org/D115615	2022-01-20 09:30:20 +01:00
Nikita Popov	d7bff2e9d2	[MemCpyOpt] Fix metadata merging during call slot optimization Call slot optimization currently merges the metadata between the call and the load. However, we also need to merge in the metadata of the store. Part of the reason why we might have gotten away with this previously is that usually the load and the store are the same instruction (a memcpy), this can only happen if call slot optimization occurs on an actual load/store pair. This addresses the issue reported in https://reviews.llvm.org/D115615#3251386. Differential Revision: https://reviews.llvm.org/D117679	2022-01-20 09:25:13 +01:00
Heejin Ahn	eb675e972d	[WebAssembly] Support Wasm EH + Wasm SjLj D108960 added support for SjLj using Wasm EH instructions, which we call Wasm SjLj going forward. (We call the old SjLj Emscripten SjLj) But it did not support using Wasm EH and Wasm SjLj together. So far users of Wasm EH had to use Wasm EH with Emscripten SjLj, which had a certain limitation and it suffered from bigger code size increases as well. This enables using Wasm EH and Wasm SjLj together. 1. This redirects `catchswitch` and `cleanupret` that unwind to caller to `catch.dispatch.longjmp` BB, which is a `catchswitch` BB that handles longjmps. 2. D108960 converted all longjmpable `call`s to `invokes` that unwind to `catch.dispatch.longjmp`. This CL checks if the `call` is embedded within another `catchpad`, and if so, makes it unwind to its nearest parent's unwind destination, rather than `catch.dispatch.longjmp`. This is necessary to preserve the scoping structure. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D117610	2022-01-19 20:13:54 -08:00
Craig Topper	02d9a4d56d	[LoopPeel] Pass TripCount to computePeelCount by value instead of by reference. NFC The TripCount is not modified by the function so it doesn't need to be passed by reference. Verified by passing it as const reference before changing to value. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D117735	2022-01-19 17:54:45 -08:00
Craig Topper	1507786c22	[LoopPeeling] Fix stale comments. NFC These comments were not updated when PeelingPreferences split from UnrollingPreferences.	2022-01-19 17:00:12 -08:00
Johannes Doerfert	b4a7559844	[OpenMP][FIX] Replace ICVs only with values valid at the getter position While we might know the value if an ICV at a getter position it is not always clear that we can simply use it. Verify the value is valid first to avoid invalid IR. Fixes #53300.	2022-01-19 18:40:13 -06:00
Eli Friedman	86cdff0e21	[OpenMPOpt] Use SetVector to store list of kernels. Fixes test failures on reverse-iteration buildbot.	2022-01-19 13:55:32 -08:00
Wenlei He	7cca13bc3a	[PartialInline] Bail out on asm-goto/callbr Fixing ICE when partial inline tries to deal with blockaddress uses of function which is typical for asm-goto/callbr. We ran into this with PGO multi-region partial inline. Differential Revision: https://reviews.llvm.org/D117509	2022-01-19 10:57:57 -08:00
Nikita Popov	4dc4815f56	[MemCpyOpt] Add some debug output to call slot optimization (NFC)	2022-01-19 15:51:10 +01:00
Nikita Popov	5ba73c924d	[BuildLibCalls] Mark calloc as inaccessiblememonly Now that DSE handles inaccessiblememonly calloc, mark it as such, as we do with other memory allocation functions.	2022-01-19 12:55:09 +01:00
Nikita Popov	26f81984e7	[DSE] Handle inaccessiblememonly calloc Change the DSE calloc handling to assume that it is inaccessiblememonly, i.e. the defining access is liveOnEntry. Differential Revision: https://reviews.llvm.org/D117543	2022-01-19 12:55:09 +01:00
Sjoerd Meijer	d544a89a37	[LoopFlatten] Update MemorySSA state I would like to move LoopFlatten from LoopPass Manager LPM2 to LPM1 (D116612), but that is a LPM that is using MemorySSA and so LoopFlatten needs to preserve MemorySSA and this adds that. More specifically, LoopFlatten restructures the CFG and with this change the MSSA state is updated accordingly, where we also update the DomTree. LoopFlatten doesn't rewrite/optimise/delete load or store instructions, so I have not added any MSSA updates for that. Differential Revision: https://reviews.llvm.org/D116660	2022-01-19 10:57:33 +00:00
Nikita Popov	d56b0ad441	[ConstantHoist] Remove check for notional overindexing ConstantHoist currently only hoists GEPs if there is no notional overindexing. As this transform only hoists address arithmetic, it shouldn't care about whether any overindexing occurs or not. There is one caveat: If the hoisted base GEP is inbounds, and a later non-inbounds GEP is rewritten in terms of it, the value may be incorrectly poisoned. To avoid this, restrict the transform to inbounds GEPs for now, as the notional overindexing check effectively did that as well. The inbounds restriction could be dropped by dropping inbounds from the base GEP expression. Differential Revision: https://reviews.llvm.org/D117201	2022-01-19 11:32:10 +01:00
Nikita Popov	a115bbea9b	[Attributor] Remove notional overindexing check AAPointerInfo currently bails on constant expression GEPs with notional overindexing. I don't think this is necessary, as the following code handling GEPOperator will deal with arbitrary indices appropriately. Differential Revision: https://reviews.llvm.org/D117203	2022-01-19 11:30:04 +01:00
Florian Hahn	165e36bf18	[VPlan] Assert can IV is only used by increments during epilogue vec. After resetting the start value of the canonical IV, it might not be canonical any more. Add an assertion to make sure it is only used by its increment, to avoid potential mis-use. Suggested in D117140.	2022-01-19 10:10:05 +00:00
Chuanqi Xu	c8ecf12bc3	[Coroutines] Offering llvm.coro.align intrinsic It is a known problem that we can't align the switch-based coroutine frame if the alignment exceeds std::max_align_t (which is 16 usually). We could solve the problem on the middle-end by dynamically transforming or in the frontend by emitting aligned allocation function. If we need to solve it in the frontend, the middle end need to offer an intrinsic to tell the alignment at least. This patch tries to offer such an intrinsic called llvm.coro.align. Reviewed By: https://reviews.llvm.org/D117542 Differential revision: https://reviews.llvm.org/D117542	2022-01-19 09:52:45 +08:00
spupyrev	13d1364a34	A better profi rebalancer This is an extension of profi post-processing step that rebalances counts in CFGs that have basic blocks w/o probes (aka "unknown" blocks). Specifically, the new version finds many more "unknown" subgraphs and marks more "unknown" basic blocks as hot (which prevents unwanted optimization passes). I see up to 0.5% perf on some (large) binaries, e.g., clang-10 and gcc-8. The algorithm is still linear and yields no build time overhead.	2022-01-18 12:14:24 -08:00
Ellis Hoag	5b9358d774	[InstrProf][NFC] Add InstrProfInstBase base The `InstrProfInstBase` class is for all `llvm.instrprof.*` intrinsics. In a later diff we will add new instrinsic of this type. Also refactor some logic in `InstrProfiling.cpp`. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D117261	2022-01-18 11:12:00 -08:00
Adrian Tong	ea27adb45b	[NFC] Test commit. This is just a test commit to check whether I got commit permission.	2022-01-18 19:01:04 +00:00
Mircea Trofin	3e8553aab4	[mlgo][inline] Improve global state tracking The global state refers to the number of the nodes currently in the module, and the number of direct calls between nodes, across the module. Node counts are not a problem; edge counts are because we want strictly the kind of edges that affect inlining (direct calls), and that is not easily obtainable without iteration over the whole module. This patch avoids relying on analysis invalidation because it turned out to be too aggressive in some cases. It leverages the fact that Node objects are stable - they do not get deleted while cgscc passes are run over the module; and cgscc pass manager invariants. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D115847	2022-01-18 17:45:34 +00:00
Jan Svoboda	5f4ae56457	[llvm] Remove uses of `std::vector<bool>` LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement. This patch does just that for llvm. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D117121	2022-01-18 18:20:45 +01:00
Sanjay Patel	2d50630efb	[InstCombine] reduce code duplication; NFC	2022-01-18 12:13:45 -05:00
Hans Wennborg	53a51acc36	Revert "[MemCpyOpt] Make capture check during call slot optimization more precise" This casued a miscompile due to call slot optimization replacing a call argument without considering the call's !noalias metadata, see discussion on the code review. > Call slot optimization is currently supposed to be prevented if > the call can capture the source pointer. Due to an implementation > bug, this check currently doesn't trigger if a bitcast of the source > pointer is passed instead. I'm somewhat afraid of the fallout of > fixing this bug (due to heavy reliance on call slot optimization > in rust), so I'd like to strengthen the capture reasoning a bit first. > > In particular, I believe that the capture is fine as long as a) > the call itself cannot depend on the pointer identity, because > neither dest has been captured before/at nor src before the > call and b) there is no potential use of the captured pointer > before the lifetime of the source alloca ends, either due to > lifetime.end or a return from a function. At that point the > potentially captured pointer becomes dangling. > > Differential Revision: https://reviews.llvm.org/D115615 Also reverting the dependent commit: > [MemCpyOpt] Look through pointer casts when checking capture > > The user scanning loop above looks through pointer casts, so we > also need to strip pointer casts in the capture check. Previously > the source was incorrectly considered not captured if a bitcast > was passed to the call. This reverts commit `487a34ed9d` and `00e6869463`.	2022-01-18 17:41:49 +01:00
Daniil Kovalev	d8e0e125a2	[InstCombine] Simplify addends reordering logic Previously some constants were not pushed to the top of the resulting expression tree as intended by the algorithm. We can remove the logic from simplifyFAdd and rely on SimplifyAssociativeOrCommutative to do that. Differential Revision: https://reviews.llvm.org/D117302	2022-01-18 16:00:47 +03:00
David Sherwood	e781620dee	[LoopVectorize][AArch64] Use get.active.lane.mask intrinsic when SVE is enabled When SVE is enabled for AArch64 targets it makes more sense to use the get.active.lane.mask intrinsic, because SVE has an exact 1-1 mapping from the intrinsic to the 'whilelo' instruction for legal vector types. This instruction neatly takes overflow into account as well. This patch fixes an issue in VPInstruction::generateInstruction that assumed we are only dealing with fixed-width vectors. Differential Revision: https://reviews.llvm.org/D117109	2022-01-18 11:59:30 +00:00
pvellien	4e1c207726	[SimplifyCFG] Fix assertion failure when reusing table switch comparison After D116332, some icmps no longer fold with the target-independent constant folder. The SimplifyCFG code assumed that the comparison would always fold, which is not guaranteed. Explicitly check that the result is either true or false. Differential Revision: https://reviews.llvm.org/D117184	2022-01-18 09:30:54 +01:00
Philip Reames	26049b8ce3	[GlobalOpt] Generalize malloc-to-global for any allocation function We can generalize the malloc-to-global transform for other allocation functions which are both a) removable, and b) have a known initialization value. One subtlety that I want to point out - mostly because I hadn't realized it was true until I took a closer look - is that the existing code doesn't prove that initialization/malloc happens only once. The initialization function can be called multiple times. This is correct without special handling for malloc as undef can map to any value previously written, but a non-undef initializing allocation it means we may end up memseting the new global repeatedly. In particular, this means it's not legal to fold the memset into the initializer of the global. Differential Revision: https://reviews.llvm.org/D117503	2022-01-17 15:06:23 -08:00
Philip Reames	6ca192de58	[LoopDeletion] Add back statistic update lost in `523573e` Caught by a couple of builders as an unused variable warning (e.g. https://lab.llvm.org/buildbot#builders/57/builds/13973).	2022-01-17 12:20:51 -08:00
Philip Reames	523573e90d	[LoopDeletion] Revert `3af8a11` and add test coverage for breakage This reverts `3af8a11` because I'd used an upper bound where an lower bound was required. The included reduced test case demonstrates the issue.	2022-01-17 11:44:03 -08:00
Stephen Tozer	32417b3203	[DebugInfo] ValueMapper impl for DIArgList respects IgnoreMissingLocals This patch fixes an issue in which SSA value reference within a DIArgList would be unnecessarily dropped by llvm-link, even when invoking on a single file (which should be a no-op). The reason for the difference is that the ValueMapper does not refer to the RF_IgnoreMissingLocals flag for LocalAsMetadata contained within a DIArgList; this flag is used for direct LocalAsMetadata uses to preserve SSA references even when the ValueMapper does not have an explicit mapping for the referenced SSA value, which appears to always be the case when using llvm-link in this manner. Differential Revision: https://reviews.llvm.org/D114355	2022-01-17 17:17:32 +00:00
Sanjay Patel	4cdf30d9d3	[InstCombine] FP with reassoc FMF: (X * C) + X --> X * (MulC + 1.0) This fold already exists for scalars via FAddCombine (and that's why 2 of the tests are only changed cosmetically), but that code misses vectors and has largely been replaced by simpler folds over time, so this is another step towards removing it.	2022-01-17 10:38:05 -05:00
Florian Hahn	aa7f0e6a55	[DSE] Remove commented-out InvisibleToCallerBeforeRet. (NFC) This code was is a leftover from earlier changes and should be removed.	2022-01-17 13:59:13 +00:00
Sanjay Patel	7037d110fa	[InstCombine] propagate IR flags from binop through select The tests with constant folding that produces poison could potentially remove the select entirely: https://alive2.llvm.org/ce/z/e-WUqF ...but this patch just removes the FMF-only limitation on propagation.	2022-01-17 08:42:48 -05:00
Florian Hahn	500fe60957	[VPlan] Drop unnecessary uses of getVPSingleValue (NFC).	2022-01-17 13:27:33 +00:00
Nikita Popov	12bee2c054	[GlobalOpt] Drop an incorrect check This was a last-minute addition to D117249, and of course I ended up inverting the condition in a way that caused an uninitialized memory read. I've dropped it entirely, as I don't think we actually care whether the size is zero or not here. The previous code wasn't checking this either.	2022-01-17 10:10:56 +01:00
Nikita Popov	499f1ca79f	[GlobalOpt] Use generic type when converting malloc to global The malloc to global transform currently determines the type of the global by looking at bitcasts of the malloc. This is limited (the transform fails if there are multiple different types) and incompatible with opaque pointers. My initial approach was to construct an appropriate struct type based on usage in loads/stores. What this patch does instead is to always create an [i8 x AllocSize] global, without trying to guess types at all. This does mean that other transforms that require a certain global type may break. I fixed two of these in D117034 and D117223, which I believe should be sufficient to avoid regressions. In particular, the global SRA change should end up splitting the global into naturally-typed sub-globals, at which point all other optimizations should work. Differential Revision: https://reviews.llvm.org/D117092	2022-01-17 09:55:33 +01:00
Nikita Popov	4796b4ae7b	[GlobalOpt] Make global SRA offset based Currently global SRA uses the GEP structure to determine how to split the global. This patch instead analyses the loads and stores that are performed on the global, and collects which types are used at which offset, and then splits the global according to those. This is both more general, and works fine with opaque pointers. This is also closer to how ordinary SROA is performed. Differential Revision: https://reviews.llvm.org/D117223	2022-01-17 09:28:36 +01:00
Nikita Popov	00b77d917c	[DSE] Remove alloc function check in canSkipDef() canSkipDef() currently skips inaccessiblememonly calls, but not if they are allocation functions. This check was added in D103009, but actually seems to be a leftover from a previous implementation in D101440. canSkipDef() is not used on the storeIsNoop() path, where the relevant transform ended up being implemented. Differential Revision: https://reviews.llvm.org/D117005	2022-01-17 09:23:51 +01:00
Florian Hahn	070d1034da	[LV] Restore metadata to disable runtime unrolling for epilogue loop. After `d4a8fc3a87` LV stopped adding metadata to disable runtime unrolling to the vectorized epilogue loop. This was missed because `278aa65cc4` removed the relevant test coverage. This patch fixes that by adding the relevant metadata after vector loop generation.	2022-01-16 13:14:16 +00:00
Florian Hahn	62739204d4	[LV] Move AddRuntimeUnrollDisableMetaData so it can be used earlier (NFC) Move up the definition of AddRuntimeUnrollDisableMetaData, so it can be re-used earlier in the file in a follow-up patch.	2022-01-16 10:30:24 +00:00
Nikita Popov	c63a3175c2	[AttrBuilder] Remove ctor accepting AttributeList and Index Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeList. Moving this out of the AttrBuilder generally results in cleaner code.	2022-01-15 22:39:31 +01:00
Nikita Popov	d1675e4944	[AttrBuilder] Remove empty() / td_empty() methods The empty() method is a footgun: It only checks whether there are non-string attributes, which is not at all obvious from its name, and of dubious usefulness. td_empty() is entirely unused. Drop these methods in favor of hasAttributes(), which checks whether there are any attributes, regardless of whether these are string or enum attributes.	2022-01-15 17:57:18 +01:00
Florian Hahn	e00158ed5c	[LoopUtils] Use InstSimplifyFolder in addRuntimeChecks. Use the InstSimplifyFolder introduced earlier to perform initial simplification during runtime check construction.	2022-01-15 15:21:16 +00:00
Vitaly Buka	35d00fdc10	[msan] Reset shadow of byval before call If function is not sanitized we must reset shadow, not copy. Depends on D117285 Reviewed By: kda, eugenis Differential Revision: https://reviews.llvm.org/D117286	2022-01-14 22:35:43 -08:00
Quentin Colombet	a8ca4046e2	[LSR] Fix crash in Phi node with EHPad block This fixes a crash I observed in issue #48708 where the LSR pass tries to insert an instruction in a basic block with only a catchswitch statement in there. This happens because the Phi node being evaluated assumes the same value for different basic blocks. If the basic block associated with the incoming value of the operand being evaluated has an EHPad terminator LSR skips optimizing it. But if that incoming value can come from multiple different blocks there can be some incoming basic blocks which are terminated in an EHPad. If these are then rewritten in RewriteForPhi the ones containing an EHPad terminator will hit the "Insertion point must be a normal instruction" assert in AdjustInsertPositionForExpand. This fix makes CollectLoopInvariantFixupsAndFormulae also ignore cases where the same value has another incoming basic block with an EHPad, same as it already does in case the primary value has one. Patch by Lorenz Brun <lorenz@brun.one> Differential Revision: https://reviews.llvm.org/D98378	2022-01-14 18:53:18 -08:00
Vitaly Buka	0a46b6ec4e	[msan] Clear byval shadow in ignored functions If function has no sanitize_memory we still reset shadow for nested calls. The first return from getShadow() correctly returned shadow for argument, but it didn't reset shadow of byval pointee. Depends on D117277 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D117278	2022-01-14 17:32:07 -08:00
Vitaly Buka	4959708502	[NFC][msan] Consolidate clean shadow handling Depends on D117276 Reviewed By: kda, eugenis Differential Revision: https://reviews.llvm.org/D117277	2022-01-14 17:06:39 -08:00
Vitaly Buka	18e4369e19	[NFC][msan] Don't setOrigin for byval pointer It's NFC because shadow of pointer is clean so origins will not be propagated anyway. Depends on D117275 Reviewed By: kda, eugenis Differential Revision: https://reviews.llvm.org/D117276	2022-01-14 16:42:26 -08:00
Heejin Ahn	c3a68c5d63	[SROA] Bail out on PHIs in catchswitch BBs In the process of rewriting `alloca`s and `phi`s that use them, the SROA pass can try to insert a non-PHI instruction by calling `getFirstInsertionPt()`, which is not possible in a catchswitch BB. This CL makes we bail out on these cases. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D117168	2022-01-14 14:55:07 -08:00
Congzhe Cao	fa6a2876c7	[LoopInterchange] Enable interchange with multiple inner loop indvars Currently loop interchange only supports loops with one inner loop induction variable. This patch adds support for transformation with more than one inner loop induction variables. The induction PHIs and induction increment instructions are moved/duplicated properly to the new outer header and the new outer latch, respectively. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D114917	2022-01-14 16:28:41 -05:00
Vitaly Buka	3552177229	[NFC][msan] Reorder branches in complex if Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D117274	2022-01-14 13:22:43 -08:00
Nadav Rotem	9551fc57b7	Fold ashr-exact into a icmp-ugt. This commit optimizes the code sequence: icmp-XXX (ashr-exact (X, C_1), C_2). Instcombine already implements this optimization for sgt, and this patch adds support to additional predicates. The transformation is legal for all predicates if the 'exact' flag is set, and to SGE, UGE, SLT, ULT when the exact flag is not present. This pattern is found in the std::vector bounds checks code of the at() method. Alive2 proof: https://alive2.llvm.org/ce/z/JT_WL8 Differential Revision: https://reviews.llvm.org/D117252	2022-01-14 12:58:44 -08:00
Jessica Paquette	acb8de565e	[JumpThreading] Change asserts for WantInteger into actual checks After `e734e8286b`, it is possible to end up in a situation where an `indirectbr` is fed by a cast, which is in turn fed by an operation which only produces integers. `indirectbr` expects a block address, however these operations can't produce that. There were several asserts in `computeValueKnownInPredecessorsImpl` which check that we're not looking for a block address if we're walking through something which can never produce one. Since it's now possible to hit these asserts, this changes them into actual checks which return false if `Preference` is not `WantInteger`. This adds a testcase which verifies that we don't crash anymore in these situations. Differential Revision: https://reviews.llvm.org/D99814	2022-01-14 11:15:14 -08:00
Florian Hahn	42b34facfd	Recommit "[LV] Inline CreateSplatIV call for scalar VFs." This reverts the revert commit `073c27b5e5`. A reduced test case has been added in `5e4966cbae` and the code has been updated to handle the case where getInductionOpcode returns BinaryOpsEnd. In this case, the original code was always using Instruction::Add. Do the same in the patch. Note this commit may slightly change the value naming, because it now also assigns the 'induction' name in the floating point case.	2022-01-14 19:03:49 +00:00
Sanjay Patel	02455bea6b	[InstCombine] remove unnecessary use check on X >>exact == 0 fold The transform replaces one icmp with another, so we should not care if the shift has another use.	2022-01-14 12:52:16 -05:00
Florian Hahn	1ef9bfa013	[InstSimplify] Pass pointer and indices separately to SimplifyGEPInst. This doesn't require callers to put the pointer operand and the indices in a container like a vector when calling the function. This is not really an issue with the existing callers. But when using it from IRBuilder the inputs are available as separate pointer value and indices ArrayRef. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D117038	2022-01-14 09:59:52 +00:00
Caroline Concatto	8e5a5b619d	[InstCombine] Fold for masked scatters to a uniform address When masked scatter intrinsic does a uniform store to a destination address from a source vector, and in this case, the mask is all one value. This patch replaces the masked scatter with an extracted element of the last lane of the source vector and stores it in the destination vector. This patch also folds when the value in the masked scatter is a splat. In this case, the mask cannot be all zero, and it folds to a scalar store of the value in the destination pointer. Differential Revision: https://reviews.llvm.org/D115724	2022-01-14 09:44:34 +00:00
Bryce Wilson	28b6e2cb3d	[Attributor] [NFC] Use canonical variable name Differential Revision: https://reviews.llvm.org/D117241	2022-01-13 23:06:00 -08:00
Vitaly Buka	71a4fde397	[NFC][msan] Init few vars later	2022-01-13 22:00:37 -08:00
Vitaly Buka	36138d8252	[NFC][msan] Declare some getShadow vars later	2022-01-13 21:36:37 -08:00
James Y Knight	073c27b5e5	Revert "[LV] Inline CreateSplatIV call for scalar VFs (NFC)." Causes a crash with the following (creduce'd) test-case: clang -O3 '--target=aarch64-grtev4-linux-gnu' -xc - -c -o /dev/null <<EOF int e; int f; int g() { int h; int j = 0; while (&f - j > 0) { int k; k = j; if (e == j && *e) k = 5; h = k; j++; } return h; } EOF This reverts commit `7ce48be0fd`.	2022-01-14 00:00:02 +00:00
Philip Reames	5d5d4d94f0	[Attributor] Generalize heap to stack to any allocator with relevant properties This completes removal of the isXLike queries, and depends on a whole series of earlier patches which have already landed. Differential Revision: https://reviews.llvm.org/D117242	2022-01-13 15:33:24 -08:00
Philip Reames	cf66f01ec1	[Attributor] Share code for abstract interpretation of allocation sizes with getObjectSize [NFC-ish] The basic idea is that we can parameterize the getObjectSize implementation with a callback which lets us replace the operand before analysis if desired. This is what Attributor is doing during it's abstract interpretation, and allows us to have one copy of the code. Note this is not NFC for two reasons: * The existing attributor code is wrong. (Well, this is under-specified to be honest, but at least inconsistent.) The intermediate math needs to be done in the index type of the pointer space. Imagine e.g. i64 arguments in a 32 bit address space. * I did not preserve the behavior in getAPInt where we return 0 for a partially analyzed value. This looks simply wrong in the original code, and nothing test wise contradicts that. Differential Revision: https://reviews.llvm.org/D117241	2022-01-13 15:33:24 -08:00
Arthur Eubanks	9a0fe1b0fc	[Inline] Attempt to delete any discardable if unused functions Previously we limited ourselves to only internal/private functions. We can also delete linkonce_odr functions. Minor compile time wins: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=instructions Major memory wins on tramp3d: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=max-rss Relanding with fix for compile times D117236. Reviewed By: nikic, mtrofin Differential Revision: https://reviews.llvm.org/D115545	2022-01-13 14:48:38 -08:00
Arthur Eubanks	757e044dce	[Inliner] Don't removeDeadConstantUsers() when checking if a function is dead If a function has many uses, this can take a good chunk of compile times. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D117236	2022-01-13 14:29:45 -08:00
Congzhe Cao	37e34b74e9	[LoopInterchange] Enable interchange with multiple outer loop indvars This patch enables loop interchange with multiple outer loop induction variables, and hence removes the limitation that only a single outer loop induction variable is supported. In fact, it turns out that the current pass already trivially supports multiple outer indvars, which is the result of a previous patch `https://reviews.llvm.org/D102743`. Therefore, this patch removed that limitation and provides test cases for multiple outer indvars. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D114916	2022-01-13 16:51:32 -05:00
Roman Lebedev	82c8aca934	[SimplifyCFG] Be more aggressive when sinking into block followed by unreachable I strongly believe we need some variant of this. The main problem is e.g. that the glibc's assert has 4 parameters, but the profitability check is only okay with one extra phi node, so D116692 doesn't even trigger on most of the expected cases. While that restriction probably makes sense in normal code, if we are about to run off of a cliff (into an `unreachable`), this successor block is unlikely so the cost to setup these PHI nodes should not be on the hotpath, and shouldn't matter performance-wise. Likewise, we don't sink if there are unconditional predecessors UNLESS we'd sink at least one non-speculatable instruction, which is a performance workaround, but if we are about to run into `unreachable`, it shouldn't matter. Note that we only allow the case where there are at most unconditiona branches on the way to the unreachable block. Differential Revision: https://reviews.llvm.org/D117045	2022-01-13 23:30:31 +03:00
Florian Hahn	3f2fb767e3	[VPlan] Make IV operand explicit for VPWidenCanonicalIVRecipe (NFC). This makes the def-use relationship between VPCanonicalIVPHIRecipe and VPWidenCanonicalIVRecipe explicit. Needed for D117140.	2022-01-13 11:13:05 +00:00
Nikita Popov	1cbb456123	[GlobalOpt] Fix global to select transform under opaque pointers We need to check that the load/store type is also the same, as this is no longer implicitly checked through the pointer type.	2022-01-13 11:13:06 +01:00
Florian Hahn	7ce48be0fd	[LV] Inline CreateSplatIV call for scalar VFs (NFC). This is a NFC change split off from D116123, as suggested there. D116123 will remove the last user of CreateSplatIV.	2022-01-13 09:34:31 +00:00
James Y Knight	55fcbf0a84	Revert "[Inline] Attempt to delete any discardable if unused functions" Somehow this ends up causing an infinite loop in the inliner. This reverts commit `d5be48c66d`.	2022-01-13 03:06:47 +00:00
Philip Reames	9979299705	[Attributor] Simplify how we handle required alignment during heap-to-stack [NFC] The existing code duplicated the same concern in two places, and (weirdly) changed the inference of the allocation size based on whether we could meet the alignment requirement. Instead, just directly check the allocation requirement.	2022-01-12 17:34:17 -08:00
Philip Reames	d1f4c6a611	[Attributor] Generalize calloc handling in heap-to-stack for any init value [NFC] Rewrite the calloc specific handling in heap-to-stack to allow arbitrary init values. The basic problem being solved is that if an allocation is initilized to anything other than zero, this must be explicitly done for the formed alloca as well. This covers the calloc case today, but once a couple of earlier guards are removed in this code, downstream allocators with other init values could also be handled. Inspired by discussion on D116971	2022-01-12 16:58:39 -08:00
Philip Reames	8e76720cf2	[Attributor] Reuse object size evaluation code [NFC]	2022-01-12 16:58:39 -08:00
Philip Reames	db57065b36	[Attributor] Use getAllocAlignment where possible [NFC] Inspired by D116971.	2022-01-12 16:58:39 -08:00
Arthur Eubanks	fe827a93f6	[ModuleInliner] Properly delete dead functions Followup to D116964 where we only did this in the CGSCC inliner. Fixes leaks reported in D116964.	2022-01-12 09:57:43 -08:00
Arthur Eubanks	d5be48c66d	[Inline] Attempt to delete any discardable if unused functions Previously we limited ourselves to only internal/private functions. We can also delete linkonce_odr functions. Minor compile time wins: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=instructions Major memory wins on tramp3d: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=max-rss Reviewed By: nikic, mtrofin Differential Revision: https://reviews.llvm.org/D115545	2022-01-12 08:36:04 -08:00
Florian Hahn	d4a8fc3a87	[VPlan] Introduce and use BranchOnCount VPInstruction. This patch adds a new BranchOnCount VPInstruction opcode with 2 operands. It first compares its 2 operands (increment of canonical induction and vector trip count), followed by a branch to either the exit block or back to the vector header. It must be the last recipe in the exit block of the topmost vector loop region. This extracts parts from D113224 and was discussed in D113223. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116479	2022-01-12 13:42:13 +00:00
Rosie Sumpter	552eb372cb	[LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter This is required to query the legality more precisely in the LoopVectorizer. This adds another TTI function named 'forceScalarizeMaskedGather/Scatter' function to work around the hack introduced for MVE, where isLegalMaskedGather/Scatter would return an answer by second-guessing where the function was called from, based on the Type passed in (vector vs scalar). The new interface makes this explicit. It is also used by X86 to check for vector widths where gather/scatters aren't profitable (or don't exist) for certain subtargets. Differential Revision: https://reviews.llvm.org/D115329	2022-01-12 13:34:12 +00:00
Florian Hahn	e3275cfa94	[BuildLibCalls] Add nounwind,willreturn to memset_pattern{4,8,16}. Similar to memset, memset_pattern{4,8,16} all will return and do not unwind. Use fallthrough to include all attributes also set for memset. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114904	2022-01-12 10:32:53 +00:00
Nikita Popov	5642ce5ac2	[GlobalOpt] Drop redundant setExternallyInitialized() call (NFC) This is part of copyAttributesFrom().	2022-01-12 09:42:58 +01:00
Nikita Popov	47a47733f0	[GlobalStatus] Remove unused HasNonInstructionUser member (NFC) This hasn't been used in a long time.	2022-01-12 09:40:54 +01:00
Nikita Popov	f3e87176e1	[GlobalOpt] Support "stored once" optimization for different types GlobalOpt can optimize a global with undef initializer and a single store to put the stored value into the initializer instead. Currently, this requires the type of the global and the store to match. This patch extends support to cases with different types (but same size), in which case we create a new global to replace the old one. Differential Revision: https://reviews.llvm.org/D117034	2022-01-12 09:39:31 +01:00
Chuanqi Xu	22225cc5e6	[Coroutines] Handle lifetime markers, bitcast and unused instruciton for symmetric transfer This fixes bug49888. The root cause for this is that simplifyTerminatorLeadingToRet didn't handle lifetime markers well. Another issue also noted in D116327 is that we deleted some inlined optimization pass in CoroSplit so that simplifyTerminatorLeadingToRet need to remove dead instructions by hand. This patch fixes bug49888 by skipping lifetime markers and bitcast instruction and removing dead instructions by hand in simplifyTerminatorLeadingToRet. Reviewed By: junparser Differential Revision: https://reviews.llvm.org/D116330	2022-01-12 15:58:38 +08:00
Mircea Trofin	248d55af3e	[NFC][MLGO] Use LazyCallGraph::Node to track functions. This avoids the InlineAdvisor carrying the responsibility of deleting Function objects. We use LazyCallGraph::Node objects instead, which are stable in memory for the duration of the Module-wide performance of CGSCC passes started under the same ModuleToPostOrderCGSCCPassAdaptor (which is the case here) Differential Revision: https://reviews.llvm.org/D116964	2022-01-11 19:23:47 -08:00
Chuanqi Xu	403772ff1c	[Coroutines] Enhance symmetric transfer for constant CmpInst This fixes bug52896. Simply, some symmetric transfer optimization chances get invalided due to we delete some inlined optimization passes in `822b92a`. This would cause stack-overflow in some situations which should be avoided by the design of coroutine. This patch tries to fix this by transforming the constant CmpInst instruction which was done in the deleted passes. Reviewed By: rjmccall, junparser Differential Revision: https://reviews.llvm.org/D116327	2022-01-12 10:14:37 +08:00
Kevin Athey	7ea175d1c6	Add 'eager-checks' as a module parameter to MSAN. This creates a way to configure MSAN to for eager checks that will be leveraged by the introduction of a clang flag (-fsanitize-memory-param-retval). This is redundant with the existing flag: -mllvm -msan-eager-checks. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D116855	2022-01-11 14:30:49 -08:00
Philip Reames	6bf590d6e8	[InstCombine] Pull out a helper function to simplify upcoming patch [NFC]	2022-01-11 13:05:25 -08:00
Philip Reames	75de92d3e2	[DSE] Seperate malloc+memset -> calloc transform from noop store dedection [NFC] This transformation has nothing to do with whether the store is a noop. The memset becomes a noop, but only after we replace the malloc with a calloc.	2022-01-11 12:55:59 -08:00
Philip Reames	e2e7ecf25d	[DSE] Minor style improvements to calloc formation code [NFC]	2022-01-11 12:18:23 -08:00
Philip Reames	a1bf4ddac6	[DSE] Generalize store null to calloc allocated memory [NFC-ish] This change removes a direct check for calloc-like allocation functions, and instead handles the generic case where we're storing a constant to constant initialized memory. This is mostly to remove the call to isCallocLike, but if someone downstream happens to have an initialized alloc which initializes to e.g. -1, this will also kick in for them. (I don't know of such an example ftr.)	2022-01-11 12:02:51 -08:00
Philip Reames	3712372fa5	[DSE] Style improvements after `3cef3cf` - remove redundant dyn_casts [NFC] I'd been working on exactly the same patch when Nikita landed his, so this patch is basically the style diff between the two. :)	2022-01-11 08:39:18 -08:00
Nikita Popov	94d6263391	[GlobalStatus] Look through non-constexpr casts analyzeGlobal() looks through non-constexpr cast instructions when looking for users. However, this particular place only strips the casts again if they are constexprs. We should be looking through all casts here.	2022-01-11 16:02:35 +01:00
Nikita Popov	3cef3cf02f	[DSE] Check for noalias calls rather than alloc functions For these "visible on unwind/ret" checks we only care about the fact that no other code has access to the pointer (unless it escapes). A noalias call is sufficient for this, it does not have to be a known allocation function. This is basically the same change as D116728, but for DSE rather than LICM.	2022-01-11 12:22:16 +01:00
Florian Hahn	2d67a86b7c	[SCEVExpander] Use IntToPtr for temporary instruction. Use PtrToInt instead Add when creating temporary instructions. The add might get folded away with more sophisticated folding.	2022-01-11 09:40:21 +00:00
Philip Reames	abc787fbf3	Delete a stale comment	2022-01-10 18:18:34 -08:00
Philip Reames	5265ac72c6	[MemoryBuiltin] Add an API for checking if an unused allocation can be removed [NFC] Not all allocation functions are removable if unused. An example of a non-removable allocation would be a direct call to the replaceable global allocation function in C++. An example of a removable one - at least according to historical practice - would be malloc.	2022-01-10 15:43:39 -08:00
Craig Topper	38b30eb2b2	[LowerMatrixIntrinsics] Call getRegisterClassForType before getNumberOfRegisters. getNumberOfRegisters takes a ClassID as it's argument. It shouldn't be passed a bool. Assuming the bool meant vector or not, we should call getRegisterClassForType first. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D116903	2022-01-10 15:32:13 -08:00
Roman Lebedev	82fb4f4b22	[SCEV] Sequential/in-order `UMin` expression As discussed in https://github.com/llvm/llvm-project/issues/53020 / https://reviews.llvm.org/D116692, SCEV is forbidden from reasoning about 'backedge taken count' if the branch condition is a poison-safe logical operation, which is conservatively correct, but is severely limiting. Instead, we should have a way to express those poison blocking properties in SCEV expressions. The proposed semantics is: ``` Sequential/in-order min/max SCEV expressions are non-commutative variants of commutative min/max SCEV expressions. If none of their operands are poison, then they are functionally equivalent, otherwise, if the operand that represents the saturation point* of given expression, comes before the first poison operand, then the whole expression is not poison, but is said saturation point. ``` * saturation point - the maximal/minimal possible integer value for the given type The lowering is straight-forward: ``` compare each operand to the saturation point, perform sequential in-order logical-or (poison-safe!) ordered reduction over those checks, and if reduction returned true then return saturation point else return the naive min/max reduction over the operands ``` https://alive2.llvm.org/ce/z/Q7jxvH (2 ops) https://alive2.llvm.org/ce/z/QCRrhk (3 ops) Note that we don't need to check the last operand: https://alive2.llvm.org/ce/z/abvHQS Note that this is not commutative: https://alive2.llvm.org/ce/z/FK9e97 That allows us to handle the patterns in question. Reviewed By: nikic, reames Differential Revision: https://reviews.llvm.org/D116766	2022-01-10 20:51:26 +03:00
Bryce Wilson	fb936595fa	[MemoryBuiltins] Add field for alignment argument [NFC] There are a few places where the alignment argument for AlignedAllocLike functions was previously hardcoded. This patch adds an getAllocAlignment function and a change to the MemoryBuiltin table to allow alignment arguments to be found generically. This will shortly allow alignment inference on operator new's with align_val params and an extension to Attributor's HeapToStack. The former will follow shortly - I split Bryce's patch for purpose of having the large change be NFC. The later will be reviewed separately. Differential Revision: https://reviews.llvm.org/D116851 (part 1 of 2)	2022-01-10 09:15:20 -08:00
Philip Reames	f4c54683d6	[instcombine] Infer alignment for aligned_alloc with potentially zero size This change removes a previous restriction where we had to prove the allocation performed by aligned_alloc was non-zero in size before using the align parameter to annotate the result. I believe this was conservatism around the C11 specification of this routine which allowed UB when size was not a multiple of alignment, but if so, it was a partial one at best. (ex: align 32, size 16 was equally UB, but not restricted) The spec has since been clarified to require nullptr return, not UB. A nullptr - the documented return for this function on failure for all cases after UB mentioned above was removed - is trivially aligned for any power of two. This isn't totally new behavior even for this transform, we'd previously annotate potentially failing allocs (e.g. huge sizes) meaning we were putting align on potentially null pointers anyways. This change simpy does the same for all failure modes.	2022-01-10 08:48:49 -08:00
Johannes Doerfert	7b39dccbe4	[Attributor][FIX] Ensure "IsExact" is false for non-exact accesses If we look at potentially interfering accesses we need to ensure the "IsExact" flag is set appropriately. Accesses that have an "unknown" size or offset cannot be exact matches and we missed to flag that. Error and test reported by Serguei N. Dmitriev.	2022-01-10 10:09:36 -06:00
Simon Pilgrim	c1f1359882	[PGOInstrumentation] populateEHOperandBundle - earlyout if !isa<CallBase> All paths (that actually do anything) require a successful dyn_cast<CallBase> - so just earlyout if the cast fails Fixes static analyzer nullptr deference warning	2022-01-10 15:34:37 +00:00
Simon Pilgrim	353484d191	[LowerExpectIntrinsic] Use cast<> instead of dyn_cast<> to avoid dereference of nullptr. NFC	2022-01-10 15:34:37 +00:00
David Sherwood	b0922a9dcd	[LoopVectorize] Make VPWidenCanonicalIVRecipe::execute work for scalable vectors The code in VPWidenCanonicalIVRecipe::execute only worked for fixed-width vectors due to the way we generate the values per lane. This patch changes the code to use a combination of vector splats and step vectors to get the same result. This then works for both fixed-width and scalable vectors. Tests that exercise this code path for scalable vectors have been added here: Transforms/LoopVectorize/AArch64/sve-tail-folding.ll Differential Revision: https://reviews.llvm.org/D113180	2022-01-10 14:12:32 +00:00
Nuno Lopes	7b1cb72ad9	[SROA] Switch replacement of dead/UB/unreachable ops from undef to poison SROA has 3 data-structures where it stores sets of instructions that should be deleted: - DeadUsers -> instructions that are UB or have no users - DeadOperands -> instructions that are UB or operands of useless phis - DeadInsts -> "dead" instructions, including loads of uninitialized memory with users The first 2 sets can be RAUW with poison instead of undef. No brainer as UB can be replaced with poison, and for instructions with no users RAUW is a NOP. The 3rd case cannot be currently replaced with poison because the set mixes the loads of uninit memory. I leave that alone for now. Another case where we can use poison is in the construction of vectors from multiple loads. The base vector for the first insertelement is now poison as it doesn't matter as it is fully overwritten by inserts. Differential Revision: https://reviews.llvm.org/D116887	2022-01-10 14:04:26 +00:00
Serge Guelton	d2cc6c2d0c	Use a sorted array instead of a map to store AttrBuilder string attributes Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step. Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions Differential Revision: https://reviews.llvm.org/D116599	2022-01-10 14:49:53 +01:00
Florian Hahn	003ac239d8	[SROA] Reduce the number of times a IRBuilder is constructed (NFC). This patch reduces the number of times IRBuilders need to be constructed in SROA.cpp by passing existing ones by reference to the appropriate places.	2022-01-10 12:09:13 +00:00
Florian Hahn	aecad5828e	[SCEVExpander] Only create trunc when needed. `9345ab3a45` updated generateOverflowCheck to skip creating checks that always evaluate to false. This in turn means that we only need to create TruncTripCount if it is actually used. Sink the TruncTripCount creating into ComputeEndCheck, so it is only created when there's an actual check.	2022-01-10 11:31:27 +00:00
David Sherwood	e3c84fb948	[LoopVectorize] Add support for tail folding using scalable vectors This patch fixes up an issue with InnerLoopVectorizer::getOrCreateVectorTripCount whereby we weren't correctly generating the runtime trip count for scalable vectors when tail-folding. It also removes some asserts in the tail-folding path for cases when the VF is not scalable. In this patch I have only permitted tail-folding to be enabled explicitly for scalable vectors when the user has specified one of the following flags: -prefer-predicate-over-epilogue=predicate-dont-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue For now it's best not to enable tail-folding with scalable vectors for low trip counts or when optimising for code size, since there has been no analysis on whether this is worth it. Various tests have been added here: Transforms/LoopVectorize/AArch64/sve-tail-folding.ll Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll The tests cannot be target independent because they require masked load/store support, i.e. TTI.isLegalMaskedLoad and TTI.isLegalMaskedStore need to return true. Differential Revision: https://reviews.llvm.org/D113003	2022-01-10 10:55:40 +00:00
Florian Hahn	ad1b8772cf	[SCEVExpander] Only create multiplication if needed. `9345ab3a45` updated generateOverflowCheck to skip creating checks that always evaluate to false. This in turn means that we only need to compute \|Step\| * Trip count if the result of the multiplication is actually used. Sink the multiplication into ComputeEndCheck, so it is only created when there's an actual check.	2022-01-10 08:49:25 +00:00
Nikita Popov	92d55e7336	[MemoryBuiltins] Remove isNoAliasFn() in favor of isNoAliasCall() We currently have two similar implementations of this concept: isNoAliasCall() only checks for the noalias return attribute. isNoAliasFn() also checks for allocation functions. We should switch to only checking the attribute. SLC is responsible for inferring the noalias return attribute for non-new allocation functions (with a missing case fixed in `348bc76e35`). For new, clang is responsible for setting the attribute, if -fno-assume-sane-operator-new is not passed. Differential Revision: https://reviews.llvm.org/D116800	2022-01-10 09:18:15 +01:00
Johannes Doerfert	4e8a02e7f4	[Attributor][FIX] Remove assumption that doesn't have to hold There is no guarantee we strip all GEPOperators and the conservative handling doesn't even require us to.	2022-01-09 13:15:53 -06:00
Florian Hahn	1ce01b7dfe	[SCEVExpander] Simplify cleanup, skip sorting by dominance. There is no need to sort inserted instructions by dominance, as the deletion loop still requires RAUW with undef before deleting. Removing instructions in reverse insertion order should still insure that the number of uselist updates is kept to a minimum.	2022-01-09 18:38:41 +00:00
Florian Hahn	7f1bf68d7d	[SCEVExpander] Only check overflow if it is needed. `9345ab3a45` updated generateOverflowCheck to skip creating checks that always evaluate to false. This in turn means that we only need to check for overflows if the result of the multiplication is actually used. Sink the Or for the overflow check into ComputeEndCheck, so it is only created when there's an actual check.	2022-01-09 12:55:41 +00:00
Sanjay Patel	1d21667ce2	[InstCombine] (~A \| B) & (A ^ B) -> ~A & B This is part of a set of 2-variable logic optimizations suggested here: https://lists.llvm.org/pipermail/llvm-dev/2021-December/154470.html The 'not' op must not propagate undef elements of a vector, so this patch creates a new 'full' not, but I am not counting that as an extra-use restriction because it should get folded with the existing value by CSE. https://alive2.llvm.org/ce/z/7v65im	2022-01-09 06:23:51 -05:00
Johannes Doerfert	6c745e04fa	[Attributor][FIX] Ensure order for multiple references into map If we have multiple references into a map we need to ensure the ones created late do not invalidate the ones created early. To do that we need to make sure all but the first are not modifying the map, hence for them the keys have to be present already. Fixes #52875.	2022-01-08 16:59:21 -06:00
Kazu Hirata	435a5a3652	[llvm] Fix bugprone argument comments (NFC) Identified with bugprone-argument-comment.	2022-01-08 11:56:38 -08:00
Philip Reames	2cafbcb560	[instcombine] Key deref vs deref_or_null annotation of allocation sites off nonnull attribute Goal is to remove use of isOpNewLike. I looked at a couple approaches to this, and this turned out to be the cheapest one. Just letting deref_or_null be generated causes a bunch of test diffs, and I couldn't convince myself there wasn't a real regression somewhere. A generic instcombine to convert deref_or_null + nonnull to deref is annoying complicated since you have to mix facts from callsite and declaration while manipulating only existing call site attributes. It just wasn't worth the code complexity. Note that the change in new-delete-itanium.ll is a real regression. If you have a callsite which overrides the builtin status of a nobuiltin declaration, and you don't put the apppriate attributes on that callsite, you may lose the deref fact. I decided this didn't matter; if anyone disagrees, you can add this case to the generic non-null inference.	2022-01-08 10:33:54 -08:00
Simon Pilgrim	274359cf09	[OpenMPOpt] Use cast<> instead of dyn_cast<> to avoid dereference of nullptr. NFC	2022-01-08 13:47:35 +00:00
Florian Hahn	9345ab3a45	[SCEVExpander] Skip creating <u 0 check, which is always false. Unsigned compares of the form <u 0 are always false. Do not create such a redundant check in generateOverflowCheck. The patch introduces a new lambda to create the check, so we can exit early conveniently and skip creating some instructions feeding the check. I am planning to sink a few additional instructions as follow-ups, but I would prefer to do this separately, to keep the changes and diff smaller. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D116811	2022-01-08 10:31:04 +00:00
Kazu Hirata	4e2ec7e38d	[llvm] Remove unused forward declarations (NFC)	2022-01-07 20:00:34 -08:00
Kazu Hirata	b932bdf59f	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-07 17:45:09 -08:00
Arthur Eubanks	f96ab6cc1b	Revert "[Inline] Attempt to delete any discardable if unused functions" This reverts commit `335a3163aa`. Causes crashes when building llvm-test-suite's kc under ReleaseLTO-g.	2022-01-07 13:12:40 -08:00
Arthur Eubanks	335a3163aa	[Inline] Attempt to delete any discardable if unused functions Previously we limited ourselves to only internal/private functions. We can also delete linkonce_odr functions. Minor compile time wins: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=instructions Major memory wins on tramp3d: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=max-rss Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D115545	2022-01-07 11:05:26 -08:00
Philip Reames	dcbc91f40c	[instcombine] Delete duplicate object size logic nstCombine appears to duplicate the allocation size logic used inside getObjectSize when figuring out which attributes are safe to place on the callsite. We can use the existing utility function instead. The test change is correct. With aligned_alloc, a zero alignment is required to return nullptr. As such, deref_or_null is a correct attribute to use. Differential Revision: https://reviews.llvm.org/D116816	2022-01-07 10:32:26 -08:00
Philip Reames	6b0ff0969d	Extract utility function for checking initial value of allocation [NFC, try 2] This is a reoccuring pattern, we can consolidate three copies into one. The main motivation is to reduce usages of isMallocLike. The original commit (which was quickly reverted) didn't account for the allocation function could be an invoke, test coverage for that case added in this commit.	2022-01-07 08:44:08 -08:00
Philip Reames	a3573f203e	Fix a bug in `67a3331e` (cast instead of dyn_cast) The original commit was expected to be NFC, but I didn't account for the fact that invokes could be considered allocation functions. Interestingly, only one builder caught the problem.	2022-01-07 08:25:02 -08:00
Florian Hahn	f395a4f8d5	[SCEVExpand] Only create required predicate checks. Currently generateOverflowCheck always creates code for Step being negative and positive, followed by a select at the end depending on Step's sign. This patch updates the code to only create either the checks for step being positive or negative, if the sign is known. Follow-up to D116696. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D116747	2022-01-07 14:49:02 +00:00
Nikita Popov	348bc76e35	[LibCalls] Infer same attrs for reallocf() as realloc() reallocf() is the same as realloc() but frees the input pointer on failure as well. We can infer the same attributes. Also combine some cases that infer the same attributes and are logically related.	2022-01-07 09:51:15 +01:00
Kazu Hirata	2aed08131d	[llvm] Use true/false instead of 1/0 (NFC) Identified with modernize-use-bool-literals.	2022-01-07 00:39:14 -08:00
Nikita Popov	c8189da201	[ModuleUtils] Remove dead arg from filterDeadComdatFunctions() (NFC) The module argument is no longer used.	2022-01-07 09:12:16 +01:00
Philip Reames	c6a0c1585a	Revert "Extract utility function for checking initial value of allocation [NFC]" This reverts commit `9ce30fe86f`. Appears to be causing a problem on a buildbot, revert while investigating. https://green.lab.llvm.org/green//job/clang-stage1-RA/26818/consoleFull#-1502953973d489585b-5106-414a-ac11-3ff90657619c	2022-01-06 19:05:51 -08:00
Philip Reames	9ce30fe86f	Extract utility function for checking initial value of allocation [NFC] This is a reoccuring pattern, we can consolidate three copies into one. The main motivation is to reduce usages of isMallocLike.	2022-01-06 18:02:14 -08:00
Philip Reames	5d1cfd4348	Remove unused LookThroughBitCast param in isXAllocLike functions [NFC] This parameter took the non-default value exactly twice, and neither had semantic effect.	2022-01-06 18:02:13 -08:00
Philip Reames	7052670e96	Move getMallocAllocatedType and getMallocArraySize to GlobalOpt [NFC] These are implementation details of the global-opt transform and not easily reuseable, so remove them from the analysis header.	2022-01-06 18:02:13 -08:00
Philip Reames	67a3331e4f	Inline extractMallocCall to sole use and delete [NFC]	2022-01-06 18:02:13 -08:00
Congzhe Cao	c251bfc3b9	[LoopInterchange] Remove a limitation in LoopInterchange legality There was a limitation in legality that in the original inner loop latch, no instruction was allowed between the induction variable increment and the branch instruction. This is because we used to split the inner latch at the induction variable increment instruction. Since now we have split at the inner latch branch instruction and have properly duplicated instructions over to the split block, we remove this limitation. Please refer to the test case updates to see how we now interchange loops where instructions exist between the induction variable increment and the branch instruction. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D115238	2022-01-06 15:56:32 -05:00
Alexey Bataev	d130df544d	[SLP]Improve reordering for the nodes beeing used in alternate vectorization. No need to include the order of the scalars beeing used as part of the alternate vectorization into account when trying to reorder the whole graph. Such elements better to reorder in the following phase because the subtree still ends up in shuffle. Part of D116688, fixes the regression in D116690. Differential Revision: https://reviews.llvm.org/D116740	2022-01-06 11:18:57 -08:00
Alexey Bataev	7cb19fe493	[SLP]Initialize the lane with the given value instead of default 0. There is a bug in the reordering analysis stage. If the element with the given hash is not added to the map but has the same number of APOs and instructions with same parent, but different instruction opcode, it will be initalized with default values and then the counter is increased by 1. But the lane is not updated and default to 0 instead of the actual `Lane` value. It leads to the fact that the analysis is useless in many cases and default to lane 0 instead of actual lane with the minimum amount of APO operands. Differential Revision: https://reviews.llvm.org/D116690	2022-01-06 10:57:11 -08:00
Stanislav Mekhanoshin	0b5340acb7	[InstCombine] Factor out a common pattern match used 3 times. NFC. This is needed for the next patch which will add more patterns to the same match. Differential Revision: https://reviews.llvm.org/D116194	2022-01-06 10:23:50 -08:00
Simon Pilgrim	5e7912d80f	[LowerMatrixIntrinsics] writeFnName - don't dereference a dyn_cast<>. NFC. dyn_cast<> can return null - use cast<> instead to assert the cast is valid before dereferencing the casted pointer. Fixes static-analyzer null dereference warning.	2022-01-06 17:09:32 +00:00
Philip Reames	916b35e783	[unroll] Strengthen verification of analysis updates under expensive asserts I am suspecting a bug around updates of loop info for unreachable exits, but don't have a test case. Running this locally on make check didn't reveal anything, we'll see if the expensive checks bots find it.	2022-01-06 08:51:50 -08:00
Nikita Popov	918015c9ba	[EarlyCSE] Support opaque pointers Explicitly check the load/store value type, because this is no longer implicitly checked through the pointer type.	2022-01-06 17:08:50 +01:00
Simon Pilgrim	5bbcff6181	[MemCpyOptimizer] hasUndefContents - only look for underlying object if we've found an alloca Provides an early-out if we fail to find an AllocaInst, and avoids a static analyzer warning about null dereferencing.	2022-01-06 15:15:03 +00:00
Simon Pilgrim	8399fa673b	[MemCpyOptimizer] Use auto* for cast<> results (style). NFC.	2022-01-06 15:15:03 +00:00
Alexey Bataev	700997aef8	[SLP][NFC]Fix comment, NFC.	2022-01-06 06:38:29 -08:00
Simon Pilgrim	6638303869	[LoopFlatten] checkOverflow - use cast<> instead of dyn_cast<> to avoid dereference of nullptr. Fix static analysis warning by using cast<> instead of dyn_cast<> as both isa<> and isGuaranteedToExecuteForEveryIteration expect a non-null Instruction pointer.	2022-01-06 14:13:50 +00:00
Nikita Popov	ddd9ec667a	[LICM] Update comments related to escape check (NFC) The comments here were outdated and a bit confusing without the knowledge that we're only guarding against reads on unwind.	2022-01-06 14:45:48 +01:00
Nikita Popov	41a522779d	[LICM] Check for noalias call instead of alloc like fn When determining whether the memory is local to the function (and we can thus introduce spurious writes without thread-safety issues), check for a noalias call rather than the hardcoded list of memory allocation functions. Noalias calls are the more general way to determine allocation functions, as long as we're only interested in the property that the returned value is distinct from any other accessible memory. Differential Revision: https://reviews.llvm.org/D116728	2022-01-06 14:38:19 +01:00
Sander de Smalen	9cbe000df2	[LV] Load/store/reduction type must be sized, assert it. This addresses a suggestion by @nikic on D115356.	2022-01-06 12:35:27 +00:00
Florian Hahn	86d113a8b8	[SCEVExpand] Do not create redundant 'or false' for pred expansion. This patch updates SCEVExpander::expandUnionPredicate to not create redundant 'or false, x' instructions. While those are trivially foldable, they can be easily avoided and hinder code that checks the size/cost of the generated checks before further folds. I am planning on look into a few other similar improvements to code generated by SCEVExpander. I remember a while ago @lebedev.ri working on doing some trivial folds like that in IRBuilder itself, but there where concerns that such changes may subtly break existing code. Reviewed By: reames, lebedev.ri Differential Revision: https://reviews.llvm.org/D116696	2022-01-06 11:52:19 +00:00
Nikita Popov	32808cfb24	[IR] Track users of comdats Track all GlobalObjects that reference a given comdat, which allows determining whether a function in a comdat is dead without scanning the whole module. In particular, this makes filterDeadComdatFunctions() have complexity O(#DeadFunctions) rather than O(#SymbolsInModule), which addresses half of the compile-time issue exposed by D115545. Differential Revision: https://reviews.llvm.org/D115864	2022-01-06 09:13:58 +01:00
David Blaikie	31b79b86ee	Revert "Remove unused variable (-Wunused)" Patch that removed the use of this variable was reverted in `8ade3d43a3` This reverts commit `3988a06d86`.	2022-01-05 20:43:30 -08:00
Congzhe Cao	8ade3d43a3	Revert "[LoopInterchange] Remove a limitation in LoopInterchange legality" This reverts commit `15702ff9ce` while I investigate a ppc build bot failure at https://lab.llvm.org/buildbot#builders/36/builds/16051.	2022-01-05 23:34:36 -05:00
David Blaikie	3988a06d86	Remove unused variable (-Wunused)	2022-01-05 20:29:35 -08:00
Congzhe Cao	15702ff9ce	[LoopInterchange] Remove a limitation in LoopInterchange legality There was a limitation in legality that in the original inner loop latch, no instruction was allowed between the induction variable increment and the branch instruction. This is because we used to split the inner latch at the induction variable increment instruction. Since now we have split at the inner latch branch instruction and have properly duplicated instructions over to the split block, we remove this limitation. Please refer to the test case updates to see how we now interchange loops where instructions exist between the induction variable increment and the branch instruction. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D115238	2022-01-05 22:37:54 -05:00
Andrew Browne	4e173585f6	[DFSan] Add option for conditional callbacks. This allows DFSan to find tainted values used to control program behavior. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D116207	2022-01-05 15:07:09 -08:00
Quentin Colombet	cdbad62c52	[ADCE][NFC] Batch DT updates together This patch delayed the updates of the dominator tree to the very end of the pass instead of doing that in small increments after each basic block. This improves the runtime of the pass in particular in pathological cases because now the updater sees the full extend of the updates and can decide whether it is faster to apply the changes incrementally or just recompute the full tree from scratch. Put differently, thanks to this patch, we can take advantage of the improvements that Chijun Sima <simachijun@gmail.com> made in the dominator tree updater a while ago with commit 32fd196cbf4d: "Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929)". This change is NFC but can improve the runtime of the compiler dramatically in some pathological cases (where the pass was pushing a lot (several thousands) of small updates (less than 6)). For instance on the motivating example we went from 300+ sec to less than a second. Differential Revision: https://reviews.llvm.org/D116610	2022-01-05 14:05:20 -08:00
Alexey Bataev	dd83befe33	[SLP][NFC]Improved isAltShuffle by comparing instructions instead of opcodes, NFC. NFC part of D115955.	2022-01-05 12:30:13 -08:00
Roman Lebedev	2353e1c87b	[NFC][SimplifyCFG] Extract `performBlockTailMerging()` out of `tailMergeBlocksWithSimilarFunctionTerminators()`	2022-01-05 22:59:39 +03:00
Philip Reames	356ada9df4	Fix accidental usage of cast<> instead of dyn_cast<> in `58a0e44`	2022-01-05 11:00:10 -08:00
Philip Reames	58a0e449e1	[instcombine] Allow sinking of calls with known writes to uses If we have a call whose only side effect is a write to a location which is known to be dead, we can sink said call to the users of the call's result value. This is analogous to the recent changes to delete said calls if unused, but framed as a sinking transform instead. Differential Revision: https://reviews.llvm.org/D116200	2022-01-05 10:37:22 -08:00
Sanjay Patel	e2165e0968	[InstCombine] remove trunc user restriction for match of bswap This does not appear to cause any problems, and it fixes #50910 Extra tests with a trunc user were added with: `3a239379` ...but they don't match either way, so there's an opportunity to improve the matching further.	2022-01-05 13:04:11 -05:00
Philip Reames	c16fd6a376	Rename doesNotReadMemory to onlyWritesMemory globally [NFC] The naming has come up as a source of confusion in several recent reviews. onlyWritesMemory is consist with onlyReadsMemory which we use for the corresponding readonly case as well.	2022-01-05 08:52:55 -08:00
Florian Hahn	2ee8154816	[LV] Don't use getVPSingleValue for VPWidenMemoryInstRecipe (NFC). VPWidenMemoryInstructionRecipe is a VPValue, so this can be passed directly, instead of relying on getVPSingleValue.	2022-01-05 13:51:50 +00:00
Nikita Popov	6e474d3308	[GlobalOpt][Evaluator] Fix off by one error in bounds check (PR53002) We should bail out if the index is >= the size, not > the size. Fixes https://github.com/llvm/llvm-project/issues/53002.	2022-01-05 14:06:02 +01:00
Sander de Smalen	95a93722db	[LV] Remove what seems like stale code in collectElementTypesForWidening. This was originally added in rG22174f5d5af1eb15b376c6d49e7925cbb7cca6be although that patch doesn't really mention any reasons for ignoring the pointer type in this calculation if the memory access isn't consecutive. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D115356	2022-01-05 12:20:59 +00:00
Nikita Popov	99c6b12b92	[ConstantFolding] Unify handling of load from uniform value There are a number of places that specially handle loads from a uniform value where all the bits are the same (zero, one, undef, poison), because we a) don't care about the load offset in that case b) it bypasses casts that might not be legal generally but do work with uniform values. We had multiple implementations of this, with a different set of supported values each time. This replaces two usages with a more complete helper. Other usages will be replaced separately, because they have larger impact. This is part of D115924.	2022-01-05 12:30:46 +01:00
Benjamin Kramer	5f0a349738	Revert "Revert "[InferAttrs] Add writeonly to all the math functions"" This reverts commit `29b6e967f3`. The bug it found in PartiallyInlineLibCalls was fixed in `c8ffc73350`.	2022-01-05 12:16:35 +01:00
Benjamin Kramer	c8ffc73350	[PartiallyInlineLibCalls] Don't crash when there's a writeonly attribute on the call readnone subsumes writeonly, so just swap out the attributes. The verifier doesn't allow us to have both on a call.	2022-01-05 12:16:26 +01:00
Florian Hahn	65c4d6191f	[VPlan] Add VPCanonicalIVPHIRecipe, partly retire createInductionVariable. At the moment, the primary induction variable for the vector loop is created as part of the skeleton creation. This is tied to creating the vector loop latch outside of VPlan. This prevents from modeling the whole vector loop in VPlan, which in turn is required to model preheader and exit blocks in VPlan as well. This patch introduces a new recipe VPCanonicalIVPHIRecipe to represent the primary IV in VPlan and CanonicalIVIncrement{NUW} opcodes for VPInstruction to model the increment. This allows us to partly retire createInductionVariable. At the moment, a bit of patching up is done after executing all blocks in the plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D113223	2022-01-05 10:46:06 +00:00
Sjoerd Meijer	e550dfa4a6	Silence a few unused variable warnings. NFC.	2022-01-05 09:15:07 +00:00
Martin Storsjö	29b6e967f3	Revert "[InferAttrs] Add writeonly to all the math functions" This reverts commit `ea75be3d9d` and `1eb5b6e850`. That commit caused crashes with compilation e.g. like this (not fixed by the follow-up commit): $ cat sqrt.c float a; b() { sqrt(a); } $ clang -target x86_64-linux-gnu -c -O2 sqrt.c Attributes 'readnone and writeonly' are incompatible! %sqrtf = tail call float @sqrtf(float %0) #1 in function b fatal error: error in backend: Broken function found, compilation aborted!	2022-01-05 11:12:19 +02:00
Nikita Popov	00e6869463	[MemCpyOpt] Look through pointer casts when checking capture The user scanning loop above looks through pointer casts, so we also need to strip pointer casts in the capture check. Previously the source was incorrectly considered not captured if a bitcast was passed to the call.	2022-01-05 09:50:33 +01:00
Nikita Popov	487a34ed9d	[MemCpyOpt] Make capture check during call slot optimization more precise Call slot optimization is currently supposed to be prevented if the call can capture the source pointer. Due to an implementation bug, this check currently doesn't trigger if a bitcast of the source pointer is passed instead. I'm somewhat afraid of the fallout of fixing this bug (due to heavy reliance on call slot optimization in rust), so I'd like to strengthen the capture reasoning a bit first. In particular, I believe that the capture is fine as long as a) the call itself cannot depend on the pointer identity, because neither dest has been captured before/at nor src before the call and b) there is no potential use of the captured pointer before the lifetime of the source alloca ends, either due to lifetime.end or a return from a function. At that point the potentially captured pointer becomes dangling. Differential Revision: https://reviews.llvm.org/D115615	2022-01-05 09:39:25 +01:00
Nikita Popov	787f86e68c	[GlobalOpt][Evaluator] Don't create bitcast for same type (PR52994) isBitOrNoopPointerCastable() returns true if the types are the same, but it's not actually possible to create a bitcast for all such types. The assumption seems to be that the user will omit creating the cast in that case, as it is unnecessary. Fixes https://github.com/llvm/llvm-project/issues/52994.	2022-01-05 09:17:07 +01:00
Chuanqi Xu	e627f4ce0d	[NFC] [Coroutines] Rename ReuseFrameSlot to OptimizeFrame We could use the variable as a flag to indicate if the optimization is on.	2022-01-05 11:40:27 +08:00
Fangrui Song	1eb5b6e850	[InferAttrs] If readonly is already set, set readnone instead of writeonly D116426 may lead to an assertion failure `Attributes 'readonly and writeonly' are incompatible!` if the builtin function already has `readonly`.	2022-01-04 18:59:35 -08:00
Chuanqi Xu	c75cedc237	[Coroutines] Set presplit attribute in Clang and mlir This fixes bug49264. Simply, coroutine shouldn't be inlined before CoroSplit. And the marker for pre-splited coroutine is created in CoroEarly pass, which ran after AlwaysInliner Pass in O0 pipeline. So that the AlwaysInliner couldn't detect it shouldn't inline a coroutine. So here is the error. This patch set the presplit attribute in clang and mlir. So the inliner would always detect the attribute before splitting. Reviewed By: rjmccall, ezhulenev Differential Revision: https://reviews.llvm.org/D115790	2022-01-05 10:25:02 +08:00
Philip Reames	0b09313cd5	[funcattrs] Infer writeonly argument attribute [part 2] This builds on the code from D114963, and extends it to handle calls both direct and indirect. With the revised code structure (from series of previously landed NFCs), this is pretty straight forward. One thing to note is that we can not infer writeonly for arguments which might be captured. If the pointer can be read back by the caller, and then read through, we have no way to track that. This is the same restriction we have for readonly, except that we get no mileage out of the "callee can be readonly" exception since a writeonly param on a readonly function is either a) readnone or b) UB. This means we can't actually infer much unless nocapture has already been inferred. Differential Revision: https://reviews.llvm.org/D115003	2022-01-04 09:07:54 -08:00
Benjamin Kramer	ea75be3d9d	[InferAttrs] Add writeonly to all the math functions All of these functions would be `readnone`, but can't be on platforms where they can set `errno`. A `writeonly` function with no pointer arguments can only write (but never read) global state. Writeonly theoretically allows these calls to be CSE'd (a writeonly call with the same arguments will always result in the same global stores) or hoisted out of loops, but that's not implemented currently. There are a few functions in this list that could be `readnone` instead of `writeonly`, if someone is interested. Differential Revision: https://reviews.llvm.org/D116426	2022-01-04 16:58:05 +01:00
serge-sans-paille	9290ccc3c1	Introduce the AttributeMask class This class is solely used as a lightweight and clean way to build a set of attributes to be removed from an AttrBuilder. Previously AttrBuilder was used both for building and removing, which introduced odd situation like creation of Attribute with dummy value because the only relevant part was the attribute kind. Differential Revision: https://reviews.llvm.org/D116110	2022-01-04 15:37:46 +01:00
Rosie Sumpter	961f51fdf0	[LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions without loads/stores For loops that contain in-loop reductions but no loads or stores, large VFs are chosen because LoopVectorizationCostModel::getSmallestAndWidestTypes has no element types to check through and so returns the default widths (-1U for the smallest and 8 for the widest). This results in the widest VF being chosen for the following example, float s = 0; for (int i = 0; i < N; ++i) s += (float) i*i; which, for more computationally intensive loops, leads to large loop sizes when the operations end up being scalarized. In this patch, for the case where ElementTypesInLoop is empty, the widest type is determined by finding the smallest type used by recurrences in the loop instead of falling back to a default value of 8 bits. This results in the cost model choosing a more sensible VF for loops like the one above. Differential Revision: https://reviews.llvm.org/D113973	2022-01-04 10:12:57 +00:00
Nikita Popov	bbeaf2aac6	[GlobalOpt][Evaluator] Rewrite global ctor evaluation (fixes PR51879) Global ctor evaluation currently models memory as a map from Constant* to Constant. For this to be correct, it is required that there is only a single Constant referencing a given memory location. The Evaluator tries to ensure this by imposing certain limitations that could result in ambiguities (by limiting types, casts and GEP formats), but ultimately still fails, as can be seen in PR51879. The approach is fundamentally fragile and will get more so with opaque pointers. My original thought was to instead store memory for each global as an offset => value representation. However, we also need to make sure that we can actually rematerialize the modified global initializer into a Constant in the end, which may not be possible if we allow arbitrary writes. What this patch does instead is to represent globals as a MutableValue, which is either a Constant* or a MutableAggregate. The mutable aggregate exists to allow efficient mutation of individual aggregate elements, as mutating an element on a Constant would require interning a new constant. When a write to the Constant is made, it is converted into a MutableAggregate* as needed. I believe this should make the evaluator more robust, compatible with opaque pointers, and a bit simpler as well. Fixes https://github.com/llvm/llvm-project/issues/51221. Differential Revision: https://reviews.llvm.org/D115530	2022-01-04 09:30:54 +01:00
Philip Reames	7203140748	Revert "[unroll] Prune all but first copy of invariant exit" This reverts commit `9bd22595ba`. Seeing some bot failures which look plausibly connected. Revert while investigating/waiting for bots to stablize. e.g. https://lab.llvm.org/buildbot#builders/36/builds/15933	2022-01-03 11:57:35 -08:00
Craig Topper	cbcbbd6ac8	[ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC This function returns an upper bound on the number of bits needed to represent the signed value. Use "Max" to match similar functions in KnownBits like countMaxActiveBits. Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old name around to keep this patch size down. Will do a bulk rename as follow up. Rename KnownBits::countMaxSignedBits->countMaxSignificantBits. Reviewed By: lebedev.ri, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D116522	2022-01-03 11:33:30 -08:00
Kazu Hirata	e5947760c2	Revert "[llvm] Remove redundant member initialization (NFC)" This reverts commit `fd4808887e`. This patch causes gcc to issue a lot of warnings like: warning: base class ‘class llvm::MCParsedAsmOperand’ should be explicitly initialized in the copy constructor [-Wextra]	2022-01-03 11:28:47 -08:00
Craig Topper	14849fe554	[SimplifyCFG] Make use of ComputeMinSignedBits and KnownBits::getBitWidth. NFC	2022-01-03 10:08:14 -08:00
Philip Reames	9bd22595ba	[unroll] Prune all but first copy of invariant exit If we have an exit which is controlled by a loop invariant condition and which dominates the latch, we know only the copy in the first unrolled iteration can be taken. All other copies are dead. The change itself is pretty straight forward, but let me add two points of context: * I'd have expected other transform passes to catch this after unrolling, but I'm seeing multiple examples where we get to the end of O2/O3 without simplifying. * I'd like to do a stronger change which did CSE during unroll and accounted for invariant expressions (as defined by SCEV instead of trivial ones from LoopInfo), but that doesn't fit cleanly into the current code structure. Differential Revision: https://reviews.llvm.org/D116496	2022-01-03 09:55:19 -08:00
Nikita Popov	730414b341	[CodeExtractor] Remove unnecessary explicit attribute handling (NFC) The nounwind and uwtable attributes will get handled as part of the loop below as well, there is no need to special-case them here.	2022-01-03 14:23:25 +01:00
Nikita Popov	587495ffa1	[CodeExtractor] Separate function from param/ret attributes (NFC) This list is confusing because it conflates functions attributes (which are either extractable or not) and other attribute kinds, which are simply irrelevant for this code.	2022-01-03 14:07:12 +01:00
Florian Hahn	791523bae6	[LV] Set loop metadata after VPlan execution (NFC). Setting the loop metadata for the vector loop after VPlan execution allows generating the full loop body during VPlan execution. This is in preparation for D113224.	2022-01-03 09:59:50 +00:00
Nikita Popov	330cb03269	[LoadStoreVectorizer] Check for guaranteed-to-transfer (PR52950) Rather than checking for nounwind in particular, make sure the instruction is guaranteed to transfer execution, which will also handle non-willreturn calls correctly. Fixes https://github.com/llvm/llvm-project/issues/52950.	2022-01-03 10:55:47 +01:00
Nikita Popov	3478d64ee4	[DSE] Check for whole object overwrite even if dead store size not known If the killing store overwrites the whole object, we know that the preceding store is dead, regardless of the accessed offset or size. This case was previously only handled if the size of the dead store was also known. This allows us to perform conventional DSE for calls that write to an argument (but without known size). Differential Revision: https://reviews.llvm.org/D116267	2022-01-03 09:36:44 +01:00
Florian Hahn	6e0a333f71	[LV] Use Builder.CreateVectorReverse directly. (NFC) IRBuilder::CreateVectorReverse already handles all cases required by LoopVectorize. It can be used directly instead of reverseVector.	2022-01-02 19:09:30 +00:00
Kazu Hirata	7e163afd9e	Remove redundant void arguments (NFC) Identified by modernize-redundant-void-arg.	2022-01-02 10:20:19 -08:00
Florian Hahn	b1a333f0fe	[VPlan] Don't consider VPWidenCanonicalIVRecipe phi-like. VPWidenCanonicalIVRecipe does not create PHI instructions, so it does not need to be placed in the phi section of a VPBasicBlock. Also tidies the code so the WidenCanonicalIV recipe and the compare/lane-masks are created in the header. Discussed D113223. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116473	2022-01-02 12:48:17 +00:00
Kazu Hirata	fd4808887e	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-01 16:18:18 -08:00
Florian Hahn	7305798049	[VPlan] Remove VPWidenPHIRecipe constructor without start value (NFC). This was suggested as a separate cleanup in recent reviews.	2022-01-01 13:53:48 +00:00
Kazu Hirata	732e8968a8	[Scalar] Remove a redundant declaration (NFC) InitializePasses.h contains the proper declaration. Identified with readability-redundant-declaration.	2021-12-31 14:02:29 -08:00
Florian Hahn	e2f1c4c706	[LV] Turn check for unexpected VF into assertion (NFC). VF should always be non-zero in widenIntOrFpInduction. Turn check into assertion.	2021-12-31 13:19:03 +00:00
Ellis Hoag	a699b2f1c0	[InstrProf] Mark counters as used in debug correlation mode In debug info correlation mode we do not emit the data globals so we need to explicitly mark the counter globals as used so they don't get stripped. Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D115981	2021-12-30 14:50:45 -08:00
Alexey Bataev	e0efedd2c3	[SLP][NFC]Fix non-determinism in reordering, NFC. Need to clear CurrentOrder order mask if it is determined that extractelements form identity order and need to use a vector-like construct when iterating over ordered entries in the reorderTopToBottom function.	2021-12-30 13:10:25 -08:00
Benjamin Kramer	4683ce2cd8	[InferAttrs] Give strnlen the same attributes as strlen This moves the only string function out of the big list of math funcs. And let's us CSE strnlen calls.	2021-12-30 20:43:43 +01:00
Sanjay Patel	0c6979b2d6	[InstCombine] fold opposite shifts around an add ((X << C) + Y) >>u C --> (X + (Y >>u C)) & (-1 >>u C) https://alive2.llvm.org/ce/z/DY9DPg This replaces a shift with an 'and', and in the case where the add has a constant operand, it eliminates both shifts. As noted in the TODO comment, we already have this fold when the shifts are in the opposite order (and that code handles bitwise logic ops too). Fixes #52851	2021-12-30 12:01:06 -05:00
Sanjay Patel	fd9cd3408b	Revert "[InstCombine] fold opposite shifts around an add" This reverts commit `2e3e0a5c28`. Some unintended diffs snuck into this patch.	2021-12-30 11:54:55 -05:00
Sanjay Patel	2e3e0a5c28	[InstCombine] fold opposite shifts around an add ((X << C) + Y) >>u C --> (X + (Y >>u C)) & (-1 >>u C) https://alive2.llvm.org/ce/z/DY9DPg This replaces a shift with an 'and', and in the case where the add has a constant operand, it eliminates both shifts. As noted in the TODO comment, we already have this fold when the shifts are in the opposite order (and that code handles bitwise logic ops too). Fixes #52851	2021-12-30 11:52:29 -05:00
Nuno Lopes	84b285d6eb	[GVN] Set phi entries of unreachable predecessors to poison instead of undef This matches NewGVN's behavior.	2021-12-30 14:47:24 +00:00
Fangrui Song	b69fe48ccf	[IROutliner] Move global namespace cl::opt inside llvm::	2021-12-30 01:12:55 -08:00
Sanjay Patel	6c716c8589	[InstCombine] add more folds for unsigned overflow checks ((Op1 + C) & C) u< Op1 --> Op1 != 0 ((Op1 + C) & C) u>= Op1 --> Op1 == 0 Op0 u> ((Op0 + C) & C) --> Op0 != 0 Op0 u<= ((Op0 + C) & C) --> Op0 == 0 https://alive2.llvm.org/ce/z/iUfXJN https://alive2.llvm.org/ce/z/caAtjj define i1 @src(i8 %x, i8 %y) { ; the add/mask must be with a low-bit mask (0x01ff...) %y1 = add i8 %y, 1 %pop = call i8 @llvm.ctpop.i8(i8 %y1) %ismask = icmp eq i8 %pop, 1 call void @llvm.assume(i1 %ismask) %a = add i8 %x, %y %m = and i8 %a, %y %r = icmp ult i8 %m, %x ret i1 %r } define i1 @tgt(i8 %x, i8 %y) { %r = icmp ne i8 %x, 0 ret i1 %r } I suspect this can be generalized in some way, but this is the pattern I'm seeing in a motivating test based on issue #52851.	2021-12-29 15:53:56 -05:00
Florian Hahn	ba9016a030	[LV] Replace redundant tail-fold check with assert (NFC). The code path can only be reached when folding the tail, so turn the check into an assertion.	2021-12-29 19:00:41 +01:00
Nuno Lopes	680d409561	[NewGVN] Use poison instead of undef to represent unreachable values This enables more simplifications and gets us closer to removing undef. ping @alinas	2021-12-29 15:51:29 +00:00
Nuno Lopes	6d702a1e6a	[NewGVN] Prefer poison to undef when ranking operands ping @alinas	2021-12-29 12:38:14 +00:00
Johannes Doerfert	944aa0421c	Reapply "[OpenMP][NFCI] Embed the source location string size in the ident_t" This reverts commit `73ece231ee` and reapplies `7bfcdbcbf3` with mlir changes. Also reverts commit `423ba12971` and includes the unit test changes of `16da214004`.	2021-12-29 01:10:38 -06:00
Mehdi Amini	73ece231ee	Revert "[OpenMP][NFCI] Embed the source location string size in the ident_t" This reverts commit `7bfcdbcbf3`. Broke MLIR build	2021-12-29 06:57:36 +00:00
Johannes Doerfert	5602c866c0	[Attributor] Look through allocated heap memory AAPointerInfo, and thereby other places, can look already through internal global and stack memory. This patch enables them to look through heap memory returned by functions with a `noalias` return. In the future we can look through `noalias` arguments as well but that will require AAIsDead to learn that such memory can be inspected by the caller later on. We also need teach AAPointerInfo about dominance to actually deal with memory that might not be `null` or `undef` initialized. D106397 is a first step in that direction already. Reviewed By: kuter Differential Revision: https://reviews.llvm.org/D109170	2021-12-29 00:21:36 -06:00
Johannes Doerfert	3e0c512ce6	[OpenMP] Simplify all stores in the device code Similar to loads, we want to be aggressive when it comes to store simplification. Not everything in LLVM handles dead stores well when address space casts are involved, we can simply ask the Attributor to do it for us though. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D109998	2021-12-29 00:19:38 -06:00
Johannes Doerfert	7bfcdbcbf3	[OpenMP][NFCI] Embed the source location string size in the ident_t One of the unused ident_t fields now holds the size of the string (=const char *) field so we have an easier time dealing with those in the future. Differential Revision: https://reviews.llvm.org/D113126	2021-12-28 23:53:29 -06:00
Johannes Doerfert	6e2fcf8513	[Attributor][FIX] Ensure store uses are correlated with reloads While we skipped uses in stores if we can find all copies of the value when the memory is loaded, we did not correlate the use in the store with the use in the load. So far this lead to less precise results in the offset calculations which prevented deductions. With the new EquivalentUseCB callback argument the user of checkForAllUses can be informed of the correlation and act on it appropriately. Differential Revision: https://reviews.llvm.org/D109662	2021-12-28 23:53:29 -06:00
Johannes Doerfert	9f04a0ea43	[OpenMP][FIX] Make AAExecutionDomain deterministic	2021-12-28 23:53:29 -06:00
Johannes Doerfert	ba70f3a5d9	[OpenMP][FIX] Make heap2shared deterministic Issue #52875 reported non-determinism, this is the first step to avoid it. We iterate over MallocCalls so we should keep the order stable.	2021-12-28 23:53:28 -06:00
Johannes Doerfert	7de5da2a67	[OpenMP][NFC] Move address space enum into OMPConstants header	2021-12-28 23:53:28 -06:00
Kazu Hirata	5a667c0e74	[llvm] Use nullptr instead of 0 (NFC) Identified with modernize-use-nullptr.	2021-12-28 08:52:25 -08:00
Florian Hahn	9d297c7894	[VPlan] Add prepareToExecute to set up live-ins (NFC). This patch adds a new prepareToExecute helper to set up live-ins, so VPTransformState doesn't need to hold values like TripCount. This also requires making the trip count operand for ActiveLaneMask explicit in VPlan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116320	2021-12-28 17:49:47 +01:00
Sanjay Patel	0edf99950e	[Analysis] allow caller to choose signed/unsigned when computing constant range We should not lose analysis precision if an 'add' has both no-wrap flags (nsw and nuw) compared to just one or the other. This patch is modeled on a similar construct that was added with D59386. I don't think it is possible to expose a problem with an unsigned compare because of the way this was coded (nuw is handled first). InstCombine has an assert that fires with the example from: https://github.com/llvm/llvm-project/issues/52884 ...because it was expecting InstSimplify to handle this kind of pattern with an smax. Fixes #52884 Differential Revision: https://reviews.llvm.org/D116322	2021-12-28 09:45:37 -05:00
Florian Hahn	c2275278c6	[VPlan] Add abstract base class for header phi recipes (NFC). Not all header phis widen the phi, e.g. like the new VPCanonicalIVPHIRecipe in D113223. To let those recipes also inherit from a phi-like base class, add a more generic VPHeaderPHIRecipe abstract base class. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116304	2021-12-28 15:37:47 +01:00
Nikita Popov	693b1f1e1b	[InstCombine] Skip some GEP folds under opaque pointers In their current form, these folds are fundamentally incompatible with opaque pointers. We should add a separate set of folds for the canonicalization of the GEP source type. For now, skip these folds.	2021-12-28 15:32:11 +01:00
Nikita Popov	e6f31f4e51	[InstCombine] Use GEP type instead of pointee type The GEP source type is independent of whether it is a scalar or vector GEP, as such we can simply preserve it.	2021-12-28 14:57:43 +01:00
Nikita Popov	7d850a0c4d	[InstCombine] Make indexed compare fold opaque ptr compatible We need to make sure that the GEP source element types match. A caveat here is that the used GEP source element type can be arbitrary if no offset is stripped from the original GEP -- the transform is somewhat inconsistent in that it always starts from a GEP, but might not actually look through it if it has multiple indices.	2021-12-28 11:47:20 +01:00
Florian Hahn	c66286ed59	[LV] Use specific first-order recurrence recipe as arg type (NFC). Required for further refactoring in D116304.	2021-12-28 10:58:21 +01:00
Nikita Popov	30a12f3f63	[InstCombine] Fix GEP with same index comparison with opaque pointers We need to also check that the source element type is the same, otherwise the indices may have different meaning. The added addrspacecast demonstrates that we do still need to check the pointer type.	2021-12-28 09:23:28 +01:00
Joseph Huber	6e220296d7	[OpenMP] Use alignment information in HeapToShared This patch uses the return alignment attribute now present in the `__kmpc_alloc_shared` runtime call to set the alignment of the shared memory global created to replace it. Depends on D115971 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D116319	2021-12-27 16:58:27 -05:00
Joseph Huber	38fc89623b	[Attributor][Fix] Add alignment return attribute to HeapToStack This patch changes the HeapToStack optimization to attach the return alignment attribute information to the created alloca instruction. This would cause problems when replacing the heap allocation with an alloca did not respect the alignment of the original heap allocation, which would typically be aligned on an 8 or 16 byte boundary. Malloc calls now contain alignment attributes, so we can use that information here. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115888	2021-12-27 16:58:23 -05:00
Nikita Popov	d122d91e37	[InstCombine] Fix GEP of GEP fold with opaque pointers We need to check that result and source element types match, as this is no longer automatically enforced with opaque pointers.	2021-12-27 14:56:41 +01:00
Nikita Popov	de2ed8e38e	[InstCombine] Extract GEP of GEP fold into separate function This change may not be entirely NFC, because a number of early returns will now only early return from this particular fold, rather than the whole visitGetElementPtr() implementation. This is also the reason why I'm doing this change, as I don't think this was intended.	2021-12-27 14:52:11 +01:00
Nikita Popov	daf32b13d7	[IndVars] Support opaque pointers in LFTR Remove the assertion about the pointer element type, only check that the stride is one. Ultimately, the actual pointer type here doesn't matter, because SCEVExpander would insert appropriate casts if necessary.	2021-12-27 12:32:50 +01:00
Florian Hahn	2e630eabd3	[LV] Sink BTC creation to actual use (NFC). Suggested separately in D116123.	2021-12-27 11:25:46 +01:00
Kazu Hirata	e7774f499b	Use static_assert instead of assert (NFC) Identified with misc-static-assert.	2021-12-26 14:26:44 -08:00
Florian Hahn	511726c64d	[LV] Move getStepVector out of ILV (NFC). First step to split up induction handling and move it outside ILV. Used in D116123 and following.	2021-12-26 21:17:26 +01:00
Kazu Hirata	76f0f1cc5c	Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC)	2021-12-24 21:43:06 -08:00
Alexey Zhikhartsev	d5dc3964a7	[DFAJumpThreading] Determinator BB should precede switch-defining BB Otherwise, it is possible that the state defined in the determinator block defines the state for the next iteration of the loop, rather than for the current one. Fixes llvm-test-suite's SingleSource/Regression/C/gcc-c-torture/execute/pr80421.c Differential Revision: https://reviews.llvm.org/D115832	2021-12-24 10:27:03 -05:00
Matt Arsenault	286237962a	InstCombine: Gracefully handle more allocas in the wrong address space Officially this is currently required to always use the datalayout's alloca address space. This may change in the future, and it's cleaner to propagate the existing alloca's addrspace anyway. This is a triple fix. Initially the change in simplifyAllocaArraySize would drop the address space, but produce output. Fixing this hit an assertion in the cast combine. This patch also makes the changes to handle this situation from `a33e128012` dead, so eliminate it. InstCombine should not take it upon itself to introduce addrspacecasts, and preserve the original address space instead.	2021-12-24 08:59:26 -05:00
Nikita Popov	eb91d91b7a	[DSE] Fix typo in recent commit This fixes a typo in `81d69e1bda`. Of course we should only skip the particular store if it isn't removable, not bail out of the whole loop. Add a test to cover this case.	2021-12-24 11:25:25 +01:00
Nikita Popov	90095a0b65	[DSE] Remove unnecessary check in getLocForWrite() (NFC) MemoryLocation::getForDest() checks this itself, call it directly.	2021-12-24 10:45:35 +01:00
Nikita Popov	72d2201785	[DSE] Rename getLocForWriteEx() to getLocForWrite() (NFC) We used to have both getLocForWrite() and getLocForWriteEx(). Now that we only have a single method, the "ex" suffix no longer makes sense.	2021-12-24 10:43:48 +01:00
Nikita Popov	034e66e76c	[DSE] Assert analyzable write in isRemovable() (NFC) As requested on D116210. The function is not necessarily well-defined without this precondition.	2021-12-24 10:39:50 +01:00
Nikita Popov	2b8a703858	[DSE] Avoid calling isRemovable() on non-analyzable location (NFC) At this point the instruction may either have an analyzable write or be a terminator. For terminators, isRemovable() is not necessarily well-defined. Move the check until after we have ensured that it is not a terminator.	2021-12-24 10:18:15 +01:00
Nikita Popov	81d69e1bda	[DSE] Call isRemovable() after getLocForWriteEx() (NFCI) The only non-trivial change here is that the isReadClobber() check for redundant stores is now on the DefLoc, not the UpperLoc. This is semantically the right location to use, though in practice it makes no difference (the locations are either the same, or the def inst does not read).	2021-12-24 10:01:25 +01:00
Nikita Popov	ba2b34b1c7	[DSE] Simplify isGuaranteedLoopInvariant() (NFC) We have Value->stripInBoundsConstantOffsets() which does what we want here, but the inbounds requirement isn't actually necessary. We should probably add Value->stripConstantOffsets() as well.	2021-12-24 09:39:44 +01:00
Nikita Popov	ae64c5a0fd	[DSE][MemLoc] Handle intrinsics more generically Remove the special casing for intrinsics in MemoryLocation::getForDest() and handle them through the general attribute based code. On the DSE side, this means that isRemovable() now needs to handle more than a hardcoded list of intrinsics. We consider everything apart from volatile memory intrinsics and lifetime markers to be removable. This allows us to perform DSE on intrinsics that DSE has not been specially taught about, using a matrix store as an example here. There is an interesting test change for invariant.start, but I believe that optimization is correct. It only looks a bit odd because the code is immediate UB anyway. Differential Revision: https://reviews.llvm.org/D116210	2021-12-24 09:29:57 +01:00
Nikita Popov	69ffc3cee9	[Attributor] Directly call areTypesABICompatible() hook Instead of using the ArgumentPromotion implementation, we now walk call sites using checkForAllCallSites() and directly call areTypesABICompatible() using the replacement types. I believe that resolves the TODO in the code. Differential Revision: https://reviews.llvm.org/D116033	2021-12-24 09:20:31 +01:00
Philip Reames	ee5d5e19f9	[funcattrs] Use callsite param attributes from indirect calls when inferring access attributes Arguments to an indirect call is by definition outside the SCC, but there's no reason we can't use locally defined facts on the call site. This also has the nice effect of further simplifying the code. Differential Revision: https://reviews.llvm.org/D116118	2021-12-22 18:21:59 -08:00
Marianne Mailhot-Sarrasin	90d1786ba0	[DSE] Fix invalid removal of store instruction Fix handling of alloc-like instructions in isGuaranteedLoopInvariant(). It was not valid when the 'KillingDef' was outside of the loop, while the 'CurrentDef' was inside the loop. In that case, the 'KillingDef' only overwrites the definition from the last iteration of the loop, and not the ones of all iterations. Therefor it does not make the 'CurrentDef' to be dead, and must not remove it. Fixing issue : https://github.com/llvm/llvm-project/issues/52774 Reviewed by: Florian Hahn Differential revision: https://reviews.llvm.org/D115965	2021-12-22 16:11:23 -05:00
Florian Hahn	ede7c2438f	[VPlan] Create header & latch blocks for skeleton up front (NFC). By creating the header and latch blocks up front and adding blocks and recipes in between those 2 blocks we ensure that the entry and exits of the plan remain valid throughout construction. In order to avoid test changes and keep printing of the plans the same, we use the new header block instead of creating a new block on the first iteration of the loop traversing the original loop. We also fold the latch into its predecessor. This is a follow up to a post-commit suggestion in D114586. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115793	2021-12-22 12:44:25 +00:00
Florian Hahn	c83ef407df	[LV] Adjust comment to say the induction is created in header. Follow-up suggested post-commit for `1a54889f48`.	2021-12-22 11:56:40 +00:00
Nikita Popov	9f24f010ab	[RS4GC] Clean up attribute removal (NFC) It is not necessary to explicitly check which attributes are present, and only add those to the builder. We can simply list all attributes that need to be stripped and remove them unconditionally. This also allows us to use some nicer APIs that don't require mucking about with attribute list indices.	2021-12-22 09:55:54 +01:00
Nikita Popov	e8e8bfeeb7	[DataFlowSanitizer] Simplify attribute removal (NFC) We're only removing a single attribute, so there is no need to go through AttrBuilder.	2021-12-22 09:42:55 +01:00
Nikita Popov	f5ac23b5ae	[ArgPromotion][TTI] Pass types to ABI compatibility hook The areFunctionArgsABICompatible() hook currently accepts a list of pointer arguments, though what we're actually interested in is the ABI compatibility after these pointer arguments have been converted into value arguments. This means that a) the current API is incompatible with opaque pointers (because it requires inspection of pointee types) and b) it can only be used in the specific context of ArgPromotion. I would like to reuse the API when inspecting calls during inlining. This patch converts it into an areTypesABICompatible() hook, which accepts a list of types. This makes the method more generally usable, and compatible with opaque pointers from an API perspective (the actual usage in ArgPromotion/Attributor is still incompatible, I'll follow up on that in separate patches). Differential Revision: https://reviews.llvm.org/D116031	2021-12-22 09:37:51 +01:00
Kazu Hirata	9db0e21660	[llvm] Use depth_first (NFC)	2021-12-21 22:28:48 -08:00
minglotus-6	9c49f8d705	[LTO][WPD] Ignore unreachable function by analyzing IR. In regular LTO, analyze IR and discard unreachable functions when finding virtual call targets. Differential Revision: https://reviews.llvm.org/D116056	2021-12-21 18:13:03 +00:00
Philip Reames	b7b308c50a	[funcattrs] Infer access attributes for vararg arguments This change allows us to infer access attributes (readnone, readonly) on arguments passed to vararg functions. Since there isn't a formal argument corresponding to the parameter, they'll never be considered part of the speculative SCC, but they can still benefit from attributes on the call site or the callee function. The main motivation here is just to simplify the code, and remove some special casing. Previously, an indirect vararg call could return more precise results than an direct vararg call which is just weird. Differential Revision: https://reviews.llvm.org/D115964	2021-12-21 09:34:14 -08:00
Philip Reames	1fee7195c9	[funcattrs] Fix incorrect readnone/readonly inference on captured arguments This fixes a bug where we would infer readnone/readonly for a function which passed a value to a function which could capture it. With the value captured in memory, the function could reload the value from memory after the call, and write to it. Inferring the argument as readnone or readonly is unsound. @jdoerfert apparently noticed this about two years ago, and tests were checked in with `76467c4`, but the issue appears to have never gotten fixed. Since this seems like this issue should break everything, let me explain why the case is actually fairly narrow. The main inference loop over the argument SCCs only analyzes nocapture arguments. As such, we can only hit this when construction the partial SCCs. Due to that restriction, we can only hit this when we have either a) a function declaration with a manually annotated argument, or b) an immediately self recursive call. It's also worth highlighting that we do have cases we can infer readonly/readnone on a capturing argument validly. The easiest example is a function which simply returns its argument without ever accessing it. Differential Revision: https://reviews.llvm.org/D115961	2021-12-21 09:34:14 -08:00
Florian Hahn	1a54889f48	[LV] Ensure WidenCanonicalIVRecipe is always created in header (NFC). The VPWidenCanonicalIVRecipe must always be created in the phi section of the header block. Use that block as insert point.	2021-12-21 15:14:48 +00:00
Djordje Todorovic	93615b88f5	[Debugify] Use WeakWH map collected before Pass when checking loc drop This fixes a typo/bug when checking for pointer reuse when testing DI location preservation in the Debugify original mode (when checking -g generated Debug Info). Differential Revision: https://reviews.llvm.org/D115621	2021-12-21 15:54:09 +01:00
Paul Walker	7c68ed8892	[SVE] Reintroduce -scalable-vectorization=preferred as an alias to "on". Some buildbots still rely on the experimental flag, so let's keep it until everything has been migrated to the new "on by default" state.	2021-12-21 12:54:04 +00:00
Sjoerd Meijer	9e3ae8d296	[FuncSpec] Rename internal option. NFC. Rename option MaxConstantsThreshold to MaxClonesThreshold. Not only is this more descriptive, this is also in preparation of introducing another threshold to analyse more than just 1 constant argument as we currently do, and to better distinguish these options/thresholds.	2021-12-21 11:02:01 +00:00
Nikita Popov	2926d6d335	[ConstantFold][GlobalOpt] Don't create x86_mmx null value This fixes the assertion failure reported at https://reviews.llvm.org/D114889#3198921 with a straightforward check, until the cleaner fix in D115924 can be reapplied.	2021-12-21 09:11:41 +01:00
Nikita Popov	9be67289b3	[InstCombine] Drop outdated alignment comment (NFC) Loads always have an alignment now, so this is no longer relevant.	2021-12-21 08:58:48 +01:00
Kazu Hirata	500c4b68dc	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-12-20 23:43:24 -08:00
Philip Reames	44d23d5345	[DSE] Remove calls with known writes to dead memory This is a reapply of `a8a51fe5`, which was reverted in 1ba99e due to a failing compiler-rt test. That test was a false positive because it was checking asan failures not accounting for the fact the call could be validly optimized out. I hopefully managed to stablize that test in 9b955f. (That's a speculative fix due to disk consumption needed to build compiler-rt tests locally being absurd.) Original commit message follows.. The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size. Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet). In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest. Differential Revision: https://reviews.llvm.org/D115904	2021-12-20 18:10:23 -08:00
Sami Tolvanen	5dc8aaac39	[llvm][IR] Add no_cfi constant With Control-Flow Integrity (CFI), the LowerTypeTests pass replaces function references with CFI jump table references, which is a problem for low-level code that needs the address of the actual function body. For example, in the Linux kernel, the code that sets up interrupt handlers needs to take the address of the interrupt handler function instead of the CFI jump table, as the jump table may not even be mapped into memory when an interrupt is triggered. This change adds the no_cfi constant type, which wraps function references in a value that LowerTypeTestsModule::replaceCfiUses does not replace. Link: https://github.com/ClangBuiltLinux/linux/issues/1353 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D108478	2021-12-20 12:55:32 -08:00
Sander de Smalen	b1ff20fd35	[LV] Enable scalable vectorization by default for SVE cores. The availability of SVE should be sufficient to enable scalable auto-vectorization. This patch adds a new TTI interface to query the target what style of vectorization it wants when scalable vectors are available. For other targets than AArch64, this currently defaults to 'FixedWidthOnly'. Differential Revision: https://reviews.llvm.org/D115651	2021-12-20 16:23:29 +00:00
Alexey Bataev	ab9078f3d3	[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy. Need to check for the number of the unique non-constant values since the unique values may include several constants. Differential Revision: https://reviews.llvm.org/D115939	2021-12-20 07:21:20 -08:00
Alexey Bataev	4459a11f4d	Revert "[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy." This reverts commit `fcaf290d02` to fix test mismatch reported in https://lab.llvm.org/buildbot#builders/117/builds/3531	2021-12-20 07:21:18 -08:00
Florian Hahn	5b362e4c7f	[VPlan] Add Debugloc to VPInstruction. Upcoming changes require attaching debug locations to VPInstructions, e.g. adding induction increment recipes in D113223. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115123	2021-12-20 15:10:41 +00:00
Alexey Bataev	fcaf290d02	[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy. Need to check for the number of the unique non-constant values since the unique values may include several constants. Differential Revision: https://reviews.llvm.org/D115939	2021-12-20 05:15:01 -08:00
Nikita Popov	aeb36ae0f4	Revert "[ConstantFolding] Unify handling of load from uniform value" This reverts commit `9fd4f80e33`. This breaks SingleSource/Regression/C/gcc-c-torture/execute/pr19687.c in test-suite. Either the test is incorrect, or clang is generating incorrect union initialization code. I've submitted https://reviews.llvm.org/D115994 to fix the test, assuming my interpretation is correct. Reverting this in the meantime as it may take some time to resolve.	2021-12-18 20:46:52 +01:00
Nikita Popov	1ba99eaf70	Revert "[DSE] Remove calls with known writes to dead memory" This reverts commit `a8a51fe556`. This breaks the strncpy-overflow.cpp test case.	2021-12-18 09:23:41 +01:00
Kazu Hirata	fee57711fe	Use DenseMap::lookup (NFC)	2021-12-17 18:19:25 -08:00
Kazu Hirata	26bd534a79	[llvm] Use none_of instead of \!any_of (NFC)	2021-12-17 13:48:57 -08:00
Philip Reames	a8a51fe556	[DSE] Remove calls with known writes to dead memory The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size. Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet). In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest. Differential Revision: https://reviews.llvm.org/D115904	2021-12-17 13:42:36 -08:00
Alexey Bataev	71fe59212c	[SLP][NFC]Adjust type in debug output loop. The ReuseShuffleIndices indeces are integer, not unsigned, need to fix the type in the debug print loop.	2021-12-17 12:43:01 -08:00
Alexey Bataev	46ad66b817	[SLP][NFC]Use 'llvm::copy' instead of element-by-elemen copying.	2021-12-17 12:07:59 -08:00
Nikita Popov	eb2cad8329	[DSE] Make isRemovable() for calls more robust (NFCI) We can only drop calls if they have an analyzable write, the return value is not used, they don't throw and they don't diverge. The last two conditions were previously not checked, because all the libcalls with analyzable writes already happened to satisfy those conditions anyway. This may not be true for generalizations (with D115904 in mind). No test changes because the necessary attributes are already inferred for currently supported libcalls. Differential Revision: https://reviews.llvm.org/D115962	2021-12-17 20:52:34 +01:00
Kazu Hirata	2b7be47b22	[llvm] Strip redundant lambda (NFC)	2021-12-17 10:51:40 -08:00
Ellis Hoag	65d7fd0239	[Try2][InstrProf] Add Correlator class to read debug info Extend `llvm-profdata` to read in a `.proflite` file and also a debug info file to generate a normal `.profdata` profile. This reduces the binary size by 8.4% when building an instrumented Clang binary without value profiling (164 MB vs 179 MB). This work is part of the "lightweight instrumentation" RFC: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 This was first landed in https://reviews.llvm.org/D114566 but had to be reverted due to build errors. Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D115915	2021-12-17 10:45:59 -08:00
Philip Reames	4c9e31a481	[funcattrs] Use early return to clarify code in determinePointerAccessAttrs [NFC] Instead of having the speculative path be the untaken path in the branch, explicitly have it return. This does require tail duplicating one call, but the resulting code is shorter and easier to understand. Also rewrite the condition using appropriate accessors.	2021-12-17 10:00:36 -08:00
Philip Reames	54ee8bb73a	[funcattrs] Use getDataOperandNo where appropriate [NFC] We'd manually duplicated the same logic and assertions; we can use the utility instead.	2021-12-17 09:35:29 -08:00
Philip Reames	33cbaab141	[funcattrs] Consistently treat calling a function pointer as a non-capturing read We were being wildly inconsistent about what memory access was implied by an indirect function call. Depending on the call site attributes, you could get anything from a read, to unknown, to none at all. (The last was a miscompile.) We were also always traversing the uses of a readonly indirect call. This is entirely unneeded as the indirect call does not capture. The callee might capture itself internally, but that has no implications for this caller. (See the nice explanation in the CaptureTracking comments if that case is confusing.) Note that elsewhere in the same file, we were correctly computing the nocapture attribute for indirect calls. The changed case only resulted in conservatism when computing memory attributes if say the return value was written to. Differential Revision: https://reviews.llvm.org/D115916	2021-12-17 09:02:03 -08:00
Nikita Popov	9fd4f80e33	[ConstantFolding] Unify handling of load from uniform value There are a number of places that specially handle loads from a uniform value where all the bits are the same (zero, one, undef, poison), because we a) don't care about the load offset in that case and b) it bypasses casts that might not be legal generally but do work with uniform values. We had multiple implementations of this, with a different set of supported values each time, as well as incomplete type checks in some cases. In particular, this fixes the assertion reported in https://reviews.llvm.org/D114889#3198921, as well as a similar assertion that could be triggered via constant folding. Differential Revision: https://reviews.llvm.org/D115924	2021-12-17 17:05:06 +01:00
Sjoerd Meijer	b7b61fe091	[FuncSpec] Create helper to update state. NFC. This creates a helper function updateSpecializedFuncs and is a NFC just to make the function that drives the transformation easier to read.	2021-12-17 12:14:33 +00:00
Florian Hahn	564d109b35	[LV] Pass VectorHeader block to emitTransformedIndex (NFC). Pass in the vector header instead of relying on ILV::LoopVectorBody. This reduces the dependence on state from ILV. Where VPTransformState is available, State.CFG.PrevBB can be used.	2021-12-17 10:11:16 +00:00
Sjoerd Meijer	78a392cf9f	[FuncSpec] Respect MaxConstantsThreshold This is a follow up of D115458 and truncates the worklist of actual arguments that can be specialised to 'MaxConstantsThreshold' candidates if MaxConstantsThreshold was exceeded. Thus, this changes the behaviour of option -func-specialization-max-constants. Before it didn't specialise at all when this threshold was exceeded, but now it specialises up to MaxConstantsThreshold candidates from the sorted worklist. Differential Revision: https://reviews.llvm.org/D115509	2021-12-17 09:25:45 +00:00
Sjoerd Meijer	89bcfd1632	Recommit "[FuncSpec] Decouple cost/benefit analysis, allowing sorting of candidates." Replaced llvm:sort with llvm::stable_sort, this was failing on the bot with expensive checks enabled.	2021-12-17 09:02:51 +00:00
Philip Reames	f632c49478	Extract a helper function for computing estimate trip count of an exiting branch Plan to use this in following change to support estimated trip counts derived from multiple loop exits.	2021-12-16 17:29:32 -08:00
Ellis Hoag	bdc68ee70f	Revert "[InstrProf] Add Correlator class to read debug info" Also reverts an attempt to fix the build errors https://reviews.llvm.org/D115911 The original diff https://reviews.llvm.org/D114566 causes some build errors that I need to investigate. https://lab.llvm.org/buildbot/#/builders/118/builds/7037 This reverts commit `95946d2f85`. Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D115913	2021-12-16 16:28:19 -08:00
Ellis Hoag	95946d2f85	[InstrProf] Add Correlator class to read debug info Extend `llvm-profdata` to read in a `.proflite` file and also a debug info file to generate a normal `.profdata` profile. This reduces the binary size by 8.4% when building an instrumented Clang binary without value profiling (164 MB vs 179 MB). This work is part of the "lightweight instrumentation" RFC: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D114566	2021-12-16 15:18:12 -08:00
Ellis Hoag	58d9c1aec8	[Try2][InstrProf] Attach debug info to counters Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile. Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 The original diff https://reviews.llvm.org/D114565 was reverted because of the `Instrumentation/InstrProfiling/debug-info-correlate.ll` test, which is fixed in this commit. Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D115693	2021-12-16 14:20:30 -08:00
Philip Reames	682b083bbd	Allow calls with known writes when trying to remove allocas [part 2] This is a slight generalization of D115829. I noticed this while restructuring code for a follow up patch to perform the same optimizations in DSE. If we have a call whose only visible effect is writing to an alloca, and we're removing the alloca anyways, we don't care if the call also reads from the same alloca. That read will be unobservable and thus doesn't block removal of the call. Worth noting is that this observation generalizes for non-argument reads. It just happens that case reduces to a readonly call, and is already handled separately. Differential Revision: https://reviews.llvm.org/D115898	2021-12-16 13:13:14 -08:00
Philip Reames	d98dfb2baa	[instcombine Use reference for never-null pointer in isAllocSiteRemovable [nfc]	2021-12-16 12:03:42 -08:00
Alexey Bataev	65fc992579	[SLP]Early exit out of the reordering if shuffled/perfect diamond match found. Need to early exit out of the reordering process if the perfect/shuffled match is found in the operands. Such pattern will result in not profitable reordering because of (false positive) external use of scalars. Differential Revision: https://reviews.llvm.org/D115811	2021-12-16 11:09:49 -08:00
Philip Reames	4c8dbe96d7	Allow calls with known writes when trying to remove allocas isAllocSiteRemovable tracks whether all uses of an alloca are both non-capturing, and non-reading. If so, we can remove said alloca because nothing can depend on its content or address. This patch extends this reasoning to allow writes from calls where we can prove the call has no side effect other than writing to said allocation. This is a fairly natural fit for the existing code with one subtle detail - the call can write to multiple locations at once which stores can't. As a follow up, we can likely sink the intrinsic handling into the generic code by allowing readnone arguments as well. I deliberately left that out to minimize conceptual churn. Differential Revision: https://reviews.llvm.org/D115829	2021-12-16 11:04:34 -08:00
Florian Hahn	3b35113ff0	[VPlan] Add VPBlockBase::successors() returning an iterator_range (NFC). This will also be helpful for D115793.	2021-12-16 14:28:50 +00:00
Sjoerd Meijer	5b139a583d	Revert "[FuncSpec] Decouple cost/benefit analysis, allowing sorting of candidates." This reverts commit `20b03d6536`. This shows some failed tests on a bot with expensive checks enabled that I need to look at.	2021-12-16 12:56:11 +00:00
Sjoerd Meijer	20b03d6536	[FuncSpec] Decouple cost/benefit analysis, allowing sorting of candidates. This mostly is the same code that is refactored to decouple the cost and benefit analysis. The biggest change is top-level function specializeFunctions that now drives the transformation more like this: specializeFunctions() { Cost = getSpecializationCost(F); calculateGains(F, Cost); specializeFunction(F); } while this is just a restructuring, it helps the functional change in calculateGains. I.e., we now sort the candidates based on the expected specialisation gain, which we didn't do before. For this, a book keeping struct ArgInfo was introduced. If we have a list of N candidates, but we only want specialise less than N as set by option -func-specialization-max-constants, we sort the list and discard the candidates that give the least benefit. Given a formal argument, this change results in selecting the best actual argument(s). This is NFC'ish in that this shouldn't change the current output (hence no test change here), but in follow ups starting with D115509, it should and I want to go one step further and compare all functions and all arguments, which will mostly build on top of this refactoring and change. Differential Revision: https://reviews.llvm.org/D115458	2021-12-16 11:55:37 +00:00
eopXD	6734be290b	Revert "[LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified" This reverts commit `fbf6c8ac15`.	2021-12-16 02:06:11 -08:00
Yueh-Ting Chen	fbf6c8ac15	[LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified After this function call, the LLVM IR would look like the following: ``` if (true) /* NonVersionedLoop / else / VersionedLoop */ ``` Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D104631	2021-12-16 00:41:29 -08:00
eopXD	ecb3ae524e	[LoopIdiom] Use utility from SE instead of local rewriter ScalarEvolution::applyLoopGuards shall do the work. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D115784	2021-12-15 20:46:49 -08:00
minglotus-6	4ab5527c15	[ThinLTO] Ignore unreachable virtual functions in WPD in thin LTO. Differential Revision: https://reviews.llvm.org/D115648	2021-12-16 02:24:20 +00:00
Arthur Eubanks	5a81a60391	[NFC] Remove more calls to getAlignment() These are deprecated and should be replaced with getAlign(). Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.	2021-12-15 14:40:57 -08:00
Arthur Eubanks	d5583366ba	[FunctionComparator] Use getAlign() instead of getAlignment() getAlignment() is deprecated.	2021-12-15 14:40:56 -08:00
Stanislav Mekhanoshin	e6f6942296	[InstCombine] (~a & b & c) \| ~(a \| b) -> (c \| ~b) & ~a Transform ``` (~a & b & c) \| ~(a \| b) -> (c \| ~b) & ~a ``` and swapped case ``` (~a \| b \| c) & ~(a & b) -> (c & ~b) \| ~a ``` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %or1 = or i4 %b, %a %not1 = xor i4 %or1, 15 %not2 = xor i4 %a, 15 %and1 = and i4 %b, %not2 %and2 = and i4 %and1, %c %or2 = or i4 %and2, %not1 ret i4 %or2 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %notb = xor i4 %b, 15 %or = or i4 %notb, %c %nota = xor i4 %a, 15 %and = and i4 %or, %nota ret i4 %and } Transformation seems to be correct! ``` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %and1 = and i4 %b, %a %not1 = xor i4 %and1, 15 %not2 = xor i4 %a, 15 %or1 = or i4 %b, %not2 %or2 = or i4 %or1, %c %and2 = and i4 %or2, %not1 ret i4 %and2 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %notb = xor i4 %b, 15 %and = and i4 %notb, %c %nota = xor i4 %a, 15 %or = or i4 %and, %nota ret i4 %or } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D113037	2021-12-15 09:36:59 -08:00
Alexey Bataev	6f2e087631	[SLP]Do not represent splats as node with the reused scalars. No need to represent splats as a node with the reused scalars, it may increase the cost (currently pass just ignores extra shuffle cost and it is still not correct). Differential Revision: https://reviews.llvm.org/D115800	2021-12-15 06:33:11 -08:00
Hongtao Yu	d7b7b64914	[CSSPGO] Warn instead of error out for modules that are not probed. Modules that are not compiled with pseudo probe enabled can still be compiled with a sample profile input, such as in LTO postlink where other modules are probed. Since the profile is unrelated to the current modules, we should warn instead of error out the compilation. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D115642	2021-12-14 16:38:50 -08:00
Hongtao Yu	5740bb801a	[CSSPGO] Use nested context-sensitive profile. CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts. For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned. In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed. A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum. ``` main:1968679:12 2: 24 3: 28 _Z5funcAi:18 3.1: 28 _Z5funcBi:30 3: _Z5funcAi:1467398 0: 10 1: 10 _Z8funcLeafi:11 3: 24 1: _Z8funcLeafi:1467299 0: 6 1: 6 3: 287884 4: 287864 _Z3fibi:315608 15: 23 !CFGChecksum: 138828622701 !Attributes: 2 !CFGChecksum: 281479271677951 !Attributes: 2 ``` Specific work included in this change: - A recursive profile converter to convert CS flat profile to nested profile. - Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile. - Unifiy sample loader inliner path for CS and preinlined nested profile. - Changes in the sample loader to support probe-based nested profile. I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1. Test Plan: Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D115205	2021-12-14 14:40:25 -08:00
Sanjay Patel	1a60ae02c6	[InstCombine] fold mask-with-signbit-splat to icmp+select ~(iN X s>> (N-1)) & Y --> (X s< 0) ? 0 : Y https://alive2.llvm.org/ce/z/JKlQ9x This is similar to D111410 / `727e642e97` , but it includes a 'not' of the signbit and so it saves an instruction in the basic pattern. DAGCombiner or target-specific folds can expand this back into bit-hacks. The diffs in the logical-select tests are not true regressions - running early-cse and another round of instcombine is expected in a normal opt pipeline, and that reduces back to a minimal form as shown in the duplicated PhaseOrdering test. I have no understanding of the SystemZ diffs, so I made the minimal edits suggested by FileCheck to make that test pass again. That whole test file is wrong though. It is running the entire optimizer (-O2) to check IR, and then topping that by even running codegen and checking asm. It needs to be split up. Fixes #52631	2021-12-14 16:00:42 -05:00
Mingming Liu	09a704c5ef	[LTO] Ignore unreachable virtual functions in WPD in hybrid LTO. Differential Revision: https://reviews.llvm.org/D115492	2021-12-14 20:18:04 +00:00
Zaara Syeda	dd245bab9f	[LoopUnroll] Disable loop unroll when user explicitly asks for unroll-and-jam If a loop isn't forced to be unrolled, we want to avoid unrolling it when there is an explicit unroll-and-jam pragma. This is to prevent automatic unrolling from interfering with the user requested transformation. Differential Revision: https://reviews.llvm.org/D114886	2021-12-14 16:46:37 +00:00
Sanjay Patel	bb2fc19c63	[InstCombine] prevent infinite looping from opposing cmp and select transforms (PR52684) As noted in the code comment, we might want to simply give up on this select transform completely (given how many exceptions there are already and the risk of future conflicts), but for now, carve out one more bailout to avoid an infinite loop. Fixes #52684: https://github.com/llvm/llvm-project/issues/52684	2021-12-14 11:18:36 -05:00
Sanjay Patel	3db974face	[InstCombine] convert static function to internal class function; NFC The transform can require an optional shuffle instruction to be sound, so we need to use Builder to create all values and then replace the original instruction with whatever that final value is.	2021-12-14 11:18:35 -05:00
Alexey Bataev	bd05376986	[SLP]Improve multinode analysis. Changes the preliminary multinode analysis: 1. Introduced scores for reversed loads/extractelements. 2. Improved shallow score calculation. 3. Lowered the cost of external uses (no need to consider it several times, just ones). 4. The initial lane for analysis is the one with the minimal possible reorderings. These changes in general shall reduce compile time and improve the reordering in many cases. Part of D57059. Differential Revision: https://reviews.llvm.org/D101109	2021-12-14 06:01:52 -08:00
Kazu Hirata	7787a8f1b7	[llvm] Use llvm::reverse (NFC)	2021-12-13 21:54:51 -08:00
Ellis Hoag	c809da7d9c	Revert "[InstrProf] Attach debug info to counters" This reverts commit `800bf8ed29`. The `Instrumentation/InstrProfiling/debug-info-correlate.ll` test was failing because I forgot the `llc` commands are architecture specific. I'll follow up with a fix. Differential Revision: https://reviews.llvm.org/D115689	2021-12-13 18:15:17 -08:00
Ellis Hoag	800bf8ed29	[InstrProf] Attach debug info to counters Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile. Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D114565	2021-12-13 17:51:22 -08:00
Philip Reames	e6ad9ef4e7	[instcombine] Canonicalize constant index type to i64 for extractelement/insertelement The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order. I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions. We chose the canonical type as i64 arbitrarily. We might consider changing this to something else in the future if we have cause. Differential Revision: https://reviews.llvm.org/D115387	2021-12-13 16:56:22 -08:00
Alina Sbirlea	46fb810955	[NewGVN] Use PredicateInfo info when previously used for the same ssa.copy intrinsic Symbolic execution using PredicateInfo is only done for the ssa.copy intrinsic. It's using two potential sources for building the expression: 1. the Value of the instruction for which the instruction is a copy of, and 2. the Value from the contraint in PredicateInfo It's possible to get into an infinite loop when choosing between these two, as described in PR31613. This patch proposes performing swapping of the two values (i.e. choosing the second one for the expression), if that same second value was chosen before; this breaks the cycle. In the testcases provided, where there is a contradiction between the value from symbolic execution and assume instruction, NewGVN reduces the assume to assume(false). Resolves PR31613. Differential Revision: https://reviews.llvm.org/D110907	2021-12-13 16:49:24 -08:00
Nick Desaulniers	95ba0e4563	[SimplifyLibCalls] propagate tail flags on CallInsts I noticed we weren't propagating tail flags on calls when FortifiedLibCallSimplifier.optimizeCall() was replacing calls to runtime checked calls to the non-checked routines (when safe to do so). Make sure to check this before replacing the original calls! Also, avoid any libcall transforms when notail/musttail is present. PR46734 Fixes: https://github.com/llvm/llvm-project/issues/46079 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D107872	2021-12-13 11:18:30 -08:00
Alexey Bataev	e5b191a433	[SLP]Improve/fix reodering for gather nodes with extractelements/undefs. If the gather node is a mix of undefvalues and exractelement instructions, need to take the ordering for such nodes into account too. It allows to reorder some (sub)trees and remove some extra shuffles, improving overall vectorization. Also, outlined common functionality into a separate function. Differential Revision: https://reviews.llvm.org/D115358	2021-12-13 10:59:38 -08:00
eopXD	bc17d32a5f	[LoopIdiom] Let LIR fold memset pointer / stride SCEV regarding loop guards Expression guraded in loop entry can be folded prior to comparison. This patch proceeds D107353 and makes LIR able to deal with nested for-loop. Reviewed By: qianzhen, bmahjour Differential Revision: https://reviews.llvm.org/D108112	2021-12-13 09:36:58 -08:00
Sanjay Patel	f46a9c8edd	[InstCombine] don't automatically drop poison-generating flags in SimplifyVectorDemandedElts I noticed this while reviewing the test diffs in D115460 (and so the diffs in that patch will be reduced if this one is applied first). This is effectively a revert of `3436dc2923` ( https://reviews.llvm.org/rG3436dc29239d ) - since that commit, we've made several enhancements, so the reasoning there is no longer valid. Specifically, we added a poison value to IR, and we clarified the behavior of undef/poison elements in a shuffle mask: https://llvm.org/docs/LangRef.html#shufflevector-instruction Alive2 seems to agree that the propagation of flags in the test diffs shown here are valid: https://alive2.llvm.org/ce/z/UuY-jr https://alive2.llvm.org/ce/z/GXoMD9 https://alive2.llvm.org/ce/z/nVCyVH Differential Revision: https://reviews.llvm.org/D115526	2021-12-13 10:12:19 -05:00
Nikita Popov	432c41ebe9	[SLP] Avoid getPointerElementType() call Use the load result type instead of the element type of the load pointer operand.	2021-12-13 15:46:13 +01:00
Evgeniy Brevnov	7002125cff	[LV][NFC] Fix debug message to print out resulting clamped VF	2021-12-13 18:54:05 +07:00
Florian Hahn	e90630e5a5	[VPlan] Remove unused createNaryOp (NFC).	2021-12-13 11:11:00 +00:00
Evgeniy Brevnov	2025e0985c	[LV] Make sure VF doesn't exceed compile time known TC For the simple copy loop (see test case) vectorizer selects VF equal to 32 while the loop is known to have 17 iterations only. Such behavior makes no sense to me since such vector loop will never be executed. The only case we may want to select VF large than TC is masked vectoriztion. So I haven't touched that case. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D114528	2021-12-13 13:48:46 +07:00
Florian Hahn	b6a2ddb6c8	[LV] Use info from State in some helper functions (NFC). This updates several helper functions to use information provided by VPTransformState instead of ILV directly, to help with the transition out of ILV.	2021-12-12 20:48:38 +00:00
Gulfem Savrun Yeniceri	e5a8af7a90	[Passes] Fix relative lookup table converter pass This patch fixes the relative table converter pass for the lookup table accesses that are resulted in an instruction sequence, where gep is not immediately followed by a load, such as gep being hoisted outside the loop or another instruction is inserted in between them. The fix inserts the call to load.relative.instrinsic in the original place of load instead of gep. Issue is reported by FreeBSD via https://bugs.freebsd.org/259921. Differential Revision: https://reviews.llvm.org/D115571	2021-12-12 04:40:17 +00:00
Kazu Hirata	36b8a4f9f3	[llvm] Use llvm::is_contained (NFC)	2021-12-11 11:42:09 -08:00
Florian Hahn	361111906b	[EarlyCSE] Retain poison flags, if program is UB if poison. Poison-generating flags can be retained during CSE on the earlier instruction , if the earlier instruction being poison causes UB. For now, always take AND for floating point instructions. https://alive2.llvm.org/ce/z/4K3D7P Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D115247	2021-12-11 15:11:44 +00:00
Andrew Browne	7c004c2bc9	Revert "[asan] Add support for disable_sanitizer_instrumentation attribute" This reverts commit `2b554920f1`. This change causes tsan test timeout on x86_64-linux-autoconf. The timeout can be reproduced by: git clone https://github.com/llvm/llvm-zorg.git BUILDBOT_CLOBBER= BUILDBOT_REVISION=eef8f3f85679c5b1ae725bade1c23ab7bb6b924f llvm-zorg/zorg/buildbot/builders/sanitizers/buildbot_standard.sh	2021-12-10 14:33:38 -08:00
Sami Tolvanen	9a74c753fe	[ThinLTO][MC] Use conditional assignments for promotion aliases Inline assembly refererences to static functions with ThinLTO+CFI were fixed in D104058 by creating aliases for promoted functions. Creating the aliases unconditionally resulted in an unexpected size increase in a Chrome helper binary: https://bugs.chromium.org/p/chromium/issues/detail?id=1261715 This is caused by the compiler being unable to drop unused code now referenced by the alias in module-level inline assembly. This change adds a .set_conditional assembly extension, which emits an assignment only if the target symbol is also emitted, avoiding phantom references to functions that could have otherwise been dropped. This is an alternative to the solution proposed in D112761. Reviewed By: pcc, nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D113613	2021-12-10 12:21:37 -08:00
David Green	fed3041863	[LV][ARM] Improve reduction costmodel for mismatching extension types. Given a MLA reduction from two different types (say i8 and i16), we were previously failing to find the reduction pattern, often making us chose the lower vector factor. This improves that by using the largest of the two extension types, allowing us to use the larger VF as the type of the reduction. As per https://godbolt.org/z/KP549EEYM the backend handles this valiantly, leading to better performance. Differential Revision: https://reviews.llvm.org/D115432	2021-12-10 15:40:58 +00:00
Florian Hahn	505ad03c7d	[LV] Remove redundant IV casts using VPlan (NFCI). This patch simplifies handling of redundant induction casts, by removing dead cast instructions after initial VPlan construction. This has the following benefits: 1. fixes a crash (see @test_optimized_cast_induction_feeding_first_order_recurrence) 2. Simplifies VPWidenIntOrFpInduction to a single-def recipes 3. Retires recordVectorLoopValueForInductionCast. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115112	2021-12-10 13:57:03 +00:00
Florian Hahn	acea6e9cfa	[Passes] Only run extra vector passes if loops have been vectorized. This patch uses a similar trick as in D113947 to only run the extra passes after vectorization on functions where loops have been vectorized. The reason for running the 'extra vector passes' is simplification/unswitching of the runtime checks created by LV, there should be no need to run them if nothing got vectorized To do that, a new dummy analysis ShouldRunExtraVectorPasses has been added. If loops have been vectorized for a function, LV will cache the analysis. At the moment it uses MadeCFGChanges as proxy for loop vectorized, which isn't perfect (it could be too aggressive, e.g. because no runtime checks have been added), but should be good enough for now. The extra passes are now managed by a new FunctionPassManager that runs its passes only if ShouldRunExtraVectorPasses has been cached. Without this patch, `-extra-vectorizer-passes` has the following compile-time impact: NewPM-O3: +4.86% NewPM-ReleaseThinLTO: +3.56% NewPM-ReleaseLTO-g: +7.17% http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=c292da649e2c6e88a31e702fdc474727d09c72bc&stat=instructions With this patch, that gets reduced to NewPM-O3: +1.43% NewPM-ReleaseThinLTO: +1.00% NewPM-ReleaseLTO-g: +1.58% http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=e67d86b57810011cf285eb9aa1944781be6096f0&stat=instructions It is probably still too high to enable by default, but much better. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D115052	2021-12-10 11:42:45 +00:00
Alexander Potapenko	2b554920f1	[asan] Add support for disable_sanitizer_instrumentation attribute For ASan this will effectively serve as a synonym for __attribute__((no_sanitize("address"))) Differential Revision: https://reviews.llvm.org/D114421	2021-12-10 12:17:26 +01:00
Florian Hahn	978883d254	[VPlan] Add InductionDescriptor to VPWidenIntOrFpInduction. (NFC) This allows easier access to the induction descriptor from VPlan, without needing to go through Legal. VPReductionPHIRecipe already contains a RecurrenceDescriptor in a similar fashion. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115111	2021-12-10 09:55:09 +00:00
Alexander Potapenko	1aa59ff2f7	[msan] Implement -msan-disable-checks. To ease the deployment of KMSAN, we need a way to apply __attribute__((no_sanitize("kernel-memory"))) to the whole source file. Passing -msan-disable-checks=1 to the compiler will make it treat every function in the file as if it was lacking the sanitize_memory attribute. Differential Revision: https://reviews.llvm.org/D115236	2021-12-10 10:27:51 +01:00
Rong Xu	ad2e5be4be	[PGO] Adjust BFI verification option default values [NFC] Slightly changed the default option values. Also avoided some bogus output.	2021-12-09 14:15:28 -08:00
Alexey Bataev	19c5cf4167	[SLP]Fix comparator for cmp instruction vectorization. The comparator for the sort functions should provide strict weak ordering relation between parameters. Current solution causes compiler crash with some standard c++ library implementations, because it does not meet this criteria. Tried to fix it + it improves the iverall vectorization result. Differential Revision: https://reviews.llvm.org/D115268	2021-12-09 10:57:57 -08:00
Philip Reames	0d13f94c1d	[reductions] Delete another piece of dead flag handling [NFC] The code claimed to handle nsw/nuw, but those aren't passed via builder state and the explicit IR construction just above never sets them. The only case this bit of code is actually relevant for is FMF flags. However, dropPoisonGeneratingFlags currently doesn't know about FMF at all, so this was a noop. It's also unneeded, as the caller explicitly configures the flags on the builder before this call, and the flags on the individual ops should be controled by the intrinsic flags anyways. If any of the flags aren't safe to propagate, the caller needs to make that change.	2021-12-09 10:56:55 -08:00
Philip Reames	b24db85c0b	[recurrence] Delete dead flag/fmf handling [NFC] The recurrence lowering code has handling which claims to be about flag intersection, but all the callers pass empty arrays to the arguments. The sole exception is a caller of a method which has the argument, but no implementation. I don't know what the intent was here, but it certaintly doesn't actually do anything today.	2021-12-09 10:43:53 -08:00
Philip Reames	98f5ab6af3	[instcombine] Do demanded elts last when visiting extractelement This reorders existing transforms to put demanded elements last. The reasoning here is that when we have an example which can be scalarized or handled via demanded bits, we should prefer scalarization as that doesn't require dropping flags on arithmetic instructions. This doesn't show major changes in the tests today, but once I add support for fast math flags to dropPoisonGeneratingFlags this becomes glaringly obvious. Differential Revision: https://reviews.llvm.org/D115394	2021-12-09 10:04:49 -08:00
Philip Reames	2d31b02517	Compute estimated trip counts for multiple exit loops This change allows us to estimate trip count from profile metadata for all multiple exit loops. We still do the estimate only from the latch, but that's fine as it causes us to over estimate the trip count at worst. Reviewing the uses of the API, all but one are cases where we restrict a loop transformation (unroll, and vectorize respectively) when we know the trip count is short enough. So, as a result, the change makes these passes strictly less aggressive. The test change illustrates a case where we'd previously have runtime unrolled a loop which ran fewer iterations than the unroll factor. This is definitely unprofitable. The one case where an upper bound on estimate trip count could drive a more aggressive transform is peeling, and I duplicated the logic being removed from the generic estimation there to keep it the same. The resulting heuristic makes no sense and should probably be immediately removed, but we can do that in a separate change. This was noticed when analyzing regressions on D113939. I plan to come back and incorporate estimated trip counts from other exits, but that's a minor improvement which can follow separately. Differential Revision: https://reviews.llvm.org/D115362	2021-12-09 09:53:49 -08:00
Stanislav Mekhanoshin	06ca0a2733	[InstCombine] (~a & b & c) \| ~(a \| b \| c) -> ~(a \| (b ^ c)) Transform ``` (~a & b & c) \| ~(a \| b \| c) -> ~(a \| (b ^ c)) ``` And swapped case: ``` (~a \| b \| c) & ~(a & b & c) -> ~a \| (b ^ c) ``` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %or1 = or i4 %b, %a %or2 = or i4 %or1, %c %not1 = xor i4 %or2, 15 %not2 = xor i4 %a, 15 %and1 = and i4 %b, %not2 %and2 = and i4 %and1, %c %or3 = or i4 %and2, %not1 ret i4 %or3 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %1 = xor i4 %c, %b %2 = or i4 %1, %a %or3 = xor i4 %2, 15 ret i4 %or3 } Transformation seems to be correct! ``` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %and1 = and i4 %b, %a %and2 = and i4 %and1, %c %not1 = xor i4 %and2, 15 %not2 = xor i4 %a, 15 %or1 = or i4 %not2, %b %or2 = or i4 %or1, %c %and3 = and i4 %or2, %not1 ret i4 %and3 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %xor = xor i4 %b, %c %not = xor i4 %a, 15 %or = or i4 %xor, %not ret i4 %or } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D112966	2021-12-09 09:48:55 -08:00
Arthur Eubanks	1172712f46	[NFC] Replace some deprecated getAlignment() calls with getAlign() Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D115370	2021-12-09 08:43:19 -08:00
Nikita Popov	a3a478be40	[Inliner] Add debug message for history skip (NFC)	2021-12-09 15:11:56 +01:00
Dmitry Makogon	0b533c1833	[MetaRenamer] Add command line options to disable renaming name with specified prefixes This patch adds 4 options for specifying functions, aliases, globals and structs name prefixes hat don't need to be renamed by MetaRenamer pass. This is useful if one has some downstream logic that depends directly on an entity name. MetaRenamer can break this logic, but with the patch you can tell it not to rename certain names. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D115323	2021-12-09 18:45:06 +07:00
Florian Hahn	d74a8a78ad	[LV] Mark various functions as const (NFC). Make sure various accessors do not modify any state, in preparation for D115111.	2021-12-09 10:51:29 +00:00

... 19 20 21 22 23 ...

31127 Commits