llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	c7e275388e	[ARM] Don't aggressively unroll vector remainder loops We already do not unroll loops with vector instructions under MVE, but that does not include the remainder loops that the vectorizer produces. These remainder loops will be rarely executed and are not worth unrolling, as the trip count is likely to be low if they get executed at all. Luckily they get llvm.loop.isvectorized to make recognizing them simpler. We have wanted to do this for a while but hit issues with low overhead loops being reverted due to difficult registry allocation. With recent changes that seems to be less of an issue now. Differential Revision: https://reviews.llvm.org/D90055	2020-11-10 17:01:31 +00:00
Sanne Wouda	dd03881bd5	Add loop distribution to the LTO pipeline The LoopDistribute pass is missing from the LTO pipeline, so -enable-loop-distribute has no effect during post-link. The pre-link loop distribution doesn't seem to survive the LTO pipeline either. With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43% uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is unaffected. Differential Revision: https://reviews.llvm.org/D89896	2020-11-10 12:04:32 +00:00
Sander de Smalen	f47573f9bf	[LoopVectorizer] NFC: Propagate ElementCount to more interfaces. Interfaces changed to take `ElementCount` as parameters: * LoopVectorizationPlanner::buildVPlans * LoopVectorizationPlanner::buildVPlansWithVPRecipes * LoopVectorizationCostModel::selectVectorizationFactor This patch is NFC for fixed-width vectors. Reviewed By: dmgreen, ctetreau Differential Revision: https://reviews.llvm.org/D90879	2020-11-10 11:11:02 +00:00
Max Kazantsev	25755a0159	[NFC] Add flag to disable IV widening in indvar instance This allows us to have control over IV widening in the pipeline.	2020-11-10 15:10:44 +07:00
Arthur Eubanks	1cbf8e89b5	[NewPM] Port -separate-const-offset-from-gep Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91095	2020-11-09 17:42:36 -08:00
Michael Kruse	e5dba2d7e5	[OMPIRBuilder] Start 'Create' methods with lower case. NFC. For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews. This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers. I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller. Reviewed By: mehdi_amini, fghanim Differential Revision: https://reviews.llvm.org/D91109	2020-11-09 19:35:11 -06:00
Xun Li	c2cb093d9b	[Coroutine] Move all used local allocas to the .resume function Prior to D89768, any alloca that's used after suspension points will be put on to the coroutine frame, and hence they will always be reloaded in the resume function. However D89768 introduced a more precise way to determine whether an alloca should live on the frame. Allocas that are only used within one suspension region (hence does not need to live across suspension points) will not be put on the frame. They will remain local to the resume function. When creating the new entry for the .resume function, the existing logic only moved all the allocas from the old entry to the new entry. This covers every alloca from the old entry. However allocas that's defined afer coro.begin are put into a separate basic block during CoroSplit (the PostSpill basic block). We need to make sure these allocas are moved to the new entry as well if they are used. This patch walks through all allocas, and check if they are still used but are not reachable from the new entry, if so, we move them to the new entry. Differential Revision: https://reviews.llvm.org/D90977	2020-11-09 17:24:49 -08:00
Sjoerd Meijer	e2dcea4489	[LoopFlatten] FlattenInfo bookkeeping. NFC. Introduce struct FlattenInfo to group some of the bookkeeping. Besides this being a bit of a clean-up, it is a prep step for next additions (D90640). I could take things a bit further, but thought this was a good first step also not to make this change too large. Differential Revision: https://reviews.llvm.org/D90408	2020-11-09 14:50:26 +00:00
Florian Hahn	f0d76275cb	[VPlan] Print result value for loads in VPWidenMemoryInst (NFC). For loads, print the result value.	2020-11-09 14:01:29 +00:00
Florian Hahn	537829f2a7	[VPlan] Add isStore helper to VPWidenMemoryInstructionRecipe (NFC). Move logic to check if the recipe is a store to a helper for easier reuse.	2020-11-09 14:01:29 +00:00
Florian Hahn	fec64de261	[VPlan] Use VPValue def for VPWidenCall. This patch turns VPWidenCall into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84681	2020-11-09 13:29:41 +00:00
Florian Hahn	091c5c9a18	[VPlan] Add printOperands helper to VPUser (NFC). Factor out the code for printing operands of a VPUser so it can be re-used when printing other recipes.	2020-11-09 12:30:57 +00:00
LemonBoy	42732d33cc	[InstCombine] Fix constant-folding of overflowing arithmetic ops on vectors Feeding vector values to `InstCombiner::OptimizeOverflowCheck` produces a scalar boolean flag if it proves the overflow check can be eliminated. This causes `InstCombiner::CreateOverflowTuple` to crash as it correctly expects a vector of i1 values instead. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D89628	2020-11-09 14:41:07 +03:00
Tim Northover	f7fe7ea24d	[MergeFunctions] fix function attribute comparison in FunctionComparator The comparison of AttributeSets stopped after seeing a matching type attribute. Subsequent mismatching attributes were not detected causing a crash.	2020-11-09 09:19:11 +00:00
Simon Pilgrim	b11eaf5617	[DSE] Don't dereference a dyn_cast<> result - use cast<> instead. NFCI. We were relying on the dyn_cast<> succeeding - better use cast<> and have it assert that its the correct type than dereference a null result.	2020-11-08 13:07:45 +00:00
Simon Pilgrim	0fe91ad463	[InstCombine] foldSelectFunnelShift - block poison in funnel shift value As raised by @nlopes on D90382 - if this is not a rotate then the select was blocking poison from the 'shift-by-zero' non-TVal, but a funnel shift won't - so freeze it.	2020-11-08 12:58:30 +00:00
Florian Hahn	e8dc17a2b7	[LoopInterchange] Skip non SCEV-able operands in cost function. This fixes a crash when trying to get a SCEV expression for operands that are not SCEV-able.	2020-11-08 11:41:19 +00:00
Pedro Tammela	5e8ecff0d8	[Reg2Mem] add support for the new pass manager This patch refactors the pass to accomodate the new pass manager boilerplate. Differential Revision: https://reviews.llvm.org/D91005	2020-11-08 11:14:05 +00:00
Kazu Hirata	75e46c6328	[Mem2Reg] Use llvm::count instead of std::count (NFC)	2020-11-07 20:18:47 -08:00
Kazu Hirata	c95fff5be7	[JumpThreading] Fix function names (NFC)	2020-11-07 19:35:03 -08:00
Atmn Patel	04a0896487	Revert "[LoopDeletion] Allows deletion of possibly infinite side-effect free loops" This reverts commit `0b17c6e447`. This patch causes a compile-time error in SCEV.	2020-11-07 00:32:12 -05:00
Atmn Patel	0b17c6e447	[LoopDeletion] Allows deletion of possibly infinite side-effect free loops From C11 and C++11 onwards, a forward-progress requirement has been introduced for both languages. In the case of C, loops with non-constant conditionals that do not have any observable side-effects (as defined by 6.8.5p6) can be assumed by the implementation to terminate, and in the case of C++, this assumption extends to all functions. The clang frontend will emit the `mustprogress` function attribute for C++ functions (D86233, D85393, D86841) and emit the loop metadata `llvm.loop.mustprogress` for every loop in C11 or later that has a non-constant conditional. This patch modifies LoopDeletion so that only loops with the `llvm.loop.mustprogress` metadata or loops contained in functions that are required to make progress (`mustprogress` or `willreturn`) are checked for observable side-effects. If these loops do not have an observable side-effect, then we delete them. Loops without observable side-effects that do not satisfy the above conditions will not be deleted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86844	2020-11-06 22:06:58 -05:00
Atmn Patel	babc224c5d	[LoopDeletion] Remove dead loops with no exit blocks Currently, LoopDeletion refuses to remove dead loops with no exit blocks because it cannot statically determine the control flow after it removes the block. This leads to miscompiles if the loop is an infinite loop and should've been removed. Differential Revision: https://reviews.llvm.org/D90115	2020-11-06 17:08:34 -05:00
Quentin Colombet	a585228027	Prevent LICM and machineLICM from hoisting convergent operations Results of convergent operations are implicitly affected by the enclosing control flows and should not be hoisted out of arbitrary loops. Patch by Xiaoqing Wu <xiaoqing_wu@apple.com> Differential Revision: https://reviews.llvm.org/D90361	2020-11-06 10:26:39 -08:00
Arnold Schwaighofer	c6543cc6b8	llvm.coro.id.async lowering: Parameterize how-to restore the current's continutation context and restart the pipeline after splitting The `llvm.coro.suspend.async` intrinsic takes a function pointer as its argument that describes how-to restore the current continuation's context from the context argument of the continuation function. Before we assumed that the current context can be restored by loading from the context arguments first pointer field (`first_arg->caller_context`). This allows for defining suspension points that reuse the current context for example. Also: llvm.coro.id.async lowering: Add llvm.coro.preprare.async intrinsic Blocks inlining until after the async coroutine was split. Also, change the async function pointer's context size position struct async_function_pointer { uint32_t relative_function_pointer_to_async_impl; uint32_t context_size; } And make the position of the `async context` argument configurable. The position is specified by the `llvm.coro.id.async` intrinsic. rdar://70097093 Differential Revision: https://reviews.llvm.org/D90783	2020-11-06 06:22:46 -08:00
Florian Hahn	d8d1cc647d	[SLP] Also try to vectorize incoming values of PHIs . Currently we do not consider incoming values of PHIs as roots for SLP vectorization. This means we miss scenarios like the one in the test case and PR47670. It appears quite straight-forward to consider incoming values of PHIs as roots for vectorization, but I might be missing something that makes this problematic. In terms of vectorized instructions, this applies to quite a few benchmarks across MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto Same hash: 185 (filtered out) Remaining: 52 Metric: SLP.NumVectorInstructions Program base patch diff test-suite...ProxyApps-C++/HPCCG/HPCCG.test 9.00 27.00 200.0% test-suite...C/CFP2000/179.art/179.art.test 8.00 22.00 175.0% test-suite...T2006/458.sjeng/458.sjeng.test 14.00 30.00 114.3% test-suite...ce/Benchmarks/PAQ8p/paq8p.test 11.00 18.00 63.6% test-suite...s/FreeBench/neural/neural.test 12.00 18.00 50.0% test-suite...rimaran/enc-3des/enc-3des.test 65.00 95.00 46.2% test-suite...006/450.soplex/450.soplex.test 63.00 89.00 41.3% test-suite...ProxyApps-C++/CLAMR/CLAMR.test 177.00 250.00 41.2% test-suite...nchmarks/McCat/18-imp/imp.test 13.00 18.00 38.5% test-suite.../Applications/sgefa/sgefa.test 26.00 35.00 34.6% test-suite...pplications/oggenc/oggenc.test 100.00 133.00 33.0% test-suite...6/482.sphinx3/482.sphinx3.test 103.00 134.00 30.1% test-suite...oxyApps-C++/miniFE/miniFE.test 169.00 213.00 26.0% test-suite.../Benchmarks/Olden/tsp/tsp.test 59.00 73.00 23.7% test-suite...TimberWolfMC/timberwolfmc.test 503.00 622.00 23.7% test-suite...T2006/456.hmmer/456.hmmer.test 65.00 79.00 21.5% test-suite...libquantum/462.libquantum.test 58.00 68.00 17.2% test-suite...ternal/HMMER/hmmcalibrate.test 84.00 98.00 16.7% test-suite...ications/JM/ldecod/ldecod.test 351.00 401.00 14.2% test-suite...arks/VersaBench/dbms/dbms.test 52.00 57.00 9.6% test-suite...ce/Benchmarks/Olden/bh/bh.test 118.00 128.00 8.5% test-suite.../Benchmarks/Bullet/bullet.test 6355.00 6880.00 8.3% test-suite...nsumer-lame/consumer-lame.test 480.00 519.00 8.1% test-suite...000/183.equake/183.equake.test 226.00 244.00 8.0% test-suite...chmarks/Olden/power/power.test 105.00 113.00 7.6% test-suite...6/471.omnetpp/471.omnetpp.test 92.00 99.00 7.6% test-suite...ications/JM/lencod/lencod.test 1173.00 1261.00 7.5% test-suite...0/253.perlbmk/253.perlbmk.test 55.00 59.00 7.3% test-suite...oxyApps-C/miniAMR/miniAMR.test 92.00 98.00 6.5% test-suite...chmarks/MallocBench/gs/gs.test 446.00 473.00 6.1% test-suite.../CINT2006/403.gcc/403.gcc.test 464.00 491.00 5.8% test-suite...6/464.h264ref/464.h264ref.test 998.00 1055.00 5.7% test-suite...006/453.povray/453.povray.test 5711.00 6007.00 5.2% test-suite...FreeBench/distray/distray.test 102.00 107.00 4.9% test-suite...:: External/Povray/povray.test 4184.00 4378.00 4.6% test-suite...DOE-ProxyApps-C/CoMD/CoMD.test 112.00 117.00 4.5% test-suite...T2006/445.gobmk/445.gobmk.test 104.00 108.00 3.8% test-suite...CI_Purple/SMG2000/smg2000.test 789.00 819.00 3.8% test-suite...yApps-C++/PENNANT/PENNANT.test 233.00 241.00 3.4% test-suite...marks/7zip/7zip-benchmark.test 417.00 428.00 2.6% test-suite...arks/mafft/pairlocalalign.test 627.00 643.00 2.6% test-suite.../Benchmarks/nbench/nbench.test 259.00 265.00 2.3% test-suite...006/447.dealII/447.dealII.test 4641.00 4732.00 2.0% test-suite...lications/ClamAV/clamscan.test 106.00 108.00 1.9% test-suite...CFP2000/177.mesa/177.mesa.test 1639.00 1664.00 1.5% test-suite...oxyApps-C/RSBench/rsbench.test 66.00 65.00 -1.5% test-suite.../CINT2000/252.eon/252.eon.test 3416.00 3444.00 0.8% test-suite...CFP2000/188.ammp/188.ammp.test 1846.00 1861.00 0.8% test-suite.../CINT2000/176.gcc/176.gcc.test 152.00 153.00 0.7% test-suite...CFP2006/444.namd/444.namd.test 3528.00 3544.00 0.5% test-suite...T2006/473.astar/473.astar.test 98.00 98.00 0.0% test-suite...frame_layout/frame_layout.test NaN 39.00 nan% On ARM64, there appears to be a slight regression on SPEC2006, which might be interesting to investigate: test-suite...T2006/473.astar/473.astar.test 0.9% Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D88735	2020-11-06 12:50:32 +00:00
Sander de Smalen	4a3bb9ea6c	[VPlan] NFC: Change VFRange to take ElementCount This patch changes the type of Start, End in VFRange to be an ElementCount instead of `unsigned`. This is done as preparation to make VPlans for scalable vectors, but is otherwise NFC. Reviewed By: dmgreen, fhahn, vkmr Differential Revision: https://reviews.llvm.org/D90715	2020-11-06 09:50:20 +00:00
Roman Lebedev	8d0fdd36a3	[IR] CmpInst: Add getFlippedSignednessPredicate() And refactor a few places to use it	2020-11-06 11:31:09 +03:00
Giorgis Georgakoudis	700d2417d8	[CodeExtractor] Replace uses of extracted bitcasts in out-of-region lifetime markers CodeExtractor handles bitcasts in the extracted region that have lifetime markers users in the outer region as outputs. That creates unnecessary alloca/reload instructions and extra lifetime markers. The patch identifies those cases, and replaces uses in out-of-region lifetime markers with new bitcasts in the outer region. Example ``` define void @foo() { entry: %0 = alloca i32 br label %extract extract: %1 = bitcast i32* %0 to i8* call void @llvm.lifetime.start.p0i8(i64 4, i8* %1) call void @use(i32* %0) br label %exit exit: call void @use(i32* %0) call void @llvm.lifetime.end.p0i8(i64 4, i8* %1) ret void } ``` Current extraction ``` define void @foo() { entry: %.loc = alloca i8, align 8 %0 = alloca i32, align 4 br label %codeRepl codeRepl: ; preds = %entry %lt.cast = bitcast i8* %.loc to i8* call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast) %lt.cast1 = bitcast i32* %0 to i8* call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1) call void @foo.extract(i32* %0, i8** %.loc) %.reload = load i8, i8* %.loc, align 8 call void @llvm.lifetime.end.p0i8(i64 -1, i8* %lt.cast) br label %exit exit: ; preds = %codeRepl call void @use(i32* %0) call void @llvm.lifetime.end.p0i8(i64 4, i8* %.reload) ret void } define internal void @foo.extract(i32* %0, i8** %.out) { newFuncRoot: br label %extract exit.exitStub: ; preds = %extract ret void extract: ; preds = %newFuncRoot %1 = bitcast i32* %0 to i8* store i8* %1, i8** %.out, align 8 call void @use(i32* %0) br label %exit.exitStub } ``` Extraction with patch ``` define void @foo() { entry: %0 = alloca i32, align 4 br label %codeRepl codeRepl: ; preds = %entry %lt.cast1 = bitcast i32* %0 to i8* call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1) call void @foo.extract(i32* %0) br label %exit exit: ; preds = %codeRepl call void @use(i32* %0) %lt.cast = bitcast i32* %0 to i8* call void @llvm.lifetime.end.p0i8(i64 4, i8* %lt.cast) ret void } define internal void @foo.extract(i32* %0) { newFuncRoot: br label %extract exit.exitStub: ; preds = %extract ret void extract: ; preds = %newFuncRoot %1 = bitcast i32* %0 to i8* call void @use(i32* %0) br label %exit.exitStub } ``` Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D90689	2020-11-05 17:01:08 -08:00
Sjoerd Meijer	7eb70158e4	[IndVarSimplify][SimplifyIndVar] Move WidenIV to Utils/SimplifyIndVar. NFCI. This moves WidenIV from IndVarSimplify to Utils/SimplifyIndVar so that we have createWideIV available as a generic helper utility. I.e., this is not only useful in IndVarSimplify, but could be useful for loop transformations. For example, motivation for this refactoring is the loop flatten transformation: if induction variables in a loop nest can be widened, we can avoid having to perform certain overflow checks, enabling this transformation. Differential Revision: https://reviews.llvm.org/D90421	2020-11-05 16:52:47 +00:00
Florian Hahn	be0578f0b4	[GVN] Fix MemorySSA update when replacing assume(false) with stores. When replacing an assume(false) with a store, we have to be more careful with the order we insert the new access. This patch updates the code to look at the accesses in the block to find a suitable insertion point. Alterantively we could check the defining access of the assume, but IIRC there has been some discussion about making assume() readnone, so looking at the access list might be more future proof. Fixes PR48072. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D90784	2020-11-05 12:09:32 +00:00
Simon Pilgrim	4b2be681f4	[InstCombine] Remove orphan InstCombinerImpl method declarations. NFCI.	2020-11-05 10:13:16 +00:00
Arnold Schwaighofer	ea5989b43a	Start of an llvm.coro.async implementation This patch adds the `async` lowering of coroutines. This will be used by the Swift frontend to lower async functions. In contrast to the `retcon` lowering the frontend needs to be in control over control-flow at suspend points as execution might be suspended at these points. This is very much work in progress and the implementation will change as it evolves with the frontend. As such the documentation is lacking detail as some of it might change. rdar://70097093 Reapply with fix for memory sanitizer failure and sphinx failure. Differential Revision: https://reviews.llvm.org/D90612	2020-11-04 10:29:21 -08:00
Arnold Schwaighofer	42f1916640	Revert "Start of an llvm.coro.async implementation" This reverts commit `ea606cced0`. This patch causes memory sanitizer failures sanitizer-x86_64-linux-fast.	2020-11-04 08:26:20 -08:00
Arnold Schwaighofer	ea606cced0	Start of an llvm.coro.async implementation This patch adds the `async` lowering of coroutines. This will be used by the Swift frontend to lower async functions. In contrast to the `retcon` lowering the frontend needs to be in control over control-flow at suspend points as execution might be suspended at these points. This is very much work in progress and the implementation will change as it evolves with the frontend. As such the documentation is lacking detail as some of it might change. rdar://70097093 Differential Revision: https://reviews.llvm.org/D90612	2020-11-04 07:32:29 -08:00
Roman Lebedev	93f3d7f7b3	[Reassociate] Guard `add`-like `or` conversion into an `add` with profitability check This is slightly better compile-time wise, since we avoid potentially-costly knownbits analysis that will ultimately not allow us to actually do anything with said `add`.	2020-11-04 16:10:34 +03:00
Martin Storsjö	36cf1e7d0e	Revert "[AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts" This reverts commit `59b22e495c`. That commit broke building for ARM and AArch64, reproducible like this: $ cat apedec-reduced.c a; b(e) { int c; unsigned d = f(); c = d >> 32 - e; return c; } g() { int h = i(); if (a) h = h << a \| b(a); return h; } $ clang -target aarch64-linux-gnu -w -c -O3 apedec-reduced.c clang: ../lib/Transforms/InstCombine/InstructionCombining.cpp:3656: bool llvm::InstCombinerImpl::run(): Assertion `DT.dominates(BB, UserParent) && "Dominance relation broken?"' failed. Same thing for e.g. an armv7-linux-gnueabihf target.	2020-11-04 08:39:32 +02:00
Xun Li	7f34aca083	[musttail] Unify musttail call preceding return checking There is already an API in BasicBlock that checks and returns the musttail call if it precedes the return instruction. Use it instead of manually checking in each place. Differential Revision: https://reviews.llvm.org/D90693	2020-11-03 11:39:27 -08:00
Roman Lebedev	70472f34b2	[Reassociate] Convert `add`-like `or`'s into an `add`'s to allow reassociation InstCombine is quite aggressive in doing the opposite transform, folding `add` of operands with no common bits set into an `or`, and that not many things support that new pattern.. In this case, teaching Reassociate about it is easy, there's preexisting art for `sub`/`shl`: just convert such an `or` into an `add`: https://rise4fun.com/Alive/Xlyv	2020-11-03 22:30:51 +03:00
Sanne Wouda	2ec26d3a23	Revert "Add loop distribution to the LTO pipeline" This reverts commit `6e80318eec`.	2020-11-03 19:29:27 +00:00
Sanne Wouda	6e80318eec	Add loop distribution to the LTO pipeline The LoopDistribute pass is missing from the LTO pipeline, so -enable-loop-distribute has no effect during post-link. The pre-link loop distribution doesn't seem to survive the LTO pipeline either. With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43% uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is unaffected. Differential Revision: https://reviews.llvm.org/D89896	2020-11-03 18:54:24 +00:00
Jameson Nash	59a6ab28c4	[GVN] small improvements to comments	2020-11-03 13:21:48 -05:00
Roman Lebedev	c009d11bda	[InstCombine] Perform C-(X+C2) --> (C-C2)-X transform before using Negator In particular, it makes it fire for C=0, because negator doesn't want to perform that fold since in general it's not beneficial.	2020-11-03 16:06:52 +03:00
Roman Lebedev	e465f9c303	[InstCombine] Negator: - (C - %x) --> %x - C (PR47997) This relaxes one-use restriction on that `sub` fold, since apparently the addition of Negator broke preexisting `C-(C2-X) --> X+(C-C2)` (with C=0) fold.	2020-11-03 16:06:51 +03:00
Florian Hahn	d68bed0fa9	[SCCP] Handle bitcast of vector constants. Vectors where all elements have the same known constant range are treated as a single constant range in the lattice. When bitcasting such vectors, there is a mis-match between the width of the lattice value (single constant range) and the original operands (vector). Go to overdefined in that case. Fixes PR47991.	2020-11-03 12:58:39 +00:00
Simon Pilgrim	59b22e495c	[AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns. This should allow us to begin resolving PR46896 et al. Differential Revision: https://reviews.llvm.org/D90625	2020-11-03 10:49:49 +00:00
Florian Hahn	d9cbf39a37	[SLP] Pass VecPred argument to getCmpSelInstrCost. Check if all compares in VL have the same predicate and pass it to getCmpSelInstrCost, to improve cost-modeling on targets that only support compare/select combinations for certain uniform predicates. This leads to additional vectorization in some cases ``` Same hash: 217 (filtered out) Remaining: 19 Metric: SLP.NumVectorInstructions Program base slp2 diff test-suite...marks/SciMark2-C/scimark2.test 11.00 26.00 136.4% test-suite...T2006/445.gobmk/445.gobmk.test 79.00 135.00 70.9% test-suite...ediabench/gsm/toast/toast.test 54.00 71.00 31.5% test-suite...telecomm-gsm/telecomm-gsm.test 54.00 71.00 31.5% test-suite...CI_Purple/SMG2000/smg2000.test 426.00 542.00 27.2% test-suite...ch/g721/g721encode/encode.test 30.00 24.00 -20.0% test-suite...000/186.crafty/186.crafty.test 116.00 138.00 19.0% test-suite...ications/JM/ldecod/ldecod.test 697.00 765.00 9.8% test-suite...6/464.h264ref/464.h264ref.test 822.00 886.00 7.8% test-suite...chmarks/MallocBench/gs/gs.test 154.00 162.00 5.2% test-suite...nsumer-lame/consumer-lame.test 621.00 651.00 4.8% test-suite...lications/ClamAV/clamscan.test 223.00 231.00 3.6% test-suite...marks/7zip/7zip-benchmark.test 680.00 695.00 2.2% test-suite...CFP2000/177.mesa/177.mesa.test 2121.00 2129.00 0.4% test-suite...:: External/Povray/povray.test 2406.00 2412.00 0.2% test-suite...TimberWolfMC/timberwolfmc.test 634.00 634.00 0.0% test-suite...CFP2006/433.milc/433.milc.test 1036.00 1036.00 0.0% test-suite.../Benchmarks/nbench/nbench.test 321.00 321.00 0.0% test-suite...ctions-flt/Reductions-flt.test NaN 5.00 nan% ``` Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D90124	2020-11-03 10:16:43 +00:00
Max Kazantsev	46b2e85f0f	[NFC] Refactor code in IndVars, preparing for further improvement	2020-11-03 15:08:12 +07:00
Max Kazantsev	a44b7322a2	[NFC] Split lambda into 2 parts for further reuse	2020-11-03 14:13:55 +07:00
Max Kazantsev	f847094c24	[IndVars] Use knowledge about execution on last iteration when removing checks If we know that some check will not be executed on the last iteration, we can use this fact to eliminate its check. Differential Revision: https://reviews.llvm.org/D88210 Reviwed By: ebrevnov	2020-11-03 13:38:58 +07:00
Alina Sbirlea	f514b32a89	[LICM] Add assert of AST/MSSA exclusiveness. The API `canSinkOrHoistInst` may be called by LoopSink. Add assert to avoid having two analyses passed in.	2020-11-02 18:04:43 -08:00
Akira Hatanaka	b0f1d7d562	Remove unused parameter	2020-11-02 17:40:06 -08:00
Ettore Tiotto	4274cbba1c	[PartialInliner]: Handle code regions in a switch stmt cases This patch enhances computeOutliningColdRegionsInfo() to allow it to consider regions containing a single basic block and a single predecessor as candidate for partial inlining. Reviewed By: fhann Differential Revision: https://reviews.llvm.org/D89911	2020-11-02 14:32:45 -05:00
Simon Pilgrim	55f15f99cb	[AggressiveInstCombine] foldGuardedRotateToFunnelShift - generalize rotation to funnel shift matcher. Replace matchRotate with a more general matchFunnelShift - at the moment this is still just used for rotation patterns.	2020-11-02 17:09:17 +00:00
Fangrui Song	98b9338588	[Debugify] Port -debugify-each to NewPM Preemptively switch 2 tests to the new PM Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D90365	2020-11-02 08:16:43 -08:00
Florian Hahn	b3b993a7ad	Reland "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts the revert commit `408c4408fa`. This version of the patch includes a fix for a crash caused by treating ICmp/FCmp constant expressions as instructions. Original message: On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV.	2020-11-02 15:39:29 +00:00
Teresa Johnson	0949f96dc6	[MemProf] Pass down memory profile name with optional path from clang Similar to -fprofile-generate=, add -fmemory-profile= which takes a directory path. This is passed down to LLVM via a new module flag metadata. LLVM in turn provides this name to the runtime via the new __memprof_profile_filename variable. Additionally, always pass a default filename (in $cwd if a directory name is not specified vi the = form of the option). This is also consistent with the behavior of the PGO instrumentation. Since the memory profiles will generally be fairly large, it doesn't make sense to dump them to stderr. Also, importantly, the memory profiles will eventually be dumped in a compact binary format, which is another reason why it does not make sense to send these to stderr by default. Change the existing memprof tests to specify log_path=stderr when that was being relied on. Depends on D89086. Differential Revision: https://reviews.llvm.org/D89087	2020-11-01 17:38:23 -08:00
Florian Hahn	ca38652b9a	[VPlan] Assert no users remaining when deleting a VPValue. When deleting a VPValue, all users must already by deleted. Add an assertion to make sure and catch violations.	2020-11-01 17:44:53 +00:00
Florian Hahn	aab71d4443	[DSE] Use same logic as legacy impl to check if free kills a location. This patch updates DSE + MemorySSA to use the same check as the legacy implementation to determine if a location is killed by a free call. This changes the existing behavior so that a free does not kill locations before the start of the freed pointer. This should fix PR48036.	2020-10-31 20:09:25 +00:00
Florian Hahn	799033d8c5	Reland "[SLP] Consider alternatives for cost of select instructions." This reverts the revert commit `a1b53db324`. This patch includes a fix for a reported issue, caused by matchSelectPattern returning UMIN for selects of pointers in some cases by looking to some connected casts. For now, ensure integer instrinsics are only returned for selects of ints or int vectors.	2020-10-31 16:52:36 +00:00
Simon Pilgrim	538fdb0189	[InstCombine] foldSelectRotate - generalize to foldSelectFunnelShift This is the last of the rotate->funnel shift InstCombine generalizations for PR46896 We still have foldGuardedRotateToFunnelShift to deal with in AggressiveInstCombine Differential Revision: https://reviews.llvm.org/D90382	2020-10-31 12:32:34 +00:00
Simon Pilgrim	4da6a48399	[CSE] Make some basic EarlyCSE::StackNode helper methods const. NFCI. Fixes a number of cppcheck remarks.	2020-10-31 12:16:48 +00:00
Nikita Popov	27f647d117	[Inliner] Consistently apply callsite noalias metadata Previously, !noalias and !alias.scope metadata on the call site was applied as part of CloneAliasScopeMetadata(), which short-circuits if the callee does not use any noalias metadata itself. However, these two things have no relation to each other. Consistently apply !noalias and !alias.scope metadata by integrating this into an existing function that handled !llvm.access.group and !llvm.mem.parallel_loop_access metadata. The handling for all of these metadata kinds essentially the same.	2020-10-31 10:54:45 +01:00
Arthur Eubanks	5c31b8b94f	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit `10f2a0d662`. More uint64_t overflows.	2020-10-31 00:25:32 -07:00
Florian Hahn	a1b53db324	Revert "[SLP] Consider alternatives for cost of select instructions." This reverts commit `1922570489`. This appears to cause a crash in the following example a, b, c; l() { int e = a, f = l, g, h, i, j; float d = c, k = b; for (;;) for (; g < f; g++) { k[h] = d[i]; k[h - 1] = d[j]; h += e << 1; i += e; } } clang -cc1 -triple i386-unknown-linux-gnu -emit-obj -target-cpu pentium-m -O1 -vectorize-loops -vectorize-slp reduced.c llvm::Type *llvm::Type::getWithNewBitWidth(unsigned int) const: Assertion `isIntOrIntVectorTy() && "Original type expected to be a vector of integers or a scalar integer."' failed.	2020-10-30 21:26:14 +00:00
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit `73f01e3df5`. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Peter Collingbourne	3d049bce98	hwasan: Support for outlined checks in the Linux kernel. Add support for match-all tags and GOT-free runtime calls, which are both required for the kernel to be able to support outlined checks. This requires extending the access info to let the backend know when to enable these features. To make the code easier to maintain introduce an enum with the bit field positions for the access info. Allow outlined checks to be enabled with -mllvm -hwasan-inline-all-checks=0. Kernels that contain runtime support for outlined checks may pass this flag. Kernels lacking runtime support will continue to link because they do not pass the flag. Old versions of LLVM will ignore the flag and continue to use inline checks. With a separate kernel patch [1] I measured the code size of defconfig + tag-based KASAN, as well as boot time (i.e. time to init launch) on a DragonBoard 845c with an Android arm64 GKI kernel. The results are below: code size boot time before 92824064 6.18s after 38822400 6.65s [1] https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76 Depends on D90425 Differential Revision: https://reviews.llvm.org/D90426	2020-10-30 14:25:40 -07:00
Peter Collingbourne	0930763b4b	hwasan: Move fixed shadow behind opaque no-op cast as well. This is a workaround for poor heuristics in the backend where we can end up materializing the constant multiple times. This is particularly bad when using outlined checks because we materialize it for every call (because the backend considers it trivial to materialize). As a result the field containing the shadow base value will always be set so simplify the code taking that into account. Differential Revision: https://reviews.llvm.org/D90425	2020-10-30 13:23:52 -07:00
Arthur Eubanks	10f2a0d662	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-30 10:03:46 -07:00
Pedro Tammela	86e0c1acdb	[NFC][Reg2Mem] modernize loops iterators This patch updates the Reg2Mem loops to use more modern iterators. Differential Revision: https://reviews.llvm.org/D90122	2020-10-30 16:50:07 +00:00
Pedro Tammela	70a495c7f0	[NFC][LoopSimplify] modernize for loops over LoopInfo This patch modifies two for loops to use the range based syntax. Since they are equivalent, this patch is tagged NFC. Differential Revision: https://reviews.llvm.org/D90069	2020-10-30 16:50:07 +00:00
Michael Liao	c82403d025	[gvn] PRE needs to skip convergent intrinsics/calls. - As convergent intrinsics/calls could only be moved to control-equivalent blocks, or more precisely the same divergent branch, PRE needs to skip them. Differential Revision: https://reviews.llvm.org/D90391	2020-10-30 11:24:40 -04:00
Evgeniy Brevnov	3d31adaec4	[DSE] Improve partial overlap detection Currently isOverwrite returns OW_MaybePartial even for accesss known not to overlap. This is not a big problem for legacy implementation (since isPartialOverwrite follows isOverwrite and clarifies the result). Contrary SSA based version does a lot of work to later find out that accesses don't overlap. Besides negative impact on compile time we quickly reach MemorySSAPartialStoreLimit and miss optimization opportunities. Note: In fact, I think it would be cleaner implementation if isOverwrite returned fully clarified result in the first place whithout need to call isPartialOverwrite. This can be done as a follow up. What do you think? Reviewed By: fhahn, asbirlea Differential Revision: https://reviews.llvm.org/D90371	2020-10-30 22:23:20 +07:00
Simon Pilgrim	ed577892cf	Use cast<> instead of dyn_cast<> as we dereference the pointers immediately. NFCI. Fix clang static analyzer warnings - we're better off relying on cast<> asserting on failure rather than a null dereference crash.	2020-10-30 15:20:40 +00:00
Florian Hahn	aa1a198a64	[VPlan] Use isa<> instead getVPRecipeID in getFirstNonPhi (NFC). As per the comment in VPRecipeBase, clients should not rely on getVPRecipeID, as it may change in the future. It should only be used in classof implementations. Use isa instead in getFirstNonPhi.	2020-10-30 14:56:06 +00:00
Simon Pilgrim	b7c91a9b8e	[SCEV] SCEVExpander::InsertNoopCastOfTo - reduce scope of pointer type. NFCI. By reducing the scope of the dyn_cast<PointerType> we can make this a cast<PointerType> and avoid clang static analyzer null deference warnings.	2020-10-30 14:55:09 +00:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
Simon Pilgrim	9e154f1aca	[SROA] Pass Twine by const reference. NFCI. Fixes clang-tidy warnings.	2020-10-30 11:36:58 +00:00
Max Kazantsev	bd341bafbf	[NFC] Simplify code in IndVars	2020-10-30 17:49:32 +07:00
Florian Hahn	05e4f7bde9	[DSE] Remove noop stores after killing stores for a MemoryDef. Currently we fail to eliminate some noop stores if there is a kill-able store between the starting def and the load. This is because we eliminate noop stores first. In practice it seems like eliminating noop stores after the main elimination for a def covers slightly more cases. This patch improves the number of stores slightly in 2 cases for X86 -O3 -flto Same hash: 235 (filtered out) Remaining: 2 Metric: dse.NumRedundantStores Program base patch diff test-suite...ce/Benchmarks/PAQ8p/paq8p.test 2.00 3.00 50.0% test-suite...006/453.povray/453.povray.test 18.00 21.00 16.7% There might be other phase ordering issues, but it appears that they do not show up in the test-suite/SPEC2000/SPEC2006. We can always tune the ordering later. Partly fixes PR47887. Reviewed By: asbirlea, zoecarver Differential Revision: https://reviews.llvm.org/D89650	2020-10-30 09:40:15 +00:00
Roman Lebedev	81fc53a36a	[SCEV] Introduce SCEVPtrToIntExpr (PR46786) And use it to model LLVM IR's `ptrtoint` cast. This is essentially an alternative to D88806, but with no chance for all the problems it caused due to having the cast as implicit there. (see rG7ee6c402474a2f5fd21c403e7529f97f6362fdb3) As we've established by now, there are at least two reasons why we want this: * It will allow SCEV to actually model the `ptrtoint` casts and their operands, instead of treating them as `SCEVUnknown` * It should help with initial problem of PR46786 - this should eventually allow us to not loose pointer-ness of an expression in more cases As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=46786 \| PR46786 ]], in principle, we could just extend `SCEVUnknown` with a `is ptrtoint` cast, because `ScalarEvolution::getPtrToIntExpr()` should sink the cast as far down into the expression as possible, so in the end we should always end up with `SCEVPtrToIntExpr` of `SCEVUnknown`. But i think that it isn't the best solution, because it doesn't really matter from memory consumption side - there probably won't be that many `SCEVPtrToIntExpr`s for it to matter, and it allows for much better discoverability. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D89456	2020-10-30 11:13:35 +03:00
Vitaly Buka	36fa658db5	[NFC] Fix "ambiguous overload for ‘operator=’" From D89768	2020-10-30 00:43:32 -07:00
Vitaly Buka	1455259546	[NFC] Fix "ambiguous overload for ‘operator=’"	2020-10-30 00:36:50 -07:00
Xun Li	9f5a2beadc	[Coroutine] Properly determine whether an alloca should live on the frame The existing logic in determining whether an alloca should live on the frame only looks explicit def-use relationships. However a value defined by an alloca may be implicitly needed across suspension points, either because an alias has across-suspension-point def-use relationship, or escaped by store/call/memory intrinsics. To properly handle all these cases, we have to properly visit the alloca pointer up-front. Thie patch extends the exisiting alloca use visitor to determine whether an alloca should live on the frame. Differential Revision: https://reviews.llvm.org/D89768	2020-10-29 23:56:05 -07:00
Stefanos Baziotis	a3345300b6	[LCSSA] Doc for special treatment of PHIs Differential Revision: https://reviews.llvm.org/D89739	2020-10-29 22:50:07 +02:00
Nikita Popov	20b386aae0	[LoopUtils] Fix neutral value for vector.reduce.fadd Use -0.0 instead of 0.0 as the start value. The previous use of 0.0 was fine for all existing uses of this function though, as it is always generated with fast flags right now, and thus nsz.	2020-10-29 21:45:13 +01:00
Florian Hahn	1922570489	[SLP] Consider alternatives for cost of select instructions. Some architectures do not have general vector select instructions (e.g. AArch64). But some cmp/select patterns can be vectorized using other instructions/intrinsics. One example is using min/max instructions for certain patterns. This patch updates the cost calculations for selects in the SLP vectorizer to consider using min/max intrinsics. This patch does not change SLP vectorizer's codegen itself to actually generate those intrinsics, but relies on the backends to lower the vector cmps & selects. This keeps things simple on the SLP side and works well in practice for AArch64. This exposes additional SLP vectorization opportunities in some benchmarks on AArch64 (-O3 -flto). Metric: SLP.NumVectorInstructions Program base slp diff test-suite...ications/JM/ldecod/ldecod.test 502.00 697.00 38.8% test-suite...ications/JM/lencod/lencod.test 1023.00 1414.00 38.2% test-suite...-typeset/consumer-typeset.test 56.00 65.00 16.1% test-suite...6/464.h264ref/464.h264ref.test 804.00 822.00 2.2% test-suite...006/453.povray/453.povray.test 3335.00 3357.00 0.7% test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2121.00 0.5% test-suite...:: External/Povray/povray.test 2378.00 2382.00 0.2% Reviewed By: RKSimon, samparker Differential Revision: https://reviews.llvm.org/D89969	2020-10-29 20:39:50 +00:00
Dávid Bolvanský	7a2abf5aca	[InferAttrs] Add nocapture/writeonly to string/mem libcalls One step closer to fix PR47644. Differential Revision: https://reviews.llvm.org/D89645	2020-10-29 20:06:43 +01:00
Simon Pilgrim	dcb3dc101d	[InstCombine] visitShl - ensure inner shifts have inrange amounts Noticed when fixing OSS Fuzz #26716	2020-10-29 15:28:15 +00:00
Max Kazantsev	a5b2e795c3	[NFC][SCEV] Refactor monotonic predicate checks to return enums instead of bools This patch gets rid of output parameter which is not needed for most users and prepares this API for further refactoring.	2020-10-29 16:01:25 +07:00
Johannes Doerfert	d39f574dcc	[Attributor][FIX] Properly promote arguments pointers to arrays When we promote pointer arguments we did compute a wrong offset and use a wrong type for the array case. Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.	2020-10-29 00:45:32 -05:00
Fangrui Song	39856d5d0b	[Debugify] Move global namespace functions into llvm:: Also move exportDebugifyStats from tools/opt to Debugify.cpp	2020-10-28 19:11:41 -07:00
Florian Hahn	53f4c4b2cc	[InstCombine] Do not introduce bitcasts for swifterror arguments. The following constraints hold for swifterror values: A swifterror value (either the parameter or the alloca) can only be loaded and stored from, or used as a swifterror argument. This patch updates instcombine to not try to convert a bitcast of a function into a bitcast of a swifterror argument. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D90258	2020-10-28 21:52:12 +00:00
Benjamin Kramer	207cf71fa9	Revert "[OpenMP] Add Passing in Original Declaration Names To Mapper API" This reverts commit `d981c7b758` and `a87d7b3d44`. Test fails under msan.	2020-10-28 13:58:14 +01:00
Max Kazantsev	160a453138	Return "[IndVars] Remove monotonic checks with unknown exit count" This reverts commit `e038b60d91`. This reverts commit `a0d84d8031`. This revert was a mistake. The reason of the failures was "Use uint64_t for branch weights instead of uint32_t" Differential Revision: https://reviews.llvm.org/D87832	2020-10-28 18:51:40 +07:00
Florian Hahn	b82f80057d	[DSE] Use walker to skip noalias stores between current & clobber def. Instead of getting the defining access we should be able to use getClobberingMemoryAccess to skip non-aliasing MemoryDefs. No additional checks should be needed, because we only remove the starting def if it matches the defining access of the load. All we need to worry about is that there are no (may)alias stores between the starting def and the load and getClobberingMemoryAccess should guarantee that. Partly fixes PR47887. This improves the number of redundant stores removed in some cases (numbers below for MultiSource, SPEC2000, SPEC2006 on X86 with -flto -O3). Same hash: 226 (filtered out) Remaining: 11 Metric: dse.NumRedundantStores Program base patch1 diff test-suite...:: External/Povray/povray.test 1.00 5.00 400.0% test-suite...chmarks/MallocBench/gs/gs.test 1.00 3.00 200.0% test-suite...0/253.perlbmk/253.perlbmk.test 21.00 37.00 76.2% test-suite...0.perlbench/400.perlbench.test 24.00 37.00 54.2% test-suite.../Applications/SPASS/SPASS.test 3.00 4.00 33.3% test-suite...006/453.povray/453.povray.test 15.00 18.00 20.0% test-suite...T2006/445.gobmk/445.gobmk.test 27.00 29.00 7.4% test-suite.../CINT2006/403.gcc/403.gcc.test 136.00 137.00 0.7% test-suite.../CINT2000/176.gcc/176.gcc.test 6.00 6.00 0.0% test-suite.../Benchmarks/Bullet/bullet.test NaN 3.00 nan% test-suite.../Benchmarks/Ptrdist/bc/bc.test NaN 1.00 nan% Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89647	2020-10-28 11:01:25 +00:00
Luqman Aden	4c0a016927	Rename EHPersonality::MSVC_Win64SEH to EHPersonality::MSVC_TableSEH. NFC. The types of SEH aren't x86(-32) vs x64 but rather stack-based exception chaining vs table-based exception handling. x86-32 is the only arch for which Windows uses the former. 32-bit ARM would use what is called Win64SEH today, which is a bit confusing so instead let's just rename it to be a bit more clear. Reviewed By: compnerd, rnk Differential Revision: https://reviews.llvm.org/D90117	2020-10-27 23:22:13 -07:00
Kazu Hirata	b2f05fae80	[JumpThreading] Remove extraneous calls to setEdgeProbability This patch removes extraneous calls to setEdgeProbability introduced in `c91487769d`. The follow-up patch, `a7b662d0f4`, has since fixed BranchProbabilityInfo::eraseBlock, so we don't need to worry about getting stale values from getEdgeProbability. Also, since getEdgeProbability(BB, BB->getSingleSuccessor()) returns edge probability 1/1 by default for BB with exactly one successor edge, we don't need to explicitly call setEdgeProbability. This patch introduces almost no functional change, but we do end up reducing debug messages from setEdgeProbability. Differential Revision: https://reviews.llvm.org/D90284	2020-10-27 21:12:54 -07:00
Johannes Doerfert	d13daa4018	[Attributor] Finalize the CGUpdater after each SCC This matches the new PM model.	2020-10-27 22:07:56 -05:00
Johannes Doerfert	50d34958df	[Attributor][NFC] Introduce a debug counter for `AA::manifest` This will simplify debugging and tracking down problems.	2020-10-27 22:07:56 -05:00
Johannes Doerfert	1d57b7f503	[Attributor][NFC] Print the right value in debug output	2020-10-27 22:07:55 -05:00
Johannes Doerfert	1c2531c9e1	[Attributor][FIX] Delete all unreachable static functions Before we used to only mark unreachable static functions as dead if all uses were known dead. Now we optimistically assume uses to be dead until proven otherwise.	2020-10-27 22:07:55 -05:00
Johannes Doerfert	bfe05b1aff	[Attributor][FIX] Do not attach range metadata to the wrong Instruction If we are looking at a call site argument it might be a load or call which is in a different context than the call site argument. We cannot simply use the call site argument range for the call or load. Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.	2020-10-27 22:07:55 -05:00
Johannes Doerfert	724fcce109	[Attributor][NFC] Clang-format	2020-10-27 22:07:55 -05:00
Johannes Doerfert	d504f7b91a	[Attributor][NFC] Hoist call out of a lambda The call is not free, unsure if this is needed but it does not make it worse either.	2020-10-27 22:07:54 -05:00
Johannes Doerfert	30e5a1f0be	[Attributor][FIX] Properly check uses in the call not uses of the call In the AANoAlias logic we determine if a pointer may have been captured before a call. We need to look at other uses in the call not uses of the call. The new code is not perfect as it does not allow trivial cases where the call has multiple arguments but it is at least not unsound and a TODO was added.	2020-10-27 22:07:54 -05:00
Johannes Doerfert	cb813ab66a	[Attributor][NFC] Improve time trace output	2020-10-27 22:07:54 -05:00
Kazu Hirata	c91487769d	[JumpThreading] Set edge probabilities when creating basic blocks This patch teaches the jump threading pass to set edge probabilities whenever the pass creates new basic blocks. Without this patch, the compiler sometimes produces non-deterministic results. The non-determinism comes from the jump threading pass using stale edge probabilities in BranchProbabilityInfo. Specifically, when the jump threading pass creates a new basic block, we don't initialize its outgoing edge probability. Edge probabilities are maintained in: DenseMap<Edge, BranchProbability> Probs; in class BranchProbabilityInfo, where Edge is an ordered pair of BasicBlock * and a successor index declared as: using Edge = std::pair<const BasicBlock *, unsigned>; Probs maps edges to their corresponding probabilities. Now, we rarely remove entries from this map, so if we happen to allocate a new basic block at the same address as a previously deleted basic block with an edge probability assigned, the newly created basic block appears to have an edge probability, albeit a stale one. This patch fixes the problem by explicitly setting edge probabilities whenever the jump threading pass creates new basic blocks. Differential Revision: https://reviews.llvm.org/D90106	2020-10-27 16:07:27 -07:00
Joseph Huber	a87d7b3d44	[OpenMP] Add Passing in Original Declaration Names To Mapper API Summary: This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;". See clang/test/OpenMP/target_map_names.cpp for an example of the generated output for a given map clause. Reviewers: jdoervert Differential Revision: https://reviews.llvm.org/D89802	2020-10-27 16:09:19 -04:00
Nicolai Hähnle	e025d09b21	Revert multiple patches based on "Introduce CfgTraits abstraction" These logically belong together since it's a base commit plus followup fixes to less common build configurations. The patches are: Revert "CfgInterface: rename interface() to getInterface()" This reverts commit `a74fc48158`. Revert "Wrap CfgTraitsFor in namespace llvm to please GCC 5" This reverts commit `f2a06875b6`. Revert "Try to make GCC5 happy about the CfgTraits thing" This reverts commit `03a5f7ce12`. Revert "Introduce CfgTraits abstraction" This reverts commit `c0cdd22c72`.	2020-10-27 20:33:30 +01:00
Nicolai Hähnle	ce6900c6cb	Revert "DomTree: Extract (mostly) read-only logic into type-erased base classes" This reverts commit `848a68a032`.	2020-10-27 20:33:29 +01:00
Vedant Kumar	5a3ef55a52	[Utils] Skip RemoveRedundantDbgInstrs in MergeBlockIntoPredecessor (PR47746) This patch changes MergeBlockIntoPredecessor to skip the call to RemoveRedundantDbgInstrs, in effect partially reverting D71480 due to some compile-time issues spotted in LoopUnroll and SimplifyCFG. The call to RemoveRedundantDbgInstrs appears to have changed the worst-case behavior of the merging utility. Loosely speaking, it seems to have gone from O(#phis) to O(#insts). It might not be possible to mitigate this by scanning a block to determine whether there are any debug intrinsics to remove, since such a scan costs O(#insts). So: skip the call to RemoveRedundantDbgInstrs. There's surprisingly little fallout from this, and most of it can be addressed by doing RemoveRedundantDbgInstrs later. The exception is (the block-local version of) SimplifyCFG, where it might just be too expensive to call RemoveRedundantDbgInstrs. Differential Revision: https://reviews.llvm.org/D88928	2020-10-27 10:12:59 -07:00
Raphael Isemann	e038b60d91	Revert "[IndVars] Remove monotonic checks with unknown exit count" This reverts commit `c6ca26c0bf`. This breaks stage2 builds due to hitting this assert: ``` Assertion failed: (WeightSum <= UINT32_MAX && "Expected weights to scale down to 32 bits"), function calcMetadataWeights ``` when compiling AArch64RegisterBankInfo.cpp in LLVM.	2020-10-27 15:31:37 +01:00
Raphael Isemann	a0d84d8031	Revert "[NFC] Factor away lambda's redundant parameter" This reverts commit `fdc845b361`. It seems to be a follow-up to c6372b3fb495 which will be reverted.	2020-10-27 15:30:52 +01:00
Simon Pilgrim	bce770ffa6	Revert rG0905bd5c2fa42bd4c "[InstCombine] collectBitParts - add trunc support." This reverts commit `0905bd5c2f`. Causing failures in multistage buildbots that I need to investigate	2020-10-27 13:43:54 +00:00
Nico Weber	2a4e704c92	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit `e5766f25c6`. Makes clang assert when building Chromium, see https://crbug.com/1142813 for a repro.	2020-10-27 09:26:21 -04:00
Simon Pilgrim	0905bd5c2f	[InstCombine] collectBitParts - add trunc support. This should allow us to remove the rather limited matchOrConcat fold and just use recognizeBSwapOrBitReverseIdiom.	2020-10-27 13:14:54 +00:00
Roman Lebedev	0ac56e8eaa	[InstCombine] Fold `(X >>? C1) << C2` patterns to shift+bitmask (PR37872) This is essentially finalizes a revert of rL155136, because nowadays the situation has improved, SCEV can model all these patterns well, and we canonicalize rotate-like patterns into a funnel shift intrinsics in InstCombine. So this should not cause any pessimization. I've verified the canonicalize-{a,l}shr-shl-to-masking.ll transforms with alive, which confirms that we can freely preserve exact-ness, and no-wrap flags. Profs: * base: https://rise4fun.com/Alive/gPQ * exact-ness preservation: https://rise4fun.com/Alive/izi * nuw preservation: https://rise4fun.com/Alive/DmD * nsw preservation: https://rise4fun.com/Alive/SLN6N * nuw nsw preservation: https://rise4fun.com/Alive/Qp7 Refs. https://reviews.llvm.org/D46760	2020-10-27 14:42:53 +03:00
Florian Hahn	f067bc3c0a	[LoopRotation] Allow loop header duplication if vectorization is forced. -Oz normally does not allow loop header duplication so this loop wouldn't be vectorized. However the vectorization pragma should override this and allow for loop rotation. rdar://problem/49281061 Original patch by Adam Nemet. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D59832	2020-10-27 09:28:01 +00:00
Max Kazantsev	fdc845b361	[NFC] Factor away lambda's redundant parameter	2020-10-27 12:56:52 +07:00
Serguei Katkov	b69919b537	[GVN LoadPRE] Add an option to disable splitting backedge GVN Load PRE can split the backedge causing breaking the loop structure where the latch contains the conditional branch with for example induction variable. Different optimizations expect this form of the loop, so it is better to preserve it for some time. This CL adds an option to control an ability to split backedge. Default value is true so technically it is NFC and current behavior is not changed. Reviewers: fedor.sergeev, mkazantsev, nikic, reames, fhahn Reviewed By: mkazasntsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D89854	2020-10-27 11:59:52 +07:00
Max Kazantsev	c6ca26c0bf	[IndVars] Remove monotonic checks with unknown exit count Even if the exact exit count is unknown, we can still prove that this exit will not be taken. If we can prove that the predicate is monotonic, fulfilled on first & last iteration, and no overflow happened in between, then the check can be removed. Differential Revision: https://reviews.llvm.org/D87832 Reviewed By: apilipenko	2020-10-27 11:35:16 +07:00
Arthur Eubanks	42f76e193b	Reland [AlwaysInliner] Pass callee AAResults to InlineFunction() Test copied from noalias-calls.ll with small changes. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89609	2020-10-26 20:40:46 -07:00
Arthur Eubanks	e5766f25c6	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-26 20:24:04 -07:00
Arthur Eubanks	4af5ba1726	Revert "[AlwaysInliner] Pass callee AAResults to InlineFunction()" This reverts commit `504fbec7a6`. Test failure.	2020-10-26 20:23:38 -07:00
Arthur Eubanks	504fbec7a6	[AlwaysInliner] Pass callee AAResults to InlineFunction() Test copied from noalias-calls.ll with small changes. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89609	2020-10-26 20:10:09 -07:00
Arthur Eubanks	3dd1c72458	Port -objc-arc-expand to NPM Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D90182	2020-10-26 20:05:10 -07:00
Arthur Eubanks	90c0b0d3d6	Port -objc-arc-apelim to NPM Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D90181	2020-10-26 20:01:46 -07:00
Chen Zheng	00e573cadb	[LSR] fix typo in comments and rename for a new added hook.	2020-10-26 22:29:22 -04:00
TaWeiTu	0efbfa38ae	[NPM] Port -slsr to NPM `-separate-const-offset-from-gep` has not yet be ported, so some tests are not updated. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D90149	2020-10-27 09:21:40 +08:00
Sriraman Tallam	ad1b9daa4b	Prepend "__uniq" to symbol names hash with -funique-internal-linkage-names. Prepend the module name hash with a fixed string ".__uniq." which helps tools that consume sampled profiles and attribute it to functions to understand that this symbol belongs to a unique internal linkage type symbol. Symbols with suffixes can result from various optimizations in the compiler. Function Multiversioning, function splitting, parameter constant propogation, unique internal linkage names. External tools like sampled profile aggregators combine profiles from multiple runs of a binary. They use various heuristics with symbols that have suffixes to try and attribute the profile to the right function instance. For instance multi-versioned symbols like foo.avx, foo.sse4.2, etc even though different should be attributed to the same source function if a single function is versioned, using attribute target_clones (supported in GCC but yet to land in LLVM). Similarly, functions that are split (split part having a .cold suffix) could have profiles for both the original and split symbols but would be aggregated and attributed to the original function that was split. Unique internal linkage functions however have different source instances and the aggregator must not put them together but attribute it to the appropriate function instance. To be sure that we are dealing with a symbol of a unique internal linkage function, we would like to prepend the hash with a known string ".__uniq." which these tools can check to understand the suffix type. Differential Revision: https://reviews.llvm.org/D89617	2020-10-26 14:24:28 -07:00
Sanjay Patel	5a6e66ec72	[InstCombine] add folds for icmp+ctpop https://alive2.llvm.org/ce/z/XjFPQJ define void @src(i64 %value) { %t0 = call i64 @llvm.ctpop.i64(i64 %value) %gt = icmp ugt i64 %t0, 63 %lt = icmp ult i64 %t0, 64 call void @use(i1 %gt, i1 %lt) ret void } define void @tgt(i64 %value) { %eq = icmp eq i64 %value, -1 %ne = icmp ne i64 %value, -1 call void @use(i1 %eq, i1 %ne) ret void } declare i64 @llvm.ctpop.i64(i64) #1 declare void @use(i1, i1)	2020-10-26 16:48:56 -04:00
Sanjay Patel	437d7551c5	[InstCombine] reduce code duplication in icmp intrinsic folds; NFC	2020-10-26 16:48:56 -04:00
Stanislav Mekhanoshin	00928a1956	Fix SROA with a PHI mergig values from a same block This fixes the bug 47945. It is legal to have a PHI with values from from the same block, but values must stay the same. In this case it is illegal to merge different values. Differential Revision: https://reviews.llvm.org/D89978	2020-10-26 12:58:27 -07:00
Joe Ellis	0f83505593	[SVE][InstCombine] Fix TypeSize warning in canReplaceGEPIdxWithZero The warning would fire when calling canReplaceGEPIdxWithZero on a GEP whose source element type is a scalable vector. The size of scalable vector types is not known, so this optimization cannot be performed. This patch fixes the issue by: - bailing out early in this routine if the GEP instruction's source element type is a scalable vector. - making use of getFixedSize -- this removes the dependency on the deprecated interface. Reviewed By: fpetrogalli Differential Revision: https://reviews.llvm.org/D89968	2020-10-26 17:40:26 +00:00
Joe Ellis	467e5cf40f	[SVE][AArch64] Fix TypeSize warning in loop vectorization legality The warning would fire when calling isDereferenceableAndAlignedInLoop with a scalable load. Calling isDereferenceableAndAlignedInLoop with a scalable load would result in the use of the now deprecated implicit cast of TypeSize to uint64_t through the overloaded operator. This patch fixes this issue by: - no longer considering vector loads as candidates in canVectorizeWithIfConvert. This doesn't make sense in the context of identifying scalar loads to vectorize. - making use of getFixedSize inside isDereferenceableAndAlignedInLoop -- this removes the dependency on the deprecated interface, and will trigger an assertion error if the function is ever called with a scalable type. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D89798	2020-10-26 17:40:04 +00:00
Simon Pilgrim	532f3bec3e	[InstCombine] collectBitParts - add bitreverse intrinsic support.	2020-10-26 14:36:36 +00:00
Simon Pilgrim	6b2eb31e1e	[InstCombine] Add support for zext(and(neg(amt),width-1)) rotate shift amount patterns Alive2: https://alive2.llvm.org/ce/z/bCvvHd	2020-10-26 11:22:41 +00:00
Max Kazantsev	bfabd7878b	Fix broken build after previous commit	2020-10-26 14:55:46 +07:00
Max Kazantsev	cdccc82f48	[NFC] Remove unused funciton param	2020-10-26 14:53:22 +07:00
Max Kazantsev	4b5e848bef	[NFC] Factor out common code into lambda for further improvement	2020-10-26 14:50:45 +07:00
Max Kazantsev	c019099053	[IndVars] Use contextual knowledge when proving trivial conds No exact example where it would help, but it's a generally a more powerful way to prove predicates.	2020-10-26 13:48:32 +07:00
Simon Pilgrim	3052e474ec	[InstCombine] matchBSwapOrBitReversem - recognise or(fshl(),fshl()) bswap patterns. I'm not certain InstCombinerImpl::matchBSwapOrBitReverse needs to filter the or(op0(),op1()) ops - there are just too many cases that recognizeBSwapOrBitReverseIdiom/collectBitParts handle now (and quickly).	2020-10-25 10:17:45 +00:00
TaWeiTu	65a36bbc3d	[NPM] Port -loop-versioning-licm to NPM Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D89371	2020-10-24 21:51:18 +08:00
TaWeiTu	060a4fccf1	[LoopVersioning] Form dedicated exits for versioned loop to preserve simplify form The exit blocks of the versioned and non-versioned loops are not dedicated and thus the two loops are not in simplify form. Insert dummy exit blocks after loop versioning with `formDedicatedExits()` to preserve the simplify form for subsequence passes. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D89569	2020-10-24 21:40:46 +08:00
Simon Pilgrim	310f62b4ff	[InstCombine] narrowFunnelShift - fold trunc/zext or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) (PR35155) As discussed on PR35155, this extends narrowFunnelShift (recently renamed from narrowRotate) to support basic funnel shift patterns. Unlike matchFunnelShift we don't include the computeKnownBits limitation as extracting the pattern from the zext/trunc layers should be a indicator of reasonable funnel shift codegen, in D89139 we demonstrated how to efficiently promote funnel shifts to wider types. Differential Revision: https://reviews.llvm.org/D89542	2020-10-24 12:42:43 +01:00
Hongtao Yu	a16cbdd676	[AutoFDO] Remove a broken assert in merging inlinee samples Duplicated callsites share the same callee profile if the original callsite was inlined. The sharing also causes the profile of callee's callee to be shared. This breaks the assert introduced ealier by D84997 in a tricky way. To illustrate, I'm using an abstract example. Say we have three functions `A`, `B` and `C`. A calls B twice and B calls C once. Some optimize performed prior to the sample profile loader duplicates first callsite to `B` and the program may look like ``` A() { B(); // with nested profile B1 and C1 B(); // duplicated, with nested profile B1 and C1 B(); // with nested profile B2 and C2 } ``` For some reason, the sample profile loader inliner then decides to only inline the first callsite in `A` and transforms `A` into ``` A() { C(); // with nested profile C1 B(); // duplicated, with nested profile B1 and C1 B(); // with nested profile B2 and C2. } ``` Here is what happens next: 1. Failing to inline the callsite `C()` results in `C1`'s samples returned to `C`'s base (outlined) profile. In the meantime, `C1`'s head samples are updated to `C1`'s entry sample. This also affects the profile of the middle callsite which shares `C1` with the first callsite. 2. Failing to inline the middle callsite results in `B1` returned to `B`'s base profile, which in turn will cause `C1` merged into `B`'s base profile. Note that the nest `C` profile in `B`'s base has a non-zero head sample count now. The value actually equals to `C1`'s entry count. 3. Failing to inline last callsite results in `B2` returned to `B`'s base profile. Note that the nested `C` profile in `B`'s base now has an entry count equal to the sum of that of `C1` and `C2`, with the head count equal to that of `C1`. This will trigger the assert later on. 4. Compiling `B` using `B`'s base profile. Failing to inline `C` there triggers the returning of the nested `C` profile. Since the nested `C` profile has a non-zero head count, the returning doesn't go through. Instead, the assert goes off. It's good that `C1` is only returned once, based on using a non-zero head count to ensure an inline profile is only returned once. However C2 is never returned. While it seems hard to solve this perfectly within the current framework, I'm just removing the broken assert. This should be reasonably fixed by the upcoming CSSPGO work where counts returning is based on context-sensitivity and a distribution factor for callsite probes. The simple example is extracted from one of our internal services. In reality, why the original callsite `B()` and duplicate one having different inline behavior is a magic. It has to do with imperfect counts in profile and extra complicated inlining that makes the hotness for them different. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D90056	2020-10-23 17:42:21 -07:00
Arthur Eubanks	baffd052b0	[StructurizeCFG][NewPM] Port -structurizecfg to NPM This doesn't support -structurizecfg-skip-uniform-regions since that would require porting LegacyDivergenceAnalysis. The NPM doesn't support adding a non-analysis pass as a dependency of another, so I had to add -lowerswitch to some tests or pin them to the legacy PM. This is the only RegionPass in tree, so I simply copied the logic for finding all Regions from the legacy PM's RGManager into StructurizeCFG::run(). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D89026	2020-10-23 15:54:03 -07:00
Arthur Eubanks	ba22c403b2	[Inliner][NPM] Properly pass callee AAResults Fixes noalias-calls.ll under NPM. Differential Revision: https://reviews.llvm.org/D89592	2020-10-23 15:37:18 -07:00
Artur Pilipenko	6ec2c5e402	GC-parseable element atomic memcpy/memmove This change introduces a GC parseable lowering for element atomic memcpy/memmove intrinsics. This way runtime can provide an implementation which can take a safepoint during copy operation. See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev for the background and details: https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ Differential Revision: https://reviews.llvm.org/D88861	2020-10-23 14:06:09 -07:00
Nick Desaulniers	b7926ce6d7	[IR] add fn attr for no_stack_protector; prevent inlining on mismatch It's currently ambiguous in IR whether the source language explicitly did not want a stack a stack protector (in C, via function attribute no_stack_protector) or doesn't care for any given function. It's common for code that manipulates the stack via inline assembly or that has to set up its own stack canary (such as the Linux kernel) would like to avoid stack protectors in certain functions. In this case, we've been bitten by numerous bugs where a callee with a stack protector is inlined into an __attribute__((__no_stack_protector__)) caller, which generally breaks the caller's assumptions about not having a stack protector. LTO exacerbates the issue. While developers can avoid this by putting all no_stack_protector functions in one translation unit together and compiling those with -fno-stack-protector, it's generally not very ergonomic or as ergonomic as a function attribute, and still doesn't work for LTO. See also: https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/ https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u Typically, when inlining a callee into a caller, the caller will be upgraded in its level of stack protection (see adjustCallerSSPLevel()). By adding an explicit attribute in the IR when the function attribute is used in the source language, we can now identify such cases and prevent inlining. Block inlining when the callee and caller differ in the case that one contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`. Fixes pr/47479. Reviewed By: void Differential Revision: https://reviews.llvm.org/D87956	2020-10-23 11:55:39 -07:00
Chen Zheng	1e0b6c1df0	[LSR] ignore profitable chain when reg num is not major cost. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D89665	2020-10-23 09:35:48 -04:00
Simon Pilgrim	1cab3bf004	[InstCombine] matchBSwapOrBitReverse - expose bswap/bitreverse matching flags. matchBSwapOrBitReverse was hardcoded to just match bswaps - we're going to need to expose the ability to match bitreverse as well, so make this part of the function call.	2020-10-23 12:35:28 +01:00
Simon Pilgrim	19a13bf538	[InstCombine] Rename InstCombinerImpl::matchBSwap to matchBSwapOrBitReverse. NFCI. This matches bswap and bitreverse intrinsics, so we should make that clear in the function name.	2020-10-23 12:35:27 +01:00
OCHyams	fea067bdfd	[mem2reg] Remove dbg.values describing contents of dead allocas This patch copies @vsk's fix to instcombine from D85555 over to mem2reg. The motivation and rationale are exactly the same: When mem2reg removes an alloca, it erases the dbg.{addr,declare} instructions which refer to the alloca. It would be better to instead remove all debug intrinsics which describe the contents of the dead alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s. As far as I can tell, prior to D80264 these `dbg.value+deref`s would have been silently dropped instead of being made `undef`, so we're just returning to previous behaviour with these patches. Testing: `llvm-lit llvm/test` and `ninja check-clang` gave no unexpected failures. Added 3 tests, each of which covers a dbg.value deletion path in mem2reg: mem2reg-promote-alloca-1.ll mem2reg-promote-alloca-2.ll mem2reg-promote-alloca-3.ll The first is based on the dexter test inlining.c from D89543. This patch also improves the debugging experience for loop.c from D89543, which suffers similarly after arg promotion instead of inlining.	2020-10-23 04:46:56 +00:00
Caroline Concatto	2415636475	[SVE]Clarify TypeSize comparisons in llvm/lib/Transforms Use isKnownXY comparators when one of the operands can be with scalable vectors or getFixedSize() for all the other cases. This patch also does bug fixes for getPrimitiveSizeInBits by using getFixedSize() near the places with the TypeSize comparison. Differential Revision: https://reviews.llvm.org/D89703	2020-10-23 09:15:17 +01:00
Max Kazantsev	6e574abf61	[SCEV][NFC] Cache symbolic max exit count We want to have a caching version of symbolic BE exit count rather than recompute it every time we need it. Differential Revision: https://reviews.llvm.org/D89954 Reviewed By: nikic, efriedma	2020-10-23 12:29:37 +07:00
Arthur Eubanks	0291e2c933	[Inliner] Run always-inliner in inliner-wrapper An alwaysinline function may not get inlined in inliner-wrapper due to the inlining order. Previously for the following, the inliner would first inline @a() into @b(), ``` define void @a() { entry: call void @b() ret void } define void @b() alwaysinline { entry: br label %for.cond for.cond: call void @a() br label %for.cond } ``` making @b() recursive and unable to be inlined into @a(), ending at ``` define void @a() { entry: call void @b() ret void } define void @b() alwaysinline { entry: br label %for.cond for.cond: call void @b() br label %for.cond } ``` Running always-inliner first makes sure that we respect alwaysinline in more cases. Fixes https://bugs.llvm.org/show_bug.cgi?id=46945. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D86988	2020-10-22 19:16:25 -07:00
Vedant Kumar	099bffe7f7	Revert "[CodeExtractor] Don't create bitcasts when inserting lifetime markers (NFCI)" This reverts commit `26ee8aff2b`. It's necessary to insert bitcast the pointer operand of a lifetime marker if it has an opaque pointer type. rdar://70560161	2020-10-22 12:25:50 -07:00
Arthur Eubanks	92d9a3868a	Port -instnamer to NPM Some clang tests use this. Reviewed By: akhuang Differential Revision: https://reviews.llvm.org/D89931	2020-10-22 12:08:36 -07:00
Layton Kifer	d49911c282	[InstCombine][NFC] Use ConstantExpr::getBinOpIdentity Delete duplicate implementation getSelectFoldableConstant and replace with ConstantExpr::getBinOpIdentity. Differential Revision: https://reviews.llvm.org/D89839	2020-10-22 20:44:57 +02:00
Nikita Popov	3e37543111	[MemCpyOpt] Move GEP during call slot optimization When performing a call slot optimization to a GEP destination, it will currently usually fail, because the GEP is directly before the memcpy and as such does not dominate the call. We should move it above the call if that satisfies the domination requirement. I think that a constant-index GEP is the only useful thing to move here, as otherwise isDereferenceablePointer couldn't look through it anyway. As such I'm not trying to generalize this further. Differential Revision: https://reviews.llvm.org/D89623	2020-10-22 20:40:56 +02:00
Ettore Tiotto	e6521ce064	[NFC][PartialInliner]: Clean up code Make member function const where possible, use LLVM_DEBUG to print debug traces rather than a custom option, pass by reference to avoid null checking, ... Reviewed By: fhann Differential Revision: https://reviews.llvm.org/D89895	2020-10-22 14:40:15 -04:00
Vedant Kumar	3419252a79	[InstCombine] Remove dbg.values describing contents of dead allocas When InstCombine removes an alloca, it erases the dbg.{addr,declare} instructions which refer to the alloca. It would be better to instead remove all debug intrinsics which describe the contents of the dead alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s. This effectively undoes work performed in an InstCombine run earlier in the pipeline by LowerDbgDeclare, which inserts DW_OP_deref dbg.values before CallInst users of an alloca. The motivating example looks like: ``` define void @foo(i32 %0) { %a = alloca i32 ; This alloca is erased. store i32 %0, i32* %a dbg.value(i32 %0, "arg0") ; This dbg.value survives. dbg.value(i32* %a, "arg0", DW_OP_deref) call void @trivially_inlinable_no_op(i32* %a) ret void } ``` If the DW_OP_deref dbg.value is not erased, it becomes dbg.value(undef) after inlining, making "arg0" unavailable. But we already have dbg.value descriptions of the alloca's value (from LowerDbgDeclare), so the DW_OP_deref dbg.value cannot serve its purpose of describing an initialization of the alloca by some callee. It invalidates other useful dbg.values, causing large gaps in location coverage, so we should delete it (even though doing so may cause stale dbg.values to appear, if there's a dead store to `%a` in @trivially_inlinable_no_op). OTOH, it wouldn't be correct to delete all dbg.value descriptions of an alloca. Note that it's possible to describe a variable that takes on different pointer values, e.g.: ``` void use(int ); void t(int a, int b) { int local = &a; // dbg.value(i32* %a.addr, "local") local = &b; // dbg.value(i32* undef, "local") use(&a); // (note: %b.addr is optimized out) local = &a; // dbg.value(i32* %a.addr, "local") } ``` In this example, the alloca for "b" is erased, but we need to describe the value of "local" as <unavailable> before the call to "use". This prevents "local" from appearing to be equal to "&a" at the callsite. rdar://66592859 Differential Revision: https://reviews.llvm.org/D85555	2020-10-22 10:00:13 -07:00
Serguei Katkov	75d0e0cd5f	[IRCE] consolidate profitability check Use BFI if it is available and BPI otherwise. This is a promised follow-up after D89541. Reviewers: ebrevnov, mkazantsev Reviewed By: ebrevnov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D89773	2020-10-22 11:26:45 +07:00
Zequan Wu	2f29341114	Revert "Revert "SimplifyCFG: Clean up optforfuzzing implementation"" This reverts commit `716f7636e1`.	2020-10-21 17:08:56 -07:00
Zequan Wu	716f7636e1	Revert "SimplifyCFG: Clean up optforfuzzing implementation" See discussion: https://reviews.llvm.org/D89590 This reverts commit `cdd006eec9`.	2020-10-21 16:56:32 -07:00
Arthur Eubanks	8d9466a385	[BlockExtract][NewPM] Port -extract-blocks to NPM Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D89015	2020-10-21 12:51:11 -07:00
Arthur Eubanks	aa6c305344	[LowerMatrixIntrinsics][NewPM] Fix PreservedAnalyses result PreservedCFGCheckerInstrumentation was saying that LowerMatrixIntrinsics didn't properly preserve CFG even though it claimed to. The legacy pass says it doesn't. Match the legacy pass's preserved analyses. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D89175	2020-10-21 12:42:16 -07:00
Artur Pilipenko	e8cce5ad89	[RS4GC] NFC. Preparatory refactoring to make GC parseable memcpy For GC parseable element atomic memcpy/memmove we'll need to shuffle statepoint arguments. Make it possible by storing the arguments as Value , not Use .	2020-10-21 12:38:20 -07:00
Simon Pilgrim	7b4a828452	[InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI.	2020-10-21 11:53:45 +01:00
Florian Hahn	88241ffb56	[Passes] Move ADCE before DSE & LICM. The adjustment seems to have very little impact on optimizations. The only binary change with -O3 MultiSource/SPEC2000/SPEC2006 on X86 is in consumer-typeset and the size there actually decreases by -0.1%, with not significant changes in the stats. On its own, it is mildly positive in terms of compile-time, most likely due to LICM & DSE having to process slightly less instructions. It should also be unlikely that DSE/LICM make much new code dead. http://llvm-compile-time-tracker.com/compare.php?from=df63eedef64d715ce1f31843f7de9c11fe1e597f&to=e3bdfcf94a9eeae6e006d010464f0c1b3550577d&stat=instructions With DSE & MemorySSA, it gives some nice compile-time improvements, due to the fact that DSE can re-use the PDT from ADCE, if it does not make any changes: http://llvm-compile-time-tracker.com/compare.php?from=15fdd6cd7c24c745df1bb419e72ff66fd138aa7e&to=481f494515fc89cb7caea8d862e40f2c910dc994&stat=instructions Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D87322	2020-10-21 10:30:56 +01:00
Martin Storsjö	4de215ff18	Revert "[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support" Also revert "[InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI." to make the primarily intended revert work. This reverts commits `ce13549761` and `e372a5f86f`. This commit caused failed asserts e.g. like this: $ cat repro.cpp bool a(char b) { return b >= '0' && b <= '9' \|\| (b \| 32) >= 'a' && (b \| 32) <= 'z'; $ clang++ -target x86_64-linux-gnu -c -O2 repro.cpp clang++: ../include/llvm/ADT/APInt.h:1151: bool llvm::APInt::operator==(const llvm::APInt&) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.	2020-10-21 09:47:18 +03:00
Geoffrey Martin-Noble	c17ae2916c	Remove unnecessary header include which violates layering This was introduced in https://reviews.llvm.org/D89774, but I don't think it should be necessary. Reviewed By: TaWeiTu, aeubanks Differential Revision: https://reviews.llvm.org/D89843	2020-10-20 20:14:03 -07:00
Nicolai Hähnle	848a68a032	DomTree: Extract (mostly) read-only logic into type-erased base classes Avoid having to instantiate and compile a subset of the dominator tree logic separately for each node type. More importantly, this allows generic algorithms to be built on top of dominator trees without writing them as templates -- such algorithms can now use opaque CfgBlockRef and CfgInterface instead. A type-erased implementation of dominator trees could be written in terms of CfgInterface as well, but doing so would change the current trade-off: it would slightly reduce code size at the cost of a slight runtime overhead. This patch does not change the trade-off, as it only does type-erasure where basic blocks can be treated in a fully opaque way, i.e. it only moves methods that don't require iteration over CFG successors and predecessors. v5: - rename generic_{begin,end,children} back without the generic_ prefix and refer explictly to base class methods in NewGVN, which wants to mutate the order of dominator tree node children directly v6: - style change: iDom -> idom; it's arguable whether this is really invalid, since it is actually standard camelCase, but clang-tidy complains about it so... shrug - rename {to,from}Generic -> {wrap,unwrap}Ref Change-Id: Ib860dc04cf8bb093d8ed00be7def40d662213672 Differential Revision: https://reviews.llvm.org/D83089	2020-10-20 19:53:07 +02:00
Ta-Wei Tu	529ecd19df	[NPM] port -unify-loop-exits to NPM Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D89774	2020-10-20 10:46:57 -07:00
Ta-Wei Tu	59286b36df	[NPM] Port -mergereturn to NPM Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D89781	2020-10-20 10:33:58 -07:00
Florian Hahn	2e58010208	[DSE] Do not scan users of memory terminators for further reads. isMemTerminator checks if the current def is a memory terminator that terminates the memory pointed to by DefLoc. We do not have to add any of their users to the worklist, because the follow-on users cannot read the memory in question. This leads to more stores eliminated in the presence of lifetime calls. Previously we added the users of those intrinsics to the worklist, limiting elimination. In terms of removed stores, this gives a nice boost on some benchmarks (MultiSource/SPEC2000/SPEC2006 on X86 with -flto -O3): Same hash: 205 (filtered out) Remaining: 32 Metric: dse.NumFastStores Program base patch diff test-suite...000/197.parser/197.parser.test 4.00 8.00 100.0% test-suite...rolangs-C++/family/family.test 4.00 7.00 75.0% test-suite...marks/7zip/7zip-benchmark.test 1722.00 2189.00 27.1% test-suite...CFP2000/177.mesa/177.mesa.test 30.00 38.00 26.7% test-suite :: External/Nurbs/nurbs.test 44.00 49.00 11.4% test-suite...lications/sqlite3/sqlite3.test 115.00 128.00 11.3% test-suite...006/447.dealII/447.dealII.test 2715.00 3013.00 11.0% test-suite...ProxyApps-C++/CLAMR/CLAMR.test 237.00 261.00 10.1% test-suite...tions/lambda-0.1.3/lambda.test 40.00 44.00 10.0% test-suite...3.xalancbmk/483.xalancbmk.test 1366.00 1475.00 8.0% test-suite...abench/jpeg/jpeg-6a/cjpeg.test 13.00 14.00 7.7% test-suite...oxyApps-C++/miniFE/miniFE.test 43.00 46.00 7.0% test-suite...lications/ClamAV/clamscan.test 230.00 246.00 7.0% test-suite...006/450.soplex/450.soplex.test 284.00 299.00 5.3% test-suite...nsumer-jpeg/consumer-jpeg.test 21.00 22.00 4.8%	2020-10-20 16:55:22 +01:00
Simon Pilgrim	ec228fbfc0	[InstCombine] SimplifyDemandedUseBits - replace dyn_cast<ConstantInt> with m_ConstantInt. NFCI.	2020-10-20 16:45:16 +01:00
Simon Pilgrim	ce13549761	[InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI.	2020-10-20 16:26:41 +01:00
Florian Hahn	6439fde6d4	[DSE] Bail out from getLocForWriteEx if call is not argmemonly/inacc_mem. This change should currently not have any impact, but guard against further inconsistencies between MemoryLocation and function attributes.	2020-10-20 14:37:53 +01:00
Simon Pilgrim	e372a5f86f	[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support Reapplied rGa704d8238c86 with a check for integer/integervector types to prevent matching with pointer types	2020-10-20 14:14:26 +01:00
Nicolai Hähnle	c0cdd22c72	Introduce CfgTraits abstraction The CfgTraits abstraction simplfies writing algorithms that are generic over the type of CFG, and enables writing such algorithms as regular non-template code that operates on opaque references to CFG blocks and values. Implementations of CfgTraits provide operations on the concrete CFG types, e.g. `IrCfgTraits::BlockRef` is `BasicBlock `. CfgInterface is an abstract base class which provides operations on opaque types CfgBlockRef and CfgValueRef. Those opaque types encapsulate a `void `, but the meaning depends on the concrete CFG type. For example, MachineCfgTraits -- for use with MachineIR in SSA form -- encodes a Register inside CfgValueRef. Converting between concrete references and opaque/generic ones is done by CfgTraits::{fromGeneric,toGeneric}. Convenience methods CfgTraits::{un}wrap{Iterator,Range} are available as well. Writing algorithms in terms of CfgInterface adds some overhead (virtual method calls, plus in same cases it removes the opportunity to inline iterators), but can be much more convenient since generic algorithms can be written as non-templates. This patch adds implementations of CfgTraits for all CFGs on which dominator trees are calculated, so that the dominator tree can be ported to this machinery. Only IrCfgTraits (LLVM IR) and MachineCfgTraits (Machine IR in SSA form) are complete, the other implementations are limited to the absolute minimum required to make the upcoming dominator tree changes work. v5: - fix MachineCfgTraits::blockdef_iterator and allow it to iterate over the instructions in a bundle - use MachineBasicBlock::printName v6: - implement predecessors/successors for all CfgTraits implementations - fix error in unwrapRange - rename toGeneric/fromGeneric into wrapRef/unwrapRef to have naming that is consistent with {wrap,unwrap}{Iterator,Range} - use getVRegDef instead of getUniqueVRegDef v7: - std::forward fix in wrapping_iterator - fix typos v8: - cleanup operators on CfgOpaqueType - address other review comments Change-Id: Ia75f4f268fded33fca11218a7d578c9aec1f3f4d Differential Revision: https://reviews.llvm.org/D83088	2020-10-20 13:50:52 +02:00
Simon Pilgrim	e346ea9905	[InstCombine] SimplifyDemandedUseBits - pass APInt by const reference. NFCI.	2020-10-20 12:13:08 +01:00
Atmn Patel	595c615606	[IR] Adds mustprogress as a LLVM IR attribute This adds the LLVM IR attribute `mustprogress` as defined in LangRef through D86233. This attribute will be applied to functions with in languages like C++ where forward progress is guaranteed. Functions without this attribute are not required to make progress. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85393	2020-10-20 03:09:57 -04:00
Serguei Katkov	38799975ce	[IRCE] Do not transform if loop has small number of iterations IRCE has some overhead for runtime checks and in case number of iteration is small the overhead can kill the benefit from optimizations. This CL bases on BlockFrequencyInfo of pre-header and header to estimate the number of loop iterations. If it is less than irce-min-estimated-iters we do not transform the loop. Probably it is better to make more complex cost model but for simplicity it seems the be enough. The usage of BFI is added only for new pass manager and tries to use it efficiently. Reviewers: ebrevnov, dantrushin, asbirlea, mkazantsev Reviewed By: mkazantsev Subscribers: llvm-commits, fhahn Differential Revision: https://reviews.llvm.org/D89541	2020-10-20 10:33:59 +07:00
Jordan Rupprecht	8a377f1e3c	[NFC] Inline assertion-only variable	2020-10-19 15:11:37 -07:00
Roman Lebedev	e0567582b8	[NFCI][SCEV] Always refer to enum SCEVTypes as enum, not integer The main tricky thing here is forward-declaring the enum: we have to specify it's underlying data type. In particular, this avoids the danger of switching over the SCEVTypes, but actually switching over an integer, and not being notified when some case is not handled. I have updated most of such switches to be exaustive and not have a default case, where it's pretty obvious to be the intent, however not all of them.	2020-10-20 00:10:22 +03:00
Roman Lebedev	3355284b2d	[NFC][SCEVExpander] isHighCostExpansionHelper(): rewrite as a switch If we switch over an enum, compiler can easily issue a diagnostic if some case is not handled. However with an if cascade that isn't so. Experimental evidence suggests new behavior to be superior.	2020-10-20 00:10:22 +03:00
Simon Pilgrim	adb52e5f9e	[InstCombine] foldOrOfICmps - only fold (icmp_eq B, 0) \| (icmp_ult/gt A, B) for integer types Fixes a number of stage2 buildbots that were failing when I generalized the m_ConstantInt() logic - that didn't match for pointer types but m_Zero() does......	2020-10-19 17:05:38 +01:00
Simon Pilgrim	482e6f0041	Revert rGa704d8238c86bac: "[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support" This reverts commit `a704d8238c`. Causing stage2 build failures on some bots.	2020-10-19 16:03:36 +01:00
Simon Pilgrim	de885f1b2a	[InstCombine] Add (icmp ne A, 0) \| (icmp ne B, 0) --> (icmp ne (A\|B), 0) vector support Scalar cases were already being handled by foldLogOpOfMaskedICmps (so this was dead code), but refactoring to support non-uniform vectors will take some time, so tweak this fold in the meantime.	2020-10-19 15:41:21 +01:00
Simon Pilgrim	ecd25086d1	[InstCombine] Add (icmp eq B, 0) \| (icmp ult/gt A, B) -> (icmp ule A, B-1) vector support	2020-10-19 15:23:48 +01:00
Simon Pilgrim	a704d8238c	[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support	2020-10-19 14:55:18 +01:00
Simon Pilgrim	1d90e53044	[InstCombine] foldOrOfICmps - pull out repeated getOperand() calls. NFCI.	2020-10-19 14:28:08 +01:00
Hans Wennborg	0628bea513	Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting" This broke Chromium's PGO build, it seems because hot-cold-splitting got turned on unintentionally. See comment on the code review for repro etc. > This patch adds -f[no-]split-cold-code CC1 options to clang. This allows > the splitting pass to be toggled on/off. The current method of passing > `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose > correctly (say, with `-O0` or `-Oz`). > > To implement the -fsplit-cold-code option, an attribute is applied to > functions to indicate that they may be considered for splitting. This > removes some complexity from the old/new PM pipeline builders, and > behaves as expected when LTO is enabled. > > Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org> > Differential Revision: https://reviews.llvm.org/D57265 > Reviewed By: Aditya Kumar, Vedant Kumar > Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar This reverts commit `273c299d5d`.	2020-10-19 12:31:14 +02:00
Simon Pilgrim	0b7b446a40	[InstCombine] Support vectors-with-undef in and(logicalshift(1,X),1) --> zext(X == 0) fold	2020-10-19 11:10:32 +01:00
Roman Lebedev	d083d55c2c	[NFC][SCEV] Rename SCEVCastExpr into SCEVIntegralCastExpr All existing SCEV cast types operate on integers. D89456 will add SCEVPtrToIntExpr cast expression type. I believe this is best for consistency. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D89455	2020-10-19 10:59:53 +03:00
Florian Hahn	f5cf7f544b	[DSE] Do not consider 'noop' intrinsics as read-clobbers. isNoopIntrinsic returns true for some intrinsics that are modeled in MemorySSA but do not actually read or write any memory and do not block DSE. Such intrinsics should not be considered as read-clobbers.	2020-10-18 15:51:05 +01:00
Dávid Bolvanský	65e94cc946	[InferAttrs] Add argmemonly attribute to string libcalls Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89602	2020-10-18 01:33:26 +02:00
Dávid Bolvanský	2a75e956e5	Revert "[InferAttrs] Add argmemonly attribute to string libcalls" This reverts commit `b77dd32a6f`. Sanitizer tests are broken.	2020-10-17 23:29:02 +02:00
Dávid Bolvanský	b77dd32a6f	[InferAttrs] Add argmemonly attribute to string libcalls Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89602	2020-10-17 22:42:36 +02:00
Sanjay Patel	53e92b4c0e	[InstCombine] (~A & B) ^ A -> A \| B Differential Revision: https://reviews.llvm.org/D86395	2020-10-17 12:20:18 -04:00
Nikita Popov	50cc9a0e61	[MemCpyOpt] Extract common function for unwinding check These two cases should be using the same logic. Not NFC, as this resolves the TODO regarding use of the underlying object.	2020-10-17 15:30:39 +02:00
Pedro Tammela	60b19424bb	[NFC] fix some typos in LoopUnrollPass This patch fixes a couple of typos in the LoopUnrollPass.cpp comments Differential Revision: https://reviews.llvm.org/D89603	2020-10-17 14:20:55 +01:00
Juneyoung Lee	62a0ec1612	Add support for !noundef metatdata on loads This patch adds metadata !noundef and makes load instructions can optionally have it. A load with !noundef always return a well-defined value (has no undef bit or isn't poison). If the loaded value isn't well defined, the behavior is undefined. This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values. It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise. The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead. The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89050	2020-10-17 13:50:10 +09:00
Artem Belevich	c36c0fabd1	[VectorCombine] Avoid crossing address space boundaries. We can not bitcast pointers across different address spaces, and VectorCombine should be careful when it attempts to find the original source of the loaded data. Differential Revision: https://reviews.llvm.org/D89577	2020-10-16 13:19:31 -07:00
Benjamin Kramer	b740899c50	[Indvars][NFCI] Simplify assertion. This should be semantically identical. Also avoids unused variable warnings in Release builds.	2020-10-16 19:58:55 +02:00
Matt Arsenault	0a7cd99a70	Reapply "OpaquePtr: Add type to sret attribute" This reverts commit `eb9f7c28e5`. Previously this was incorrectly handling linking of the contained type, so this merges the fixes from D88973.	2020-10-16 11:05:02 -04:00
Simon Pilgrim	83ae625f0c	[InstCombine] visitAnd - pull out repeated I.getType() calls. NFCI.	2020-10-16 15:43:11 +01:00
Simon Pilgrim	253f24cf4c	[InstCombine] Remove custom and(trunc(and(x,c1)),c2) fold This is more correctly handled by canEvaluateTruncated (one use checks etc.) and covers all the tests cases that were added for this fold.	2020-10-16 15:43:10 +01:00
Michael Liao	98f254960f	[globalopt] Teach to look through `addrspacecast`. - so that global variables in numbered address spaces could be properly analyzed. Differential Revision: https://reviews.llvm.org/D89140	2020-10-16 08:43:09 -04:00
Max Kazantsev	0857029011	[Indvars][NFC] Merge two functions together Logic of widenWithVariantUse is split into check and transform part, unlike any other transform in IndVars. We want to pass some extra flags from analysis to transform part and standartize the code at once, so merging them together.	2020-10-16 19:21:57 +07:00
Simon Pilgrim	981fdf01d5	[InstCombine] foldSelectRotate - canonicalize to OR(SHL,LSHR). NFCI. Match the canonicalization code that was added to matchFunnelShift at rG02295e6d1a15	2020-10-16 13:18:53 +01:00
Max Kazantsev	bb39372e5e	[Indvars][NFCI] Remove meaningless restrictive code in IndVars Variable ExtendOperExpr only exists to check whether it is a SCEV ext. We create it as SCEV ext right here, so semantically this check is trivially true. In theory, it may fail if SCEV is smart enough and can simplify the expression. However, no matter whether it is an ext or not, we never use this fact for further reasoning. So this code is currently useless and in theory may become harmful with SCEV's development. We do not expect any behavior changes with removing it. If it caused negative changes, the patch should be reverted.	2020-10-16 18:04:31 +07:00
Max Kazantsev	0ee0c7dcc3	[Indvars][NFC] Remove duplicating checks Some facts have already been checked in widenWithVariantUse and then checked again in widenWithVariantUseCodegen. The latter is redundant, we can replace it with asserts.	2020-10-16 17:35:14 +07:00
Simon Pilgrim	1cf347e48b	[InstCombine] narrowRotate - minor refactoring for funnel shift support. NFC. Prep work for PR35155 - renamed narrowRotate to narrowFunnelShift, rewrote some comments and adjusted code to collect separate shift values, although we bail if they don't match (still only rotations are only actually folded). I'm trying to match matchFunnelShift as much as possible in case we finally get to merge these one day.	2020-10-16 11:27:28 +01:00
Simon Pilgrim	55991b44b7	[InstCombine] foldAndOrOfICmpsOfAndWithPow2 - add vector support Support vector cases for folding: (iszero(A & K1) \| iszero(A & K2)) -> (A & (K1 \| K2)) != (K1 \| K2) (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 \| K2)) == (K1 \| K2)	2020-10-16 10:41:40 +01:00
Florian Hahn	51ff04567b	Recommit "[DSE] Switch to MemorySSA-backed DSE by default." After investigation by @asbirlea, the issue that caused the revert appears to be an issue in the original source, rather than a problem with the compiler. This patch enables MemorySSA DSE again. This reverts commit `915310bf14`.	2020-10-16 09:02:53 +01:00
Vedant Kumar	273c299d5d	[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting This patch adds -f[no-]split-cold-code CC1 options to clang. This allows the splitting pass to be toggled on/off. The current method of passing `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose correctly (say, with `-O0` or `-Oz`). To implement the -fsplit-cold-code option, an attribute is applied to functions to indicate that they may be considered for splitting. This removes some complexity from the old/new PM pipeline builders, and behaves as expected when LTO is enabled. Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org> Differential Revision: https://reviews.llvm.org/D57265 Reviewed By: Aditya Kumar, Vedant Kumar Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar	2020-10-15 23:13:33 +00:00
Florian Hahn	89c0124273	[LoopVersion] Unify SCEVChecks and alias check handling (NFC). This is an initial cleanup of the way LoopVersioning interacts with LAA. Currently LoopVersioning has 2 ways of initializing things: 1. Passing LAI and passing UseLAIChecks = true 2. Passing UseLAIChecks = false, followed by calling setSCEVChecks and setAliasChecks. Both ways of initializing lead to the same result and the duplication seems more complicated than necessary. This patch removes the UseLAIChecks flag from the constructor and the setSCEVChecks & setAliasChecks helpers and move initialization exclusively to the constructor. This simplifies things, by providing a single way to initialize LoopVersioning and reducing duplication. Reviewed By: Meinersbur, lebedev.ri Differential Revision: https://reviews.llvm.org/D84406	2020-10-15 22:02:17 +01:00
David Green	13ec3dd66f	[LV] Add a getRecurrenceBinOp and make use of it. NFC	2020-10-15 18:21:41 +01:00
Hiroshi Yamauchi	1ebee7adf8	[PGO] Remove the old memop value profiling buckets. Following up D81682 and D83903, remove the code for the old value profiling buckets, which have been replaced with the new, extended buckets and disabled by default. Also syncing InstrProfData.inc between compiler-rt and llvm. Differential Revision: https://reviews.llvm.org/D88838	2020-10-15 10:09:49 -07:00
Simon Pilgrim	23f1616626	[InstCombine] Use m_SpecificInt instead of m_APInt + comparison. NFCI.	2020-10-15 16:06:27 +01:00
Simon Pilgrim	b3330ae42c	[InstCombine] SimplifyDemandedUseBits - xor - refactor cast<ConstantInt> usage to PatternMatch. NFCI. First step towards replacing these to add full vector support.	2020-10-15 16:06:23 +01:00
Simon Pilgrim	2b45639ea0	[InstCombine] InstCombineAndOrXor - refactor cast<ConstantInt> usages to PatternMatch. NFCI. First step towards replacing these to add full vector support.	2020-10-15 16:06:17 +01:00
Simon Pilgrim	09be7623e4	[InstCombine] visitXor - refactor ((X^C1)>>C2)^C3 -> (X>>C2)^((C1>>C2)^C3) fold. NFCI. This is still ConstantInt-only (scalar) but is refactored to use PatternMatch to make adding vector support in the future relatively trivial.	2020-10-15 14:38:15 +01:00
Simon Pilgrim	fadd152317	[AggressiveInstCombine] foldAnyOrAllBitsSet - add uniform vector support Replace m_ConstantInt with m_APInt to support uniform vectors (with no undef elements) Adding non-undef support would involve some refactoring of the MaskOps struct but this might still be worth it.	2020-10-15 11:02:35 +01:00
Simon Pilgrim	60ba9233d1	Revert rG25a97c3a43d7 - "[InstCombine] visitCallInst - retain undefs in vector funnel shift amounts" This reverts commit `25a97c3a43`. We have other constant folds that fold undef funnel shift amounts to 0 - so we need to be consistent. If we end up with regressions where we lose a splat shift amount pattern we'll have to investigate other canonicalizations, but matchFunnelShift currently protects us from that.	2020-10-14 18:14:37 +01:00
Matt Arsenault	6a9484f4bf	InstCombine: Fix losing load properties in copy-constant-to-alloca Preserve the alignment and metadata. Atomic loads are skipped for this, but pass along the properties for consistency.	2020-10-14 12:55:25 -04:00
Matt Arsenault	6da31fa4a6	InstCombine: Fix infinite loop in copy-constant-to-alloca transform This was broken by `16295d521e`, when instructions started being handled and not just constant expressions. This was re-inserting an equivalent bitcast to the original memcpy operand, which made a non-functional IR change on every iteration. This also fixes a secondary problem where it was inserting addrspacecasts which may not have been legal (i.e. it changed the source address space). Start visiting all pointer users and fail out if we can't process them. Also start handling the relevant memory intrinsic users. These cases can be dealt with by running InferAddressSpaces separately.	2020-10-14 12:55:25 -04:00
Florian Hahn	93f6c6b79c	Recommit "[VPlan] Use VPValue def for VPMemoryInstructionRecipe." This reverts the revert commit `710aceb645` and includes a fix for a memsan failure. Original message: This patch turns VPMemoryInstructionRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible.	2020-10-14 17:41:23 +01:00
Simon Pilgrim	89657b3a3b	[InstCombine] narrowRotate - canonicalize to OR(SHL,LSHR). NFCI. Match the canonicalization code that was added to matchFunnelShift at rG02295e6d1a15	2020-10-14 16:45:00 +01:00
Simon Pilgrim	89a2a47870	[InstCombine] Add m_SpecificIntAllowUndef pattern matcher m_SpecificInt doesn't accept undef elements in a vector splat value - tweak specific_intval to optionally allow undefs and add the m_SpecificIntAllowUndef variants. Allows us to remove the m_APIntAllowUndef + comparison hack inside matchFunnelShift	2020-10-14 16:15:53 +01:00
Simon Pilgrim	25a97c3a43	[InstCombine] visitCallInst - retain undefs in vector funnel shift amounts By always performing a modulo on the shift amount constants this was causing undef amounts being replaced with zero, meaning we were losing funnel shift by splat (with undef) patterns. Tweaked the shift amount bounds check to support (passthrough) undefs, and use Constant::mergeUndefsWith to preserve the undefs after folding.	2020-10-14 14:38:21 +01:00
Roman Lebedev	7ee6c40247	Revert "Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"" and it's follow-ups While we haven't encountered an earth-shattering problem with this yet, by now it is pretty evident that trying to model the ptr->int cast implicitly leads to having to update every single place that assumed no such cast could be needed. That is of course the wrong approach. Let's back this out, and re-attempt with some another approach, possibly one originally suggested by Eli Friedman in https://bugs.llvm.org/show_bug.cgi?id=46786#c20 which should hopefully spare us this pain and more. This reverts commits `1fb6104293`, `7324616660`, `aaafe350bb`, `e92a8e0c74`. I've kept&improved the tests though.	2020-10-14 16:09:18 +03:00
Juneyoung Lee	9b3c2a72e4	[ValueTracking] Use assume's noundef operand bundle This patch updates `isGuaranteedNotToBeUndefOrPoison` to use `llvm.assume`'s `noundef` operand bundle. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89219	2020-10-14 20:16:33 +09:00
Evgeniy Brevnov	d0c95808e5	[LV] Unroll factor is expected to be > 0 LV fails with assertion checking that UF > 0. We already set UF to 1 if it is 0 except the case when IC > MaxInterleaveCount. The fix is to set UF to 1 for that case as well. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D87679	2020-10-14 16:48:17 +07:00
Simon Pilgrim	1e4d882f9a	[InstCombine] matchFunnelShift - add support for non-uniform vectors containing undefs. Replace m_SpecificInt with m_APIntAllowUndef to matching splats containing undefs, then use ConstantExpr::mergeUndefsWith to merge the undefs together in the result. The undef funnel shift amounts are getting replaced with zero later on - I'll address this in a later patch, otherwise we lose potential shift by splat value patterns.	2020-10-14 10:42:27 +01:00
sstefan1	ce16be253c	[Attributor][NFC] Make `createShallowWrapper()` available outside of Attributor D85703 will need to create shallow wrappers in order to track the spmd icv. We need to make it available. Differential Revision: https://reviews.llvm.org/D89342	2020-10-14 10:08:59 +02:00
Arthur Eubanks	518ec05a10	[LoopExtract][NewPM] Port -loop-extract to NPM -loop-extract-single is just -loop-extract on one loop. -loop-extract depended on -break-crit-edges and -loop-simplify in the legacy PM, but the NPM doesn't allow specifying pass dependencies like that, so manually add those passes to the RUN lines where necessary. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89016	2020-10-13 22:55:42 -07:00
Nikita Popov	3b31f05372	[LICM] Don't require AST in LoopPromoter (NFC) While promotion currently always has an AST available, it is only relevant for invalidation purposes in LoopPromoter, so we do not need to have it as a hard dependency.	2020-10-13 22:08:49 +02:00
Nikita Popov	cd6f40f432	[MemCpyOpt] Add test scaffolding for MSSA based MemCpyOpt This adds an -enable-memcpyopt-memoryssa option that currently does nothing apart from requiring MSSA as a dependency. The tests are split to run both with the option disabled and enabled. I went with this rather than the separate directory DSE uses, as I found it convenient to have a direct side-by-side comparison of differences. Differential Revision: https://reviews.llvm.org/D89206	2020-10-13 21:45:05 +02:00
Nikita Popov	e79ca751fc	[MemCpyOpt] Fix MemorySSA preservation moveUp() moves instructions, so we should move the corresponding memory accesses as well. We should also move the store instruction itself: Even though we'll end up removing it later, this gives us a correct MemoryDef to replace. The implementation is somewhat more complicated than it should be, because we also handle the case where P does not have a memory access due to a degnerate AA pipeline. Hopefully, the need for this will go away in the future, when the rest of the pass is based on MSSA. Differential Revision: https://reviews.llvm.org/D88778	2020-10-13 21:39:09 +02:00
Nikita Popov	baa3b87015	[MemCpyOpt] Don't shorten memset if memcpy operands may be the same If the memcpy operands are the same (which is allowed since D86815) then the memcpy is effectively a no-op and the partially overlapping memset is not dead. Differential Revision: https://reviews.llvm.org/D89192	2020-10-13 21:19:19 +02:00
Nikita Popov	39c39e8a7f	[MemCpyOpt] Don't shorten memset if destination observable through unwinding MemCpyOpt can shorten a memset if it is later partially overwritten by a memcpy. It checks that the destination is not read in between, but we also need to make sure that the destination cannot be observed via unwinding. Differential Revision: https://reviews.llvm.org/D89190	2020-10-13 21:12:19 +02:00
Xun Li	0ccf9263cc	[ASAN] Make sure we are only processing lifetime markers with offset 0 to alloca This patch addresses https://bugs.llvm.org/show_bug.cgi?id=47787 (and hence https://bugs.llvm.org/show_bug.cgi?id=47767 as well). In latter instrumentation code, we always use the beginning of the alloca as the base for instrumentation, ignoring any offset into the alloca. Because of that, we should only instrument a lifetime marker if it's actually pointing to the beginning of the alloca. Differential Revision: https://reviews.llvm.org/D89191	2020-10-13 10:21:45 -07:00
Nikita Popov	6713332fdd	[LoopVersioningLICM] Fix noalias metadata emission The previous code added the scope on each iteration, so that the same scope was represented many times in the same !noalias metadata. That's legal, and semantically equivalent to only storing the scope once, but it's also wasteful and may pessimize further optimization if AATags get intersected naively, as done by the AliasSetTracker.	2020-10-13 18:58:05 +02:00
Simon Pilgrim	9c3138bd6d	[InstCombine] visitTrunc - pass through undefs for trunc(shift(trunc/ext(x),c)) patterns Based on the recent patches D88475 and D88429 where we are losing undef values due to extension/comparisons. I've added a Constant::mergeUndefsWith method that merges the undef scalar/elements from another Constant into a specific Constant. Differential Revision: https://reviews.llvm.org/D88687	2020-10-13 14:35:18 +01:00
Vitaly Buka	710aceb645	Revert "[VPlan] Use VPValue def for VPMemoryInstructionRecipe." It introduced a memory leak. This reverts commit `525b085a65`.	2020-10-13 03:14:08 -07:00
Simon Pilgrim	5df61724a1	[InstCombine] Support uniform vector splats in ((((X >> C) & CC) + Y) << C) folds. Add support for uniform vector splats (no undefs).	2020-10-13 09:28:39 +01:00
Roman Lebedev	1fb6104293	Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown" This relands commit `1c021c64ca` which was reverted in commit `17cec6a11a` because an assertion was being triggered, since `BuildConstantFromSCEV()` wasn't updated to handle the case where the constant we want to truncate is actually a pointer. I was unsuccessful in coming up with a test case where we'd end there with constant zext/sext of a pointer, so i didn't handle those cases there until there is a test case. Original commit message: While we indeed can't treat them as no-ops, i believe we can/should do better than just modelling them as `unknown`. `inttoptr` story is complicated, but for `ptrtoint`, it seems straight-forward to model it just as a zext-or-trunc of unknown. This may be important now that we track towards making inttoptr/ptrtoint casts not no-op, and towards preventing folding them into loads/etc (see D88979/D88789/D88788) Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88806	2020-10-12 23:02:55 +03:00
Simon Pilgrim	4ff7136268	[InstCombine] FoldShiftByConstant - create Scalar/Vector constant with ConstantInt::get(). NFCI. There's no need to create constant vector splats manually - missed this one in rG24dd0cd1edd5	2020-10-12 18:39:45 +01:00
Simon Pilgrim	24dd0cd1ed	[InstCombine] FoldShiftByConstant - create Scalar/Vector constant with ConstantInt::get(). NFCI. There's no need to create constant vector splats manually.	2020-10-12 18:17:20 +01:00
Simon Pilgrim	2de368f6a7	[InstCombine] FoldShiftByConstant - merge equivalent types. NFCI. Consistently use the original shift instruction's Type/BitWidth instead of the operands, casted values etc.	2020-10-12 18:17:19 +01:00
Florian Hahn	525b085a65	[VPlan] Use VPValue def for VPMemoryInstructionRecipe. This patch turns VPMemoryInstructionRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84680	2020-10-12 18:02:33 +01:00
Hans Wennborg	17cec6a11a	Revert `1c021c64c` "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown" > While we indeed can't treat them as no-ops, i believe we can/should > do better than just modelling them as `unknown`. `inttoptr` story > is complicated, but for `ptrtoint`, it seems straight-forward > to model it just as a zext-or-trunc of unknown. > > This may be important now that we track towards > making inttoptr/ptrtoint casts not no-op, > and towards preventing folding them into loads/etc > (see D88979/D88789/D88788) > > Reviewed By: mkazantsev > > Differential Revision: https://reviews.llvm.org/D88806 It caused the following assert during Chromium builds: llvm/lib/IR/Constants.cpp:1868: static llvm::Constant llvm::ConstantExpr::getTrunc(llvm::Constant , llvm::Type *, bool): Assertion `C->getType()->isIntOrIntVectorTy() && "Trunc operand must be integer"' failed. See code review for a link to a reproducer. This reverts commit `1c021c64ca`.	2020-10-12 18:39:35 +02:00
Florian Hahn	ea058d289c	[VPlan] Use operands for printing of VPWidenMemoryInstructionRecipe. Now that operands of the recipe are managed through VPUser, we can simplify the printing by just using the operands.	2020-10-12 16:51:54 +01:00
Florian Hahn	ad5541045a	[LoopDeletion] Remove over-eager SCEV verification. `60b852092c` introduced SCEV verification to deleteDeadLoop, but it appears this check is currently a bit over-eager and some users of deleteDeadLoop appear to only patch up SE after calling it (e.g. PR47753). Remove the extra check for now. We can consider adding it back after we tracked down the source of the inconsistency for PR47753.	2020-10-12 16:18:30 +01:00
Simon Pilgrim	bbf3925879	[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw (REAPPLIED) If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139). Reapplied after the shift canonicalization in rG02295e6d1a15 which removed the need to flip the shift values. Differential Revision: https://reviews.llvm.org/D88783	2020-10-12 16:06:41 +01:00
Simon Pilgrim	fa56623370	[InstCombine] matchFunnelShift - remove shift value commutation. NFCI. After rG02295e6d1a15 we no longer need to invert the shift values for fshr - this is just hidden at the moment as funnel shifts only ever match for constant values so never use the fshr "Sub on SHL" path.	2020-10-12 15:55:18 +01:00
Simon Pilgrim	02295e6d1a	[InstCombine] matchFunnelShift - canonicalize to OR(SHL,LSHR). NFCI. Simplify the shift amount matching code by canonicalizing the shift ops first.	2020-10-12 15:10:59 +01:00
Simon Pilgrim	45d785e22b	Revert rGb97093e520036f8 - "[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw" This reverts commit `b97093e520`. Funnel shift argument commutation isn't working correctly	2020-10-12 11:38:52 +01:00
Roman Lebedev	1c021c64ca	[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown While we indeed can't treat them as no-ops, i believe we can/should do better than just modelling them as `unknown`. `inttoptr` story is complicated, but for `ptrtoint`, it seems straight-forward to model it just as a zext-or-trunc of unknown. This may be important now that we track towards making inttoptr/ptrtoint casts not no-op, and towards preventing folding them into loads/etc (see D88979/D88789/D88788) Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88806	2020-10-12 11:04:03 +03:00
David Sherwood	c5ba0d33cc	[SVE] Make ElementCount and TypeSize use a new PolySize class I have introduced a new template PolySize class, where the template parameter determines the type of quantity, i.e. for an element count this is just an unsigned value. The ElementCount class is now just a simple derivation of PolySize<unsigned>, whereas TypeSize is more complicated because it still needs to contain the uint64_t cast operator, since there are still many places in the code that rely upon this implicit cast. As such the class also still needs some of it's own operators. I've tried to minimise the amount of code in the base PolySize class, which led to a couple of changes: 1. In some places we were relying on '==' operator comparisons between ElementCounts and the scalar value 1. I didn't put this operator in the new PolySize class, and thought it was actually clearer to use the isScalar() function instead. 2. I removed the isByteSized function and replaced it with calls to isKnownMultipleOf(8). I've also renamed NextPowerOf2 to be coefficientNextPowerOf2 so that it's more consistent with coefficientDivideBy. Differential Revision: https://reviews.llvm.org/D88409	2020-10-12 08:23:38 +01:00
Roman Lebedev	544a6aa267	[InstCombine] combineLoadToOperationType(): don't fold int<->ptr cast into load And another step towards transforms not introducing inttoptr and/or ptrtoint casts that weren't there already. As we've been establishing (see D88788/D88789), if there is a int<->ptr cast, it basically must stay as-is, we can't do much with it. I've looked, and the most source of new such casts being introduces, as far as i can tell, is this transform, which, ironically, tries to reduce count of casts.. On vanilla llvm test-suite + RawSpeed, @ `-O3`, this results in -33.58% less `IntToPtr`s (19014 -> 12629) and +76.20% more `PtrToInt`s (18589 -> 32753), which is an increase of +20.69% in total. However just on RawSpeed, where i know there are basically none `IntToPtr` in the original source code, this results in -99.27% less `IntToPtr`s (2724 -> 20) and +82.92% more `PtrToInt`s (4513 -> 8255). which is again an increase of 14.34% in total. To me this does seem like the step in the right direction, we end up with strictly less `IntToPtr`, but strictly more `PtrToInt`, which seems like a reasonable trade-off. See https://reviews.llvm.org/D88860 / https://reviews.llvm.org/D88995 for some more discussion on the subject. (Eventually, `CastInst::isNoopCast()`/`CastInst::isEliminableCastPair` should be taught about this, yes) Reviewed By: nlopes, nikic Differential Revision: https://reviews.llvm.org/D88979	2020-10-11 20:24:28 +03:00
David Green	be6e8e50f4	[LV] Tail folded inloop reductions. This expands upon the inloop reductions added in e9761688e41cb9e976, allowing them to be inserted into tail folded loops. Reductions are generates with the form: x = select(mask, vecop, zero) v = vecreduce.add(x) c = add chain, v Where zero here is chosen as the identity value for add reductions. The backend is then expected to fold the select and the vecreduce into a single predicated instruction. Most of the code is fairly straight forward, except for the creation of blockmasks which need to ensure they are created in dominance order. The order they are added is altered to be after any phis, keeping the requirements for the underlying IR. Differential Revision: https://reviews.llvm.org/D84451	2020-10-11 16:58:34 +01:00
Sanjay Patel	3f3356bdd9	[InstCombine] allow vector splats for add+xor --> shifts	2020-10-11 09:04:24 -04:00
Sanjay Patel	f81200ae99	[InstCombine] add one-use check to add+xor transform As shown in the affected test, we could increase instruction count without this limitation. There's another test with extra use that shows we still convert directly to a real "sext" if possible.	2020-10-11 09:04:24 -04:00
Simon Pilgrim	b97093e520	[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139). Differential Revision: https://reviews.llvm.org/D88783	2020-10-11 10:37:20 +01:00
Simon Pilgrim	b752daa26b	[InstCombine] Replace getLogBase2 internal helper with ConstantExpr::getExactLogBase2. NFCI. This exposes the helper for other power-of-2 instcombine folds that I'm intending to add vector support to. The helper only operated on power-of-2 constants so getExactLogBase2 is a more accurate name.	2020-10-11 10:31:17 +01:00
Xun Li	667dfe39ca	[Coroutines] Refactor/Rewrite Spill and Alloca processing This patch is a refactoring of how we process spills and allocas during CoroSplit. In the previous implementation, everything that needs to go to the heap is put into Spills, including all the values defined by allocas. And the way to identify a Spill, is to check whether there exists a use-def relationship that crosses suspension points. This approach is fundamentally confusing, and unfortunately, incorrect. First of all, allocas are always process differently than spills, hence it's quite confusing to put them together. It's a much cleaner to separate them and process them separately. Doing so simplify lots of code and makes the logic more clear and easier to reason about. Secondly, use-def relationship is insufficient to decide whether a value defined by AllocaInst needs to go to the heap. There are many cases where a value defined by AllocaInst can implicitly be used across suspension points without a direct use-def relationship. For example, you can store the address of an alloca into the heap, and load that address after suspension. Or you can escape the address into an object through a function call. Or you can have a PHINode that takes two allocas, and this PHINode is used across suspension point (when this happens, the existing implementation will spill the PHINode, a.k.a a stack adddress to the heap!). All these issues suggest that we need to separate spill and alloca in order to properly implement this. This patch does not yet fix these bugs, however it sets up the code in a better shape so that we can start fixing them in the next patch. The core idea of this patch is to add a new struct called FrameDataInfo, which contains all Spills, all Allocas, and a map from each definition to its layout index in the frame (FieldIndexMap). Spills and Allocas are identified, stored and processed independently. When they are initially added to the frame, we record their field index through FieldIndexMap. When the frame layout is finalized, we update each index into their final layout index. In doing so, I also cleaned up a few things and also discovered a few other bugs. Cleanups: 1. Found out that PromiseFieldId is not used, delete it. 2. Previously, SpillInfo is a vector, which is strange because every def can have multiple users. This patch cleans it up by turning it into a map from def to users. 3. Previously, a frame Field struct contains a list of Spills that field corresponds to. This isn't necessary since we only need the layout index for each given definition. This patch removes that list. Instead, we connect each field and definition using the FieldIndexMap. 4. All the loops that process Spills are simplified now because we use a map instead of a vector. Bugs: It seems that we are only keeping llvm.dbg.declare intrinsics in the .resume part of the function. The ramp function will no longer has it. This means we are dropping some debug information in the ramp function. The next step is to start fixing the bugs where the implementation fails to identify some allocas that should live on the frame. Differential Revision: https://reviews.llvm.org/D88872	2020-10-10 22:21:34 -07:00
Simon Pilgrim	702ccb40e2	[InstCombine] getLogBase2(undef) -> 0. Move the undef element handling into the getLogBase2 helper instead of pre-empting with replaceUndefsWith.	2020-10-10 20:29:03 +01:00
Simon Pilgrim	3aab3cbd4a	[InstCombine] getLogBase2 - no need to specify Type. NFCI. In all the getLogBase2 uses, the specified Type is always the same as the constant being folded.	2020-10-10 20:09:55 +01:00
Nikita Popov	5e855f1e80	[MemCpyOpt] Don't hoist store that's not guaranteed to execute MemCpyOpt can hoist stores while load+store pairs into memcpy. This hoisting can currently result in stores being executed that weren't guaranteed to execute in the original problem. Differential Revision: https://reviews.llvm.org/D89154	2020-10-10 10:26:28 +02:00
Changpeng Fang	f192a27ed3	Sink: Handle instruction sink when a user is dead Summary: The current instruction sink pass uses findNearestCommonDominator of all users to find block to sink the instruction to. However, a user may be in a dead block, which will result in unexpected behavior. This patch handles such cases by skipping dead blocks. This patch fixes: https://bugs.llvm.org/show_bug.cgi?id=47415 Reviewers: MaskRay, arsenm Differential Revision: https://reviews.llvm.org/D89166	2020-10-09 16:20:26 -07:00
Eli Friedman	278299b0f0	[SCCP] Reduce the number of times ResolvedUndefsIn is called for large modules. If a module has many values that need to be resolved by ResolvedUndefsIn, compilation takes quadratic time overall. Solve should do a small amount of work, since not much is added to the worklists each time markOverdefined is called. But ResolvedUndefsIn is linear over the length of the function/module, so resolving one undef at a time is quadratic in general. To solve this, make ResolvedUndefsIn resolve every undef value at once, instead of resolving them one at a time. This loses a little optimization power, but can be a lot faster. We still need a loop around ResolvedUndefsIn because markOverdefined could change the set of blocks that are live. That should be uncommon, hopefully. We could optimize it by tracking which blocks transition from dead to live, instead of iterating over the whole module to find them. But I'll leave that for later. (The whole function will become a lot simpler once we start pruning branches on undef.) The regression test changes seem minor. The specific cases in question could probably be optimized with a bit more work, but they seem like edge cases that don't really matter. Fixes an "infinite" compile issue my team found on an internal workoad. Differential Revision: https://reviews.llvm.org/D89080	2020-10-09 15:24:16 -07:00
Giorgis Georgakoudis	3a6bfcf2f9	[OpenMPOpt] Merge parallel regions There are cases that generated OpenMP code consists of multiple, consecutive OpenMP parallel regions, either due to high-level programming models, such as RAJA, Kokkos, lowering to OpenMP code, or simply because the programmer parallelized code this way. This optimization merges consecutive parallel OpenMP regions to: (1) reduce the runtime overhead of re-activating a team of threads; (2) enlarge the scope for other OpenMP optimizations, e.g., runtime call deduplication and synchronization elimination. This implementation defensively merges parallel regions, only when they are within the same BB and any in-between instructions are safe to execute in parallel. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D83635	2020-10-09 09:59:04 -07:00
Arthur Eubanks	0689dab844	[FixIrreducible][NewPM] Port -fix-irreducible to NPM In the NPM, a pass cannot depend on another non-analysis pass. So pin the test that tests that -lowerswitch is run automatically to legacy PM. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D89051	2020-10-09 09:22:09 -07:00
Arthur Eubanks	9c21c6c966	[LoopInterchange][NewPM] Port -loop-interchange to NPM Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D89058	2020-10-09 09:21:31 -07:00
Simon Pilgrim	8a836daaa9	[InstCombine] Support lshr(trunc(lshr(x,c1)), c2) -> trunc(lshr(lshr(x,c1),c2)) uniform vector tests FoldShiftByConstant is hardcoded for scalar/uniform outer shift amounts atm so that needs to be fixed first to support non-uniform cases	2020-10-09 16:54:46 +01:00
Simon Pilgrim	1c040a3e56	[InstCombine] commonShiftTransforms - add support for pow2 nonuniform constant vectors in srem fold Note: we already fold srem to undef if any denominator vector element is undef.	2020-10-09 15:59:33 +01:00
Sanjay Patel	080e6bc205	[InstCombine] allow vector splats for add+and with high-mask There might be a better way to specify the pre-conditions, but this is hopefully clearer than the way it was written: https://rise4fun.com/Alive/Jhk3 Pre: C2 < 0 && isShiftedMask(C2) && (C1 == C1 & C2) %a = and %x, C2 %r = add %a, C1 => %a2 = add %x, C1 %r = and %a2, C2	2020-10-09 10:39:11 -04:00
Simon Pilgrim	9e796d5e71	[InstCombine] foldShiftOfShiftedLogic - add support for nonuniform constant vectors	2020-10-09 14:25:12 +01:00
Simon Pilgrim	556316cf72	[InstCombine] foldShiftOfShiftedLogic - replace cast<BinaryOperator> with m_BinOp matcher. NFCI. Allows us to drop the !isa<ConstantExpr> check.	2020-10-09 14:10:12 +01:00
Max Kazantsev	225df71951	[NFC] Add option to disable IV widening if needed IV widening is sometimes a strictly harmful transform (some examples of this are shown in tests 11, 12 in widen-loop-comp.ll). One of the reasons of this is that sometimes SCEV fails to prove some facts after part of guards has been widened. Though each single such case looks like a bug that can be addressed, it seems that disabling of IV widening may be profitable in some cases. We want to have an option to do so. By default, existing behavior is preserved and IV widening is on.	2020-10-09 18:32:03 +07:00
Simon Pilgrim	d9f064dc0b	[InstCombine] visitTrunc - trunc(shl(X, C)) --> shl(trunc(X),trunc(C)) vector support Annoyingly vectors aren't supported by shouldChangeType(), but we have precedents for always performing this on vector types (e.g. narrowBinOp). Differential Revision: https://reviews.llvm.org/D89067	2020-10-08 22:07:51 +01:00
Sanjay Patel	f688ae7a0e	[InstCombine] allow vector splats for add+xor with low-mask This can be allowed with undef elements too, but that can be another step: https://alive2.llvm.org/ce/z/hnC4Z-	2020-10-08 15:53:38 -04:00
Simon Pilgrim	6aa10ae5bf	[Transforms] visitCmpBlock - don't dereference a dyn_cast<>. NFCI. Use cast<> as we immediately dereference the pointer afterwards - cast<> will assert if we fail. Prevents clang static analyzer warning that we could deference a null pointer.	2020-10-08 20:18:32 +01:00
Sanjay Patel	5ac89add1e	[InstCombine] remove unnecessary one-use check from add-xor transform Pre-conditions seem to be optimal, but we don't need a use check because we are only replacing an add with a sub. https://rise4fun.com/Alive/hzN Pre: (~C1 \| C2 == -1) && isPowerOf2(C2+1) %m = and i8 %x, C1 %f = xor i8 %m, C2 %r = add i8 %f, C3 => %r = sub i8 C2 + C3, %m	2020-10-08 15:08:51 -04:00
Simon Pilgrim	0716805c02	[SLP] optimizeGatherSequence - assert every Instruction in the worklist is non-null. Fixes clang static analyzer warning.	2020-10-08 20:02:18 +01:00
Simon Pilgrim	8f0658ae67	[Transforms] CodeExtractor::verifyAssumptionCache - don't dereference a dyn_cast<>. NFCI. Use cast<> as we immediately dereference the pointer afterwards - cast<> will assert if we fail. Prevents clang static analyzer warning that we could deference a null pointer.	2020-10-08 19:04:30 +01:00
Sanjay Patel	b57451b011	[InstCombine] allow vector splats for add+xor with signmask	2020-10-08 10:46:34 -04:00
Simon Pilgrim	5415fef3ab	[InstCombine] matchFunnelShift - support non-uniform constant vector shift amounts (PR46895) Complete basic PR46895 fixes by refactoring D87452/D88402 to allow us to match non-uniform constant values. We still don't handle non-uniform vectors that contain undef elements, but that can wait until we have a decent generic mechanism for this. Differential Revision: https://reviews.llvm.org/D88420	2020-10-08 12:56:27 +01:00
Markus Lavin	06758c6a61	[DebugInfo] Improve dbg preservation in LSR. Use SCEV to salvage additional @llvm.dbg.value that have turned into referencing undef after transformation (and traditional salvageDebugInfo). Before transformation compute SCEV for each @llvm.dbg.value in the loop body and store it (along side its current DIExpression). After transformation update those @llvm.dbg.value now referencing undef by comparing its stored SCEV to the SCEV of the current loop-header PHI-nodes. Allow match with offset by inserting compensation code in the DIExpression. Includes fix for the nullptr deref that caused the original commit to be reverted in `9d63029770`. Fixes : PR38815 Differential Revision: https://reviews.llvm.org/D87494	2020-10-08 13:16:43 +02:00
Simon Pilgrim	e1d4ca0009	[InstCombine] matchRotate - add support for matching general funnel shifts with constant shift amounts (PR46896) First step towards extending the existing rotation support to full funnel shift handling now that the backend legalization support has improved. This enables us to match the shift by constant cases, which are pretty trivial to expand again if necessary. D88420 will add non-uniform support for funnel shifts as well once its been finalized. Differential Revision: https://reviews.llvm.org/D88834	2020-10-08 11:05:14 +01:00
Simon Pilgrim	aa47962cc9	[InstCombine] canNarrowShiftAmt - replace custom Constant matching with m_SpecificInt_ICMP The existing code ignores undef values which matches m_SpecificInt_ICMP, although m_SpecificInt_ICMP returns false for an all-undef constant, I've added test coverage at rGfe0197e194a64f9 to show that undef folding should already have dealt with that case.	2020-10-08 10:53:32 +01:00
David Green	498f89d188	[LV] Collect dead induction truncates We currently collect the ICmp and Add from an induction variable, marking them as dead so that vplan values are not created for them. This extends that to include any single use trunk from the ICmp, which allows the Add to more readily be removed too. This can help with costing vplan nodes, as the ICmp and Add are more reliably removed and are not double-counted. Differential Revision: https://reviews.llvm.org/D88873	2020-10-08 08:28:58 +01:00
Reid Kleckner	940d7aaea9	Port StripGCRelocates pass to NPM Fixes one test under NPM Differential Revision: https://reviews.llvm.org/D88766	2020-10-07 14:41:29 -07:00
Reid Kleckner	da48fe1732	[NPM] Port strip nonlinetable debuginfo pass to the new pass manager Fixes a few tests in llvm/test/Transforms/Utils. Differential Revision: https://reviews.llvm.org/D88762	2020-10-07 14:35:36 -07:00
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Roman Lebedev	fed0f890e5	InstCombine: Negator: don't rely on complexity sorting already being performed (PR47752) In some cases, we can negate instruction if only one of it's operands negates. Previously, we assumed that constants would have been canonicalized to RHS already, but that isn't guaranteed to happen, because of InstCombine worklist visitation order, as the added test (previously-hanging) shows. So if we only need to negate a single operand, we should ensure ourselves that we try constant operand first. Do that by re-doing the complexity sorting ourselves, when we actually care about it. Fixes https://bugs.llvm.org/show_bug.cgi?id=47752	2020-10-07 15:09:50 +03:00
Max Kazantsev	fba42aea43	[NFC] Use getZero instead of getConstant(0)	2020-10-07 13:53:36 +07:00
Roman Lebedev	7fa503ef4a	[SROA] rewritePartition()/findCommonType(): if uses have conflicting type, try getTypePartition() before falling back to largest integral use type (PR47592) And another step towards transformss not introducing inttoptr and/or ptrtoint casts that weren't there already. In this case, when load/store uses have conflicting types, instead of falling back to the iN, we can try to use allocated sub-type. As disscussed, this isn't the best idea overall (we shouldn't rely on allocated type), but it works fine as a temporary measure. I've measured, and @ `-O3` as of vanilla llvm test-suite + RawSpeed, this results in +0.05% more bitcasts, -5.51% less inttoptr and -1.05% less ptrtoint (at the end of middle-end opt pipeline) See https://bugs.llvm.org/show_bug.cgi?id=47592 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D88788	2020-10-07 09:20:19 +03:00
Johannes Doerfert	7993d61177	[Attributor] Use smarter way to determine alignment of GEPs Use same logic existing in other places to deal with base case GEPs. Add the original Attributor talk example.	2020-10-06 19:31:08 -05:00
Johannes Doerfert	c4cfe7a435	[Attributor] Ignore read accesses to constant memory The old function attribute deduction pass ignores reads of constant memory and we need to copy this behavior to replace the pass completely. First step are constant globals. TBAA can also describe constant accesses and there are other possibilities. We might want to consider asking the alias analyses that are available but for now this is simpler and cheaper.	2020-10-06 19:31:07 -05:00
Johannes Doerfert	3f540c05df	[Attributor] Give up early on AANoReturn::initialize If the function is not assumed `noreturn` we should not wait for an update to mark the call site as "may-return". This has two kinds of consequences: - We have less iterations in many tests. - We have less deductions based on "known information" (since we ask earlier, point 1, and therefore assumed information is not "known" yet). The latter is an artifact that we might want to tackle properly at some point but which is not easily fixable right now.	2020-10-06 19:31:07 -05:00
Nikita Popov	616f545048	[MemCpyOpt] Use dereferenceable pointer helper The call slot optimization has some home-grown code for checking whether the destination is dereferenceable. Replace this with the generic isDereferenceableAndAlignedPointer() helper. I'm not checking alignment here, because that is currently handled separately and may be an enforced alignment for allocas. The clean way of integrating that part would probably be to accept a callback in isDereferenceableAndAlignedPointer() for the actual isAligned check, which would then have a chance to use an enforced alignment instead. This allows the destination to be a GEP (among other things), though the two open TODOs may prevent it from working in practice. Differential Revision: https://reviews.llvm.org/D88805	2020-10-06 18:41:19 +02:00
Nikita Popov	6b441ca523	[MemCpyOpt] Check for throwing calls during call slot optimization When performing call slot optimization for a non-local destination, we need to check whether there may be throwing calls between the call and the copy. Otherwise, the early write to the destination may be observable by the caller. This was already done for call slot optimization of load/store, but not for memcpys. For the sake of clarity, I'm moving this check into the common optimization function, even if that does need an additional instruction scan for the load/store case. As efriedma pointed out, this check is not sufficient due to potential accesses from another thread. This case is left as a TODO. Differential Revision: https://reviews.llvm.org/D88799	2020-10-06 18:24:40 +02:00
Nikita Popov	80cde02e85	[MemCpyOpt] Add separate statistic for call slot optimization (NFC)	2020-10-06 18:14:10 +02:00
Dávid Bolvanský	86429c4eaf	[SimplifyLibCalls] Optimize mempcpy_chk to mempcpy	2020-10-06 17:08:46 +02:00
Johannes Doerfert	4a7a988442	[Attributor][FIX] Move assertion to make it not trivially fail The idea of this assertion was to check the simplified value before we assign it, not after, which caused this to trivially fail all the time.	2020-10-06 09:32:18 -05:00
Johannes Doerfert	04f6951397	[Attributor][FIX] Dead return values are not `noundef` When we assume a return value is dead we might still visit return instructions via `Attributor::checkForAllReturnedValuesAndReturnInsts(..)`. When we do so the "returned value" is potentially simplified to `undef` as it is the assumed "returned value". This is a problem if there was a preexisting `noundef` attribute that will only be removed as we manifest the `undef` return value. We should not use this combination to derive `unreachable` though. Two test cases fixed.	2020-10-06 09:32:18 -05:00
Johannes Doerfert	957094e31b	[Attributor][NFC] Ignore benign uses in AAMemoryBehaviorFloating In AAMemoryBehaviorFloating we used to track benign uses in a SetVector. With this change we look through benign uses eagerly to reduce the number of elements (=Uses) we look at during an update. The test does actually not fail prior to this commit but I already wrote it so I kept it.	2020-10-06 09:32:18 -05:00
Simon Pilgrim	17b9a91ec2	[InstCombine] canRewriteGEPAsOffset - don't dereference a dyn_cast<>. NFCI. We know V is a IntToPtrInst or PtrToIntInst type so we know its a CastInst - so use cast<> directly. Prevents clang static analyzer warning that we could deference a null pointer.	2020-10-06 14:48:34 +01:00
Simon Pilgrim	75d33a3a97	[InstCombine] FoldShiftByConstant - consistently use ConstantExpr in logicalshift(trunc(shift(x,c1)),c2) fold. NFCI. This still only gets used for scalar types but now always uses ConstantExpr in preparation for vector support - it was using APInt methods in some places.	2020-10-06 14:48:34 +01:00
Simon Pilgrim	21100f885d	[InstCombine] FoldShiftByConstant - use PatternMatch for logicalshift(trunc(shift(x,c1)),c2) fold. NFCI.	2020-10-06 13:13:08 +01:00
Simon Pilgrim	0b402e985e	[InstCombine] FoldShiftByConstant - remove unnecessary cast<>. NFC. Op1 is already a Constant*	2020-10-06 13:13:08 +01:00
Serguei Katkov	b988898013	[GVN LoadPRE] Extend the scope of optimization by using context to prove safety of speculation Use context to prove that load can be safely executed at a point where load is being hoisted. Postpone the decision about safety of speculative load execution till the moment we know where we hoist load and check safety at that context. Reviewers: nikic, fhahn, mkazantsev, lebedev.ri, efriedma, reames Reviewed By: reames, mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D88725	2020-10-06 09:25:16 +07:00
Vedant Kumar	9afb1c566e	Revert "Outline non returning functions unless a longjmp" This reverts commit `20797989ea`. This patch (https://reviews.llvm.org/D69257) cannot complete a stage2 build due to the change: ``` CI->getCalledFunction()->getName().contains("longjmp") ``` There are several concrete issues here: - The callee may not be a function, so `getCalledFunction` can assert. - The called value may not have a name, so `getName` can assert. - There's no distinction made between "my_longjmp_test_helper" and the actual longjmp libcall. At a higher level, there's a serious layering problem here. The splitting pass makes policy decisions in a general way (e.g. based on attributes or profile data). Special-casing certain names breaks the layering. It subverts the work of library maintainers (who may now need to opt-out of unexpected optimization behavior for any affected functions) and can lead to inconsistent optimization behavior (as not all llvm passes special-case ".longjmp." in the same way). The patch may need significant revision to address these issues. But the immediate issue is that this crashes while compiling llvm's unit tests in a stage2 build (due to the `getName` problem).	2020-10-05 14:10:25 -07:00
Roman Lebedev	e00f189d39	[InstCombine] Revert rL226781 "Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available." (PR47592) (it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html) This canonicalization seems dubious. Most importantly, while it does not create `inttoptr` casts by itself, it may cause them to appear later, see e.g. D88788. I think it's pretty obvious that it is an undesirable outcome, by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts are not no-op, and are no longer eager to look past them. Which e.g. means that given ``` %a = load i32 %b = inttoptr %a %c = inttoptr %a ``` we likely won't be able to tell that `%b` and `%c` is the same thing. As we can see in D88789 / D88788 / D88806 / D75505, we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least) And we can't recover the situation post-inlining in instcombine. So it really does look like this fold is actively breaking otherwise-good IR, in a way that is not recoverable. And that means, this fold isn't helpful in exposing the passes that are otherwise unaware of these patterns it produces. Thusly, i propose to simply not perform such a canonicalization. The original motivational RFC does not state what larger problem that canonicalization was trying to solve, so i'm not sure how this plays out in the larger picture. On vanilla llvm test-suite + RawSpeed, this results in increase of asm instructions and final object size by ~+0.05% decreases final count of bitcasts by -4.79% (-28990), ptrtoint casts by -15.41% (-3423), and of inttoptr casts by -25.59% (-6919, sic). Overall, there's -0.04% less IR blocks, -0.39% instructions. See https://bugs.llvm.org/show_bug.cgi?id=47592 Differential Revision: https://reviews.llvm.org/D88789	2020-10-06 00:00:30 +03:00
Dávid Bolvanský	a4bae56ab8	Revert "[SLC] Optimize mempcpy_chk to mempcpy" This reverts commit `3f1fd59de3`.	2020-10-05 22:27:14 +02:00
Dávid Bolvanský	3f1fd59de3	[SLC] Optimize mempcpy_chk to mempcpy As reported in PR46735: void* f(void d, const void s, size_t l) { return __builtin___mempcpy_chk(d, s, l, __builtin_object_size(d, 0)); } This can be optimized to `return mempcpy(d, s, l);`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86019	2020-10-05 22:18:36 +02:00
Roman Lebedev	59127de243	[NFC][GCOV] Fix build: there's `llvm::stable_partition()` wrapper	2020-10-05 22:52:32 +03:00
Fangrui Song	e338f8fe69	[gcov] Fix non-determinism (DenseMap iteration order) of checksum computation ... by using MapVector. The issue was caused by `63182c2ac0`. Also use stable_partition instead of partition to get stable results across different STL implementations.	2020-10-05 12:39:36 -07:00
Nikita Popov	3641d375f6	[InstCombine] Handle GEP inbounds in select op replacement (PR47730) When retrying the "simplify with operand replaced" select optimization without poison flags, also handle inbounds on GEPs. Of course, this particular example would also be safe to transform while keeping inbounds, but the underlying machinery does not know this (yet).	2020-10-05 21:13:02 +02:00
Simon Pilgrim	8fb4645321	[InstCombine] FoldShiftByConstant - use m_Specific. NFCI. Use m_Specific instead of m_Value followed by an equality check - we already do this for the similar folds above, it looks like an oversight in rG2b459fe7e1e where the original pattern match code looked a little different.	2020-10-05 18:56:10 +01:00
Simon Pilgrim	4ce61144cb	[InstCombine] canEvaluateShifted - remove dead (and never used code). NFC. This was already #if'd out when it was added back in 2010 at rG18d7fc8fc6767 and has never been touched since.	2020-10-05 18:05:13 +01:00
Nikita Popov	9d63029770	Revert "[DebugInfo] Improve dbg preservation in LSR." This reverts commit `a3caf7f610`. The ReleaseLTO-g test-suite configuration has been failing to build since this commit, because clang segfaults while building 7zip.	2020-10-05 19:02:30 +02:00
Florian Hahn	348d85a6c7	[VPlan] Clean up uses/operands on VPBB deletion. Update the code responsible for deleting VPBBs and recipes to properly update users and release operands. This is another preparation for D84680 & following patches towards enabling modeling def-use chains in VPlan.	2020-10-05 14:43:52 +01:00
Markus Lavin	a3caf7f610	[DebugInfo] Improve dbg preservation in LSR. Use SCEV to salvage additional @llvm.dbg.value that have turned into referencing undef after transformation (and traditional salvageDebugInfo). Before transformation compute SCEV for each @llvm.dbg.value in the loop body and store it (along side its current DIExpression). After transformation update those @llvm.dbg.value now referencing undef by comparing its stored SCEV to the SCEV of the current loop-header PHI-nodes. Allow match with offset by inserting compensation code in the DIExpression. Fixes : PR38815 Differential Revision: https://reviews.llvm.org/D87494	2020-10-05 09:55:16 +02:00
Florian Hahn	357bbaab66	[VPlan] Add VPRecipeBase::toVPUser helper (NFC). This adds a helper to convert a VPRecipeBase pointer to a VPUser, for recipes that inherit from VPUser. Once VPRecipeBase directly inherits from VPUser this helper can be removed.	2020-10-04 19:43:27 +01:00
Florian Hahn	f5fe7abe8a	[VPlan] Account for removed users in replaceAllUsesWith. Make sure we do not iterate using an invalid iterator. Another small fix/step towards traversing the def-use chains in VPlan.	2020-10-04 18:18:58 +01:00
Anatoly Parshintsev	a566f0525a	[RISCV][ASAN] instrumentation pass now uses proper shadow offset [10/11] patch series to port ASAN for riscv64 Depends On D87580 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D87581	2020-10-04 16:30:38 +03:00
Roman Lebedev	03bd5198b6	[OldPM] Pass manager: run SROA after (simple) loop unrolling I have stumbled into this pretty accidentally, when rewriting some spaghetti-like code into something more structured, which involved using some `std::array<>`s. And to my surprise, the `alloca`s remained, causing about `+160%` perf regression. https://llvm-compile-time-tracker.com/compare.php?from=bb6f4d32aac3eecb51909f4facc625219307ee68&to=d563e66f40f9d4d145cb2050e41cb961e2b37785&stat=instructions suggests that this has geomean compile-time cost of `+0.08%`. Note that D68593 / `cecc0d27ad` already did this chage for NewPM, but left OldPM in a pessimized state. This fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40011 \| PR40011 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=42794 \| PR42794 ]] and probably some other reports. Reviewed By: nikic, xbolva00 Differential Revision: https://reviews.llvm.org/D87972	2020-10-04 11:53:50 +03:00
Florian Hahn	82dcd383c4	[VPlan] Properly update users when updating operands. When updating operands of a VPUser, we also have to adjust the list of users for the new and old VPValues. This is required once we start transitioning recipes to become VPValues.	2020-10-03 20:54:58 +01:00
Michał Górny	66e493f81e	[asan] Stop instrumenting user-defined ELF sections Do not instrument user-defined ELF sections (whose names resemble valid C identifiers). They may have special use semantics and modifying them may break programs. This is e.g. the case with NetBSD __link_set API that expects these sections to store consecutive array elements. Differential Revision: https://reviews.llvm.org/D76665	2020-10-03 19:54:38 +02:00
Simon Pilgrim	aacfe2be53	[InstCombine] recognizeBSwapOrBitReverseIdiom - add vector support Add basic vector handling to recognizeBSwapOrBitReverseIdiom/collectBitParts - this works at the element level, all vector element operations must match (splat constants etc.) and there is no cross-element support (insert/extract/shuffle etc.).	2020-10-03 16:26:46 +01:00
Simon Pilgrim	347fd9955a	[InstCombine] recognizeBSwapOrBitReverseIdiom - use generic CreateIntegerCast Try to appease buildbots breakages due to D88578	2020-10-03 15:29:22 +01:00
Simon Pilgrim	3aa93f690b	[InstCombine] recognizeBSwapOrBitReverseIdiom - support for 'partial' bswap patterns (PR47191) (Reapplied) If we're bswap'ing some bytes and zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern. Reapplied with early-out if recognizeBSwapOrBitReverseIdiom collects a source wider than the result type. Differential Revision: https://reviews.llvm.org/D88578	2020-10-03 14:52:42 +01:00
Nikita Popov	fbf818724f	[MemCpyOpt] Make moveUp() a member method (NFC) So we don't have to pass through more parameters in the future.	2020-10-03 11:28:49 +02:00
Arthur Eubanks	321986fe68	[MetaRenamer][NewPM] Port metarenamer to NPM Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D88690	2020-10-02 15:42:25 -07:00
Vitaly Buka	aff896dea1	[NFC][MSAN] Extract llvm.abs handling into a function Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D88519	2020-10-02 15:01:25 -07:00
Nikita Popov	94704ed008	[MemCpyOpt] Add helper to erase instructions (NFC) Next to erasing the instruction, we also always want to remove it from MSSA and MD. Use a common function to do so. This is a refactoring split out from D26739.	2020-10-02 21:52:10 +02:00
Nikita Popov	87b63c1726	[MemCpyOpt] Avoid double invalidation (NFCI) The removal of the cpy instruction is left to the caller of performCallSlotOptzn(), including the invalidation of MD. Both call-sites already do this. Also handle incrementation of NumMemCpyInstr consistently at the call-site. One of the call-site was already doing this, which ended up incrementing the statistic twice. This fix was part of D26739.	2020-10-02 21:50:46 +02:00
Arthur Eubanks	7468afe9ca	[DAE] MarkLive in MarkValue(MaybeLive) if any use is live While looping through all args or all return values, we may mark a use of a later iteration as live. Previously when we got to that later value it would ignore that and continue adding to Uses instead of marking it live. For example, when looping through arg#0 and arg#1, MarkValue(arg#0, Live) may cause some use of arg#1 to be live, but MarkValue(arg#1, MaybeLive) will not notice that and continue adding into Uses. Now MarkValue(RA, MaybeLive) will MarkLive(RA) if any use is live. Fixes PR47444. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D88529	2020-10-02 10:55:08 -07:00
Arthur Eubanks	eb55735073	Reland [AlwaysInliner] Update BFI when inlining Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88324	2020-10-02 10:46:57 -07:00
Arthur Eubanks	9b8c0b8b46	Revert "[AlwaysInliner] Update BFI when inlining" This reverts commit `b1bf24667f`.	2020-10-02 10:34:51 -07:00
Arthur Eubanks	b1bf24667f	[AlwaysInliner] Update BFI when inlining Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88324	2020-10-02 10:26:34 -07:00
Simon Pilgrim	0364721e3e	Revert rG3d14a1e982ad27 - "[InstCombine] recognizeBSwapOrBitReverseIdiom - support for 'partial' bswap patterns (PR47191)" This reverts commit `3d14a1e982`. This is breaking on some 2stage clang buildbots	2020-10-02 18:17:14 +01:00
Florian Hahn	0867a9e85a	[VPlan] Use isa<> instead of directly checking VPRecipeID (NFC). getVPRecipeID is intended to be only used in `classof` helpers. Instead of checking it directly, use isa<> with the correct recipe type.	2020-10-02 17:47:35 +01:00
Simon Pilgrim	3d14a1e982	[InstCombine] recognizeBSwapOrBitReverseIdiom - support for 'partial' bswap patterns (PR47191) If we're bswap'ing some bytes and zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern. Differential Revision: https://reviews.llvm.org/D88578	2020-10-02 17:25:12 +01:00
Simon Pilgrim	0347f3ea72	TruncInstCombine.cpp - fix header include ordering to fix llvm-include-order clang-tidy warning. NFCI.	2020-10-02 17:25:12 +01:00
Simon Pilgrim	5e8e89d814	TruncInstCombine.cpp - use auto * to fix llvm-qualified-auto clang-tidy warning. NFCI.	2020-10-02 17:25:11 +01:00
Philip Reames	f29645e7af	[gvn] Handle a corner case w/vectors of non-integral pointers If we try to coerce a vector of non-integral pointers to a narrower type (either narrower vector or single pointer), we use inttoptr and violate the semantics of non-integral pointers. In theory, we can handle many of these cases, we just need to use a different code idiom to convert without going through inttoptr and back. This shows up as wrong code bugs, and in some cases, crashes due to failed asserts. Modeled after a change which has lived downstream for a couple years, though completely rewritten to be more idiomatic.	2020-10-01 19:20:21 -07:00
Joseph Huber	82453e759c	[OpenMP] Add Missing Runtime Call for Globalization Remarks Summary: Add a missing runtime call to perform data globalization checks. Reviewers: jdoerfert Subscribers: guansong hiraditya llvm-commits sstefan1 yaxunl Tags: #LLVM #OpenMP Differential Revision: https://reviews.llvm.org/D88621	2020-10-01 21:19:53 -04:00
Philip Reames	bb0344644a	[memcpyopt] Conservatively handle non-integral pointers If we allow the non-integral pointers to become memset and memcpy, we loose the ability to reason about pointer propagation. This patch is modeled on changes we've carried downstream for a long time, figured it was worth being equally conservative for other users. There is room to refine the semantics and handling here if anyone is motivated.	2020-10-01 16:46:56 -07:00
Philip Reames	de3cb9548d	Fix a bug in memset formation with vectors of non-integral pointers We were converting the non-integral store into a integer store which is not legal.	2020-10-01 16:11:11 -07:00
Nikita Popov	9d1c8c0ba9	[InstCombine] Fix select operand simplification with undef (PR47696) When replacing X == Y ? f(X) : Z with X == Y ? f(Y) : Z, make sure that Y cannot be undef. If it may be undef, we might end up picking a different value for undef in the comparison and the select operand.	2020-10-01 21:15:48 +02:00
zoecarver	6c25816d7b	[DSE] Look through memory PHI arguments when removing noop stores in MSSA. Summary: Adds support for "following" memory through MSSA PHI arguments. This will help catch more noop stores that exist between blocks. Originally part of D79391. Reviewers: fhahn, jfb, asbirlea Differential Revision: https://reviews.llvm.org/D82588	2020-10-01 10:42:02 -07:00
Simon Pilgrim	29ac9fae54	[InstCombine] collectBitParts - convert to use PatterMatch matchers and avoid IntegerType casts. Make sure we're using getScalarSizeInBits instead of cast<IntegerType> to get Type bit widths. This is preliminary cleanup before we can start adding vector support to the bswap/bitreverse (element level) matching.	2020-10-01 16:44:14 +01:00
Simon Pilgrim	567049f892	[InstCombine] Use m_FAbs matcher helper. NFCI.	2020-10-01 14:42:34 +01:00
Sjoerd Meijer	d53b4bee0c	[LoopFlatten] Add a loop-flattening pass This is a simple pass that flattens nested loops. The intention is to optimise loop nests like this, which together access an array linearly: for (int i = 0; i < N; ++i) for (int j = 0; j < M; ++j) f(A[iM+j]); into one loop: for (int i = 0; i < (NM); ++i) f(A[i]); It can also flatten loops where the induction variables are not used in the loop. This can help with codesize and runtime, especially on simple cpus without advanced branch prediction. This is only worth flattening if the induction variables are only used in an expression like i*M+j. If they had any other uses, we would have to insert a div/mod to reconstruct the original values, so this wouldn't be profitable. This partially fixes PR40581 as this pass triggers on one of the two cases. I will follow up on this to learn LoopFlatten a few more (small) tricks. Please note that LoopFlatten is not yet enabled by default. Patch by Oliver Stannard, with minor tweaks from Dave Green and myself. Differential Revision: https://reviews.llvm.org/D42365	2020-10-01 13:54:45 +01:00
Simon Pilgrim	bc730b5e43	[InstCombine] collectBitParts - use APInt directly to check for out of range bit shifts. NFCI.	2020-10-01 12:50:36 +01:00
Arthur Eubanks	460dda071e	[WholeProgramDevirt][NewPM] Add NPM testing path to match legacy pass The legacy pass's default constructor sets UseCommandLine = true and goes down a separate testing route. Match that in the NPM pass. This fixes all tests in llvm/test/Transforms/WholeProgramDevirt under NPM. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D88588	2020-09-30 17:27:37 -07:00
Simon Pilgrim	323d08e50a	[InstCombine] Fix bswap(trunc(bswap(x))) -> trunc(lshr(x, c)) vector support Use getScalarSizeInBits not getPrimitiveSizeInBits to determine the shift value at the element level.	2020-09-30 16:01:08 +01:00
Simon Pilgrim	c722b32596	[InstCombine] recognizeBSwapOrBitReverseIdiom - merge the regular/trunc+zext paths. NFCI. There doesn't seem to be any good reason for having a separate path for when we bswap/bitreverse at a smaller size than the destination size - so merge these to make the instruction generation a lot clearer.	2020-09-30 14:54:04 +01:00
Simon Pilgrim	d5545a8993	[InstCombine] recognizeBSwapOrBitReverseIdiom - remove unnecessary cast. NFCI.	2020-09-30 14:44:15 +01:00
Florian Hahn	d856365470	[VPlan] Change recipes to inherit from VPUser instead of a member var. Now that VPUser is not inheriting from VPValue, we can take the next step and turn the recipes that already manage their operands via VPUser into VPUsers directly. This is another small step towards traversing def-use chains in VPlan. This is NFC with respect to the generated code, but makes the interface more powerful.	2020-09-30 14:39:00 +01:00
Simon Pilgrim	621c6c8962	[InstCombine] recognizeBSwapOrBitReverseIdiom - cleanup bswap/bitreverse detection loop. NFCI. Early out if both pattern matches have failed (or we don't want them). Fix case of bit index iterator (and avoid Wshadow issue).	2020-09-30 14:19:18 +01:00
Simon Pilgrim	413b4998bd	[InstCombine] recognizeBSwapOrBitReverseIdiom - use ArrayRef::back() helper. NFCI. Post-commit feedback on D88316	2020-09-30 13:39:18 +01:00
Simon Pilgrim	05290eead3	InstCombine] collectBitParts - cleanup variable names. NFCI. Fix a number of WShadow warnings (I was used as the instruction and index......) and fix cases to match style. Also, replaced the Bit APInt mask check in AND instructions with a direct APInt[] bit check.	2020-09-30 13:25:32 +01:00
Simon Pilgrim	af47d40b9c	[InstCombine] recognizeBSwapOrBitReverseIdiom - recognise zext(bswap(trunc(x))) patterns (PR39793) PR39793 demonstrated an issue where we fail to recognize 'partial' bswap patterns of the lower bytes of an integer source. In fact, most of this is already in place collectBitParts suitably tags zero bits, so we just need to correctly handle this case by finding the zero'd upper bits and reducing the bswap pattern just to the active demanded bits. Differential Revision: https://reviews.llvm.org/D88316	2020-09-30 12:07:19 +01:00
Simon Pilgrim	ec3f24d453	[InstCombine] recognizeBSwapOrBitReverseIdiom - assert for correct bit providence indices. NFCI. As suggested by @spatel on D88316	2020-09-30 11:16:33 +01:00
Jeremy Morse	05659606a2	Revert "[gardening] Replace some uses of setDebugLoc(DebugLoc()) with dropLocation(), NFC" Some of the buildbots have croaked with this patch, for examples failures that begin in this build: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/29933 This reverts commit `674f57870f`.	2020-09-30 09:52:12 +01:00
Vedant Kumar	674f57870f	[gardening] Replace some uses of setDebugLoc(DebugLoc()) with dropLocation(), NFC	2020-09-29 17:39:07 -07:00
Vedant Kumar	26ee8aff2b	[CodeExtractor] Don't create bitcasts when inserting lifetime markers (NFCI) Lifetime marker intrinsics support any pointer type, so CodeExtractor does not need to bitcast to `i8*` in order to use these markers.	2020-09-29 16:34:36 -07:00
Sanjay Patel	0527c8749b	[InstCombine] ease alignment restriction for converting masked load to normal load I think we initially made this fold conservative to be safer, but we do not need the alignment attribute/metadata limitation because the masked load intrinsic itself specifies the alignment. A normal vector load is better for IR transforms and should be no worse in codegen than the masked alternative. If it is worse for some target, the backend can reverse this transform. Differential Revision: https://reviews.llvm.org/D88505	2020-09-29 15:26:22 -04:00
Simon Pilgrim	0cf48a7065	[InstCombine] visitTrunc - trunc (shr (trunc A), C) --> trunc(shr A, C) Attempt to fold trunc (shr (trunc A), C) --> trunc(shr A, C) iff the shift amount if small enough that all zero/sign bits created by the shift are removed by the last trunc. Helps fix the regressions encountered in D88316. I've tweaked a couple of shift values as suggested by @lebedev.ri to ensure we have coverage of shift values close (above/below) to the max limit. Differential Revision: https://reviews.llvm.org/D88429	2020-09-29 18:27:42 +01:00
Juneyoung Lee	67aac915ba	[BuildLibCalls] Add noundef to the returned pointers of allocators and argument of free This patch adds noundef to the returned pointers of allocators (malloc, calloc, ...) and the pointer argument of free. The returned pointer of allocators cannot be poison or (partially) undef. Since the pointer that is given to free should precisely have zero offset, it cannot be poison or (partially) undef too. For the size arguments of allocators, noundef wasn't attached simply because I wasn't sure whether attaching it is okay or not. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D87984	2020-09-30 02:13:48 +09:00
Simon Pilgrim	b610d73b3f	[InstCombine] visitTrunc - remove dead trunc(lshr (zext A), C) combine. NFCI. I added additional test coverage at rG7a55989dc4305 - but all are handled independently of this combine and http://lab.llvm.org:8080/coverage/coverage-reports/ indicates the code is never used. Differential revision: https://reviews.llvm.org/D88492	2020-09-29 17:15:16 +01:00
Simon Pilgrim	89a8a0c910	[InstCombine] Inherit exact flags on extended shifts in trunc (lshr (sext A), C) --> (ashr A, C) This was missed in D88475	2020-09-29 15:32:09 +01:00
Simon Pilgrim	14ff38e235	[InstCombine] visitTrunc - trunc (lshr (sext A), C) --> (ashr A, C) non-uniform support This came from @lebedev.ri's suggestion to use m_SpecificInt_ICMP for D88429 - since I was going to change the m_APInt to m_Constant for that patch I thought I would do it for the only other user of the APInt first. I've added a ConstantExpr::getUMin helper - its trivial to add UMAX/SMIN/SMAX but thought I'd wait until we have use cases. Differential Revision: https://reviews.llvm.org/D88475	2020-09-29 15:01:16 +01:00
Florian Hahn	7bae2bc5a8	[LoopUtils] Only verify SE in builds with assertions. Follow up to `60b852092c`.	2020-09-29 13:39:23 +01:00
Daniel Kiss	c5a4900e1a	[AArch64] Add BTI to CFI jumptables. With branch protection the jump to the jump table entries requires a landing pad. Reviewed By: eugenis, tamas.petz Differential Revision: https://reviews.llvm.org/D81251	2020-09-29 13:50:23 +02:00
David Stenberg	e6f332ef1e	[IndVarSimplify] Fix Modified status for removal of overflow intrinsics When removing an overflow intrinsic the Changed status in SimplifyIndvar was not set, leading to the IndVarSimplify pass returning an incorrect status. This was caught using the check introduced by D80916. As pointed out in the code review, a similar bug may exist for eliminateTrunc(). Reviewed By: reames Differential Revision: https://reviews.llvm.org/D85971	2020-09-29 13:20:59 +02:00
Vitaly Buka	4aa6abe4ef	[msan] Fix llvm.abs.v intrinsic The last argument of the intrinsic is a boolean flag to control INT_MIN handling and does not affect msan metadata.	2020-09-29 03:52:27 -07:00
sstefan1	cb9cfa0d2f	[OpenMPOpt][Fix] Only initialize ICV initial values once. Reviewers: jdoerfert, ggeorgakoudis Differential Revision: https://reviews.llvm.org/D88441	2020-09-29 12:22:58 +02:00
Florian Hahn	60b852092c	[LoopDeletion] Forget loop before setting values to undef After D71539, we need to forget the loop before setting the incoming values of phi nodes in exit blocks, because we are looking through those phi nodes now and the SCEV expression could depend on the loop phi. If we update the phi nodes before forgetting the loop, we miss those users during invalidation. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D88167	2020-09-29 10:38:44 +01:00
Florian Hahn	b76df593eb	Revert "Recommit "[SCCP] Do not replace deref'able ptr with un-deref'able one."" Looks like there is still another remaining issue: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-msan/builds/22273/steps/build%20libcxx%2Fmsan/logs/stdio This reverts commit `86a20d9e34`.	2020-09-29 09:18:19 +01:00
Florian Hahn	86a20d9e34	Recommit "[SCCP] Do not replace deref'able ptr with un-deref'able one." This version includes an small fix allowing function pointers to be unconditionally replaced for now. This reverts commit `4c5e4aa89b`.	2020-09-29 09:10:27 +01:00
Max Kazantsev	e862e78b63	[NFC] Use assert instead of checking the guaranteed condition From preconditions it is known that either A dominates B or B dominates A. If A does not dominate B, we do not really need to check it. Assert should be enough. Should save some compile time.	2020-09-29 11:38:45 +07:00
Max Kazantsev	d266fd960e	[IndVars] Remove exiting conditions that are trivially true/false When removing exiting loop conditions, we only consider checks for which we know the exact exit count. We could also eliminate checks for which the condition is always true/false. Differential Revision: https://reviews.llvm.org/D87344 Reviewed By: lebedev.ri, reames	2020-09-29 11:35:32 +07:00
Philip Reames	e46d74b589	[CVP] Allow two transforms in one invocation For a call site which had both constant deopt operands and nonnull arguments, we were missing the opportunity to recognize the later by bailing early. This is somewhat of a speculative fix. Months ago, I'd had a private report of performance and compile time regressions from the deopt operand folding. I never received a test case. However, the only possibility I see was that after that change CVP missed the nonnull fold, and we end up with a pass ordering/missed simplification issue. So, since it's a real issue, fix it and hope.	2020-09-28 15:11:42 -07:00
Dominic Chen	06e68f05da	[AddressSanitizer] Copy type metadata to prevent miscompilation When ASan and e.g. Dead Virtual Function Elimination are enabled, the latter will rely on type metadata to determine if certain virtual calls can be removed. However, ASan currently does not copy type metadata, which can cause virtual function calls to be incorrectly removed. Differential Revision: https://reviews.llvm.org/D88368	2020-09-28 13:56:05 -04:00
Simon Pilgrim	63ee42a06b	[InstCombine] matchRotate - force splat of uniform constant rotation amounts (PR46895) Fixes minor bug in D88402 where we were using the original shift constant (with undefs) instead of one with the splat values (re)splatted to all elements.	2020-09-28 15:12:41 +01:00
Simon Pilgrim	dabb14cadd	[InstCombine] matchRotate - allow undef in uniform constant rotation amounts (PR46895) An extension to D87452, we can safely permit undefs in the uniform/splat detection https://alive2.llvm.org/ce/z/nT-ptN Differential Revision: https://reviews.llvm.org/D88402	2020-09-28 13:36:13 +01:00
Benjamin Kramer	7e5a356d2b	[Coroutines] Remove unused includes. NFC.	2020-09-28 10:27:23 +02:00
Chuanqi Xu	b3a722e66b	[Coroutines] Reuse storage for local variables with non-overlapping lifetimes bug 45566 shows the process of building coroutine frame won't consider that the lifetimes of different local variables are not overlapped, which means the compiler could generates smaller frame. This patch calculate the lifetime range of each alloca by StackLifetime class. Then the patch build non-overlapped sets for allocas whose lifetime ranges are not overlapped. We use the largest type in a non-overlapped set as the field type in the frame. In insertSpills process, if we find the type of field is not the same with the alloca, we cast the pointer to the field type to the pointer to the alloca type. Since the lifetime range of alloca in one non-overlapped set is not overlapped with each other, it should be ok to reuse the storage space in the frame. Test plan: check-llvm, check-clang, cppcoro, folly Reviewers: junparser, lxfind, modocache Differential Revision: https://reviews.llvm.org/D87596	2020-09-28 15:48:00 +08:00
Dávid Bolvanský	155ac33394	[BuildLibCalls] Add noalias for strcat and stpcpy strcat: destination and source shall not overlap. (http://www.cplusplus.com/reference/cstring/strcat/) stpcpy: The strings may not overlap, and the destination string dest must be large enough to receive the copy. (https://man7.org/linux/man-pages/man3/stpcpy.3.html) Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D88335	2020-09-27 21:37:09 +02:00
Nikita Popov	fe79061be2	[LVI][CVP] Use block value when simplifying icmps Add a flag to getPredicateAt() that allows making use of the block value. This allows us to take into account range information from the current block, rather than only information that is threaded over edges, making the icmp simplification in CVP a lot more powerful. I'm not changing getPredicateAt() to use the block value unconditionally to avoid any impact on the JumpThreading pass, which is somewhat picky about LVI query order. Most test changes here are just icmps that now get dropped (while previously only a result used in a return was replaced). The three tests in icmp.ll show some representative improvements. Some of the folds this enables have been covered by IPSCCP in the meantime, but LVI can reason about some cases which are hard to support in IPSCCP, such as in test_br_cmp_with_offset. The compile-time time cost of doing this is fairly minimal, with a ~0.05% CTMark regression for ReleaseThinLTO: https://llvm-compile-time-tracker.com/compare.php?from=709d03f8af4da4204849a70f01798e7cebba2e32&to=6236fd503761f43c99f4537121e057a01056f185&stat=instructions This is because the block values will typically already be queried and cached by other CVP optimizations anyway. Differential Revision: https://reviews.llvm.org/D69686	2020-09-27 20:25:16 +02:00
Fangrui Song	50bd71e1d7	[NewPM] Port ConstraintElimination to the new pass manager If -enable-constraint-elimination is specified, add it to the -O2/-O3 pipeline. (-O1 uses a separate function now.) Reviewed By: fhahn, aeubanks Differential Revision: https://reviews.llvm.org/D88365	2020-09-27 11:12:26 -07:00
Benjamin Kramer	7b782062b4	[InstCombine] Simplify code. NFCI.	2020-09-27 19:11:07 +02:00
Nikita Popov	9b959b59df	[LVI] Require context instruction in external API (NFCI) Require CxtI in getConstant() and getConstantRange() APIs. Accordingly drop the BB parameter, as it is implied by CxtI->getParent(). This makes sure we don't forget to pass the context instruction, and makes the API contract clearer (also clean up the comments to that effect -- the value holds at the context instruction, not the end of the block).	2020-09-27 18:07:24 +02:00
Nikita Popov	c8abf1c12d	[CVP] Pass context instruction when narrowing div/rem This fold was the only place not passing the context instruction. The tests worked around that fact by introducing a basic block split, which is now no longer necessary.	2020-09-27 17:51:30 +02:00
Fangrui Song	82420b4e49	[DivRemPairs] Use DenseMapBase::find instead of operator[]. NFC	2020-09-27 01:13:14 -07:00
Fangrui Song	400bdbc422	[ConstraintElimination] Internalize function/class and delete an implied condition. NFC Delete an implied condition (E.NumIn <= CB.NumIn)	2020-09-26 15:04:39 -07:00
Florian Hahn	915310bf14	Revert "[DSE] Switch to MemorySSA-backed DSE by default." There appears to be a mis-compile with MemorySSA-backed DSE in combination with llvm.lifetime.end. It currently appears like DSE is doing the right thing and the llvm.lifetime.end markers are incorrect. The reverted patch uncovers the mis-compile. This patch temporarily switches back to the legacy DSE implementation, while we investigate. This reverts commit `9d172c8e9c`.	2020-09-26 18:35:27 +01:00
Florian Hahn	8f0466edc0	[DSE] Unify & fix mem terminator location checks. When looking for memory defs killed by memory terminators the code currently incorrectly ignores the size argument of llvm.lifetime.end. This patch updates the code to use isMemTerminator and updates isMemTerminator to use isOverwrite() to make sure locations that are outside the range marked as dead by llvm.lifetime.end are not considered. Note that isOverwrite is only used for llvm.lifetime.end, because free-like functions make the whole underlying object dead.	2020-09-26 13:47:50 +01:00
Tyker	8d5b289a46	[LoopDelete][Assume] Allow deleting loops with assumes This pervent very poor optimization caused by a signle assume like https://godbolt.org/z/EK3oMh baseline flags: -O3 patched flags: -O3 -mllvm --enable-knowledge-retention Before the patch ``` Metric: compile_time Program baseline patched diff test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test 20.72 29.74 43.5% test-suite :: CTMark/Bullet/bullet.test 24.39 24.91 2.2% test-suite :: CTMark/7zip/7zip-benchmark.test 37.39 38.06 1.8% test-suite :: CTMark/kimwitu++/kc.test 11.76 11.94 1.5% test-suite :: CTMark/sqlite3/sqlite3.test 12.94 12.91 -0.3% test-suite :: CTMark/SPASS/SPASS.test 11.72 11.70 -0.2% test-suite :: CTMark/lencod/lencod.test 16.12 16.10 -0.1% test-suite :: CTMark/ClamAV/clamscan.test 13.31 13.30 -0.1% test-suite :: CTMark/mafft/pairlocalalign.test 9.12 9.12 -0.1% test-suite :: CTMark/consumer-typeset/consumer-typeset.test 9.34 9.34 -0.1% Geomean difference 4.2% Metric: compiler_Kinst_count Program baseline patched diff test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test 107576069.87 172886418.90 60.7% test-suite :: CTMark/Bullet/bullet.test 123291865.66 125457117.96 1.8% test-suite :: CTMark/kimwitu++/kc.test 56347884.64 57298544.14 1.7% test-suite :: CTMark/7zip/7zip-benchmark.test 180637699.58 183341656.57 1.5% test-suite :: CTMark/sqlite3/sqlite3.test 66723788.85 66664692.80 -0.1% test-suite :: CTMark/ClamAV/clamscan.test 69581500.56 69597863.92 0.0% test-suite :: CTMark/lencod/lencod.test 94236501.48 94216545.32 -0.0% test-suite :: CTMark/SPASS/SPASS.test 58516756.95 58505089.07 -0.0% test-suite :: CTMark/consumer-typeset/consumer-typeset.test 48832815.53 48841989.39 0.0% test-suite :: CTMark/mafft/pairlocalalign.test 49682720.53 49686324.34 0.0% Geomean difference 5.4% ``` After the patch ``` Metric: compile_time Program baseline patched diff test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test 20.70 22.40 8.2% test-suite :: CTMark/7zip/7zip-benchmark.test 37.13 38.05 2.5% test-suite :: CTMark/Bullet/bullet.test 24.25 24.83 2.4% test-suite :: CTMark/kimwitu++/kc.test 11.69 11.94 2.2% test-suite :: CTMark/ClamAV/clamscan.test 13.19 13.36 1.3% test-suite :: CTMark/lencod/lencod.test 16.02 16.19 1.1% test-suite :: CTMark/consumer-typeset/consumer-typeset.test 9.29 9.36 0.7% test-suite :: CTMark/SPASS/SPASS.test 11.64 11.73 0.7% test-suite :: CTMark/mafft/pairlocalalign.test 9.10 9.15 0.5% test-suite :: CTMark/sqlite3/sqlite3.test 12.95 12.96 0.0% Geomean difference 1.9% Metric: compiler_Kinst_count Program baseline patched diff test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test 107590933.61 114044834.72 6.0% test-suite :: CTMark/kimwitu++/kc.test 56344526.77 57235806.29 1.6% test-suite :: CTMark/Bullet/bullet.test 123291285.10 125128334.97 1.5% test-suite :: CTMark/7zip/7zip-benchmark.test 180641540.10 183155706.39 1.4% test-suite :: CTMark/sqlite3/sqlite3.test 66725619.22 66668713.92 -0.1% test-suite :: CTMark/SPASS/SPASS.test 58509029.85 58478704.75 -0.1% test-suite :: CTMark/consumer-typeset/consumer-typeset.test 48843711.23 48826894.68 -0.0% test-suite :: CTMark/lencod/lencod.test 94233305.79 94207544.23 -0.0% test-suite :: CTMark/ClamAV/clamscan.test 69587887.66 69603549.90 0.0% test-suite :: CTMark/mafft/pairlocalalign.test 49686968.65 49689291.04 0.0% Geomean difference 1.0% ``` Reviewed By: jdoerfert, efriedma Differential Revision: https://reviews.llvm.org/D86816	2020-09-26 12:32:44 +02:00
Arthur Eubanks	83e3ea2cfc	[LowerTypeTests][NewPM] Add constructor that uses command line flags This matches the legacy PM pass by having one constructor use command line flags, and the other use parameters to the pass. This fixes all tests under Transforms/LowerTypeTests using NPM. Reviewed By: ychen, pcc Differential Revision: https://reviews.llvm.org/D87845	2020-09-25 17:39:59 -07:00
Layton Kifer	48961ba0de	[TRE][NFC] Refactor Basic Block Processing Simplify and improve readability. Differential Revision: https://reviews.llvm.org/D82269	2020-09-25 16:01:05 -07:00
Simon Pilgrim	9ff9c1d8ee	[InstCombine] matchRotate - support (uniform) constant rotation amounts (PR46895) This patch adds handling of rotation patterns with constant shift amounts - the next bit will be how we want to support non-uniform constant vectors. Differential Revision: https://reviews.llvm.org/D87452	2020-09-25 22:03:10 +01:00
Simon Pilgrim	2a0ca17f66	[InstCombine] collectBitParts - add fshl/fshr handling Pulled from D87452, this is a fixed version of the collectBitParts fshl/fshr handling which as @nikic noticed wasn't checking for different providers or had correct bit ordering (which was hid by only testing shift amounts of bitwidth/2). Differential Revision: https://reviews.llvm.org/D88292	2020-09-25 20:34:59 +01:00
Arthur Eubanks	d3f6972abb	[LoopReroll][NewPM] Port -loop-reroll to NPM Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87957	2020-09-25 12:09:06 -07:00
Daniel Paoliello	d2166076b8	[Coroutine] Split PHI Nodes in `cleanuppad` blocks in a way that obeys EH pad rules Issue Details: In order to support coroutine splitting, any multi-value PHI node in a coroutine is split into multiple blocks with single-value PHI Nodes, which then allows a subsequent transform to generate `reload` instructions as required (i.e., to reload the value if required if the coroutine has been resumed). This causes issues with EH pads (`catchswitch` and `catchpad`) as all pads within a `catchswitch` must have the same unwind destination, but the coroutine splitting logic may modify them to each have a unique unwind destination if there is a PHI node in the unwind `cleanuppad` that is set from values in the `catchswitch` and `cleanuppad` blocks. Fix Details: During splitting, if such a PHI node is detected, then create a "dispatcher" `cleanuppad` as well as the blocks with single-value PHI Nodes: thus the "dispatcher" is the unwind destination and it will detect which predecessor called it and then branch to the appropriate single-value PHI node block, which will then branch back to the original `cleanuppad` block. Reviewed By: GorNishanov, lxfind Differential Revision: https://reviews.llvm.org/D88059	2020-09-25 11:30:38 -07:00
Joseph Huber	a22814194e	[OpenMP] OpenMPOpt Support for Globalization Remarks Summary: This patch add support for printing analysis messages relating to data globalization on the GPU. This occurs when data is shared between the threads in a GPU context and must be pushed to global or shared memory. Reviewers: jdoerfert Subscribers: guansong hiraditya llvm-commits ormris sstefan1 yaxunl Tags: #OpenMP #LLVM Differential Revision: https://reviews.llvm.org/D88243	2020-09-24 18:23:12 -04:00
Vedant Kumar	dfc5a9eb57	[Instruction] Add dropLocation and updateLocationAfterHoist helpers Introduce a helper which can be used to update the debug location of an Instruction after the instruction is hoisted. This can be used to safely drop a source location as recommended by the docs. For more context, see the discussion in https://reviews.llvm.org/D60913. Differential Revision: https://reviews.llvm.org/D85670	2020-09-24 15:00:04 -07:00
Sanjay Patel	0a349d5827	[SLP] clean up - use 'const' and ArrayRef constructor; NFC Follow-on tidying suggested in the post-commit review of `6a23668`.	2020-09-24 15:31:07 -04:00
Craig Topper	03f22b08e2	[SLP] Remove LHS and RHS from OperationData. These were only really used for 2 things. One was to check if the operand matches the phi if it exists. The other was for the createOp method to build the reduction. For the first case we still have the operation we just need to know how to index its operands. So I've modified getLHS/getRHS to just use the opcode/kind to know how to find the right operands on an instruction that is now passed in. For the other case we had to create an OperationData object to set the LHS/RHS values and copy the opcode/kind from another object. We would then just call createOp on that temporary object. Instead I've made LHS/RHS arguments to createOp and removed all these temporary objects. Differential Revision: https://reviews.llvm.org/D88193	2020-09-24 10:57:11 -07:00
Simon Pilgrim	81a408808f	[Scalar] ConstantHoistingPass - iterate with const references. NFCI. Fix some clang-tidy warnings.	2020-09-24 18:40:50 +01:00
Matt Arsenault	d65a7003c4	OpaquePtr: Add helpers for sret to mirror byval Sret should really have a type parameter like byval does.	2020-09-24 09:57:28 -04:00
Zequan Wu	f5435399e8	[CGProfile] don't emit cgprofile entry if called function is dllimport Differential Revision: https://reviews.llvm.org/D88127	2020-09-23 16:56:54 -07:00
Arthur Eubanks	6b1ce83a12	[NewPM][CGSCC] Handle newly added functions in updateCGAndAnalysisManagerForPass This seems to fit the CGSCC updates model better than calling addNewFunctionInto{Ref,}SCC() on newly created/outlined functions. Now addNewFunctionInto{Ref,}SCC() are no longer necessary. However, this doesn't work on newly outlined functions that aren't referenced by the original function. e.g. if a() was outlined into b() and c(), but c() is only referenced by b() and not by a(), this will trigger an assert. This also fixes an issue I was seeing with newly created functions not having passes run on them. Ran check-llvm with expensive checks. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87798	2020-09-23 15:22:18 -07:00
Craig Topper	7a3c643c35	[SLP] Make HorizontalReduction::getOperationData take an Instruction* instead of a Value. NFCI All of the callers already have an Instruction . Many of them from a dyn_cast. Also update the OperationData constructor to use a Instruction& to remove a dyn_cast and make it clear that the pointer is non-null. Differential Revision: https://reviews.llvm.org/D88132	2020-09-23 10:51:03 -07:00
Krzysztof Parzyszek	76e8c1899e	Break long line accidentally left in the previous commit	2020-09-23 12:24:45 -05:00
Krzysztof Parzyszek	e976fb1e54	[EarlyCSE] Fix crash with expensive checks after D87691 D87691 reordered some checks, which turned out to be unsafe. More specifically, when examining a store instruction, the check against getOrCreateResult should be done before attempting to call isSameMemGeneration. Otherwise a crash in MSSA walker can occur. This patch restores the order of these calls to what it was originally.	2020-09-23 12:21:34 -05:00
Simon Pilgrim	91589cf679	Add missing namespace closure comments. NFCI. Fixes some clang-tidy llvm-namespace-comment warnings.	2020-09-23 16:19:25 +01:00
Simon Pilgrim	474dc33d07	Add missing namespace closure comment. NFCI. Fixes clang-tidy llvm-namespace-comment warning.	2020-09-23 16:19:25 +01:00
Florian Hahn	31923f6b36	[VPlan] Disconnect VPValue and VPUser. This refactors VPuser to not inherit from VPValue to facilitate introducing operations that introduce multiple VPValues (e.g. VPInterleaveRecipe). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D84679	2020-09-23 14:44:31 +01:00
David Sherwood	59c4d5aad0	[SVE] Fix InstCombinerImpl::PromoteCastOfAllocation for scalable vectors In this patch I've fixed some warnings that arose from the implicit cast of TypeSize -> uint64_t. I tried writing a variety of different cases to show how this optimisation might work for scalable vectors and found: 1. The optimisation does not work for cases where the cast type is scalable and the allocated type is not. This because we need to know how many times the cast type fits into the allocated type. 2. If we pass all the various checks for the case when the allocated type is scalable and the cast type is not, then when creating the new alloca we have to take vscale into account. This leads to sub-optimal IR that is worse than the original IR. 3. For the remaining case when both the alloca and cast types are scalable it is hard to find examples where the optimisation would kick in, except for simple bitcasts, because we typically fail the ABI alignment checks. For now I've changed the code to bail out if only one of the alloca and cast types is scalable. This means we continue to support the existing cases where both types are fixed, and also the specific case when both types are scalable with the same size and alignment, for example a simple bitcast of an alloca to another type. I've added tests that show we don't attempt to promote the alloca, except for simple bitcasts: Transforms/InstCombine/AArch64/sve-cast-of-alloc.ll Differential revision: https://reviews.llvm.org/D87378	2020-09-23 08:43:05 +01:00
Martin Storsjö	b90132399a	[CVP] Remove a redundant trailing semicolon, fixing GCC warnings. NFC.	2020-09-23 09:03:01 +03:00
Martin Storsjö	2c4c659666	[InstCombine] Add parentheses in assert to silence GCC warning. NFC.	2020-09-23 09:03:01 +03:00
Hubert Tong	32c9991dab	[InstCombine] Fix errno bug in pow expansion to sqrt A conversion from `pow` to `sqrt` shall not call an `errno`-setting `sqrt` with -//infinity//: the `sqrt` will set `EDOM` where the `pow` call need not. This patch avoids the erroneous (pun not intended) transformation by applying the restrictions discussed in the thread for https://lists.llvm.org/pipermail/llvm-dev/2020-September/145051.html. The existing tests are updated (depending on emphasis in the checks for library calls, avoidance of overlap, and overall coverage): - to add `ninf`, retaining the intended library call, - to use the intrinsic, retaining the use of `select`, or - to expect the replacement to not occur. The following is tested: - The pow intrinsic folds to a `select` instruction to handle -//infinity//. - The pow library call folds, with `ninf`, to `sqrt` without the `select` instruction associated with handling -//infinity//. - The pow library call does not fold to `sqrt` without `ninf`. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D87877	2020-09-22 18:58:05 -04:00
Alexey Bataev	d6ac649ccd	[SLP]Fix coding style, NFC.	2020-09-22 17:44:29 -04:00
Stefanos Baziotis	89c1e35f3c	[LoopInfo] empty() -> isInnermost(), add isOutermost() Differential Revision: https://reviews.llvm.org/D82895	2020-09-22 23:28:51 +03:00
Roman Lebedev	b289dc5306	[CVP] Narrow SDiv/SRem to the smallest power-of-2 that's sufficient to contain its operands This is practically identical to what we already do for UDiv/URem: https://rise4fun.com/Alive/04K Name: narrow udiv Pre: C0 u<= 255 && C1 u<= 255 %r = udiv i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = udiv i8 %t0, %t1 %r = zext i8 %t2 to i16 Name: narrow exact udiv Pre: C0 u<= 255 && C1 u<= 255 %r = udiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = udiv exact i8 %t0, %t1 %r = zext i8 %t2 to i16 Name: narrow urem Pre: C0 u<= 255 && C1 u<= 255 %r = urem i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = urem i8 %t0, %t1 %r = zext i8 %t2 to i16 ... only here we need to look for 'min signed bits', not 'active bits', and there's an UB to be aware of: https://rise4fun.com/Alive/KG86 https://rise4fun.com/Alive/LwR Name: narrow sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = sdiv i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = sdiv i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow exact sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = sdiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = sdiv exact i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow srem Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = srem i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = srem i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = sdiv i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = sdiv i8 %t0, %t1 %r = sext i8 %t2 to i16 Name: narrow exact sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = sdiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = sdiv exact i8 %t0, %t1 %r = sext i8 %t2 to i16 Name: narrow srem Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = srem i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = srem i8 %t0, %t1 %r = sext i8 %t2 to i16 The ConstantRangeTest.losslessSignedTruncationSignext test sanity-checks the logic, that we can losslessly truncate ConstantRange to `getMinSignedBits()` and signext it back, and it will be identical to the original CR. On vanilla llvm test-suite + RawSpeed, this fires 1262 times, while the same fold for UDiv/URem only fires 384 times. Sic! Additionally, this causes +606.18% (+1079) extra cases of aggressive-instcombine.NumDAGsReduced, and +473.14% (+1145) of aggressive-instcombine.NumInstrsReduced folds.	2020-09-22 21:37:30 +03:00
Roman Lebedev	4977eadee5	[NFC][CVP] Give a better name STATISTIC() counting udiv i16 -> udiv i8 xforms	2020-09-22 21:37:30 +03:00
Roman Lebedev	ba5afe5588	[NFC][CVP] processUDivOrURem(): refactor to use ConstantRange::getActiveBits() As an exhaustive test shows, this logic is fully identical to the old implementation, with exception of the case where both of the operands had empty ranges: ``` TEST_F(ConstantRangeTest, CVP_UDiv) { unsigned Bits = 4; EnumerateConstantRanges(Bits, [&](const ConstantRange &CR0) { if(CR0.isEmptySet()) return; EnumerateConstantRanges(Bits, [&](const ConstantRange &CR1) { if(CR0.isEmptySet()) return; unsigned MaxActiveBits = 0; for (const ConstantRange &CR : {CR0, CR1}) MaxActiveBits = std::max(MaxActiveBits, CR.getActiveBits()); ConstantRange OperandRange(Bits, /isFullSet=/false); for (const ConstantRange &CR : {CR0, CR1}) OperandRange = OperandRange.unionWith(CR); unsigned NewWidth = OperandRange.getUnsignedMax().getActiveBits(); EXPECT_EQ(MaxActiveBits, NewWidth) << CR0 << " " << CR1; }); }); } ```	2020-09-22 21:37:29 +03:00
Roman Lebedev	4eeeb356fc	[CVP] Enhance SRem -> URem fold to work not just on non-negative operands This is a continuation of `8d487668d0`, the logic is pretty much identical for SRem: Name: pos pos Pre: C0 >= 0 && C1 >= 0 %r = srem i8 C0, C1 => %r = urem i8 C0, C1 Name: pos neg Pre: C0 >= 0 && C1 <= 0 %r = srem i8 C0, C1 => %r = urem i8 C0, -C1 Name: neg pos Pre: C0 <= 0 && C1 >= 0 %r = srem i8 C0, C1 => %t0 = urem i8 -C0, C1 %r = sub i8 0, %t0 Name: neg neg Pre: C0 <= 0 && C1 <= 0 %r = srem i8 C0, C1 => %t0 = urem i8 -C0, -C1 %r = sub i8 0, %t0 https://rise4fun.com/Alive/Vd6 Now, this new logic does not result in any new catches as of vanilla llvm test-suite + RawSpeed. but it should be virtually compile-time free, and it may be important to be consistent in their handling, because if we had a pair of sdiv-srem, and only converted one of them, -divrempairs will no longer see them as a pair, and thus not "merge" them.	2020-09-22 21:37:28 +03:00
Hubert Tong	6801950192	[InstCombine] For pow(x, +/-0.5), stop falling into pow(x, 1.5), etc. case The current code for handling pow(x, y) where y is an integer plus 0.5 is not explicitly guarded against attempting to transform the case where abs(y) is exactly 0.5. The latter case is meant to be handled by `replacePowWithSqrt`. Indeed, if the pow(x, integer+0.5) case proceeds past a certain point, it will hit an assertion by attempting to form pow(x, 0) using `getPow`. This patch adds an explicit check to prevent attempting the pow(x, integer+0.5) transformation on pow(x, +/-0.5) as suggested during the review of D87877. This has the effect of retaining the shrinking of `pow` to `powf` when the `sqrt` libcall cannot be formed. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D88066	2020-09-22 14:23:32 -04:00
Hamilton Tobon Mosquera	bd31abc1d0	[OpenMPOpt] Refactored "issue" and "wait" declarations for data map runtime call. Refactored __tgt_target_data_begin_mapper_<issue\|wait> to receive the handle as an input/output argument. This given the compiler warning of returning the handle as copy. Differential Revision: https://reviews.llvm.org/D88029	2020-09-22 10:50:17 -05:00
Florian Hahn	c671e34bf2	[VPlan] Add dump() helper to VPValue & VPRecipeBase. This provides a convenient way to print VPValues and recipes in a debugger. In particular it saves the user from instantiating VPSlotTracker to print recipes or values.	2020-09-22 15:55:16 +01:00
Sanjay Patel	0c3bfbe4bc	[SLP] reduce code duplication for checking parent block; NFC	2020-09-22 09:21:20 -04:00
Sanjay Patel	bbd49a0266	[SLP] move misplaced code comments; NFC	2020-09-22 09:21:20 -04:00
Sanjay Patel	062276c691	[SLP] clean up code in gather(); NFC 1. Use range for-loop to avoid repeatedly accessing end index. 2. Better variable names.	2020-09-22 09:21:20 -04:00
Simon Pilgrim	d682a36ef9	[SLP] Merge null and dyn_cast<> checks into dyn_cast_or_null<>. NFCI.	2020-09-22 14:01:47 +01:00
Meera Nakrani	a3d0dce260	[ARM][TTI] Prevents constants in a min(max) or max(min) pattern from being hoisted when in a loop Changes TTI function getIntImmCostInst to take an additional Instruction parameter, which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT. We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated. Required minor changes in some non-ARM backends to allow for the optional parameter to be included. Differential Revision: https://reviews.llvm.org/D87457	2020-09-22 11:54:10 +00:00
Arthur Eubanks	3bf703fb6d	[AlwaysInliner] Emit optimization remarks To match the normal inliner in preparation for https://reviews.llvm.org/D86988. Also change a FIXME to an assert. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88067	2020-09-21 22:09:28 -07:00
Serguei Katkov	5502cfa091	[LoopUnswitch] Trivial simplification: remove trivial dead condition after unswitch Non trivial loop unswitch can keep the dead condition instruction. CL adds trivial dead code elimination for unused condition. Reviewers: asbirlea, aqjune, fhahn, DaniilSuchkov, reames Reviewed By: asbirlea Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D88014	2020-09-22 09:04:59 +07:00
Krzysztof Parzyszek	ae3f54c1e9	[EarlyCSE] Handle masked loads and stores Extend the handling of memory intrinsics to also include non- target-specific intrinsics, in particular masked loads and stores. Invent "isHandledNonTargetIntrinsic" to distinguish between intrin- sics that should be handled natively from intrinsics that can be passed to TTI. Add code that handles masked loads and stores and update the testcase to reflect the results. Differential Revision: https://reviews.llvm.org/D87340	2020-09-21 18:47:10 -05:00
Arthur Eubanks	1747f77764	[SimplifyCFG] Override options in default constructor SimplifyCFG's options should always be overridden by command line flags, but they mistakenly weren't in the default constructor. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D87718	2020-09-21 16:33:01 -07:00
Krzysztof Parzyszek	2c768c7d6c	[EarlyCSE] Small refactoring changes, NFC 1. Store intrinsic ID in ParseMemoryInst instead of a boolean flag "IsTargetMemInst". This will make it easier to add support for target-independent intrinsics. 2. Extract the complex multiline conditions from EarlyCSE::processNode into a new function "getMatchingValue". Differential Revision: https://reviews.llvm.org/D87691	2020-09-21 16:11:06 -05:00
Sanjay Patel	7451bf0b0b	[SLP] use std::distance/find to reduce code; NFC We were already using this code pattern right after the loop, so this makes it consistent.	2020-09-21 16:22:55 -04:00
Sanjay Patel	6bad3caeb0	[InstCombine] use unary shuffle creator to reduce code duplication; NFC	2020-09-21 15:34:24 -04:00
Sanjay Patel	be93505986	[LoopVectorize] use unary shuffle creator to reduce code duplication; NFC	2020-09-21 15:34:24 -04:00
Arthur Eubanks	f4f7df037e	[DIE] Remove DeadInstEliminationPass This pass is like DeadCodeEliminationPass, but only does one pass through a function instead of iterating on users of eliminated instructions. DeadCodeEliminationPass should be used in all cases. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87933	2020-09-21 12:12:25 -07:00
Arthur Eubanks	746a2c3775	[ObjCARC] Initialize return value Mistakenly removed initialization of `Changed` in https://reviews.llvm.org/D87806.	2020-09-21 11:03:44 -07:00
Sanjay Patel	a44238cb44	[SLP] use unary shuffle creator to reduce code duplication; NFC	2020-09-21 13:54:06 -04:00
Sanjay Patel	1e6b240d7d	[IRBuilder][VectorCombine] make and use a convenience function for unary shuffle; NFC This reduces code duplication for common construct. Follow-ups can use this in SLP, LoopVectorizer, and other passes.	2020-09-21 13:47:01 -04:00
Simon Pilgrim	005f826a05	[SLP] Use for-range loops across ValueLists. NFCI. Also rename some existing loops that used a 'j' iterator to consistently use 'V'.	2020-09-21 18:24:23 +01:00
Sanjay Patel	46075e0b78	[SLP] simplify interface for gather(); NFC The implementation of gather() should be reduced too, but this change by itself makes things a little clearer: we don't try to gather to a different type or number-of-values than whatever is passed in as the value list itself.	2020-09-21 12:57:28 -04:00
Arthur Eubanks	024979b7b6	[ObjCARC][NewPM] Port objc-arc-contract to NPM Similar to https://reviews.llvm.org/D86178. This is a module pass instead of a function pass since ARCRuntimeEntryPoints can lazily add function declarations. Reviewed By: ahatanak Differential Revision: https://reviews.llvm.org/D87806	2020-09-21 09:40:14 -07:00
Simon Pilgrim	3ddecfd220	SLPVectorizer.cpp - fix include ordering. NFCI.	2020-09-21 17:17:11 +01:00
Alexey Bataev	3ff07fcd54	[SLP] Allow reordering of vectorization trees with reused instructions. If some leaves have the same instructions to be vectorized, we may incorrectly evaluate the best order for the root node (it is built for the vector of instructions without repeated instructions and, thus, has less elements than the root node). In this case we just can not try to reorder the tree + we may calculate the wrong number of nodes that requre the same reordering. For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first leaf, it will be shrink to \<a, b\>. If instructions in this leaf should be reordered, the best order will be \<1, 0\>. We need to extend this order for the root node. For the root node this order should look like \<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes with the reused instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D45263	2020-09-21 10:51:03 -04:00
Florian Hahn	57ae9bb932	[LSR] Preserve MSSA when using SplitCriticalEdge. LSR claims to MemorySSA, but we also have to make sure it is preserved when splitting critical edges. This can be done by passing MSSAU to SplitCriticalEdge. Fixes PR47557.	2020-09-21 09:51:26 +01:00
Sanjay Patel	7903ae4720	[InstCombine] factorize left shifts of add/sub We do similar factorization folds in SimplifyUsingDistributiveLaws, but that drops no-wrap properties. Propagating those optimally may help solve: https://llvm.org/PR47430 The propagation is all-or-nothing for these patterns: when all 3 incoming ops have nsw or nuw, the 2 new ops should have the same no-wrap property: https://alive2.llvm.org/ce/z/Dv8wsU This also solves: https://llvm.org/PR47584	2020-09-20 12:55:24 -04:00
Sanjay Patel	cf75e83275	[InstCombine] replace zombie unreachable values with 'undef' before erasing The test (currently crashing) is reduced from the example provided in the post-commit discussion in D87149. Differential Revision: https://reviews.llvm.org/D87965	2020-09-20 12:25:08 -04:00
Fangrui Song	871d03a675	[FunctionAttrs] Inline setDoesNotRecurse() and delete it. NFC It always returns true, which may lead to confusion. Inline it because it is trivial and only called twice.	2020-09-19 22:24:52 -07:00
Fangrui Song	0526713aa8	[FunctionAttrs] Remove redundant check. NFC	2020-09-19 20:46:18 -07:00
Fangrui Song	6913812abc	Fix some clang-tidy bugprone-argument-comment issues	2020-09-19 20:41:25 -07:00
Nikita Popov	f4e5541809	[Local] Clean up enforceKnownAlignment() (NFC) I want to export this function, and the current API was a bit weird: It took an additional Alignment argument that didn't really have anything to do with what the function does. Drop it, and perform a max at the callsite. Also rename it to tryEnforceAlignment().	2020-09-19 22:29:40 +02:00
Florian Hahn	1d8f2e5292	[SCEVExpander] Support expanding nonintegral pointers with constant base. Currently SCEVExpander creates inttoptr for non-integral pointers if the base is a null constant for example. This results in invalid IR. This patch changes InsertNoopCastOfTo to emit a GEP & bitcast to convert to a non-integral pointer. First, a GEP of i8* null is generated and the integral value is used as index. The GEP is then bitcasted to the target type. This was exposed by D71539. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87827	2020-09-19 17:19:53 +01:00
Xun Li	11453740bc	[ASAN] Properly deal with musttail calls in ASAN When address sanitizing a function, stack unpinsoning code is inserted before each ret instruction. However if the ret instruciton is preceded by a musttail call, such transformation broke the musttail call contract and generates invalid IR. This patch fixes the issue by moving the insertion point prior to the musttail call if there is one. Differential Revision: https://reviews.llvm.org/D87777	2020-09-18 23:10:34 -07:00
Fangrui Song	76eec6c95b	[SCEV] Fix an unused variable in -DLLVM_ENABLE_ASSERTIONS=off build	2020-09-18 16:19:05 -07:00
Eric Christopher	ecfd8161bf	Temporarily Revert "[SLP] Allow reordering of vectorization trees with reused instructions." as it's infinite looping on occasion. This reverts commit `455ca0ebb6`.	2020-09-18 12:50:04 -07:00
Huihui Zhang	9ad6049736	[InstCombine][SVE] Skip scalable type for InstCombiner::getFlippedStrictnessPredicateAndConstant. We cannot iterate on scalable vector, the number of elements is unknown at compile-time. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87918	2020-09-18 11:26:36 -07:00
Alexey Bataev	455ca0ebb6	[SLP] Allow reordering of vectorization trees with reused instructions. If some leaves have the same instructions to be vectorized, we may incorrectly evaluate the best order for the root node (it is built for the vector of instructions without repeated instructions and, thus, has less elements than the root node). In this case we just can not try to reorder the tree + we may calculate the wrong number of nodes that requre the same reordering. For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first leaf, it will be shrink to \<a, b\>. If instructions in this leaf should be reordered, the best order will be \<1, 0\>. We need to extend this order for the root node. For the root node this order should look like \<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes with the reused instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D45263	2020-09-18 09:34:59 -04:00
Florian Hahn	9d172c8e9c	Recommit "[DSE] Switch to MemorySSA-backed DSE by default." This switches to using DSE + MemorySSA by default again, after fixing the issues reported after the first commit. Notable fixes `fc82006331`, `a0017c2bc2`. This reverts commit `3a59628f3c`.	2020-09-18 11:05:00 +01:00
Nikita Popov	13e19d2e7c	Revert "[InstCombine] Canonicalize SPF_ABS to abs intrinc" This reverts commit `05d4c4ebc2`. mstorsjo reports a miscompile after this change in https://reviews.llvm.org/D87188#2281093. Reverting until I can investigate this.	2020-09-18 09:38:26 +02:00
Nikita Popov	05d4c4ebc2	[InstCombine] Canonicalize SPF_ABS to abs intrinc Enable canonicalization of SPF_ABS and SPF_NABS to the abs intrinsic. To be conservative, the one-use check on the comparison is retained, this may be relaxed if all goes well. It's pretty likely that this will uncover places that missing handling for the abs() intrinsic. Please report any seen performance regressions. Differential Revision: https://reviews.llvm.org/D87188	2020-09-17 22:28:34 +02:00
Whitney Tsang	1cee33e9db	[LoopUnrollAndJam] Allow unroll and jam loops forced by user. Summary: Allow unroll and jam loops forced by user. LoopUnrollAndJamPass is still disabled by default in the NPM pipeline, and can be controlled by -enable-npm-unroll-and-jam. Reviewed By: Meinersbur, dmgreen Differential Revision: https://reviews.llvm.org/D87786	2020-09-17 19:40:14 +00:00
Nikita Popov	91ce8e121b	[GVN] Use that assume(!X) implies X==false (PR47496) We already use that assume(X) implies X==true, do the same for assume(!X) implying X==false. This fixes PR47496.	2020-09-17 21:34:44 +02:00
Sanjay Patel	48a23bccf3	[VectorCombine] limit load+insert transform to one-use As discussed in: https://llvm.org/PR47558 ...there are several potential fixes/follow-ups visible in the test case, but this is the quickest and safest fix of the perf regression.	2020-09-17 14:29:15 -04:00
Sanjay Patel	ddd9575d15	[VectorCombine] rearrange bailouts for load insert for efficiency; NFC	2020-09-17 13:50:37 -04:00
Xun Li	5b533d6cde	[Coroutine] Fix a bug where Coroutine incorrectly spills phi and invoke defs before CoroBegin When a spill definition is before CoroBegin, we cannot spill it to the frame immediately after the definition. We have to spill it after the frame is ready. The current implementation handles it properly for any other kinds of instructions except for PhINode and InvokeInst, which could also be defined before CoroBegin. This patch fixes it by moving the CoroBegin dominance check earlier, so that it covers all cases. Added a test. Differential Revision: https://reviews.llvm.org/D87810	2020-09-17 08:13:07 -07:00
Sanjay Patel	03783f19dc	[SLP] sort candidates to increase chance of optimal compare reduction This is one (small) part of improving PR41312: https://llvm.org/PR41312 As shown there and in the smaller tests here, if we have some member of the reduction values that does not match the others, we want to push it to the end (bring the matching members forward and together). In the regression tests, we have 5 candidates for the 4 slots of the reduction. If the one "wrong" compare is grouped with the others, it prevents forming the ideal v4i1 compare reduction. Differential Revision: https://reviews.llvm.org/D87772	2020-09-17 08:49:27 -04:00
Roman Lebedev	aadf55d1ce	[NFC] EliminateDuplicatePHINodes(): small-size optimization: if there are <= 32 PHI's, O(n^2) algo is faster (geomean -0.08%) This is functionally equivalent to the old implementation. As per https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=4739e6e4eb54d3736e6457249c0919b30f6c855a&stat=instructions this is a clear geomean compile-time regression-free win with overall geomean of `-0.08%` 32 PHI's appears to be the sweet spot; both the 16 and 64 performed worse: https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=c4efe1fbbfdf0305ac26cd19eacb0c7774cdf60e&stat=instructions https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=e4989d1c67010d3339d1a40ff5286a31f10cfe82&stat=instructions If we have more PHI's than that, we fall-back to the original DenseSet-based implementation, so the not-so-fast cases will still be handled. However compile-time isn't the main motivation here. I can name at least 3 limitations of this CSE: 1. Assumes that all PHI nodes have incoming basic blocks in the same order (can be fixed while keeping the DenseMap) 2. Does not special-handle `undef` incoming values (i don't see how we can do this with hashing) 3. Does not special-handle backedge incoming values (maybe can be fixed by hashing backedge as some magical value) Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87408	2020-09-17 11:29:03 +03:00
Michael Liao	4e4c89b22c	[EarlyCSE] Simplify max/min pattern matching. NFC.	2020-09-16 18:34:46 -04:00
Nikita Popov	222bf3ffbc	Reapply [InstCombine] Simplify select operand based on equality condition Reapply after fixing SimplifyWithOpReplaced() to never return the original value, which would lead to an infinite loop in this transform. ----- For selects of the type X == Y ? A : B, check if we can simplify A by using the X == Y equality and replace the operand if that's possible. We already try to do this in InstSimplify, but will only fold if the result of the simplification is the same as B, in which case the select can be dropped entirely. Here the select will be retained, just one operand simplified. As we are performing an actual replacement here, we don't have problems with refinement / poison values. Differential Revision: https://reviews.llvm.org/D87480	2020-09-16 20:53:58 +02:00
Arthur Eubanks	c27b64bbe1	[Coro][NewPM] Handle llvm.coro.prepare.retcon in NPM coro-split pass Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D87731	2020-09-16 09:09:10 -07:00
Dangeti Tharun kumar	01e2b394ee	[Partial Inliner] Compute intrinsic cost through TTI https://bugs.llvm.org/show_bug.cgi?id=45932 assert(OutlinedFunctionCost >= Cloner.OutlinedRegionCost && "Outlined function cost should be no less than the outlined region") getting triggered in computeBBInlineCost. Intrinsics like "assume" are considered regular function calls while computing costs. This patch enables computeBBInlineCost to queries TTI for intrinsic call cost. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D87132	2020-09-16 15:12:31 +01:00
Sanjay Patel	24238f09ed	[SLP] fix formatting; NFC Also move variable declarations closer to usage and add code comments.	2020-09-16 08:50:27 -04:00
Sanjay Patel	6a23668e78	[SLP] remove uses of 'auto' that obscure functionality; NFC	2020-09-16 08:26:21 -04:00
Sanjay Patel	0cee1bf5d1	[SLP] remove redundant size check; NFC We bail out on small array size anyway.	2020-09-16 08:11:19 -04:00
Sanjay Patel	bbad998bab	[SLP] move loop index variable declaration to its use; NFC	2020-09-16 07:59:31 -04:00
Sanjay Patel	158989184e	[SLP] change poorly named variable; NFC 'V' shadows a function argument.	2020-09-16 07:59:31 -04:00
Arthur Eubanks	ba12e77ec1	[NewPM] Port strip* passes to NPM strip-nondebug and strip-debug-declare have no existing associated tests Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D87639	2020-09-15 18:25:12 -07:00
Arthur Eubanks	f7aa1563eb	[LowerSwitch][NewPM] Port lowerswitch to NPM Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D87726	2020-09-15 18:18:31 -07:00
Wenlei He	2c391a5a14	[LICM] Make Loop ICM profile aware again D65060 was reverted because it introduced non-determinism by using BFI counts from already freed blocks. The parent of this revision fixes that by using a VH callback on blocks to prevent this from happening and makes sure BFI data is passed correctly in LoopStandardAnalysisResults. This re-introduces the previous optimization of using BFI data to prevent LICM from hoisting/sinking if the instruction will end up moving to a colder block. Internally at Facebook this change results in a ~7% win in a CPU related metric in one of our big services by preventing hoisting cold code into a hot pre-header like the added test case demonstrates. Testing: ninja check Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87551	2020-09-15 17:21:58 -07:00
Wenlei He	2ea4c2c598	[BFI] Make BFI information available through loop passes inside LoopStandardAnalysisResults ~~D65060 uncovered that trying to use BFI in loop passes can lead to non-deterministic behavior when blocks are re-used while retaining old BFI data.~~ ~~To make sure BFI is preserved through loop passes a Value Handle (VH) callback is registered on blocks themselves. When a block is freed it now also wipes out the accompanying BFI entry such that stale BFI data can no longer persist resolving the determinism issue. ~~ ~~An optimistic approach would be to incrementally update BFI information throughout the loop passes rather than only invalidating them on removed blocks. The issues with that are:~~ ~~1. It is not clear how BFI information should be incrementally updated: If a block is duplicated does its BFI information come with? How about if it's split/modified/moved around? ~~ ~~2. Assuming we can address these problems the implementation here will be a massive undertaking. ~~ ~~There's a known need of BFI in LICM analysis which requires correct but not incrementally updated BFI data. A follow-up change can register BFI in all loop passes so this preserved but potentially lossy data is available to any loop pass that wants it.~~ See: D75341 for an identical implementation of preserving BFI via VH callbacks. The previous statements do still apply but this change no longer has to be in this diff because it's already upstream 😄 . This diff also moves BFI to be a part of LoopStandardAnalysisResults since the previous method using getCachedResults now (correctly!) statically asserts (D72893) that this data isn't static through the loop passes. Testing Ninja check Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D86156	2020-09-15 16:16:24 -07:00
Xun Li	7b4cc0961b	[TSAN] Handle musttail call properly in EscapeEnumerator (and TSAN) Call instructions with musttail tag must be optimized as a tailcall, otherwise could lead to incorrect program behavior. When TSAN is instrumenting functions, it broke the contract by adding a call to the tsan exit function inbetween the musttail call and return instruction, and also inserted exception handling code. This happend throguh EscapeEnumerator, which adds exception handling code and returns ret instructions as the place to insert instrumentation calls. This becomes especially problematic for coroutines, because coroutines rely on tail calls to do symmetric transfers properly. To fix this, this patch moves the location to insert instrumentation calls prior to the musttail call for ret instructions that are following musttail calls, and also does not handle exception for musttail calls. Differential Revision: https://reviews.llvm.org/D87620	2020-09-15 15:20:05 -07:00
Huihui Zhang	3b7f5166bd	[SLPVectorizer][SVE] Skip scalable-vector instructions before vectorizeSimpleInstructions. For scalable type, the aggregated size is unknown at compile-time. Skip instructions with scalable type to ensure the list of instructions for vectorizeSimpleInstructions does not contains any scalable-vector instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D87550	2020-09-15 13:10:15 -07:00
Matt Arsenault	7d6ca2ec57	InferAddressSpaces: Fix assert with unreachable code Invalid IR in unreachable code is technically valid IR. In this case, the address space of the value was never inferred, and we tried to rewrite it with an invalid address space value which would assert.	2020-09-15 15:48:43 -04:00
Florian Hahn	3d42d54955	[ConstraintElimination] Add constraint elimination pass. This patch is a first draft of a new pass that adds a more flexible way to eliminate compares based on more complex constraints collected from dominating conditions. In particular, it aims at simplifying conditions of the forms below using a forward propagation approach, rather than instcomine-style ad-hoc backwards walking of def-use chains. if (x < y) if (y < z) if (x < z) <- simplify or if (x + 2 < y) if (x + 1 < y) <- simplify assuming no wraps The general approach is to collect conditions and blocks, sort them by dominance and then iterate over the sorted list. Conditions are turned into a linear inequality and add it to a system containing the linear inequalities that hold on entry to the block. For blocks, we check each compare against the system and see if it is implied by the constraints in the system. We also keep a stack of processed conditions and remove conditions from the stack and the constraint system once they go out-of-scope (= do not dominate the current block any longer). Currently there still are the least the following areas for improvements * Currently large unsigned constants cannot be added to the system (coefficients must be represented as integers) * The way constraints are managed currently is not very optimized. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D84547	2020-09-15 19:31:11 +01:00
Florian Hahn	3a59628f3c	Revert "[DSE] Switch to MemorySSA-backed DSE by default." This reverts commit `fb109c42d9`. Temporarily revert due to a mis-compile pointed out at D87163.	2020-09-15 18:07:56 +01:00
Fangrui Song	4452cc4086	[VectorCombine] Don't vectorize scalar load under asan/hwasan/memtag/tsan Similar to the tsan suppression in `Utils/VNCoercion.cpp:getLoadLoadClobberFullWidthSize` (rL175034; load widening used by GVN), the D81766 optimization should be suppressed under tsan due to potential spurious data race reports: struct A { int i; const short s; // the load cannot be vectorized because int modify; // it overlaps with bytes being concurrently modified long pad1, pad2; }; // __tsan_read16 does not know that some bytes are undef and accessing is safe Similarly, under asan, users can mark memory regions with `__asan_poison_memory_region`. A widened load can lead to a spurious use-after-poison error. hwasan/memtag should be similarly suppressed. `mustSuppressSpeculation` suppresses asan/hwasan/tsan but not memtag, so we need to exclude memtag in `vectorizeLoadInsert`. Note, memtag suppression can be relaxed if the load is aligned to the its granule (usually 16), but that is out of scope of this patch. Reviewed By: spatel, vitalybuka Differential Revision: https://reviews.llvm.org/D87538	2020-09-15 09:47:21 -07:00
Simon Pilgrim	2b42d53e5e	SLPVectorizer.h - remove unnecessary AliasAnalysis.h include. NFCI. Forward declare AAResults instead of the (old) AliasAnalysis type. Remove includes from SLPVectorizer.cpp that are already included in SLPVectorizer.h.	2020-09-15 16:24:05 +01:00
Simon Pilgrim	65c6ae3b6a	[Utils] isLegalToPromote - Fix missing null check before writing to FailureReason. The FailureReason input parameter maybe null, we check this in all other cases in the method but this one was missed somehow. Fixes clang-tidy warning.	2020-09-15 14:49:04 +01:00
Sanjay Patel	aa57c1c967	[InstCombine] fix bug in pow expansion There at least one other bug related to pow -> sqrt transforms: http://lists.llvm.org/pipermail/llvm-dev/2020-September/145051.html ...but we probably can't solve that without fixing this first.	2020-09-15 09:29:48 -04:00
Simon Pilgrim	796c805269	ProvenanceAnalysis.h - remove unnecessary AliasAnalysis.h include. NFCI. Forward declare AAResults instead of the (old) AliasAnalysis type.	2020-09-15 13:34:35 +01:00
Bjorn Pettersson	aa8be5aeea	[Scalarizer] Avoid changing name of non-instructions The "takeName" logic in ScalarizerVisitor::gather did not consider that the value vector could refer to non-instructions, such as global variables. This patch make sure that we avoid changing the name of a value if it isn't an instruction. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D87685	2020-09-15 14:15:50 +02:00
Benjamin Kramer	b768546fe0	Revert "[InstCombine] Simplify select operand based on equality condition" This reverts commit `cfff88c03c`. Sends instcombine into an infinite loop. ``` define i1 @foo(i32 %arg, i32 %arg1) { bb: %tmp = udiv i32 %arg, %arg1 %tmp2 = mul nsw i32 %tmp, %arg1 %tmp3 = icmp eq i32 %tmp2, %arg %tmp4 = select i1 %tmp3, i32 %tmp, i32 undef %tmp5 = icmp sgt i32 %tmp4, 255 ret i1 %tmp5 } ```	2020-09-15 12:22:47 +02:00
Simon Pilgrim	5f13d6c1ee	[Transforms][Coroutines] Add missing header path to CMakeLists.txt Helps Visual Studio check include dependencies.	2020-09-15 10:37:25 +01:00
David Sherwood	69cccb3189	[SVE] Fix isLoadInvariantInLoop for scalable vectors I've amended the isLoadInvariantInLoop function to bail out for scalable vectors for now since the invariant.start intrinsic is only ever generated by the clang frontend for thread locals or struct and class constructors, neither of which support sizeless types. In addition, the intrinsic itself does not currently support the concept of a scaled size, which makes it impossible to compare the sizes of different scalable objects, e.g. <vscale x 32 x i8> and <vscale x 16 x i8>. Added new tests here: Transforms/LICM/AArch64/sve-load-hoist.ll Transforms/LICM/hoisting.ll Differential Revision: https://reviews.llvm.org/D87227	2020-09-15 08:30:19 +01:00
Arthur Eubanks	10b12d4035	Reland [docs][NewPM] Add docs for writing NPM passes As to not conflict with the legacy PM example passes under llvm/lib/Transforms/Hello, this is under HelloNew. This makes the CMakeLists.txt and general directory structure less confusing for people following the example. Much of the doc structure was taken from WritinAnLLVMPass.rst. This adds a HelloWorld pass which simply prints out each function name. More will follow after this, e.g. passes over different units of IR, analyses. https://llvm.org/docs/WritingAnLLVMPass.html contains a lot more. Relanded with missing "Support" dependency in LLVMBuild.txt. Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D86979	2020-09-14 16:06:19 -07:00
Arthur Eubanks	39ec36415d	Revert "[docs][NewPM] Add docs for writing NPM passes" This reverts commit `c2590de30d`. Breaks shared libs build	2020-09-14 15:55:17 -07:00
Arthur Eubanks	f3d8344854	[PruneEH][NFC] Use CallGraphUpdater in PruneEH In preparation for porting the pass to NPM. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87632	2020-09-14 14:43:19 -07:00
Arthur Eubanks	c2590de30d	[docs][NewPM] Add docs for writing NPM passes As to not conflict with the legacy PM example passes under llvm/lib/Transforms/Hello, this is under HelloNew. This makes the CMakeLists.txt and general directory structure less confusing for people following the example. Much of the doc structure was taken from WritinAnLLVMPass.rst. This adds a HelloWorld pass which simply prints out each function name. More will follow after this, e.g. passes over different units of IR, analyses. https://llvm.org/docs/WritingAnLLVMPass.html contains a lot more. Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D86979	2020-09-14 13:26:03 -07:00
Teresa Johnson	226d80ebe2	[MemProf] Rename HeapProfiler to MemProfiler for consistency This is consistent with the clang option added in `7ed8124d46`, and the comments on the runtime patch in D87120. Differential Revision: https://reviews.llvm.org/D87622	2020-09-14 13:14:57 -07:00
Nikita Popov	cfff88c03c	[InstCombine] Simplify select operand based on equality condition For selects of the type X == Y ? A : B, check if we can simplify A by using the X == Y equality and replace the operand if that's possible. We already try to do this in InstSimplify, but will only fold if the result of the simplification is the same as B, in which case the select can be dropped entirely. Here the select will be retained, just one operand simplified. As we are performing an actual replacement here, we don't have problems with refinement / poison values. Differential Revision: https://reviews.llvm.org/D87480	2020-09-14 20:07:06 +02:00
Simon Pilgrim	4ff4708d39	collectBitParts - use const references. NFCI. Fixes clang-tidy warnings first noticed on D87452.	2020-09-14 18:23:00 +01:00
Florian Hahn	f715d81c9d	[DSE] Only eliminate candidates that always store the same loc. AliasAnalysis/MemoryLocation does not account for loops. Two MemoryLocation can be must-overwrite, even if the first one writes multiple locations in a loop. This patch prevents removing such stores, by only considering candidates that are known to be loop invariant, or executed in the same BB. Currently the invariant check is quite conservative and only considers Alloca and Alloca-like instructions and arguments as invariant base pointers. It also considers GEPs with all constant indices and invariant bases as invariant. This can be improved in the future, but the current implementation has only minor impact on the total number of stores eliminated (25903 vs 26047 for the baseline). There are some 2-10% swings for some individual benchmarks. In roughly half of the cases, the number of stores removed increases actually, because we skip candidates that are unlikely to be valid candidates early.	2020-09-14 12:06:58 +01:00
David Sherwood	816663adb5	[SVE] In LoopIdiomRecognize::isLegalStore bail out for scalable vectors The function LoopIdiomRecognize::isLegalStore looks for stores in loops that could be transformed into memset or memcpy. However, the algorithm currently requires that we know how big the store is at runtime, i.e. that the store size will not overflow an unsigned integer. For scalable vectors we cannot guarantee this so I have changed the code to bail out for now. In addition, even if we add a way to query the maximum value of vscale in future we will still need to update the algorithm to cope with non-constant strides. The additional cost associated with calculating the memset and memcpy arguments will need to be taken into account as well. This patch also fixes up an implicit TypeSize -> uint64_t cast, thereby removing a warning. I've added tests here showing a fixed width vector loop being transformed into memcpy, and a scalable vector loop remaining unchanged: Transforms/LoopIdiom/memcpy-vectors.ll Differential Revision: https://reviews.llvm.org/D87439	2020-09-14 11:28:31 +01:00
David Stenberg	bfcb824ba5	[JumpThreading] Fix an incorrect Modified status This fixes PR47297. When ProcessBlock() was able to constant fold the terminator's condition, but not do any more transformations, the function would return false, which would lead to the JumpThreading pass returning an incorrect modified status. This patch makes so that ProcessBlock() returns true in such cases. This will trigger an unnecessary invocation of ProcessBlock() in such cases, but this should be rare to occur. This was caught using the check introduced by D80916. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87392	2020-09-14 10:36:13 +02:00
Jay Foad	9a4476072e	[UnifyLoopExits] Fix non-deterministic iteration order This was causing random minor codegen differences in shaders compiled with the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D87548	2020-09-14 09:09:58 +01:00
David Blaikie	6e06f1cd08	GCOVProfiling: Avoid use-after-move Turns out this was use-after-move of function_ref, which is trivially copyable and movable, so the move did nothing and use after move was safe. But since this function_ref is being copied into a std::function, change the function_ref to be std::function to avoid extra layers of type erasure indirection - and then it's a real use after move, and fix that by referring to the moved-to member variable rather than the moved-from parameter.	2020-09-13 12:54:36 -07:00
Fangrui Song	5f4e9bf641	[gcov] Fix memory leak due to BranchProbabilityInfoWrapperPass This is weird.	2020-09-13 00:44:32 -07:00
Fangrui Song	63182c2ac0	[gcov] Add spanning tree optimization gcov is an "Edge Profiling with Edge Counters" application according to Optimally Profiling and Tracing Programs (1994). The minimum number of counters necessary is \|E\|-(\|V\|-1). The unmeasured edges form a spanning tree. Both GCC --coverage and clang -fprofile-generate leverage this optimization. This patch implements the optimization for clang --coverage. The produced .gcda files are much smaller now.	2020-09-13 00:07:31 -07:00
Fangrui Song	f086e85eea	[gcov] Assign names to some types and loaded values used in @__llvm_internal* This makes the generated IR much more readable.	2020-09-12 22:42:37 -07:00
Fangrui Song	d6fadc49e3	[gcov] Process .gcda immediately after the accompanying .gcno instead of doing all .gcda after all .gcno i.e. change the work flow from * .gcno for function A * .gcno for function B * .gcno for function C * .gcda for function A * .gcda for function B * .gcda for function C to * .gcno for function A * .gcda for function A * .gcno for function B * .gcda for function B * .gcno for function C * .gcda for function C Currently there is duplicate logic in .gcno & .gcda processing: how functions are filtered, which edges are instrumented, etc. This refactor enables simplification. Since we always process .gcno, in -fprofile-arcs -fno-test-coverage mode, __llvm_internal_gcov_emit_function_args.0 will have non-zero checksums.	2020-09-12 13:53:03 -07:00
Fangrui Song	7d3825ed95	Revert "[gcov] emitProfileArcs: iterate over GCOVFunction's instead of Function's to avoid duplicated filtering" This reverts commit `412c9c0bf2`.	2020-09-12 12:34:43 -07:00
Fangrui Song	412c9c0bf2	[gcov] emitProfileArcs: iterate over GCOVFunction's instead of Function's to avoid duplicated filtering	2020-09-12 12:21:32 -07:00
Fangrui Song	c55c14837e	[gcov] Clean up by getting llvm.dbg.cu earlier	2020-09-12 12:21:32 -07:00
Florian Hahn	e082dee2b5	[DSE] Bail out on MemoryPhis when deleting stores at end of function. When deleting stores at the end of a function, we have to do PHI translation, otherwise we might miss reads in different iterations of a loop. See multiblock-loop-carried-dependence.ll for details. This fixes a mis-compile and surprisingly also increases the number of eliminated stores from 26047 to 26572 for MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto. This is most likely because we save budget by not exploring through MemoryPhis, which are less likely to result in valid candidates for elimination. The issue was reported post-commit for `fb109c42d9`.	2020-09-12 19:05:59 +01:00
David Green	74760bb00f	[LV][ARM] Add preferInloopReduction target hook. This allows the backend to tell the vectorizer to produce inloop reductions through a TTI hook. For the moment on ARM under MVE this means allowing integer add reductions of the correct size. In the future this can include integer min/max too, under -Os. Differential Revision: https://reviews.llvm.org/D75512	2020-09-12 17:47:04 +01:00
Tyker	78de7297ab	Reland [AssumeBundles] Use operand bundles to encode alignment assumptions NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html Complemantary to the assumption outliner prototype in D71692, this patch shows how we could simplify the code emitted for an alignemnt assumption. The generated code is smaller, less fragile, and it makes it easier to recognize the additional use as a "assumption use". As mentioned in D71692 and on the mailing list, we could adopt this scheme, and similar schemes for other patterns, without adopting the assumption outlining.	2020-09-12 15:36:06 +02:00
Nikita Popov	36e2e2e12e	[InstCombine] Fix incorrect SimplifyWithOpReplaced transform (PR47322) This is a followup to D86834, which partially fixed this issue in InstSimplify. However, InstCombine repeats the same transform while dropping poison flags -- which does not cover cases where poison is introduced in some other way. The fix here is a bit more comprehensive, because things are quite entangled, and it's hard to only partially address it without regressing optimization. There are really two changes here: * Export the SimplifyWithOpReplaced API from InstSimplify, with an added AllowRefinement flag. For replacements inside the TrueVal we don't actually care whether refinement occurs or not, the replacement is always legal. This part of the transform is now done in InstSimplify only. (It should be noted that the current AllowRefinement check is not sufficient -- that's an issue we need to address separately.) * Change the InstCombine fold to work by temporarily dropping poison generating flags, running the fold and then restoring the flags if it didn't work out. This will ensure that the InstCombine fold is correct as long as the InstSimplify fold is correct. Differential Revision: https://reviews.llvm.org/D87445	2020-09-12 14:45:06 +02:00
Sanjay Patel	40f12ef621	[SLP] further limit bailout for load combine candidate (PR47450) The test example based on PR47450 shows that we can match non-byte-sized shifts, but those won't ever be bswap opportunities. This isn't a full fix (we'd still match if the shifts were by 8-bits for example), but this should be enough until there's evidence that we need to do more (this is a borderline case for vectorization in the first place).	2020-09-11 11:56:11 -04:00
Krzysztof Parzyszek	f92908cc74	[DSE] Make sure that DSE+MSSA can handle masked stores Differential Revision: https://reviews.llvm.org/D87414	2020-09-11 10:00:21 -05:00
Sanjay Patel	6aa3fc4a5b	Revert "[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430)" This reverts commit `324a53205a`. On closer examination of at least one of the test diffs, this does not appear to be correct in all cases. Even the existing 'nsw' creation may be wrong based on this example: https://alive2.llvm.org/ce/z/uL4Hw9 https://alive2.llvm.org/ce/z/fJMKQS	2020-09-11 10:54:48 -04:00
Sanjay Patel	324a53205a	[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430) There's no signed wrap if both geps have 'inbounds': https://alive2.llvm.org/ce/z/nZkQTg https://alive2.llvm.org/ce/z/7qFauh	2020-09-11 10:39:09 -04:00
Simon Pilgrim	48b510c4bc	[NFC] Fix compiler warnings due to integer comparison of different signedness Fix by directly using INT_MAX and INT32_MAX. Patch by: @nullptr.cpp (Yang Fan) Differential Revision: https://reviews.llvm.org/D87347	2020-09-11 15:32:03 +01:00
David Sherwood	1e1770a07e	[SVE][CodeGen] Fix InlineFunction for scalable vectors When inlining functions containing allocas of scalable vectors we cannot specify the size in the lifetime markers, since we don't know this at compile time. Added new test here: test/Transforms/Inline/AArch64/sve-alloca-merge.ll Differential Revision: https://reviews.llvm.org/D87139	2020-09-11 08:34:51 +01:00
Michael Liao	f787fe15d8	[EarlyCSE] Remove unnecessary operand swap. - As min/max are commutative operators, there is no need to swap operands. That breaks the convention calculating the hash value.	2020-09-11 02:14:04 -04:00
Michael Liao	41e68f7ee7	[EarlyCSE] Fix and recommit the revised `c9826829d7` In addition to calculate hash consistently by swapping SELECT's operands, we also need to inverse the select pattern favor to match the original logic. [EarlyCSE] Equivalent SELECTs should hash equally DenseMap<SimpleValue> assumes that, if its isEqual method returns true for two elements, then its getHashValue method must return the same value for them. This invariant is broken when one SELECT node is a min/max operation, and the other can be transformed into an equivalent min/max by inverting its predicate and swapping its operands. This patch fixes an assertion failure that would occur intermittently while compiling the following IR: define i32 @t(i32 %i) { %cmp = icmp sle i32 0, %i %twin1 = select i1 %cmp, i32 %i, i32 0 %cmpinv = icmp sgt i32 0, %i %twin2 = select i1 %cmpinv, i32 0, i32 %i %sink = add i32 %twin1, %twin2 ret i32 %sink } Differential Revision: https://reviews.llvm.org/D86843	2020-09-10 23:30:56 -04:00
Michael Liao	39dc75f66c	Revert "[EarlyCSE] Equivalent SELECTs should hash equally" This reverts commit `c9826829d7` as it breaks regression tests.	2020-09-10 22:37:35 -04:00
Florian Hahn	fb109c42d9	[DSE] Switch to MemorySSA-backed DSE by default. The tests have been updated and I plan to move them from the MSSA directory up. Some end-to-end tests needed small adjustments. One difference to the legacy DSE is that legacy DSE also deletes trivially dead instructions that are unrelated to memory operations. Because MemorySSA-backed DSE just walks the MemorySSA, we only visit/check memory instructions. But removing unrelated dead instructions is not really DSE's job and other passes will clean up. One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll, but I think this comes down to legacy DSE not handling instructions that may throw correctly in that case. To cover this with MemorySSA-backed DSE, we need an update to llvm.coro.begin to treat it's return value to belong to the same underlying object as the passed pointer. There are some minor cases MemorySSA-backed DSE currently misses, e.g. related to atomic operations, but I think those can be implemented after the switch. This has been discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers and details in the thread on llvm-dev. Impact on CTMark: ``` Legacy Pass Manager exec instrs size-text O3 + 0.60% - 0.27% ReleaseThinLTO + 1.00% - 0.42% ReleaseLTO-g. + 0.77% - 0.33% RelThinLTO (link only) + 0.87% - 0.42% RelLO-g (link only) + 0.78% - 0.33% ``` http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions ``` New Pass Manager exec instrs. size-text O3 + 0.95% - 0.25% ReleaseThinLTO + 1.34% - 0.41% ReleaseLTO-g. + 1.71% - 0.35% RelThinLTO (link only) + 0.96% - 0.41% RelLO-g (link only) + 2.21% - 0.35% ``` http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions Reviewed By: asbirlea, xbolva00, nikic Differential Revision: https://reviews.llvm.org/D87163	2020-09-10 22:24:32 +01:00
Bryan Chan	c9826829d7	[EarlyCSE] Equivalent SELECTs should hash equally DenseMap<SimpleValue> assumes that, if its isEqual method returns true for two elements, then its getHashValue method must return the same value for them. This invariant is broken when one SELECT node is a min/max operation, and the other can be transformed into an equivalent min/max by inverting its predicate and swapping its operands. This patch fixes an assertion failure that would occur intermittently while compiling the following IR: define i32 @t(i32 %i) { %cmp = icmp sle i32 0, %i %twin1 = select i1 %cmp, i32 %i, i32 0 %cmpinv = icmp sgt i32 0, %i %twin2 = select i1 %cmpinv, i32 0, i32 %i %sink = add i32 %twin1, %twin2 ret i32 %sink } Differential Revision: https://reviews.llvm.org/D86843	2020-09-10 16:59:24 -04:00
Christopher Tetreault	7ddfd9b3eb	[SVE] Bail from VectorUtils heuristics for scalable vectors Bail from maskIsAllZeroOrUndef and maskIsAllOneOrUndef prior to iterating over the number of elements for scalable vectors. Assert that the mask type is not scalable in possiblyDemandedEltsInMask . Assert that the types are correct in all three functions. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87424	2020-09-10 12:29:37 -07:00
Craig Topper	c195ae2f00	[SLPVectorizer][X86][AMDGPU] Remove fcmp+select to fmin/fmax reduction support. Previously we could match fcmp+select to a reduction if the fcmp had the nonans fast math flag. But if the select had the nonans fast math flag, InstCombine would turn it into a fminnum/fmaxnum intrinsic before SLP gets to it. Seems fairly likely that if one of the fcmp+select pair have the fast math flag, they both would. My plan is to start vectorizing the fmaxnum/fminnum version soon, but I wanted to get this code out as it had some of the strangest fast math flag behaviors.	2020-09-10 11:49:19 -07:00
Fangrui Song	a0ffe2b21a	[PGO] Skip if an IndirectBrInst critical edge cannot be split PGOInstrumentation runs `SplitIndirectBrCriticalEdges` but some IndirectBrInst critical edge cannot be split. `getInstrBB` will crash when calling `SplitCriticalEdge`, e.g. int foo(char p) { void targets[2]; targets[0] = &&indirect; targets[1] = &&end; for (;; p++) if (p == 7) { indirect: goto targets[p[1]]; // the self loop is critical in -O } end: return 0; } Skip such critical edges to prevent a crash. Reviewed By: davidxl, lebedev.ri Differential Revision: https://reviews.llvm.org/D87435	2020-09-10 11:04:14 -07:00
Ettore Tiotto	6b13cfe739	[ArgumentPromotion]: Copy function metadata after promoting arguments The argument promotion pass currently fails to copy function annotations over to the modified function after promoting arguments. This patch copies the original function annotation to the new function. Reviewed By: fhann Differential Revision: https://reviews.llvm.org/D86630	2020-09-10 13:08:57 -04:00
Krzysztof Parzyszek	8a08740db6	[GVN] Account for masked loads/stores depending on load/store instructions This is a case where an intrinsic depends on a non-call instruction. Differential Revision: https://reviews.llvm.org/D87423	2020-09-10 10:57:33 -05:00
Nikita Popov	4e413e1621	[InstCombine] Temporarily do not drop volatile stores before unreachable See discussion in D87149. Dropping volatile stores here is legal per LLVM semantics, but causes issues for real code and may result in a change to LLVM volatile semantics. Temporarily treat volatile stores as "not guaranteed to transfer execution" in just this place, until this issue has been resolved.	2020-09-10 16:16:44 +02:00
Florian Hahn	a5ec99da6e	[DSE] Support eliminating memcpy.inline. MemoryLocation has been taught about memcpy.inline, which means we can get the memory locations read and written by it. This means DSE can handle memcpy.inline	2020-09-10 13:19:25 +01:00
Juneyoung Lee	1b9884df8d	Enable InsertFreeze flag of JumpThreading when used in LTO This patch enables inserting freeze when JumpThreading converts a select to a conditional branch when it is run in LTO. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85534	2020-09-10 19:05:49 +09:00
Sam Parker	0bdf8c9127	[SCEV] Constant expansion cost at minsize As code size is the only thing we care about at minsize, query the cost of materialising immediates when calculating the cost of a SCEV expansion. We also modify the CostKind to TCK_CodeSize for minsize, instead of RecipThroughput. Differential Revision: https://reviews.llvm.org/D76434	2020-09-10 08:21:11 +01:00
Juneyoung Lee	39c1653b3d	[JumpThreading] Conditionally freeze its condition when unfolding select This patch fixes pr45956 (https://bugs.llvm.org/show_bug.cgi?id=45956 ). To minimize its impact to the quality of generated code, I suggest enabling this only for LTO as a start (it has two JumpThreading passes registered). This patch contains a flag that makes JumpThreading enable it. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84940	2020-09-10 15:49:40 +09:00
Max Kazantsev	c413a8a8ec	[LoopLoadElim] Filter away candidates that stop being AddRecs after loop versioning. PR47457 The test in PR47457 demonstrates a situation when candidate load's pointer's SCEV is no loger a SCEVAddRec after loop versioning. The code there assumes that it is always a SCEVAddRec and crashes otherwise. This patch makes sure that we do not consider candidates for which this requirement is broken after the versioning. Differential Revision: https://reviews.llvm.org/D87355 Reviewed By: asbirlea	2020-09-10 13:30:31 +07:00
Florian Hahn	9969c317ff	[DSE,MemorySSA] Handle atomic stores explicitly in isReadClobber. Atomic stores are modeled as MemoryDef to model the fact that they may not be reordered, depending on the ordering constraints. Atomic stores that are monotonic or weaker do not limit re-ordering, so we do not have to treat them as potential read clobbers. Note that llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll already contains a set of negative test cases. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87386	2020-09-09 23:01:58 +01:00
Fangrui Song	ad61e346d3	[gcov] Give the __llvm_gcov_ctr load instruction a name for more readable output	2020-09-09 12:34:43 -07:00
Fangrui Song	dbac20bb6b	[gcov] Don't split entry block; add a synthetic entry block instead The entry block is split at the first instruction where `shouldKeepInEntry` returns false. The created basic block has a br jumping to the original entry block. The new basic block causes the function label line and the other entry block lines to be covered by different basic blocks, which can affect line counts with special control flows (fork/exec in the entry block requires heuristics in llvm-cov gcov to get consistent line counts). int main() { // BB0 return 0; // BB2 (due to entry block splitting) } // BB1 is the exit block (since gcov 4.8) This patch adds a synthetic entry block (like PGOInstrumentation and GCC) and inserts an edge from the synthetic entry block to the original entry block. We can thus remove the tricky `shouldKeepInEntry` and entry block splitting. The number of basic blocks does not change, but the emitted .gcno files will be smaller because we can save one GCOV_TAG_LINES tag. // BB0 is the synthetic entry block with a single edge to BB2 int main() { // BB2 return 0; // BB2 } // BB1 is the exit block (since gcov 4.8)	2020-09-09 12:25:24 -07:00
Mark de Wever	08196e0b2e	Implements [[likely]] and [[unlikely]] in IfStmt. This is the initial part of the implementation of the C++20 likelihood attributes. It handles the attributes in an if statement. Differential Revision: https://reviews.llvm.org/D85091	2020-09-09 20:48:37 +02:00
Krzysztof Parzyszek	81ff2d30a9	[DSE] Handle masked stores	2020-09-09 13:31:31 -05:00
David Stenberg	48fc781438	[UnifyFunctionExitNodes] Fix Modified status for unreachable blocks If a function had at most one return block, the pass would return false regardless if an unified unreachable block was created. This patch fixes that by refactoring runOnFunction into two separate helper functions for handling the unreachable blocks respectively the return blocks, as suggested by @bjope in a review comment. This was caught using the check introduced by D80916. Reviewed By: serge-sans-paille Differential Revision: https://reviews.llvm.org/D85818	2020-09-09 13:36:03 +02:00
Juneyoung Lee	36c8621638	[BuildLibCalls] Add more noundef to library functions This patch follows D85345 and adds more noundef attributes to return values/arguments of library functions that are mostly about accessing the file system or processes. A few functions like `chmod` or `times` use typedef `mode_t` and `clock_t`. They are neither struct nor union, so they cannot contain undef even if they're lowered to iN in IR. So, it is fine to add noundef to them. - clock_t's actual type is size_t (C17, 7.27.1.3), so it isn't struct or union. - For mode_t, either int or long is used in practice because programmers use bit manipulation. So, I think it is okay that it's never aggregate in practice. After this patch, the remaining library functions are those that eagerly participate in optimizations: they can be removed, reordered, or introduced by a transformation from primitive IR operations. For them, a few testings is needed, since it may not be valid to add noundef anymore even if C standard says it's okay. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85894	2020-09-09 20:33:35 +09:00
Juneyoung Lee	25ce1e0497	[ValueTracking] Add UndefOrPoison/Poison-only version of relevant functions This patch adds isGuaranteedNotToBePoison and programUndefinedIfUndefOrPoison. isGuaranteedNotToBePoison will be used at D75808. The latter function is used at isGuaranteedNotToBePoison. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84242	2020-09-09 20:00:26 +09:00
Florian Hahn	2bcc4db761	[EarlyCSE] Explicitly require AAResultsWrapperPass. The MemorySSAWrapperPass depends on AAResultsWrapperPass and if MemorySSA is preserved but AAResultsWrapperPass is not, this could lead to a crash when updating the last user of the MemorySSAWrapperPass. Alternatively AAResultsWrapperPass could be marked preserved by GVN, but I am not sure if that would be safe. I am not sure what is required in order to preserve AAResultsWrapperPass. At the moment, it seems like a couple of passes that do similar transforms to GVN are preserving it. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87137	2020-09-09 09:14:50 +01:00
Johannes Doerfert	d445b6dfec	[Attributor] Cleanup `::initialize` of various AAs This commit cleans up the ::initialize method of various AAs in the following ways: - If an associated function is required, give up on declarations. This was discovered as a real problem when lots of llvm.dbg.XXX call sites were assumed `noreturn` until proven otherwise. That does not make any sense and caused huge regressions and missed deductions. - Require more associated declarations for function interface AAs. - Use the IRAttribute::initialize to determine if function interface AAs can be used in IPO, don't replicate the checks (especially isFunctionIPOAmendable) all over the place. Arguably the function declaration check should be moved to some central place to.	2020-09-09 01:38:25 -05:00
Johannes Doerfert	849146ba93	[Attributor] Associate the callback callee with a call site argument (if any) If we have a callback, call site arguments were already associated with the callback callee. Now we also associate the function with the callback callee, thus we know ensure that the following holds true (if all return nonnull): `getAssociatedArgument()->getParent() == getAssociatedFunction()` To test this an early exit from `AAMemoryBehaviorCallSiteArgument::initialize`` is included as well. Without the change to getAssociatedFunction() this kind of early exit for declarations would cause callback call site arguments to miss out.	2020-09-09 00:52:17 -05:00
Johannes Doerfert	cefd2a2c70	[Attributor] Cleanup `IRPosition::getArgNo` usages As we handle callback calls we need to disambiguate the call site argument number from the callee argument number. While always equal in non-callback calls, a callback comes with a partial parameter-argument mapping so there is no implicit correspondence. Here we split `IRPosition::getArgNo()` into two public functions, `getCallSiteArgNo()` and `getCalleeArgNo()`. Usages are adjusted to pick the right one for their purpose. This fixed some problems that would have been exposed as we more aggressively optimize callbacks.	2020-09-09 00:52:17 -05:00
Johannes Doerfert	c0ab901bdd	[Attributor] Selectively look at the callee even when there are operand bundles While operand bundles carry unpredictable semantics, we know some of them and can therefore "ignore" them. In this case we allow to look at the declaration of `llvm.assume` when asked for the attributes at a call site. The assume operand bundles we have do not invalidate the declaration attributes. We cannot test this in isolation because the llvm.assume attributes are determined by the parser. However, a follow up patch will provide test coverage.	2020-09-09 00:52:17 -05:00
Johannes Doerfert	d5d75f61e5	[Attributor] Provide a command line option that limits recursion depth In `MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.cpp` we initialized attributes until stack frame ~35k caused space to run out. The initial size 1024 is pretty much random.	2020-09-09 00:47:02 -05:00
Max Kazantsev	795e4ee9d2	[NFC] Move functon from IndVarSimplify to SCEV This function can be reused in other places. Differential Revision: https://reviews.llvm.org/D87274 Reviewed By: fhahn, lebedev.ri	2020-09-09 11:20:59 +07:00
David Stenberg	17dce2fe43	[UnifyFunctionExitNodes] Remove unused getters, NFC The get{Return,Unwind,Unreachable}Block functions in UnifyFunctionExitNodes have not been used for many years, so just remove them. Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D87078	2020-09-08 20:42:28 +02:00
Nikita Popov	f6b87da0c7	[InstCombine] Fold comparison of abs with int min If the abs is poisoning, this is already folded to true/false. For non-poisoning abs, we can convert this to a comparison with the operand.	2020-09-08 20:23:03 +02:00
Nikita Popov	e97f3b1b43	[InstCombine] Fold abs of known negative operand If we know that the abs operand is known negative, we can replace it with a neg. To avoid computing known bits twice, I've removed the fold for the non-negative case from InstSimplify. Both the non-negative and the negative case are handled by InstCombine now, with one known bits call. Differential Revision: https://reviews.llvm.org/D87196	2020-09-08 20:14:35 +02:00
Xun Li	59a467ee4f	[Coroutine] Make dealing with alloca spills more robust D66230 attempted to fix a problem where when there are allocas used before CoroBegin. It keeps allocas and their uses stay in put if there are no escapse/changes to the data before CoroBegin. Unfortunately that's incorrect. Consider this code: %var = alloca i32 %1 = getelementptr .. %var; stays put %f = call i8* @llvm.coro.begin store ... %1 After this fix, %1 will now stay put, however if a store happens after coro.begin and hence modifies the content, this change will not be reflected in the coroutine frame (and will eventually be DCEed). To generalize the problem, if any alias ptr is created before coro.begin for an Alloca and that alias ptr is latter written into after coro.begin, it will lead to incorrect behavior. There are also a few other minor issues, such as incorrect dominate condition check in the ptr visitor, unhandled memory intrinsics and etc. Ths patch attempts to fix some of these issue, and make it more robust to deal with aliases. While visiting through the alloca pointer, we also keep track of all aliases created that will be used after CoroBegin. We track the offset of each alias, and then reacreate these aliases after CoroBegin using these offset. It's worth noting that this is not perfect and there will still be cases we cannot handle. I think it's impractical to handle all cases given the current design. This patch makes it more robust and should be a pure win. In the meantime, we need to think about what how to completely elimiante these issues, likely through the route as @rjmccall mentioned in D66230. Differential Revision: https://reviews.llvm.org/D86859	2020-09-08 10:59:13 -07:00
Florian Hahn	c7b7c32f4a	[DSE,MemorySSA] Increase walker limit a bit. This slightly bumps the walker limit so that it covers more cases while not increasing compile-time too much: http://llvm-compile-time-tracker.com/compare.php?from=0fc1c2b51ba0cfb9145139af35be638333865251&to=91144a50ea4fa82c0c877e77784f60371640b263&stat=instructions	2020-09-08 14:55:46 +01:00
Andrew Wei	78071fb524	[LSR] Canonicalize a formula before insert it into the list In GenerateConstantOffsetsImpl, we may generate non canonical Formula if BaseRegs of that Formula is updated and includes a recurrent expr reg related with current loop while its ScaledReg is not. Patched by: mdchen Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D86939	2020-09-08 13:14:53 +08:00
Johannes Doerfert	711bf7dcf9	[Attributor][FIX] Don't crash on internalizing linkonce_odr hidden functions The CloneFunctionInto has implicit requirements with regards to the linkage and visibility of the function. We now update these after we did the CloneFunctionInto on the copy with the same linkage and visibility as the original.	2020-09-07 23:38:09 -05:00
Johannes Doerfert	e6208849c8	[Attributor][NFC] Change variable spelling	2020-09-07 23:38:09 -05:00
Johannes Doerfert	8637acac5a	[Attributor][NFC] Clang tidy: no else after continue	2020-09-07 23:38:08 -05:00
Johannes Doerfert	ff70c25d76	[Attributor][NFC] Expand `auto` types (clang-fix-it)	2020-09-07 23:38:08 -05:00
Johannes Doerfert	79651265b2	[Attributor][FIX] Properly return changed if the IR was modified Deleting or replacing anything is certainly a modification. This caused a later assertion in IPSCCP when compiling 400.perlbench with the new PM. I'm not sure how to test this.	2020-09-07 23:38:08 -05:00
Florian Hahn	efb8e156da	[DSE,MemorySSA] Add an early check for read clobbers to traversal. Depending on the benchmark, this early exit can save a substantial amount of compile-time: http://llvm-compile-time-tracker.com/compare.php?from=505f2d817aa8e07ba98e5fd4a8f6ff0666f89df1&to=eb4e441147f9b4b7a5fcbbc57428cadbe9e01f10&stat=instructions	2020-09-07 23:22:10 +01:00
Roman Lebedev	bb7d3af113	Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline This was reverted in `503deec218` because it caused gigantic increase (3x) in branch mispredictions in certain benchmarks on certain CPU's, see https://reviews.llvm.org/D84108#2227365. It has since been investigated and here are the results: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html > It's an amazingly severe regression, but it's also all due to branch > mispredicts (about 3x without this). The code layout looks ok so there's > probably something else to deal with. I'm not sure there's anything we can > reasonably do so we'll just have to take the hit for now and wait for > another code reorganization to make the branch predictor a bit more happy :) > > Thanks for giving us some time to investigate and feel free to recommit > whenever you'd like. > > -eric So let's just reland this. Original commit message: I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108 This reverts commit `503deec218`.	2020-09-08 00:24:03 +03:00
Nikita Popov	9fb46a452d	[SCCP] Compute ranges for supported intrinsics For intrinsics supported by ConstantRange, compute the result range based on the argument ranges. We do this independently of whether some or all of the input ranges are full, as we can often still constrain the result in some way. Differential Revision: https://reviews.llvm.org/D87183	2020-09-07 22:16:06 +02:00
Sanjay Patel	8b30067919	[InstCombine] improve fold of pointer differences This was supposed to be an NFC cleanup, but there's a real logic difference (did not drop 'nsw') visible in some tests in addition to an efficiency improvement. This is because in the case where we have 2 GEPs, the code was always swapping the operands and negating the result. But if we have 2 GEPs, we should never need swapping/negation AFAICT. This is part of improving flags propagation noticed with PR47430.	2020-09-07 15:54:32 -04:00
Simon Pilgrim	5ea9e655ef	VPlan.h - remove unnecessary forward declarations. NFCI. Already defined in includes.	2020-09-07 18:35:06 +01:00
Sanjay Patel	7a6d6f0f70	[InstCombine] improve folds for icmp with multiply operands (PR47432) Check for no overflow along with an odd constant before we lose information by converting to bitwise logic. https://rise4fun.com/Alive/2Xl Pre: C1 != 0 %mx = mul nsw i8 %x, C1 %my = mul nsw i8 %y, C1 %r = icmp eq i8 %mx, %my => %r = icmp eq i8 %x, %y Name: nuw ne Pre: C1 != 0 %mx = mul nuw i8 %x, C1 %my = mul nuw i8 %y, C1 %r = icmp ne i8 %mx, %my => %r = icmp ne i8 %x, %y Name: odd ne Pre: C1 % 2 != 0 %mx = mul i8 %x, C1 %my = mul i8 %y, C1 %r = icmp ne i8 %mx, %my => %r = icmp ne i8 %x, %y	2020-09-07 12:40:37 -04:00
Sanjay Patel	b22910daab	[InstCombine] erase instructions leading up to unreachable Normal dead code elimination ignores assume intrinsics, so we fail to delete assumes that are not meaningful (and potentially worse if they cause conflicts with other assumptions). The motivating example in https://llvm.org/PR47416 suggests that we might have problems upstream from here (difference between C and C++), but this should be a cheap way to make sure we remove more dead code. Differential Revision: https://reviews.llvm.org/D87149	2020-09-07 10:44:08 -04:00
Sanjay Patel	3ca8b9a560	[InstCombine] give a name to an intermediate value for easier tracking; NFC As noted in PR47430, we probably want to conditionally include 'nsw' here anyway, so we are going to need to fill out the optional args.	2020-09-07 08:19:42 -04:00
Sam Parker	928c4b4b49	[SCEV] Refactor isHighCostExpansionHelper To enable the cost of constants, the helper function has been reorganised: - A struct has been introduced to hold SCEV operand information so that we know the user of the operand, as well as the operand index. The Worklist now uses instead instead of a bare SCEV. - The costing of each SCEV, and collection of its operands, is now performed in a helper function. Differential Revision: https://reviews.llvm.org/D86050	2020-09-07 11:57:46 +01:00
Sam Parker	65f78e73ad	[SimplifyCFG] Consider cost of combining predicates. Modify FoldBranchToCommonDest to consider the cost of inserting instructions when attempting to combine predicates to fold blocks. The threshold can be controlled via a new option: -simplifycfg-branch-fold-threshold which defaults to '2' to allow the insertion of a not and another logical operator. Differential Revision: https://reviews.llvm.org/D86526	2020-09-07 10:04:50 +01:00
Florian Hahn	16bb71fd4f	[DSE,MemorySSA] Add a few additional debug messages.	2020-09-06 20:31:00 +01:00
Nikita Popov	4892d3a198	[InstCombine] Fold abs with dominating condition Similar to D87168, but for abs. If we have a dominating x >= 0 condition, then we know that abs(x) is x. This fold is in InstCombine, because we need to create a sub instruction for the x < 0 case. Differential Revision: https://reviews.llvm.org/D87184	2020-09-05 16:18:35 +02:00
Nikita Popov	ada8a17d94	[InstCombine] Fold abs intrinsic eq zero Following the same transform for the select version of abs.	2020-09-05 15:11:38 +02:00
Nikita Popov	58b28fa7a2	[InstCombine] Fold mul of abs intrinsic Same as the existing SPF_ABS fold. We don't need to explicitly handle NABS, as the negs will get folded away first.	2020-09-05 12:37:45 +02:00
Nikita Popov	10cb23c6ca	[InstCombine] Fold cttz of abs intrinsic Same as the existing fold for SPF_ABS. We don't need to explicitly handle the NABS variant, as we'll first fold away the neg in that case.	2020-09-05 12:25:41 +02:00
serge-sans-paille	3a6f3fc160	Fix return status of SimplifyCFG When a switch case is folded into default's case, that's an IR change that should be reported, update ConstantFoldTerminator accordingly. Differential Revision: https://reviews.llvm.org/D87142	2020-09-05 07:54:15 +02:00
Florian Hahn	00eb6fef08	[DSE,MemorySSA] Check for throwing instrs between killing/killed def. We also have to check all uses between the killing & killed def and check if any of them is throwing.	2020-09-04 18:54:59 +01:00
Wei Wang	4eef14f978	[OpenMPOpt] Assume indirect call always changes ICV When checking call sites, give special handling to indirect call, as the callee may be unknown and can lead to nullptr dereference later. Assume conservatively that the ICV always changes in such case. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D87104	2020-09-04 09:05:32 -07:00
Teresa Johnson	45c3560384	[HeapProf] Address post-review comments in instrumentation code Addresses post-review comments from D85948, which can be found here: https://reviews.llvm.org/rG7ed8124d46f9.	2020-09-04 08:59:00 -07:00
Florian Hahn	6bc5e866bd	[MemCpyOpt] Account for case that MemInsertPoint == BI. In that case, the new MemoryDef needs to be inserted before MemInsertPoint.	2020-09-04 14:04:08 +01:00
Florian Hahn	e2fc6a31d3	[MemCpyOpt] Preserve MemorySSA. This patch updates MemCpyOpt to preserve MemorySSA. It uses the MemoryDef at the insertion point of the builder and inserts the new def after that def. In some cases, we just modify a memory instruction. In that case, get the defining access, then remove the memory access and add a new one. If the defining access is in a different block, insert a new def at the beginning of the current block, otherwise after the defining access. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86651	2020-09-04 09:05:33 +01:00
Sanjay Patel	2391a34f9f	[InstCombine] canonicalize all commutative intrinsics with constant arg	2020-09-03 12:42:04 -04:00
Sanjay Patel	bdd5bfd0e4	[IR][GVN] add/allow commutative intrinsics with >2 args Follow-up to D86798 and rGe25449f.	2020-09-03 10:14:53 -04:00
Florian Hahn	6de51189b0	[PassManager] Move load/store motion pass after DSE in LTO pipeline. As far as I am aware, the placement of MergedLoadStoreMotion in the pipeline is not heavily tuned currently. It seems to not matter much if we do it after DSE in the LTO pipeline (no binary changes for -O3 -flto on MultiSource/SPEC2000/SPEC2006). Moving it after DSE however has a major benefit: MemorySSA is constructed by LICM and is consumed by DSE, so if MergedLoadStoreMotion happens after DSE, we do not need to preserve MemorySSA in it. If there are any concerns with this move, I can also update MergedLoadStoreMotion to preserve MemorySSA. This patch together with D86651 (preserve MemSSA in MemCpyOpt) and D86534 (preserve MemSSA in GVN) are the remaining patches to bring down compile-time for DSE + MemorySSA to the levels outlined in http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html Once they land, we should be able to start with flipping the switch on enabling DSE + MmeorySSA. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86967	2020-09-03 13:47:50 +01:00
Florian Hahn	a344b382a0	[GVN] Preserve MemorySSA if it is available. Preserve MemorySSA if it is available before running GVN. DSE with MemorySSA will run closely after GVN. If GVN and 2 other passes preserve MemorySSA, DSE can re-use MemorySSA used by LICM when doing LTO. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86534	2020-09-03 12:28:13 +01:00
David Green	245f846c4e	[MemCpyOptimizer] Change required analysis order for BasicAA/PhiValuesAnalysis This is a followup to `1ccfb52a61`, which made a number of changes including the apparently innocuous reordering of required passes in MemCpyOptimizer. This however altered the creation order of BasicAA vs Phi Values analysis, meaning BasicAA did not pick up PhiValues as a cached result. Instead if we require MemoryDependence first it will require PhiValuesAnalysis allowing BasicAA to use it for better results. I don't claim this is an excellent design, but it fixes a nasty little regressions where a query later in JumpThreading was getting worse results. Differential Revision: https://reviews.llvm.org/D87027	2020-09-03 12:01:51 +01:00
Florian Hahn	4c5e4aa89b	Revert "[SCCP] Do not replace deref'able ptr with un-deref'able one." This reverts commit `3542feeb20`. This seems to be causing issues with a sanitizer build http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-msan/builds/21677	2020-09-03 10:28:42 +01:00
Florian Hahn	3542feeb20	[SCCP] Do not replace deref'able ptr with un-deref'able one. Currently IPSCCP (and others like CVP/GVN) blindly propagate pointer equalities. In certain cases, that leads to dereferenceable pointers being replaced, as in the example test case. I think this is not allowed, as it introduces an access of an un-dereferenceable pointer. Note that the pointer is inbounds, but one past the last element, so it is valid, but not dereferenceable. This patch is mostly to highlight the issue and start a discussion. Currently it only checks for specifically looking one-past-the-last-element pointers with array typed bases. This causes the mis-compile outlined in https://stackoverflow.com/questions/55754313/is-this-gcc-clang-past-one-pointer-comparison-behavior-conforming-or-non-standar In the test case, if we replace %p with the GEP for the store, we subsequently determine that the store and the load cannot alias, because they are to different underlying objects. Note that Alive2 seems to think that the replacement is valid: https://alive2.llvm.org/ce/z/2rorhk Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85332	2020-09-03 10:22:21 +01:00
Eli Friedman	96ef6998df	[InstCombine] Fix a couple crashes with extractelement on a scalable vector. Differential Revision: https://reviews.llvm.org/D86989	2020-09-02 18:02:07 -07:00
Huihui Zhang	b4f04d7135	[VectorCombine][SVE] Do not fold bitcast shuffle for scalable type. First, shuffle cost for scalable type is not known for scalable type; Second, we cannot reason if the narrowed shuffle mask for scalable type is a splat or not. E.g., Bitcast splat vector from type <vscale x 4 x i32> to <vscale x 8 x i16> will involve narrowing shuffle mask <vscale x 4 x i32> zeroinitializer to <vscale x 8 x i32> with element sequence of <0, 1, 0, 1, ...>, which cannot be reasoned if it's a valid splat or not. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86995	2020-09-02 15:02:16 -07:00
Arthur Eubanks	352cf57cfb	[Bindings] Move LLVMAddInstructionSimplifyPass to Scalar.cpp Should not be with the pass, but alongside all the other C bindings. Reviewed By: sroland Differential Revision: https://reviews.llvm.org/D87041	2020-09-02 10:35:39 -07:00
Congzhe Cao	ec489ae048	[IPSCCP] Fix a bug that the "returned" attribute is not cleared when function is optimized to return undef In IPSCCP when a function is optimized to return undef, it should clear the returned attribute for all its input arguments and its corresponding call sites. The bug is exposed when the value of an input argument of the function is assigned to a physical register and because of the argument having a returned attribute, the value of this physical register will continue to be used as the function return value right after the call instruction returns, even if the value that this register holds may be clobbered during the function call. This potentially results in incorrect values being used afterwards. Reviewed By: jdoerfert, fhahn Differential Revision: https://reviews.llvm.org/D84220	2020-09-02 11:21:48 -04:00
David Stenberg	6d36b22b21	[GlobalOpt] Fix an incorrect Modified status When marking a global variable constant, and simplifying users using CleanupConstantGlobalUsers(), the pass could incorrectly return false if there were still some uses left, and no further optimizations was done. This was caught using the check introduced by D80916. This fixes PR46749. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D85837	2020-09-02 15:00:45 +02:00
Venkataramanan Kumar	626c3738cd	[InstCombine] Transform 1.0/sqrt(X) * X to X/sqrt(X) These transforms will now be performed irrespective of the number of uses for the expression "1.0/sqrt(X)": 1.0/sqrt(X) * X => X/sqrt(X) X * 1.0/sqrt(X) => X/sqrt(X) We already handle more general cases, and we are intentionally not creating extra (and likely expensive) fdiv ops in IR. This pattern is the exception to the rule because we always expect the Backend to reduce X/sqrt(X) to sqrt(X), if it has the necessary (reassoc) fast-math-flags. Ref: DagCombiner optimizes the X/sqrt(X) to sqrt(X). Differential Revision: https://reviews.llvm.org/D86726	2020-09-02 08:23:48 -04:00
Sanjay Patel	8fb055932c	[VectorCombine] allow vector loads with mismatched insert type This is an enhancement to D81766 to allow loading the minimum target vector type into an IR vector with a different number of elements. In one of the motivating tests from PR16739, SLP creates <2 x float> load ops mixed with <4 x float> insert ops, so we want to handle that pattern in addition to potential oversized vectors created by the vectorizers. For now, we are assuming the insert/extract subvector with undef is free because there is no exact corresponding TTI modeling for that. Differential Revision: https://reviews.llvm.org/D86160	2020-09-02 08:11:36 -04:00
Shinji Okumura	5d13479574	[Attributor] Make use of AANoUndef in AAUndefinedBehavior This patch makes it possible for AAUB to use information from AANoUndef. This is the next patch of D86983 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86984	2020-09-02 16:08:03 +09:00
Shinji Okumura	7558e9e5a2	[Attributor] Fix AANoUndef initialization When the associated value is undef, we immediately forced to indicate a pessimistic fixpoint so far. This patch changes the initialization to check the attribute given in IR at first and to indicate an optimistic fixpoint when it is given. This change will enable us to catch , for example, the following case in AAUB. ``` call void @foo(i32 noundef undef) ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86983	2020-09-02 15:40:43 +09:00
Alina Sbirlea	1ccfb52a61	[MemCpyOptimizer] Preserve analyses and replace use of lambdas to get them. Summary: Analyses are preserved in MemCpyOptimizer. Get analyses before running the pass and store the pointers, instead of using lambdas and getting them every time on demand. Reviewers: lenary, deadalnix, mehdi_amini, nikic, efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74494	2020-09-01 17:35:40 -07:00
Aaron Liu	d7e16ca28f	[LV] Interleave to expose ILP for small loops with scalar reductions. Interleave for small loops that have reductions inside, which breaks dependencies and expose. This gives very significant performance improvements for some benchmarks. Because small loops could be in very hot functions in real applications. Differential Revision: https://reviews.llvm.org/D81416	2020-09-01 19:47:32 +00:00
Arthur Eubanks	96f0b57568	[Bindings] Add LLVMAddInstructionSimplifyPass Reviewed By: sroland Differential Revision: https://reviews.llvm.org/D86764	2020-09-01 12:38:49 -07:00
Anh Tuyen Tran	68717acb24	[LoopIdiomRecognizePass] Options to disable part or the entire Loop Idiom Recognize Pass Loop Idiom Recognize Pass (LIRP) attempts to transform loops with subscripted arrays into memcpy/memset function calls. In some particular situation, this transformation introduces negative impacts. For example: https://bugs.llvm.org/show_bug.cgi?id=47300 This patch will enable users to disable a particular part of the transformation, while he/she can still enjoy the benefit brought about by the rest of LIRP. The default behavior stays unchanged: no part of LIRP is disabled by default. Reviewed By: etiotto (Ettore Tiotto) Differential Revision: https://reviews.llvm.org/D86262	2020-09-01 13:59:24 +00:00
Hamilton Tobon Mosquera	1d3d9b9cd8	[OpenMPOpt][NFC] Moving constants as struct static attributes	2020-08-31 19:05:00 -05:00
Hamilton Tobon Mosquera	8931add617	[OpenMPOpt][HideMemTransfersLatency] Get values stored in offload arrays getValuesInOffloadArrays goes through the offload arrays in __tgt_target_data_begin_mapper getting the values stored in them before the call is issued. call void @__tgt_target_data_begin_mapper(arg0, arg1, i8 %offload_baseptrs, i8 %offload_ptrs, i64* %offload_sizes, ...) Diferential Revision: https://reviews.llvm.org/D86300	2020-08-31 15:33:05 -05:00
Sanjay Patel	e25449ff57	[IR][GVN] allow intrinsics in Instruction's isCommutative query (2nd try) The 1st try was reverted because I missed an assert that needed softening. As discussed in D86798 / rG09652721 , we were potentially returning a different result for whether an Instruction is commutable depending on if we call the base class or derived class method. This requires relaxing asserts in GVN, but that pass seems to be working otherwise. NewGVN requires more work because it uses different code paths for numbering binops and calls.	2020-08-31 16:01:19 -04:00
Christopher Tetreault	640f20b0c7	[SVE] Remove calls to VectorType::getNumElements from InstCombine Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D82237	2020-08-31 12:59:10 -07:00
Roman Lebedev	c23aefd7c3	[NFC][InstCombine] visitPHINode(): cleanup PHI CSE instruction replacement As @nikic is pointing out in https://reviews.llvm.org/rGbf21ce7b908e#inline-4647 this must be sufficient otherwise `EliminateDuplicatePHINodes()` would have hit issues with it already.	2020-08-31 22:29:39 +03:00
Fangrui Song	f2284e3405	[Sink] Optimize/simplify sink candidate finding with nearest common dominator For an instruction in the basic block BB, SinkingPass enumerates basic blocks dominated by BB and BB's successors. For each enumerated basic block, SinkingPass uses `AllUsesDominatedByBlock` to check whether the basic block dominates all of the instruction's users. This is inefficient. Use the nearest common dominator of all users to avoid enumerating the candidate. The nearest common dominator may be in a parent loop which is not beneficial. In that case, find the ancestors in the dominator tree. In the case that the instruction has no user, with this change we will not perform unnecessary move. This causes some amdgpu test changes. A stage-2 x86-64 clang is a byte identical with this change.	2020-08-30 22:51:00 -07:00
Sanjay Patel	badd7264e1	Revert "[IR][GVN] allow intrinsics in Instruction's isCommutative query" This reverts commit `25597f7783`. It is causing crashing on bots such as: http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10523/steps/ninja-build/logs/stdio	2020-08-30 17:02:51 -04:00
Florian Hahn	86d817d7cf	[DSE,MemorySSA] Skip defs without analyzable write locations. Similar to other checks above, if there is no write location for a def, it cannot be considered for elimination and can be skipped.	2020-08-30 21:56:25 +01:00
Sanjay Patel	25597f7783	[IR][GVN] allow intrinsics in Instruction's isCommutative query As discussed in D86798 / rG09652721 , we were potentially returning a different result for whether an Instruction is commutable depending on if we call the base class or derived class method. This requires relaxing an assert in GVN, but that pass seems to be working otherwise. NewGVN requires more work because it uses different code paths for numbering binops and calls.	2020-08-30 16:49:22 -04:00
Florian Hahn	42c57c294d	[DSE,MemorySSA] Simplify code, EarlierAccess is be a MemoryDef (NFC). After recent changes, we return early if Current is a MemoryPhi, so EarlierAccess can only be a MemoryDef.	2020-08-30 21:31:57 +01:00
Florian Hahn	eb35ebb3a2	[LV] Update CFG before adding runtime checks. addRuntimeChecks uses SCEVExpander, which relies on the DT/LoopInfo to be up-to-date. Changing the CFG afterwards may invalidate some inserted instructions, especially LCSSA phis. Reorder the code to first update the CFG and then create the runtime checks. This should not have any impact on the generated code, as we adjust the CFG and generate runtime checks together. Fixes PR47343.	2020-08-30 18:21:44 +01:00
Sanjay Patel	af4581e8ab	[SLP] make commutative check apply only to binops; NFC As discussed in D86798, it's not clear if the caller code works with a more liberal definition of "commutative" that includes intrinsics like min/max. This makes the binop restriction (current functionality is unchanged) explicit until the code is audited/tested.	2020-08-30 10:55:44 -04:00
David Green	543c5425f1	[LV] Add some const to RecurrenceDescriptor. NFC	2020-08-30 12:27:51 +01:00
sstefan1	5dfd7cc46c	Reland [OpenMPOpt] ICV tracking for calls The problem with module slice has been addressed in D86319 Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a function can have a unique ICV value after it is finished, and AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a call site. This enables us to check the value of a call and if it changes the ICV. This also changes the approach in `getReplacementValues()` to a worklist-based approach so we can explore all relevant BBs. Differential Revision: https://reviews.llvm.org/D85544	2020-08-30 11:27:48 +02:00
sstefan1	8d8ce85b23	[Attributor] Introduce module slice. Summary: The module slice describes which functions we can analyze and transform while working on an SCC as part of the Attributor-CGSCC pass. So far we simply restricted it to the SCC. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D86319	2020-08-30 10:30:44 +02:00
Shinji Okumura	a7ca9e09bd	[Attributor] Fix callsite check in AAUndefinedBehavior This is the next patch of D86842 When we check `noundef` attribute violation at callsites, we do not have to require `nonnull` in the following two cases. 1. An argument is known to be simplified to undef 2. An argument is known to be dead Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86845	2020-08-30 13:17:02 +09:00
Shinji Okumura	7082381735	[Attributor][NFC] Fix dependency type in AAUndefinedBehaviorImpl::updateImpl This patch fixes wrong dependency type in AAUB. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86842	2020-08-30 12:34:50 +09:00
Shinji Okumura	7a15dfd056	[Attributor] Fix AANoUndef identification Even though `noundef` IR attribute might be attached to non-void type values, AANoUndef is mistakenly identified for pointer type values only. This patch fixes that. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86737	2020-08-30 05:39:25 +09:00
Florian Hahn	5067f4b626	[LV] Check opt-for-size before expanding runtime checks. Move bail out when optimizing for size before runtime check generation. In that case, we do not use the result of the expansion, the expanded instruction will be dead and cleaned up later. By doing the check before expanding the runtime-checks, we can save a bit of unnecessary work.	2020-08-29 20:35:14 +01:00
Roman Lebedev	1dcb936cf6	[NFC][Local] EliminateDuplicatePHINodes(): add STATISTIC()	2020-08-29 22:03:18 +03:00
Roman Lebedev	961483a5ea	[NFCI][Local] Rewrite EliminateDuplicatePHINodes to optionally check hashing invariants EarlyCSE has a mode to verify the invariant that hash equality equals key equality, but EliminateDuplicatePHINodes() doesn't. I've verified that this would have caught the stage2-stage3 mismatches `5ec2b757cc` revert has fixed, that were introduced last time in `3e69871ab5`.	2020-08-29 22:03:10 +03:00
Shinji Okumura	1364d856f4	[Attributor][NFC] Do not manifest noundef for positions to be changed to undef This patch fixes AANoUndef manifestation. We should not manifest noundef for positions that will be changed to undef. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86835	2020-08-30 03:23:41 +09:00
Florian Hahn	31cdb29de4	[DSE,MemorySSA] Return early when hitting a MemoryPhi. A MemoryPhi can never be eliminated. If we hit one, return the Phi, so the caller can continue traversing the incoming accesses. This saves some unnecessary read clobber checks and improves compile-time http://llvm-compile-time-tracker.com/compare.php?from=1ffc58b6d098ce8fa71f3a80fe75b990f633f921&to=d0fa8d1982380b57d7b6067528104bc373dbe07a&stat=instructions	2020-08-29 18:28:26 +01:00
Roman Lebedev	5ec2b757cc	[Instruction] Speculatively undo isIdenticalToWhenDefined() PHI handling changes The stage2-stage3 differences persist even without instcombine-based PHI CSE, so this is the only possible reason.	2020-08-29 19:38:57 +03:00
Sanjay Patel	0965272140	[EarlyCSE] fold commutable intrinsics Handling the new min/max intrinsics is the motivation, but it turns out that we have a bunch of other intrinsics with this missing bit of analysis too. The FP min/max tests show that we are intersecting FMF, so that part should be safe too. As noted in https://llvm.org/PR46897 , there is a commutative property specifier for intrinsics, but no corresponding function attribute, and so apparently no uses of that bit. We may want to remove that next. Follow-up patches should wire up the Instruction::isCommutative() to this IntrinsicInst specialization. That requires updating callers to be aware of the more general commutative property (not just binops). Differential Revision: https://reviews.llvm.org/D86798	2020-08-29 12:11:01 -04:00
Roman Lebedev	bf21ce7b90	[InstCombine] Take 3: Perform trivial PHI CSE The original take 1 was `6102310d81`, which taught InstSimplify to do that, which seemed better at time, since we got EarlyCSE support for free. However, it was proven that we can not do that there, the simplified-to PHI would not be reachable from the original PHI, and that is not something InstSimplify is allowed to do, as noted in the commit `ed90f15efb` that reverted it: > It appears to cause compilation non-determinism and caused stage3 mismatches. Then there was take 2 `3e69871ab5`, which was InstCombine-specific, but it again showed stage2-stage3 differences, and reverted in `bdaa3f86a0`. This is quite alarming. Here, let's try to change how we find existing PHI candidate: due to the worklist order, and the way PHI nodes are inserted (it may be inserted as the first one, or maybe not), let's look at all PHI nodes in the block. Effects on vanilla llvm test-suite + RawSpeed: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \\|%\\| \| \|----------------------------------------------------\|-----------\|-----------\|-------:\|---------:\|---------:\| \| asm-printer.EmittedInsts \| 7942329 \| 7942457 \| 128 \| 0.00% \| 0.00% \| \| assembler.ObjectBytes \| 254295632 \| 254312480 \| 16848 \| 0.01% \| 0.01% \| \| correlated-value-propagation.NumPhis \| 18412 \| 18347 \| -65 \| -0.35% \| 0.35% \| \| early-cse.NumCSE \| 2183283 \| 2183267 \| -16 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 550105 \| 541842 \| -8263 \| -1.50% \| 1.50% \| \| instcombine.NumAggregateReconstructionsSimplified \| 73 \| 4506 \| 4433 \| 6072.60% \| 6072.60% \| \| instcombine.NumCombined \| 3640311 \| 3644419 \| 4108 \| 0.11% \| 0.11% \| \| instcombine.NumDeadInst \| 1778204 \| 1783205 \| 5001 \| 0.28% \| 0.28% \| \| instcombine.NumPHICSEs \| 0 \| 22490 \| 22490 \| 0.00% \| 0.00% \| \| instcombine.NumWorklistIterations \| 2023272 \| 2024400 \| 1128 \| 0.06% \| 0.06% \| \| instcount.NumCallInst \| 1758395 \| `1758802` \| 407 \| 0.02% \| 0.02% \| \| instcount.NumInvokeInst \| 59478 \| 59502 \| 24 \| 0.04% \| 0.04% \| \| instcount.NumPHIInst \| 330557 \| 330545 \| -12 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1077138 \| 1077220 \| 82 \| 0.01% \| 0.01% \| \| instcount.TotalFuncs \| 101442 \| 101441 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8831946 \| 8832606 \| 660 \| 0.01% \| 0.01% \| \| simplifycfg.NumHoistCommonCode \| 24186 \| 24187 \| 1 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 4300 \| 4410 \| 110 \| 2.56% \| 2.56% \| \| simplifycfg.NumSimpl \| 1019813 \| 999767 \| -20046 \| -1.97% \| 1.97% \| ``` So it fires 22490 times, which is less than ~24k the take 1 did, but more than what take 2 did (22228 times) . It allows foldAggregateConstructionIntoAggregateReuse() to actually work after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG would have done this PHI CSE, of all places. Additionally, allows some more `invoke`->`call` folds to happen (+110, +2.56%). All in all, expectedly, this catches less things overall, but all the motivational cases are still caught, so all good.	2020-08-29 18:21:24 +03:00
Roman Lebedev	bdaa3f86a0	Revert "[InstCombine] Take 2: Perform trivial PHI CSE" While the original variant with doing this in InstSimplify (rightfully) caused questions and ultimately was detected to be a culprit of stage2-stage3 mismatch, it was expected that InstCombine-based implementation would be fine. But apparently it's not, as http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24095/steps/compare-compilers/logs/stdio suggests. Which suggests that somewhere in InstCombine there is a loop over nondeterministically sorted container, which causes different worklist ordering. This reverts commit `3e69871ab5`.	2020-08-29 16:05:02 +03:00
Nikita Popov	6093b14c2c	[InstCombine] Return replaceInstUsesWith() result (NFC) Follow the usual usage pattern for this function and return the result.	2020-08-29 14:49:57 +02:00
Roman Lebedev	71ac9105cd	[InstCombine] foldAggregateConstructionIntoAggregateReuse(): use InstCombiner::replaceInstUsesWith() instead of RAUW We really shouldn't use RAUW in InstCombine because we should consistently update Worklist to avoid extra iterations.	2020-08-29 15:10:14 +03:00
Roman Lebedev	e65f213178	[InstCombine] canonicalizeICmpPredicate(): use InstCombiner::replaceInstUsesWith() instead of RAUW We really shouldn't use RAUW in InstCombine because we should consistently update Worklist to avoid extra iterations.	2020-08-29 15:10:14 +03:00
Roman Lebedev	bd12113f57	[NFC][InstCombine] Fix some comments: the code already uses IC::replaceInstUsesWith()	2020-08-29 15:10:14 +03:00
Roman Lebedev	49d223274f	[NFC][InstCombine] Add STATISTIC() for how many iterations we did As we've established, if it takes more than two iterations (one to perform folding and one to ensure that no folding opportunities remain) per function, then there are worklist management issues. So it may be interesting to keep track of it.	2020-08-29 15:10:13 +03:00
Roman Lebedev	4f4eecf0ec	[InstCombine] visitPHINode(): use InstCombiner::replaceInstUsesWith() instead of RAUW As noted in post-commit review, we really shouldn't use RAUW in InstCombine because we should consistently update Worklist to avoid extra iterations.	2020-08-29 15:10:00 +03:00
Roman Lebedev	3e69871ab5	[InstCombine] Take 2: Perform trivial PHI CSE The original take was `6102310d81`, which taught InstSimplify to do that, which seemed better at time, since we got EarlyCSE support for free. However, it was proven that we can not do that there, the simplified-to PHI would not be reachable from the original PHI, and that is not something InstSimplify is allowed to do, as noted in the commit `ed90f15efb` that reverted it : > It appears to cause compilation non-determinism and caused stage3 mismatches. However InstCombine already does many different optimizations, so it should be a safe place to do it here. Note that we still can't just compare incoming values ranges, because there is no guarantee that these PHI's we'd simplify to were already re-visited and sorted. However coming up with a test is problematic. Effects on vanilla llvm test-suite + RawSpeed: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|-------:\|---------:\|---------:\| \| instcombine.NumPHICSEs \| 0 \| 22228 \| 22228 \| 0.00% \| 0.00% \| \| asm-printer.EmittedInsts \| 7942329 \| 7942456 \| 127 \| 0.00% \| 0.00% \| \| assembler.ObjectBytes \| 254295632 \| 254313792 \| 18160 \| 0.01% \| 0.01% \| \| early-cse.NumCSE \| 2183283 \| 2183272 \| -11 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 550105 \| 541842 \| -8263 \| -1.50% \| 1.50% \| \| instcombine.NumAggregateReconstructionsSimplified \| 73 \| 4506 \| 4433 \| 6072.60% \| 6072.60% \| \| instcombine.NumCombined \| 3640311 \| 3666911 \| 26600 \| 0.73% \| 0.73% \| \| instcombine.NumDeadInst \| 1778204 \| 1783318 \| 5114 \| 0.29% \| 0.29% \| \| instcount.NumCallInst \| 1758395 \| 1758804 \| 409 \| 0.02% \| 0.02% \| \| instcount.NumInvokeInst \| 59478 \| 59502 \| 24 \| 0.04% \| 0.04% \| \| instcount.NumPHIInst \| 330557 \| 330549 \| -8 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1077138 \| 1077221 \| 83 \| 0.01% \| 0.01% \| \| instcount.TotalFuncs \| 101442 \| 101441 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8831946 \| 8832611 \| 665 \| 0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 4300 \| 4410 \| 110 \| 2.56% \| 2.56% \| \| simplifycfg.NumSimpl \| 1019813 \| 999740 \| -20073 \| -1.97% \| 1.97% \| ``` So it fires ~22k times, which is less than ~24k the take 1 did. It allows foldAggregateConstructionIntoAggregateReuse() to actually work after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG would have done this PHI CSE, of all places. Additionally, allows some more `invoke`->`call` folds to happen (+110, +2.56%). All in all, expectedly, this catches less things overall, but all the motivational cases are still caught, so all good.	2020-08-29 13:13:06 +03:00
Nikita Popov	57a26bb7b4	[InstCombine] Fix typo in comment (NFC) As pointed out in post-commit review of D63060.	2020-08-29 10:17:17 +02:00
Akira Hatanaka	0231a4e5bd	[ObjC][ARC] In HandlePotentialAlterRefCount, check whether an instruction can decrement the reference count, not whether it can alter it This prevents the state transition from S_Use to S_CanRelease when doing a bottom-up traversal and the transition from S_Retain to S_CanRelease when doing a top-down traversal when the visited instruction can increment the ref count but cannot decrement it. This allows the ARC optimizer to remove retain/release pairs which were previously not removed. rdar://problem/21793154	2020-08-28 17:45:14 -07:00
Fangrui Song	b5ef137c11	[gcov] Increment counters with atomicrmw if -fsanitize=thread Without this patch, `clang --coverage -fsanitize=thread` may fail spuriously because non-atomic counter increments can be detected as data races.	2020-08-28 16:32:35 -07:00
Craig Topper	aab90384a3	[Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None) There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None. So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline. This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations. Differential Revision: https://reviews.llvm.org/D86744	2020-08-28 13:23:45 -07:00
Arthur Eubanks	cfde93e5d6	[ObjCARCOpt] Port objc-arc to NPM Since doInitialization() in the legacy pass modifies the module, the NPM pass is a Module pass. Reviewed By: ahatanak, ychen Differential Revision: https://reviews.llvm.org/D86178	2020-08-28 12:59:33 -07:00
Tyker	6d3657417e	[SROA] Improve handleling of assumes bundles by SROA This patch fixes this crash https://gcc.godbolt.org/z/Ps8d1e And gives SROA the ability to remove assumes if it allows promoting an alloca to register Without removing assumes when it can't promote to register. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86570	2020-08-28 21:55:45 +02:00
Nikita Popov	ffe05dd125	[InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178) Fixes https://bugs.llvm.org/show_bug.cgi?id=42178 by folding usub.sat(a, b) + b to umax(a, b). The backend will expand umax back to usubsat if that is profitable. We may also want to handle uadd.sat(a, b) - b in the future. Differential Revision: https://reviews.llvm.org/D63060	2020-08-28 21:52:29 +02:00
Benjamin Kramer	8782c72765	Strength-reduce SmallVectors to arrays. NFCI.	2020-08-28 21:14:20 +02:00
David Sherwood	f4257c5832	[SVE] Make ElementCount members private This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isScalable(). In addition I've added some other member functions for more commonly used operations. Hopefully this makes the class more useful and will reduce the need for calling getKnownMinValue(). Differential Revision: https://reviews.llvm.org/D86065	2020-08-28 14:43:53 +01:00
Benjamin Kramer	3524c23ff2	[SCCP] Use bulk-remove API to bulk-remove attributes. NFCI.	2020-08-28 14:44:14 +02:00
Benjamin Kramer	dce72dc870	[FunctionAttrs] Bulk remove attributes. NFC.	2020-08-28 12:56:19 +02:00
Florian Hahn	43aa7227df	[DSE,MemorySSA] Check if Current is valid for elimination first. This changes getDomMemoryDef to check if a Current is a valid candidate for elimination before checking for reads. Before the change, we were spending a lot of compile-time in checking for read accesses for Current that might not even be removable. This patch flips the logic, so we skip Current if they cannot be removed before checking all their uses. This is much more efficient in practice. It also adds a more aggressive limit for checking partially overlapping stores. The main problem with overlapping stores is that we do not know if they will lead to elimination until seeing all of them. This patch limits adds a new limit for overlapping store candidates, which keeps the number of modified overlapping stores roughly the same. This is another substantial compile-time improvement (while also increasing the number of stores eliminated). Geomean -O3 -0.67%, ReleaseThinLTO -0.97%. http://llvm-compile-time-tracker.com/compare.php?from=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&to=2e630629b43f64b60b282e90f0d96082fde2dacc&stat=instructions Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86487	2020-08-28 11:19:04 +01:00
Florian Hahn	20e989e9de	[BuildLibCalls] Add argmemonly to more lib calls. strspn, strncmp, strcspn, strcasecmp, strncasecmp, memcmp, memchr, memrchr, memcpy, memmove, memcpy, mempcpy, strchr, strrchr, bcmp should all only access memory through their arguments. I broke out strcoll, strcasecmp, strncasecmp because the result depends on the locale, which might get accessed through memory. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86724	2020-08-28 09:50:38 +01:00
Shinji Okumura	50ebd1afa9	[Attributor] Do not manifest noundef for dead positions Even if noundef is deduced for a position, we should not manifest it when the position is dead. This is because the associated values with dead positions are replaced with undef values by AAIsDead. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86565	2020-08-28 05:58:18 +09:00
Christopher Tetreault	035833ae42	[SVE] Remove bad call to VectorType::getNumElements() from HeapProfiler Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D86727	2020-08-27 12:16:00 -07:00
Shinji Okumura	c5e6872ec6	[Attributor] Guarantee getAAFor not to update AA in the manifestation stage If we query an AA with `Attributor::getAAFor` in `AbstractAttribute::manifest`, the AA may be updated. This patch makes use of the phase flag in Attributor, and handle `getAAFor` behavior according to the flag. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86635	2020-08-28 04:07:42 +09:00
Christopher Tetreault	5e63083435	[SVE] Remove calls to VectorType::getNumElements from Transforms/Vectorize Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D82056	2020-08-27 12:02:20 -07:00
Teresa Johnson	5b9d462b7d	[HeapProf] Fix bot failures from instrumentation pass Fix bot failure from 7ed8124d46f94601d5f1364becee9cee8538265e: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/8533 Since we are always using dynamic shadow, insertDynamicShadowAtFunctionEntry should always return true for modifying the function.	2020-08-27 10:21:19 -07:00
Shinji Okumura	7a68f0f1e0	[Attributor] Add a phase flag to Attributor Add a new flag that indicates which stage in the process we are in. This flag is introduced for handling behavior of `getAAFor` according to the stage. (discussed in D86635) Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86678	2020-08-28 01:16:38 +09:00
Teresa Johnson	7ed8124d46	[HeapProf] Clang and LLVM support for heap profiling instrumentation See RFC for background: http://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html Note that the runtime changes will be sent separately (hopefully this week, need to add some tests). This patch includes the LLVM pass to instrument memory accesses with either inline sequences to increment the access count in the shadow location, or alternatively to call into the runtime. It also changes calls to memset/memcpy/memmove to the equivalent runtime version. The pass is modeled on the address sanitizer pass. The clang changes add the driver option to invoke the new pass, and to link with the upcoming heap profiling runtime libraries. Currently there is no attempt to optimize the instrumentation, e.g. to aggregate updates to the same memory allocation. That will be implemented as follow on work. Differential Revision: https://reviews.llvm.org/D85948	2020-08-27 08:50:35 -07:00
Florian Hahn	419c6948df	[SimplifyLibCalls] Remove over-eager early return in strlen optzns. Currently we bail out early for strlen calls with a GEP operand, if none of the GEP specific optimizations fire. But there could be later optimizations that still apply, which we currently miss out on. An example is that we do not apply the following optimization strlen(x) == 0 --> *x == 0 Unless I am missing something, there seems to be no reason for bailing out early there. Fixes PR47149. Reviewed By: lebedev.ri, xbolva00 Differential Revision: https://reviews.llvm.org/D85886	2020-08-27 15:19:45 +01:00
serge-sans-paille	4e29d25669	Fix OpenMP deduplicateRuntimeCalls return status Differential Revision: https://reviews.llvm.org/D86705	2020-08-27 15:01:04 +02:00
serge-sans-paille	5621571fc7	Fix Attributor return status Differential Revision: https://reviews.llvm.org/D86703	2020-08-27 15:01:04 +02:00
Florian Hahn	bb024c3c4e	[DSE,MemorySSA] Remove short-cut to check if all paths are covered. The post-order number early continue does not work in some cases, e.g. if a path from EarlierAccess to an exit includes a node that dominates EarlierAccess in a cycle. The short-cut only has very minor impact on compile-time, so it seems straight-forward to remove it for now: http://llvm-compile-time-tracker.com/compare.php?from=062412e79fcfedf2cf004433e42036b0333e3f83&to=d7386016a77ce1387bdbbf360f1de157faea9d31&stat=instructions Fixes PR47285.	2020-08-27 12:42:40 +01:00
Florian Hahn	e717fdb0f1	[DSE,MemorySSA] Traverse use-def chain without MemSSA Walker. For DSE with MemorySSA it is beneficial to manually traverse the defining access, instead of using a MemorySSA walker, so we can better control the number of steps together with other limits and also weed out invalid/unprofitable paths early on. This patch requires a follow-up patch to be most effective, which I will share soon after putting this patch up. This temporarily XFAIL's the limit tests, because we now explore more MemoryDefs that may not alias/clobber the killing def. This will be improved/fixed by the follow-up patch. This patch also renames some `Dom` variables to `Earlier`, because the dominance relation is not really used/important here and potentially confusing. This patch allows us to aggressively cut down compile time, geomean -O3 -0.64%, ReleaseThinLTO -1.65%, at the expense of fewer stores removed. Subsequent patches will increase the number of removed stores again, while keeping compile-time in check. http://llvm-compile-time-tracker.com/compare.php?from=d8e3294118a8c5f3f97688a704d5a05b67646012&to=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&stat=instructions Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86486	2020-08-27 10:02:02 +01:00
Shinji Okumura	6c25eca614	[Attributor] Add flag for undef value to the state of AAPotentialValues Currently, an undef value is reduced to 0 when it is added to a set of potential values. This patch introduces a flag for under values. By this, for example, we can merge two states `{undef}`, `{1}` to `{1}` (because we can reduce the undef to 1). Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85592	2020-08-27 16:30:29 +09:00
Arthur Eubanks	486ed88533	[ConstProp] Remove ConstantPropagation As discussed in http://lists.llvm.org/pipermail/llvm-dev/2020-July/143801.html. Currently no users outside of unit tests. Replace all instances in tests of -constprop with -instsimplify. Notable changes in tests: * vscale.ll - @llvm.sadd.sat.nxv16i8 is evaluated by instsimplify, use a fake intrinsic instead * InsertElement.ll - insertelement undef is removed by instsimplify in @insertelement_undef llvm/test/Transforms/ConstProp moved to llvm/test/Transforms/InstSimplify/ConstProp Reviewed By: lattner, nikic Differential Revision: https://reviews.llvm.org/D85159	2020-08-26 15:51:30 -07:00
Wei Mi	c67ccf5faf	[SampleFDO] Enhance profile remapping support for searching inline instance and indirect call promotion candidate. Profile remapping is a feature to match a function in the module with its profile in sample profile if the function name and the name in profile look different but are equivalent using given remapping rules. This is a useful feature to keep the performance stable by specifying some remapping rules when sampleFDO targets are going through some large scale function signature change. However, currently profile remapping support is only valid for outline function profile in SampleFDO. It cannot match a callee with an inline instance profile if they have different but equivalent names. We found that without the support for inline instance profile, remapping is less effective for some large scale change. To add that support, before any remapping lookup happens, all the names in the profile will be inserted into remapper and the Key to the name mapping will be recorded in a map called NameMap in the remapper. During name lookup, a Key will be returned for the given name and it will be used to extract an equivalent name in the profile from NameMap. So with the help of the NameMap, we can translate any given name to an equivalent name in the profile if it exists. Whenever we try to match a name in the module to a name in the profile, we will try the match with the original name first, and if it doesn't match, we will use the equivalent name got from remapper to try the match for another time. In this way, the patch can enhance the profile remapping support for searching inline instance and searching indirect call promotion candidate. In a planned large scale change of int64 type (long long) to int64_t (long), we found the performance of a google internal benchmark degraded by 2% if nothing was done. If existing profile remapping was enabled, the performance degradation dropped to 1.2%. If the profile remapping with the current patch was enabled, the performance degradation further dropped to 0.14% (Note the experiment was done before searching indirect call promotion candidate was added. We hope with the remapping support of searching indirect call promotion candidate, the degradation can drop to 0% in the end. It will be evaluated post commit). Differential Revision: https://reviews.llvm.org/D86332	2020-08-26 11:07:35 -07:00
Roman Lebedev	95848ea101	[Value][InstCombine] Fix one-use checks in PHI-of-op -> Op-of-PHI[s] transforms to be one-user checks As FIXME said, they really should be checking for a single user, not use, so let's do that. It is not that unusual to have the same value as incoming value in a PHI node, not unlike how a PHI may have the same incoming basic block more than once. There isn't a nice way to do that, Value::users() isn't uniqified, and Value only tracks it's uses, not Users, so the check is potentially costly since it does indeed potentially involes traversing the entire use list of a value.	2020-08-26 20:20:41 +03:00
Sjoerd Meijer	bda8fbe2d2	[LV] Fallback strategies if tail-folding fails This implements 2 different vectorisation fallback strategies if tail-folding fails: 1) don't vectorise at all, or 2) vectorise using a scalar epilogue. This can be controlled with option -prefer-predicate-over-epilogue, that has been changed to take a numeric value corresponding to the tail-folding preference and preferred fallback. Patch by: Pierre van Houtryve, Sjoerd Meijer. Differential Revision: https://reviews.llvm.org/D79783	2020-08-26 16:55:25 +01:00
Shinji Okumura	3050713798	[Attributor] Provide an edge-based interface in AAIsDead This patch produces an edge-based interface in AAIsDead. By this, we can query a set of basic blocks that are directly reachable from a given basic block. This is specifically useful for implementation of AAReachability. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85547	2020-08-26 16:57:52 +09:00
Roman Lebedev	1f90d45b9e	[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad While since D86306 we do it's sibling fold for `insertvalue`, we should also do this for `extractvalue`'s. And unlike that one, the results here are, quite honestly, shocking, as it can be observed here on vanilla llvm test-suite + RawSpeed results: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|--------:\|--------:\|-------:\| \| asm-printer.EmittedInsts \| 7945095 \| 7942507 \| -2588 \| -0.03% \| 0.03% \| \| assembler.ObjectBytes \| 273209920 \| 273069800 \| -140120 \| -0.05% \| 0.05% \| \| early-cse.NumCSE \| 2183363 \| 2183398 \| 35 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 541847 \| 550017 \| 8170 \| 1.51% \| 1.51% \| \| instcombine.NumAggregateReconstructionsSimplified \| 2139 \| 108 \| -2031 \| -94.95% \| 94.95% \| \| instcombine.NumCombined \| 3601364 \| 3635448 \| 34084 \| 0.95% \| 0.95% \| \| instcombine.NumConstProp \| 27153 \| 27157 \| 4 \| 0.01% \| 0.01% \| \| instcombine.NumDeadInst \| 1694521 \| 1765022 \| 70501 \| 4.16% \| 4.16% \| \| instcombine.NumPHIsOfExtractValues \| 0 \| 37546 \| 37546 \| 0.00% \| 0.00% \| \| instcombine.NumSunkInst \| 63158 \| 63686 \| 528 \| 0.84% \| 0.84% \| \| instcount.NumBrInst \| 874304 \| 871857 \| -2447 \| -0.28% \| 0.28% \| \| instcount.NumCallInst \| 1757657 \| 1758402 \| 745 \| 0.04% \| 0.04% \| \| instcount.NumExtractValueInst \| 45623 \| 11483 \| -34140 \| -74.83% \| 74.83% \| \| instcount.NumInsertValueInst \| 4983 \| 580 \| -4403 \| -88.36% \| 88.36% \| \| instcount.NumInvokeInst \| 61018 \| 59478 \| -1540 \| -2.52% \| 2.52% \| \| instcount.NumLandingPadInst \| 35334 \| 34215 \| -1119 \| -3.17% \| 3.17% \| \| instcount.NumPHIInst \| 344428 \| 331116 \| -13312 \| -3.86% \| 3.86% \| \| instcount.NumRetInst \| 100773 \| 100772 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1081154 \| 1077166 \| -3988 \| -0.37% \| 0.37% \| \| instcount.TotalFuncs \| 101443 \| 101442 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8890201 \| 8833747 \| -56454 \| -0.64% \| 0.64% \| \| instsimplify.NumSimplified \| 75822 \| 75707 \| -115 \| -0.15% \| 0.15% \| \| simplifycfg.NumHoistCommonCode \| 24203 \| 24197 \| -6 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 48201 \| 48195 \| -6 \| -0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 2785 \| 4298 \| 1513 \| 54.33% \| 54.33% \| \| simplifycfg.NumSimpl \| 997332 \| 1018189 \| 20857 \| 2.09% \| 2.09% \| \| simplifycfg.NumSinkCommonCode \| 7088 \| 6464 \| -624 \| -8.80% \| 8.80% \| \| simplifycfg.NumSinkCommonInstrs \| 15117 \| 14021 \| -1096 \| -7.25% \| 7.25% \| ``` ... which tells us that this new fold fires whopping 38k times, increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time), decreasing total instruction count by -0.64% (-56454), and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less) and `extractvalue`'s (-74.83%, i.e. four times less). This causes geomean -0.01% binary size decrease http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform `InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold, which is supposed to be dealing with such aggregate reconstructions, fires a lot less now. There are two reasons why: 1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes. We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun. 2. But then, EarlyCSE not only manages to fold such redundant PHI's, it also sees that the extract-insert chain recreates the original aggregate, and replaces it with the original aggregate. The take-aways are 1. We maybe should do most trivial, same-BB PHI CSE in InstCombine 2. I need to check if what other patterns remain, and how they can be resolved. (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away) This is a reland of the original commit `fcb51d8c24`, because originally i forgot to ensure that the base aggregate types match. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86530	2020-08-26 09:57:50 +03:00
Roman Lebedev	c295c6f2c0	Revert "[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad" This reverts commit `fcb51d8c24`. As buildbots report, there's apparently some missing check to ensure that the types of incoming values match the type of PHI. Let's revert for a moment.	2020-08-26 09:23:22 +03:00
Roman Lebedev	fcb51d8c24	[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad While since D86306 we do it's sibling fold for `insertvalue`, we should also do this for `extractvalue`'s. And unlike that one, the results here are, quite honestly, shocking, as it can be observed here on vanilla llvm test-suite + RawSpeed results: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|--------:\|--------:\|-------:\| \| asm-printer.EmittedInsts \| 7945095 \| 7942507 \| -2588 \| -0.03% \| 0.03% \| \| assembler.ObjectBytes \| 273209920 \| 273069800 \| -140120 \| -0.05% \| 0.05% \| \| early-cse.NumCSE \| 2183363 \| 2183398 \| 35 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 541847 \| 550017 \| 8170 \| 1.51% \| 1.51% \| \| instcombine.NumAggregateReconstructionsSimplified \| 2139 \| 108 \| -2031 \| -94.95% \| 94.95% \| \| instcombine.NumCombined \| 3601364 \| 3635448 \| 34084 \| 0.95% \| 0.95% \| \| instcombine.NumConstProp \| 27153 \| 27157 \| 4 \| 0.01% \| 0.01% \| \| instcombine.NumDeadInst \| 1694521 \| 1765022 \| 70501 \| 4.16% \| 4.16% \| \| instcombine.NumPHIsOfExtractValues \| 0 \| 37546 \| 37546 \| 0.00% \| 0.00% \| \| instcombine.NumSunkInst \| 63158 \| 63686 \| 528 \| 0.84% \| 0.84% \| \| instcount.NumBrInst \| 874304 \| 871857 \| -2447 \| -0.28% \| 0.28% \| \| instcount.NumCallInst \| 1757657 \| 1758402 \| 745 \| 0.04% \| 0.04% \| \| instcount.NumExtractValueInst \| 45623 \| 11483 \| -34140 \| -74.83% \| 74.83% \| \| instcount.NumInsertValueInst \| 4983 \| 580 \| -4403 \| -88.36% \| 88.36% \| \| instcount.NumInvokeInst \| 61018 \| 59478 \| -1540 \| -2.52% \| 2.52% \| \| instcount.NumLandingPadInst \| 35334 \| 34215 \| -1119 \| -3.17% \| 3.17% \| \| instcount.NumPHIInst \| 344428 \| 331116 \| -13312 \| -3.86% \| 3.86% \| \| instcount.NumRetInst \| 100773 \| 100772 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1081154 \| 1077166 \| -3988 \| -0.37% \| 0.37% \| \| instcount.TotalFuncs \| 101443 \| 101442 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8890201 \| 8833747 \| -56454 \| -0.64% \| 0.64% \| \| instsimplify.NumSimplified \| 75822 \| 75707 \| -115 \| -0.15% \| 0.15% \| \| simplifycfg.NumHoistCommonCode \| 24203 \| 24197 \| -6 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 48201 \| 48195 \| -6 \| -0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 2785 \| 4298 \| 1513 \| 54.33% \| 54.33% \| \| simplifycfg.NumSimpl \| 997332 \| 1018189 \| 20857 \| 2.09% \| 2.09% \| \| simplifycfg.NumSinkCommonCode \| 7088 \| 6464 \| -624 \| -8.80% \| 8.80% \| \| simplifycfg.NumSinkCommonInstrs \| 15117 \| 14021 \| -1096 \| -7.25% \| 7.25% \| ``` ... which tells us that this new fold fires whopping 38k times, increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time), decreasing total instruction count by -0.64% (-56454), and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less) and `extractvalue`'s (-74.83%, i.e. four times less). This causes geomean -0.01% binary size decrease http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform `InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold, which is supposed to be dealing with such aggregate reconstructions, fires a lot less now. There are two reasons why: 1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes. We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun. 2. But then, EarlyCSE not only manages to fold such redundant PHI's, it also sees that the extract-insert chain recreates the original aggregate, and replaces it with the original aggregate. The take-aways are 1. We maybe should do most trivial, same-BB PHI CSE in InstCombine 2. I need to check if what other patterns remain, and how they can be resolved. (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86530	2020-08-26 09:08:24 +03:00
Juneyoung Lee	f753f5b050	[ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands. Instead of special-casing llvm.assume, I think it is also a viable option to add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86477	2020-08-26 04:40:21 +09:00
Sanjay Patel	c4f0a0896f	[InstCombine] improve demanded element analysis for vector insert-of-extract (2nd try) The 1st attempt (rG557b890) was reverted because it caused miscompiles. That bug is avoided here by changing the order of folds and as verified in the new tests. Original commit message: InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here. But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests): https://alive2.llvm.org/ce/z/hp4hv3 (this is ok) https://alive2.llvm.org/ce/z/ehEWaN (poison leakage) SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739. Differential Revision: https://reviews.llvm.org/D86460	2020-08-25 11:19:36 -04:00
Sjoerd Meijer	ae366479e8	[LV] get.active.lane.mask consuming tripcount instead of backedge-taken count This adapts LV to the new semantics of get.active.lane.mask as discussed in D86147, which means that the LV now emits intrinsic get.active.lane.mask with the loop tripcount instead of the backedge-taken count as its second argument. The motivation for this is described in D86147. Differential Revision: https://reviews.llvm.org/D86304	2020-08-25 13:49:19 +01:00
Shinji Okumura	05390440a2	[Attributor][NFC] Clang format	2020-08-25 19:32:58 +09:00
Benjamin Kramer	c6fb72de4f	Revert "[InstCombine] improve demanded element analysis for vector insert-of-extract" This reverts commit `557b890ff4`. Causing miscompiles, test case is on llvm-commits.	2020-08-25 11:31:31 +02:00
David Sherwood	7b64765cd1	[SVE] Fix TypeSize related warnings with IR truncates of scalable vectors In getCastInstrCost when the instruction is a truncate we were relying upon the implicit TypeSize -> uint64_t cast when asking if a given type has the same size as a legal integer. I've changed the code to only ask the question if the type is fixed length. I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail out for now if the type is a scalable vector. I've added the following new tests: Analysis/CostModel/AArch64/sve-trunc.ll Transforms/InstCombine/AArch64/sve-trunc.ll for both of these fixes. Differential revision: https://reviews.llvm.org/D86432	2020-08-25 09:17:56 +01:00
Florian Hahn	e19ef1aab5	[DSE,MemorySSA] Cache accesses with/without reachable read-clobbers. Currently we repeatedly check the same uses for read clobbers in some cases. We can avoid unnecessary checks by keeping track of the memory accesses we already found read clobbers for. To do so, we just add memory access causing read-clobbers to a set. Note that marking all visited accesses as read-clobbers would be to pessimistic, as that might include accesses not on any path to the actual read clobber. If we do not find any read-clobbers, we can add all visited instructions to another set and use that to skip the same accesses in the next call. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D75025	2020-08-25 08:48:46 +01:00
Roman Lebedev	cdd339c568	[InstCombine] PHI-of-insertvalues -> insertvalue-of-PHI's As per statistic, this happens pretty exceedingly rare, but i have seen it in exactly the situations the Phi-aware aggregate reconstruction would have handled, eventually, and allowed invoke -> call fold later on. So while this might be something that other fold will have to learn about, i believe we should be doing this transform in general. Here, we are okay with adding two PHI's to get both the base aggregate, and the inserted value. I'm not sure it makes much sense to restrict it to a single phi (to just the inserted value?), because originally we'd be receiving the final aggregate already.. llvm test-suite + RawSpeed: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \\|%\\| \| \|--------------------------------------------\|-----------\|-----------\|-----:\|-------:\|------:\| \| instcombine.NumPHIsOfInsertValues \| 0 \| 12 \| 12 \| 0.00% \| 0.00% \| \| asm-printer.EmittedInsts \| 8926643 \| 8926595 \| -48 \| 0.00% \| 0.00% \| \| instcombine.NumCombined \| 3846614 \| 3846640 \| 26 \| 0.00% \| 0.00% \| \| instcombine.NumConstProp \| 24302 \| 24293 \| -9 \| -0.04% \| 0.04% \| \| instcombine.NumDeadInst \| 1620140 \| 1620112 \| -28 \| 0.00% \| 0.00% \| \| instcount.NumBrInst \| 898466 \| 898464 \| -2 \| 0.00% \| 0.00% \| \| instcount.NumCallInst \| 1760819 \| 1760875 \| 56 \| 0.00% \| 0.00% \| \| instcount.NumExtractValueInst \| 45659 \| 45649 \| -10 \| -0.02% \| 0.02% \| \| instcount.NumInsertValueInst \| 4991 \| 4981 \| -10 \| -0.20% \| 0.20% \| \| instcount.NumIntToPtrInst \| 27084 \| 27087 \| 3 \| 0.01% \| 0.01% \| \| instcount.NumPHIInst \| 371435 \| 371429 \| -6 \| 0.00% \| 0.00% \| \| instcount.NumStoreInst \| 906011 \| 906019 \| 8 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1105520 \| 1105518 \| -2 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9795737 \| 9795776 \| 39 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 2784 \| 2786 \| 2 \| 0.07% \| 0.07% \| \| simplifycfg.NumSimpl \| 1001840 \| 1001850 \| 10 \| 0.00% \| 0.00% \| \| simplifycfg.NumSinkCommonInstrs \| 15174 \| 15170 \| -4 \| -0.03% \| 0.03% \| ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86306	2020-08-25 10:38:11 +03:00
Sanjay Patel	557b890ff4	[InstCombine] improve demanded element analysis for vector insert-of-extract InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here. But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests): https://alive2.llvm.org/ce/z/hp4hv3 (this is ok) https://alive2.llvm.org/ce/z/ehEWaN (poison leakage) SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739. Differential Revision: https://reviews.llvm.org/D86460	2020-08-24 17:00:16 -04:00
Bjorn Pettersson	fce44ff5da	[Scalarizer] Avoid updating the name of globals The "takeName" logic at the end of ScalarizerVisitor::finish could end up renaming global variables when having simplified and extractelement instruction to simply pick a single vector element. If the input vector to the extractelement instruction held pointers to global variables we ended up renaming the global variable. The patch make sure we only take the name of the replaced Op when we have added new instructions that might need a useful name. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D86472	2020-08-24 21:55:03 +02:00
Roman Lebedev	56c529300e	[NFC][InstCombine] Adjust naming for some methods to match coding standards Requested as preparatory cleanup in https://reviews.llvm.org/D86306#inline-799065	2020-08-24 22:39:34 +03:00
Fangrui Song	44ee9d070a	Revert D85812 "[coroutine] should disable inline before calling coro split" This reverts commit `2e43acfed8`. LLVMCoroutines (the library which contains Coroutines.h) depends on LLVMipo (the library which contains SampleProfile.cpp). It is inappropriate for SampleProfile.cpp to depent on Coroutines.h (circular dependency). The test inverted dependencies as well: llvm/test/Transforms/Coroutines/coro-inline.ll uses -sample-profile.	2020-08-24 11:41:05 -07:00
Florian Hahn	d1a1cce5b1	[DSE,MemorySSA] Do not use callCapturesBefore in isReadClobber. Using callCapturesBefore potentially improves the precision and the number of stores we can remove. But in practice, it seems to have very little impact in terms of stores removed. For example, for SPEC2000/SPEC2006/MultiSource with -O3 -flto, ~50 more stores are removed (out of ~26900 stores removed). But in terms of compile-time, it is very expensive and the patch gives substantial compile-time improvements: Geomean O3 -0.24%, ReleaseThinLTO -0.47%, ReleaseLTO-g -0.39%. http://llvm-compile-time-tracker.com/compare.php?from=612a0bff88ed906c83b82f079d4c49e5fecfb9d0&to=e6c86b96d20d97dd88e903a409bd8d39b6114312&stat=instructions	2020-08-24 16:19:42 +01:00
dongAxis	2e43acfed8	[coroutine] should disable inline before calling coro split summary: When callee coroutine function is inlined into caller coroutine function before coro-split pass, llvm will emits "coroutine should have exactly one defining @llvm.coro.begin". It seems that coro-early pass can not handle this quiet well. So we believe that unsplited coroutine function should not be inlined. This patch fix such issue by not inlining function if it has attribute "coroutine.presplit" (it means the function has not been splited) to fix this issue TestPlan: check-llvm Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D85812	2020-08-24 22:22:08 +08:00
Francesco Petrogalli	5a34b3ab95	[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI] Changes: * Change `ToVectorTy` to deal directly with `ElementCount` instances. * `VF == 1` replaced with `VF.isScalar()`. * `VF > 1` and `VF >=2` replaced with `VF.isVector()`. * `VF <=1` is replaced with `VF.isZero() \|\| VF.isScalar()`. * Replaced the uses of `llvm::SmallSet<ElementCount, ...>` with `llvm::SmallSetVector<ElementCount, ...>`. This avoids the need of an ordering function for the `ElementCount` class. * Bits and pieces around printing the `ElementCount` to string streams. To guarantee that this change is a NFC, `VF.Min` and asserts are used in the following places: 1. When it doesn't make sense to deal with the scalable property, for example: a. When computing unrolling factors. b. When shuffle masks are built for fixed width vector types In this cases, an assert(!VF.Scalable && "<mgs>") has been added to make sure we don't enter coepaths that don't make sense for scalable vectors. 2. When there is a conscious decision to use `FixedVectorType`. These uses of `FixedVectorType` will likely be removed in favour of `VectorType` once the vectorizer is generic enough to deal with both fixed vector types and scalable vector types. 3. When dealing with building constants out of the value of VF, for example when computing the vectorization `step`, or building vectors of indices. These operation _make sense_ for scalable vectors too, but changing the code in these places to be generic and make it work for scalable vectors is to be submitted in a separate patch, as it is a functional change. 4. When building the potential VFs in VPlan. Making the VPlan generic enough to handle scalable vectorization factors is a functional change that needs a separate patch. See for example `void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF)`. 5. The class `IntrinsicCostAttribute`: this class still uses `unsigned VF` as updating the field to use `ElementCount` woudl require changes that could result in changing the behavior of the compiler. Will be done in a separate patch. 7. When dealing with user input for forcing the vectorization factor. In this case, adding support for scalable vectorization is a functional change that migh require changes at command line. Note that in some places the idiom ``` unsigned VF = ... auto VTy = FixedVectorType::get(ScalarTy, VF) ``` has been replaced with ``` ElementCount VF = ... assert(!VF.Scalable && ...); auto VTy = VectorType::get(ScalarTy, VF) ``` The assertion guarantees that the new code is (at least in debug mode) functionally equivalent to the old version. Notice that this change had been possible because none of the methods that are specific to `FixedVectorType` were used after the instantiation of `VTy`. Reviewed By: rengolin, ctetreau Differential Revision: https://reviews.llvm.org/D85794	2020-08-24 13:54:03 +00:00
Francesco Petrogalli	bad7d6b373	Revert "[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]" Reverting because the commit message doesn't reflect the one agreed on phabricator at https://reviews.llvm.org/D85794. This reverts commit `c8d2b065b9`.	2020-08-24 13:50:55 +00:00
Francesco Petrogalli	c8d2b065b9	[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI] Changes: * Change `ToVectorTy` to deal directly with `ElementCount` instances. * `VF == 1` replaced with `VF.isScalar()`. * `VF > 1` and `VF >=2` replaced with `VF.isVector()`. * `VF <=1` is replaced with `VF.isZero() \|\| VF.isScalar()`. * Add `<` operator to `ElementCount` to be able to use `llvm::SmallSetVector<ElementCount, ...>`. * Bits and pieces around printing the ElementCount to string streams. * Added a static method to `ElementCount` to represent a scalar. To guarantee that this change is a NFC, `VF.Min` and asserts are used in the following places: 1. When it doesn't make sense to deal with the scalable property, for example: a. When computing unrolling factors. b. When shuffle masks are built for fixed width vector types In this cases, an assert(!VF.Scalable && "<mgs>") has been added to make sure we don't enter coepaths that don't make sense for scalable vectors. 2. When there is a conscious decision to use `FixedVectorType`. These uses of `FixedVectorType` will likely be removed in favour of `VectorType` once the vectorizer is generic enough to deal with both fixed vector types and scalable vector types. 3. When dealing with building constants out of the value of VF, for example when computing the vectorization `step`, or building vectors of indices. These operation _make sense_ for scalable vectors too, but changing the code in these places to be generic and make it work for scalable vectors is to be submitted in a separate patch, as it is a functional change. 4. When building the potential VFs in VPlan. Making the VPlan generic enough to handle scalable vectorization factors is a functional change that needs a separate patch. See for example `void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF)`. 5. The class `IntrinsicCostAttribute`: this class still uses `unsigned VF` as updating the field to use `ElementCount` woudl require changes that could result in changing the behavior of the compiler. Will be done in a separate patch. 7. When dealing with user input for forcing the vectorization factor. In this case, adding support for scalable vectorization is a functional change that migh require changes at command line. Differential Revision: https://reviews.llvm.org/D85794	2020-08-24 13:39:42 +00:00
Florian Hahn	b99a5eb659	[DSE,MemorySSA] Delay PointerMayBeCaptured calls until actually needed. Avoid computing InvisibleToCallerBefore/AfterRet up front. In most cases, this information is not really needed. Instead, introduce helper functions to compute and cache the result on demand. Notably, this also does not use PointerMayBeCapturedBefore for isInvisibleToCallerBeforeRet, as it requires the killing MemoryDef as starting instruction, making the caching ineffective. But it appears the use of PointerMayBeCapturedBefore has very limited benefits in practice (e.g. on SPEC2000/SPEC2006/MultiSource there are no binary changes with -O3 -flto). Refrain from using it for now, to limit-compile-time. This gives some nice compile-time improvements: http://llvm-compile-time-tracker.com/compare.php?from=db9345f6810f379a36752dc52caf5230585d0ebd&to=b4d091047e1b8a3d377d200137b79d03aca65663&stat=instructions	2020-08-24 14:05:44 +01:00
Florian Hahn	2431b143ae	[DSE,MemorySSA] Limit elimination at end of function to single UO. Limit elimination of stores at the end of a function to MemoryDefs with a single underlying object, to save compile time. In practice, the case with multiple underlying objects seems not very important in practice. For -O3 -flto on MultiSource/SPEC2000/SPEC2006 this results in a total of 2 more stores being eliminated. We can always re-visit that in the future.	2020-08-24 13:00:17 +01:00
Sanjay Patel	6a44edb8da	[InstCombine] fold abs of select with negated op (PR39474) Similar to the existing transform - peek through a select to match a value and its negation. https://alive2.llvm.org/ce/z/MXi5KG define i8 @src(i1 %b, i8 %x) { %0: %neg = sub i8 0, %x %sel = select i1 %b, i8 %x, i8 %neg %abs = abs i8 %sel, 1 ret i8 %abs } => define i8 @tgt(i1 %b, i8 %x) { %0: %abs = abs i8 %x, 1 ret i8 %abs } Transformation seems to be correct!	2020-08-24 07:37:55 -04:00
Sam Parker	8ce450da32	[NFCI][SimplifyCFG] Combine select costs and checks Combine the cost modelling and validity checks for the phi to select conversion in SpeculativelyExecuteBB, extracting the logic out into a function.	2020-08-24 09:16:11 +01:00
Roman Lebedev	f6decfa36d	[InstCombine] Negator: freeze is freely negatible if it's operand is negatible	2020-08-23 23:28:19 +03:00
Florian Hahn	2843c9fe0a	[DSE,MemorySSA] Keep single DL instance in DSEState (NFC). Small cleanup, also removes one instance of getting DataLayout without using it later.	2020-08-23 15:56:38 +01:00
Sanjay Patel	ec06b38130	[InstCombine] canonicalize 'not' ops before logical shifts This reverses the existing transform that would uniformly canonicalize any 'xor' after any shift. In the case of logical shifts, that turns a 'not' into an arbitrary 'xor' with constant, and that's probably not as good for analysis, SCEV, or codegen. The SCEV motivating case is discussed in: http://bugs.llvm.org/PR47136 There's an analysis motivating case at: http://bugs.llvm.org/PR38781 I did draft a patch that would do the same for 'ashr' but that's questionable because it's just swapping the position of a 'not' and uncovers at least 2 missing folds that we would probably need to deal with as preliminary steps. Alive proofs: https://rise4fun.com/Alive/BBV Name: shift right of 'not' Pre: C2 == (-1 u>> C1) %a = lshr i8 %x, C1 %r = xor i8 %a, C2 => %n = xor i8 %x, -1 %r = lshr i8 %n, C1 Name: shift left of 'not' Pre: C2 == (-1 << C1) %a = shl i8 %x, C1 %r = xor i8 %a, C2 => %n = xor i8 %x, -1 %r = shl i8 %n, C1 Name: ashr of 'not' %a = ashr i8 %x, C1 %r = xor i8 %a, -1 => %n = xor i8 %x, -1 %r = ashr i8 %n, C1 Differential Revision: https://reviews.llvm.org/D86243	2020-08-22 09:38:13 -04:00
Florian Hahn	5e7e2162d4	[DSE,MemorySSA] Use BatchAA for AA queries. We can use BatchAA to avoid some repeated AA queries. We only remove stores, so I think we will get away with using a single BatchAA instance for the complete run. The changes in AliasAnalysis.h mirror the changes in D85583. The change improves compile-time by roughly 1%. http://llvm-compile-time-tracker.com/compare.php?from=67ad786353dfcc7633c65de11601d7823746378e&to=10529e5b43809808e8c198f88fffd8f756554e45&stat=instructions This is part of the patches to bring down compile-time to the level referenced in http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86275	2020-08-22 08:36:35 +01:00
Roman Lebedev	503deec218	Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline" As disscussed in post-commit review starting with https://reviews.llvm.org/D84108#2227365 while this appears to be mostly a win overall, especially code-size-wise, this appears to shake //certain// code pattens in a way that is extremely unfavorable for performance (+30% runtime regression) on certain CPU's (i personally can't reproduce). So until the behaviour is better understood, and a path forward is mapped, let's back this out for now. This reverts commit `1d51dc38d8`.	2020-08-22 00:33:22 +03:00
kuterd	65fcc0ee31	[Attributor] Function seed allow list - Adds a command line option to seed only selected functions. - Makes seed allow listing exclusive to assertions enabled builds. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D86129	2020-08-21 23:55:26 +03:00
Shinji Okumura	e21a22a7a8	[Attributor] fix AANoUndef initialization Currently, `AANoUndefImpl::initialize` mistakenly always indicates optimistic fixpoint for function returned position. This is because an associated value is `Function` in the case, and `isGuaranteedNotToBeUndefOrPoison` returns true for Function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86361	2020-08-22 05:06:14 +09:00
Amy Huang	5e3fd471ac	[Cloning] Fix to cloning DISubprograms. When trying to enable -debug-info-kind=constructor there was an assert that occurs during debug info cloning ("mismatched subprogram between llvm.dbg.value variable and !dbg attachment"). It appears that during llvm::CloneFunctionInto, a DISubprogram could be duplicated when MapMetadata is called, and then added to the MD map again when DIFinder gets a list of subprograms. This results in two different versions of the DISubprogram. This patch switches the order so that the DIFinder subprograms are added before MapMetadata is called. Fixes https://bugs.llvm.org/show_bug.cgi?id=46784 Differential Revision: https://reviews.llvm.org/D86185	2020-08-21 11:54:56 -07:00
Serguei Katkov	9e362bb0eb	[InstCombine] Remove unused entries in gc-live bundle of statepoint If some of gc live value are not used in gc.relocate we can remove them from gc-live bundle of statepoint instruction. Also the CL removes duplicated Values in gc-live bundle. Reviewers: reames, dantrushin Reviewed By: dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D85959	2020-08-22 01:36:22 +07:00
Serguei Katkov	63d9d56a55	[InstCombine] Move handling of gc.relocate in a gc.statepoint The only def for gc.relocate is a gc.statepoint. But real dependency is not described by def-use chain. Instead this dependency is encoded by indecies of operands in gc-live bundle of statepoint as integer constants in gc.relocate. InstCombine operates by def-use chain. As a result when value in gc-live bundle is simplified the gc.statepoint itself is not simplified but it might simplify dependent gc.relocates. To trigger the optimization of gc.relocate we now unconditionally trigger check of all dependent gc.relocates by adding them to worklist. This CL handles of gc.relocates as process of gc.statepoint optimization considering gc.statepoint and related gc.relocate as whole entity. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D85954	2020-08-21 23:44:23 +07:00
Florian Hahn	8eded24bf4	Recommit "[SCEVExpander] Add helper to clean up instrs inserted while expanding." Recommit the patch after fixing an issue reported caused by the fact that re-used values are also added to InsertedValues. Additional tests have been added in `88818491b9` This reverts the revert commit `38884641f2`.	2020-08-21 15:04:17 +01:00
Sam Parker	bfc6d8b59b	[NFC][SimplifyCFG] Formatting and variable rename	2020-08-21 13:11:17 +01:00
Florian Hahn	9f7350672e	[DSE,MemorySSA] Handle atomicrmw/cmpxchg conservatively. This adds conservative handling of AtomicRMW/AtomicCmpXChg to isDSEBarrier, similar to atomic loads and stores.	2020-08-21 10:42:42 +01:00
Yevgeny Rouban	18bc400f97	[NewPM][PassInstrumentation] Add PreservedAnalyses parameter to AfterPass* callbacks Both AfterPass and AfterPassInvalidated pass instrumentation callbacks get additional parameter of type PreservedAnalyses. This patch was created by @fedor.sergeev. I have just slightly changed it. Reviewers: fedor.sergeev Differential Revision: https://reviews.llvm.org/D81555	2020-08-21 16:10:42 +07:00
Sam Parker	47251582f5	[SimplifyCFG] Cost required selects Before we speculatively execute a basic block, query the cost of inserting the necessary select instructions against the phi folding threshold. For non-trivial insertions, a more accurate decision can probably be made during machine if-conversion. With minsize we query the CodeSize cost, otherwise we use SizeAndLatency. Differential Revision: https://reviews.llvm.org/D82438	2020-08-21 09:52:52 +01:00
Florian Hahn	a0e92ffd0d	[DSE,MemorySSA] Split off partial tracking from isOverwite. When traversing memory uses to look for aliasing reads/writes, we only care about complete overwrites. This patch splits off the partial overwrite tracking from isOverwrite This avoids some unnecessary work when checking for read/write clobbers with MemorySSA-DSE. isOverwrite, which skips the partial overwrite tracking. This gives a relatively small improvement http://llvm-compile-time-tracker.com/compare.php?from=ef2a2f77f87553a0a4a39f518eb9ac86b756bda6&to=658f3905dd96d3415f3782adc712c79fa59a4665&stat=instructions This is part of the patches to bring down compile-time to the level referenced in http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86280	2020-08-21 09:13:59 +01:00
David Green	2b69efded0	[ARM][LV] Add a preferPredicatedReductionSelect target hook As part of D84741, this adds a target hook for the preferPredicatedReductionSelect option and makes use of it under MVE, allowing us to tail predicate most reduction loops. Differential Revision: https://reviews.llvm.org/D85980	2020-08-21 08:48:12 +01:00
David Green	816097e4e5	[LV] Allow tail folded reduction selects to remain in the loop The normal scheme for tail folding reductions is to use: loop: p = phi(0, a) mask = ... x = masked_load(..., mask) a = add(x, p) s = select(mask, a, p) This means we need to keep the register p and a alive out of the loop, plus the mask. On a target with predicated operations we can instead generate the phi as p = phi(0, s). This ensures the select in the loop and we can fold select(m, add(a, b), c) to something like a vaddt c, a, b using the m predicate. This in turn allows us to tail predicate the entire loop. Differential Revision: https://reviews.llvm.org/D84741	2020-08-20 14:31:14 +01:00
Shinji Okumura	835cfa5def	[Attributor] Handle CallBase case in AAValueConstantRange::initialize Currently, although we handle `CallBase` case in updateImpl, we give up in initialize in the case. That is problematic when we propagate a range from call site returned position to floating position. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86196	2020-08-20 20:15:19 +09:00
David Stenberg	8206257cb8	[GlobalOpt] Fix an incorrect Modified status When removing a non-constant store to a global in CleanupPointerRootUsers(), the GlobalOpt pass could incorrectly return false. This was caught using the check introduced by D80916. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86149	2020-08-20 11:52:09 +02:00
David Stenberg	7a1029fd1e	Reland "[LoopUnswitch] Fix incorrect Modified status" Relanded since the buildbot issue was unrelated to this commit. When hoisting simple values out from a loop, and an optsize attribute, a convergent call, or an invoke instruction hindered the pass from unswitching the loop, the pass would return an incorrect Modified status. This was caught using the check introduced by D80916. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86085	2020-08-20 11:52:09 +02:00
David Stenberg	ca688ae497	Revert "[LoopUnswitch] Fix incorrect Modified status" This reverts commit `dfd447c220`. After I pushed this commit, llvm-sphinx-docs started failing, due to: Warning, treated as error: extension 'recommonmark' has no setup() function; is it really a Sphinx extension module? I don't see how this commit may have caused that, but I'm still reverting it since I don't know how to proceed with that troubleshooting.	2020-08-20 11:14:23 +02:00
Evgeny Leviant	d5b701b972	[ThinLTO] Import globals recursively Differential revision: https://reviews.llvm.org/D73698	2020-08-20 12:13:43 +03:00
David Stenberg	dfd447c220	[LoopUnswitch] Fix incorrect Modified status When hoisting simple values out from a loop, and an optsize attribute, a convergent call, or an invoke instruction hindered the pass from unswitching the loop, the pass would return an incorrect Modified status. This was caught using the check introduced by D80916. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86085	2020-08-20 09:04:16 +02:00
Johannes Doerfert	012819f301	[Attributor][FIX] Update the call graph properly when internalizing functions The internal version is now part of the SCC, make sure to perform this update.	2020-08-20 01:44:58 -05:00
Johannes Doerfert	3edea15f9a	[Attributor] Simplify comparison against constant null pointer Comparison against null is a common pattern that usually is followed by error handling code and the likes. We now use AANonNull to simplify these comparisons optimistically in order to make more code dead early on. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D86145	2020-08-20 01:44:58 -05:00
Johannes Doerfert	d01ad217ba	[Attributor][FIX] Do not use cyclic arguments for `nonnull` `AADereferenceable::getAssumedDereferenceableBytes()` is actually deducing `dereferenceable_or_null`. We should not use that information to deduce `nonnull`, since it doesn't imply `nonnull`.	2020-08-20 01:44:58 -05:00
Johannes Doerfert	a49dae0e38	[Attributor][AAIsDead][NFC] Skip uninteresting instructions early	2020-08-20 01:44:58 -05:00
Johannes Doerfert	08f33756e6	[Attributor][NFC] Extract functionality into own member	2020-08-20 01:44:58 -05:00
Johannes Doerfert	1de70a724e	Revert "[OpenMPOpt] ICV tracking for calls" This commits breaks certain OpenMP codes (on power) because it expanded the Attributor scope without telling the Attributor about the SCC extend. See: https://reviews.llvm.org/D85544#2227611 This reverts commit `b0b32e6490`.	2020-08-20 00:00:35 -05:00
Kyungwoo Lee	7a028fe702	Force Remove Attribute -force-attribute adds an attribute to function via command-line. However, there was no counter-part to remove an attribute. This patch adds -force-remove-attribute that removes an attribute from function. Differential Revision: https://reviews.llvm.org/D85586	2020-08-19 17:30:13 -04:00
Sanjay Patel	6f3511a01a	[ValueTracking] define/use max recursion depth in header There's a potential motivating case to increase this limit in PR47191: http://bugs.llvm.org/PR47191 But first we should make it less hacky. The limit in InstCombine is directly tied to this value because an increase there can cause asserts in the underlying value tracking calls if not changed together. The usage in VectorUtils is independent, but the comment suggests that we should use the same value unless there's a known reason to diverge. There are similar limits in codegen analysis, but I think we should leave those independent in case we intentionally want the optimization power/cost to be different there. Differential Revision: https://reviews.llvm.org/D86113	2020-08-19 16:56:59 -04:00
Hiroshi Yamauchi	ab401a8c8a	[PGO][PGSO][LV] Fix loop not vectorized issue under profile guided size opts. D81345 appears to accidentally disables vectorization when explicitly enabled. As PGSO isn't currently accessible from LoopAccessInfo, revert back to the vectorization with versioning-for-unit-stride for PGSO. Differential Revision: https://reviews.llvm.org/D85784	2020-08-19 12:13:34 -07:00
Florian Hahn	c0cbe6453a	[DSE] Remove dead argument from removePartiallyOverlappedStores (NFC). The argument is unused and can be removed.	2020-08-19 19:33:52 +01:00
Mehdi Amini	a407ec9b6d	Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private."" Was reverted because MLIR/Flang builds were broken, these APIs have been fixed in the meantime.	2020-08-19 17:26:36 +00:00
Mehdi Amini	4fc56d70aa	Revert "[NFC][llvm] Make the contructors of `ElementCount` private." This reverts commit `264afb9e6a`. (and dependent `6b742cc48` and `fc53bd610f`) MLIR/Flang are broken.	2020-08-19 17:21:37 +00:00
Hamilton Tobon Mosquera	bd2fa1819b	[OpenMPOpt][HideMemTransfersLatency] Moving the 'wait' counterpart of __tgt_target_data_begin_mapper canBeMovedDownwards checks if the "wait" counterpart of __tgt_target_data_begin_mapper can be moved downwards, returning a pointer to the instruction that might require/modify the data transferred, and returning null it the movement is not possible or not worth it. The function splitTargetDataBeginRTC receives that returned instruction and instead of moving the "wait" it creates it at that point. Differential Revision: https://reviews.llvm.org/D86155	2020-08-19 11:42:22 -05:00
Francesco Petrogalli	264afb9e6a	[NFC][llvm] Make the contructors of `ElementCount` private. Differential Revision: https://reviews.llvm.org/D86120	2020-08-19 16:26:44 +00:00
Sanjay Patel	c8d711adae	[InstCombine] reduce code duplication; NFC	2020-08-19 12:05:12 -04:00
Benjamin Kramer	b98e25b6d7	Make helpers static. NFC.	2020-08-19 16:00:03 +02:00
Roman Lebedev	3d76a133c7	Revert "[InstCombine] Lower infinite combine loop detection thresholds" And as being reported by Florian Hahn, there's a hit in MultiSource/Benchmarks/mafft from the test-suite on X86 with -O3 -flto, so reverting until addressed. This reverts commit `71e0b82c9f`.	2020-08-19 16:53:30 +03:00
Roman Lebedev	71e0b82c9f	[InstCombine] Lower infinite combine loop detection thresholds It's been a month since `2f3862eb9f`, and no new bug reports about the threshold were filled, so let's bump it again and wait again.	2020-08-19 14:37:57 +03:00
sstefan1	b0b32e6490	[OpenMPOpt] ICV tracking for calls Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a function can have a unique ICV value after it is finished, and AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a call site. This enables us to check the value of a call and if it changes the ICV. This also changes the approach in `getReplacementValues()` to a worklist-based approach so we can explore all relevant BBs. Differential Revision: https://reviews.llvm.org/D85544	2020-08-19 11:43:12 +02:00
Florian Hahn	1a55fbceaa	[DSE,MemorySSA] Use NumRedundantStores instead of NumNoopStores. Legacy DSE uses NumRedundantStores, while MemorySSA DSE uses NumNoopStores. We should just use the same counter.	2020-08-19 08:50:33 +01:00
Roman Lebedev	2f01785857	[NFC][InstCombine] Aggregate reconstruction: use plain map Now that we no longer require for this map to have stable iteration order, we no longer need to pay for keeping the iteration order stable, so switch from `SmallMapVector` to `SmallDenseMap`.	2020-08-19 01:09:25 +03:00
Roman Lebedev	78bd4231bf	[InstCombine] PHI-aware aggregate reconstruction: properly handle duplicate predecessors While it may seem like we can just "deduplicate" the case where some basic block happens to be a predecessor more than once, which happens for e.g. switches, that is not correct thing to do. We must actually add a PHI operand for each predecessor. This was initially reported to me by David Major as a clang crash during gecko build for android.	2020-08-19 01:00:42 +03:00
Sanjay Patel	139da9c4d7	[InstCombine] fold fabs of select with negated operand This is the FP example shown in: https://bugs.llvm.org/PR39474	2020-08-18 09:23:07 -04:00
Shinji Okumura	5e361e2aa4	[Attributor] Deduce noundef attribute This patch introduces a new abstract attribute `AANoUndef` which corresponds to `noundef` IR attribute and deduce them. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85184	2020-08-18 18:05:54 +09:00
Johannes Doerfert	8abd69aa9e	[Attributor] Bail early if AAMemoryLocation cannot derive anything Before this change we looked through all memory operations in a function even if the first was an unknown call that could do anything. This did cost a lot of time but there is little use to do so. We also avoid creating AAs for things that we would have looked at in case no other AA will; that is the reason for the test changes. Running only the attributor-cgscc pass on a IR version of `llvm-test-suite/MultiSource/Applications/SPASS/clause.c` reduced the time we spend in `AAMemoryLocation::update` from 4% total to 0.9% (disclaimer: no accurate measurements).	2020-08-17 23:36:36 -05:00
Johannes Doerfert	1d99c3d707	[Attributor] We (should) keep the CG updated so we can mark it as preserved	2020-08-17 23:36:36 -05:00
Johannes Doerfert	858c75f7d1	[Attributor][NFC] Directly return proper type to avoid casts	2020-08-17 23:36:36 -05:00
Johannes Doerfert	b27bdf955a	[Attributor][FIX] Handle function pointers properly in AANonNull Before we tired to create a dominator tree for a declaration when we wanted to determine if the function pointer is `nonnull`. We now avoid looking at global values if `Value::getPointerDereferenceableBytes` not already determined `nonnull`.	2020-08-17 23:36:35 -05:00
Hamilton Tobon Mosquera	496f8e5b36	[OpenMPOpt][HideMemTransfersLatency] Split __tgt_target_data_begin_mapper into its "issue" and "wait" counterparts. WIP that tries to hide the latency of runtime calls that involve host to device memory transfers by splitting them into their "issue" and "wait" versions. The "issue" is moved upwards as much as possible. The "wait" is moved downards as much as possible. The "issue" issues the memory transfer asynchronously, returning a handle. The "wait" waits in the returned handle for the memory transfer to finish. We still lack of the movement.	2020-08-17 20:56:10 -05:00
Aditya Kumar	370330f084	NFC: [GVNHoist] Outline functions from the class Reviewers: sebpop Reviewed By: hiraditya Differential Revision: https://reviews.llvm.org/D86032	2020-08-17 17:40:04 -07:00
Johannes Doerfert	19bd4ef157	[Attributor] Properly use the call site argument position	2020-08-17 18:21:09 -05:00
Johannes Doerfert	5dfc207c53	[Attributor][FIX] Do not request an AANonNull for non-pointer types	2020-08-17 18:21:08 -05:00
Roman Lebedev	03127f795b	[InstCombine] PHI-aware aggregate reconstruction: correctly detect "use" basic block While the original implementation added in D85787 / `ae7f08812e` is not incorrect, it is known to be suboptimal. In particular, it is not incorrect to use the basic block in which the original `insertvalue` instruction is located as the merge point, that is not necessarily optimal, as `@test6` shows. We should look at all the AggElts, and, if they are all defined in the same basic block, then that is the basic block we should use. On RawSpeed library, this catches +4% (+50) more cases. On vanilla LLVM test-suits, this catches +12% (+92) more cases.	2020-08-18 00:45:18 +03:00
Roman Lebedev	f4f673e0e3	[NFC][InstCombine] PHI-aware aggregate reconstruction: don't capture UseBB in lambdas, take it as argument In a following patch, UseBB will be detected later, so capturing it is potentially error-prone (capture by ref vs by val). Also, parametrized UseBB will likely be needed for multiple levels of PHI indirections later on anyways.	2020-08-18 00:45:18 +03:00
Roman Lebedev	4973ca3eac	[NFC][InstCombine] PHI-aware aggregate reconstruction: insert PHI node manually This is NFC at the moment, because right now we always insert the PHI into the same basic block in which the original `insertvalue` instruction is, but that will change. Also, fixes addition of the suffix to the value names.	2020-08-18 00:45:17 +03:00
Florian Hahn	4cc20aa743	[DSE,MemorySSA] Skip access already dominated by a killing def. If we already found a killing def (= a def that completely overwrites the location) that dominates an access, we can skip processing it further. This does not help with compile-time, but increases the number of memory accesses we can process with the same scan budget, leading to more stores being eliminated. Improvements with this change Same hash: 203 (filtered out) Remaining: 34 Metric: dse.NumFastStores Program base dom diff test-suite...rolangs-C++/family/family.test 2.00 4.00 100.0% test-suite...ProxyApps-C++/CLAMR/CLAMR.test 172.00 229.00 33.1% test-suite...ks/Prolangs-C/agrep/agrep.test 10.00 12.00 20.0% test-suite...oxyApps-C++/miniFE/miniFE.test 44.00 51.00 15.9% test-suite...marks/7zip/7zip-benchmark.test 1285.00 1474.00 14.7% test-suite...006/450.soplex/450.soplex.test 254.00 289.00 13.8% test-suite...006/447.dealII/447.dealII.test 2466.00 2798.00 13.5% test-suite...000/197.parser/197.parser.test 9.00 10.00 11.1% test-suite.../Benchmarks/nbench/nbench.test 85.00 91.00 7.1% test-suite...ce/Applications/siod/siod.test 68.00 72.00 5.9% test-suite...ications/JM/lencod/lencod.test 786.00 824.00 4.8% test-suite...6/464.h264ref/464.h264ref.test 765.00 798.00 4.3% test-suite.../Benchmarks/Ptrdist/bc/bc.test 105.00 109.00 3.8% test-suite...lications/obsequi/Obsequi.test 29.00 28.00 -3.4% test-suite...3.xalancbmk/483.xalancbmk.test 1322.00 1367.00 3.4% test-suite...chmarks/MallocBench/gs/gs.test 118.00 122.00 3.4% test-suite...T2006/401.bzip2/401.bzip2.test 60.00 62.00 3.3% test-suite...6/482.sphinx3/482.sphinx3.test 30.00 31.00 3.3% test-suite...rks/tramp3d-v4/tramp3d-v4.test 862.00 887.00 2.9% test-suite...telecomm-gsm/telecomm-gsm.test 78.00 80.00 2.6% test-suite...ediabench/gsm/toast/toast.test 78.00 80.00 2.6% test-suite.../Applications/SPASS/SPASS.test 163.00 167.00 2.5% test-suite...lications/ClamAV/clamscan.test 240.00 245.00 2.1% test-suite...006/453.povray/453.povray.test 1392.00 1419.00 1.9% test-suite...000/255.vortex/255.vortex.test 211.00 215.00 1.9% test-suite...:: External/Povray/povray.test 1295.00 1317.00 1.7% test-suite...lications/sqlite3/sqlite3.test 175.00 177.00 1.1% test-suite...T2000/256.bzip2/256.bzip2.test 99.00 100.00 1.0% test-suite...0/253.perlbmk/253.perlbmk.test 629.00 635.00 1.0% test-suite.../CINT2006/403.gcc/403.gcc.test 1183.00 1194.00 0.9% test-suite.../CINT2000/176.gcc/176.gcc.test 647.00 653.00 0.9% test-suite...ications/JM/ldecod/ldecod.test 512.00 516.00 0.8% test-suite...0.perlbench/400.perlbench.test 1026.00 1034.00 0.8% test-suite...-typeset/consumer-typeset.test 1876.00 1877.00 0.1% Geomean difference 7.3%	2020-08-17 20:54:48 +01:00
Florian Hahn	df4756ec6c	[DSE,MemorySSA] Check for underlying objects first. isWriteAtEndOfFunction needs to check all memory uses of Def, which is much more expensive than getting the underlying objects in practice. Switch the call order, as recommended by the TODO, which was added as per an earlier review. This shaves off a bit of compile-time.	2020-08-17 18:52:18 +01:00
Florian Hahn	139810449b	[DSE,MemorySSA] Account for ScanLimit == 0 on entry. Currently the code does not account for the fact that getDomMemoryDef can be called with ScanLimit == 0, if we reached the limit while processing an earlier access. Also tighten the check a bit more and bump the scan limit now that it is handled properly. In some cases, this brings a 2x speedup in terms of compile-time.	2020-08-17 17:55:14 +01:00
Aditya Kumar	cb6e6936db	NFC: [GVNHoist] Hoist loop invariant code and rename variables for readability Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D86031	2020-08-17 09:43:34 -07:00
Sanjay Patel	e6b6787d01	[InstCombine] fold abs(X)/X to cmp+select The backend can convert the select-of-constants to bit-hack shift+logic if desirable. https://alive2.llvm.org/ce/z/pgJT6E define i8 @src(i8 %x) { %0: %a = abs i8 %x, 1 %d = sdiv i8 %x, %a ret i8 %d } => define i8 @tgt(i8 %x) { %0: %cond = icmp sgt i8 %x, 255 %r = select i1 %cond, i8 1, i8 255 ret i8 %r } Transformation seems to be correct!	2020-08-17 08:01:28 -04:00
Sanjay Patel	6cd4a6f6b2	[InstCombine] reduce code duplication; NFC	2020-08-17 08:01:27 -04:00
Yonghong Song	aa61e43040	[InstCombine] Fix a compilation bug With gcc 6.3.0, I hit the following compilation bug. ../lib/Transforms/InstCombine/InstCombineVectorOps.cpp:937:2: error: extra ‘;’ [-Werror=pedantic] }; ^ cc1plus: all warnings being treated as errors The error is introduced by Commit `ae7f08812e` ("[InstCombine] Aggregate reconstruction simplification (PR47060)")	2020-08-16 21:56:42 -07:00
Roman Lebedev	0ec1f0f332	[NFCI][InstCombine] Pacify GCC builds - don't name variable and enum class identically	2020-08-16 23:37:36 +03:00
Roman Lebedev	ae7f08812e	[InstCombine] Aggregate reconstruction simplification (PR47060) This pattern happens in clang C++ exception lowering code, on unwind branch. We end up having a `landingpad` block after each `invoke`, where RAII cleanup is performed, and the elements of an aggregate `{i8, i32}` holding exception info are `extractvalue`'d, and we then branch to common block that takes extracted `i8` and `i32` elements (via `phi` nodes), form a new aggregate, and finally `resume`'s the exception. The problem is that, if the cleanup block is effectively empty, it shouldn't be there, there shouldn't be that `landingpad` and `resume`, said `invoke` should be a `call`. Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`. But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft, while it is pointless, does not look like "empty cleanup block". So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has higher cost than it could have on unwind branch :S This doesn't happen that often, but it will basically happen once per C++ function with complex CFG that called more than one other function that isn't known to be `nounwind`. I think, this is a missing fold in InstCombine, so i've implemented it. I think, the algorithm/implementation is rather self-explanatory: 1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate. 2. For each element, try to find from which aggregate it was extracted. If it was extracted from the aggregate with identical type, from identical element index, great. 3. If all elements were found to have been extracted from the same aggregate, then we can just use said original source aggregate directly, instead of re-creating it. 4. If we fail to find said aggregate when looking only in the current block, we need be PHI-aware - we might have different source aggregate when coming from each predecessor. I'm not sure if this already handles everything, and there are some FIXME's, i'll deal with all that later in followups. I'd be fine with going with post-commit review here code-wise, but just in case there are thoughts, i'm posting this. On RawSpeed, for example, this has the following effect: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| abs(%) \| \|---------------------------------------------------\|---------:\|---------:\|------:\|--------:\|-------:\| \| instcombine.NumAggregateReconstructionsSimplified \| 0 \| 1253 \| 1253 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 948 \| 1355 \| 407 \| 42.93% \| 42.93% \| \| instcount.NumInsertValueInst \| 4382 \| 3210 \| -1172 \| -26.75% \| 26.75% \| \| simplifycfg.NumSinkCommonCode \| 574 \| 458 \| -116 \| -20.21% \| 20.21% \| \| simplifycfg.NumSinkCommonInstrs \| 1154 \| 921 \| -233 \| -20.19% \| 20.19% \| \| instcount.NumExtractValueInst \| 29017 \| 26397 \| -2620 \| -9.03% \| 9.03% \| \| instcombine.NumDeadInst \| 166618 \| 174705 \| 8087 \| 4.85% \| 4.85% \| \| instcount.NumPHIInst \| 51526 \| 50678 \| -848 \| -1.65% \| 1.65% \| \| instcount.NumLandingPadInst \| 20865 \| 20609 \| -256 \| -1.23% \| 1.23% \| \| instcount.NumInvokeInst \| 34023 \| 33675 \| -348 \| -1.02% \| 1.02% \| \| simplifycfg.NumSimpl \| 113634 \| 114708 \| 1074 \| 0.95% \| 0.95% \| \| instcombine.NumSunkInst \| 15030 \| 14930 \| -100 \| -0.67% \| 0.67% \| \| instcount.TotalBlocks \| 219544 \| 219024 \| -520 \| -0.24% \| 0.24% \| \| instcombine.NumCombined \| 644562 \| 645805 \| 1243 \| 0.19% \| 0.19% \| \| instcount.TotalInsts \| 2139506 \| 2135377 \| -4129 \| -0.19% \| 0.19% \| \| instcount.NumBrInst \| 156988 \| 156821 \| -167 \| -0.11% \| 0.11% \| \| instcount.NumCallInst \| 1206144 \| 1207076 \| 932 \| 0.08% \| 0.08% \| \| instcount.NumResumeInst \| 5193 \| 5190 \| -3 \| -0.06% \| 0.06% \| \| asm-printer.EmittedInsts \| 948580 \| 948299 \| -281 \| -0.03% \| 0.03% \| \| instcount.TotalFuncs \| 11509 \| 11507 \| -2 \| -0.02% \| 0.02% \| \| inline.NumDeleted \| 97595 \| 97597 \| 2 \| 0.00% \| 0.00% \| \| inline.NumInlined \| 210514 \| 210522 \| 8 \| 0.00% \| 0.00% \| ``` So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half, and there is a very apparent decrease in instruction and basic block count. On vanilla llvm-test-suite: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| abs(%) \| \|---------------------------------------------------\|---------:\|---------:\|------:\|--------:\|-------:\| \| instcombine.NumAggregateReconstructionsSimplified \| 0 \| 744 \| 744 \| 0.00% \| 0.00% \| \| instcount.NumInsertValueInst \| 2705 \| 2053 \| -652 \| -24.10% \| 24.10% \| \| simplifycfg.NumInvokes \| 1212 \| 1424 \| 212 \| 17.49% \| 17.49% \| \| instcount.NumExtractValueInst \| 21681 \| 20139 \| -1542 \| -7.11% \| 7.11% \| \| simplifycfg.NumSinkCommonInstrs \| 14575 \| 14361 \| -214 \| -1.47% \| 1.47% \| \| simplifycfg.NumSinkCommonCode \| 6815 \| 6743 \| -72 \| -1.06% \| 1.06% \| \| instcount.NumLandingPadInst \| 14851 \| 14712 \| -139 \| -0.94% \| 0.94% \| \| instcount.NumInvokeInst \| 27510 \| 27332 \| -178 \| -0.65% \| 0.65% \| \| instcombine.NumDeadInst \| 1438173 \| 1443371 \| 5198 \| 0.36% \| 0.36% \| \| instcount.NumResumeInst \| 2880 \| 2872 \| -8 \| -0.28% \| 0.28% \| \| instcombine.NumSunkInst \| 55187 \| 55076 \| -111 \| -0.20% \| 0.20% \| \| instcount.NumPHIInst \| 321366 \| 320916 \| -450 \| -0.14% \| 0.14% \| \| instcount.TotalBlocks \| 886816 \| 886493 \| -323 \| -0.04% \| 0.04% \| \| instcount.TotalInsts \| 7663845 \| 7661108 \| -2737 \| -0.04% \| 0.04% \| \| simplifycfg.NumSimpl \| 886791 \| 887171 \| 380 \| 0.04% \| 0.04% \| \| instcount.NumCallInst \| 553552 \| 553733 \| 181 \| 0.03% \| 0.03% \| \| instcombine.NumCombined \| 3200512 \| 3201202 \| 690 \| 0.02% \| 0.02% \| \| instcount.NumBrInst \| 741794 \| 741656 \| -138 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 14443 \| 14445 \| 2 \| 0.01% \| 0.01% \| \| asm-printer.EmittedInsts \| 7978085 \| 7977916 \| -169 \| 0.00% \| 0.00% \| \| inline.NumDeleted \| 73188 \| 73189 \| 1 \| 0.00% \| 0.00% \| \| inline.NumInlined \| 291959 \| 291968 \| 9 \| 0.00% \| 0.00% \| ``` Roughly similar effect, less instructions and blocks total. See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e. Compile-time wise, this appears to be roughly geomean-neutral: http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions And this is a win size-wize in general: http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text See https://bugs.llvm.org/show_bug.cgi?id=47060 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D85787	2020-08-16 23:27:56 +03:00
Sanjay Patel	3ffb751f3d	[InstCombine] fold copysign with fabs/fneg operand We already get this in the backend, but we need to do it in IR too to consistently get yet more copysign transforms.	2020-08-16 08:53:47 -04:00
Sanjay Patel	3fed67b7e6	[InstCombine] reduce code duplication; NFC	2020-08-16 08:53:47 -04:00
Wenlei He	577e58bcc7	[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context. A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose. This is a resubmit of https://reviews.llvm.org/D83743	2020-08-15 20:17:21 -07:00
Luofan Chen	266949b2bc	[Attributor][NFC] Format code	2020-08-16 00:00:45 +08:00
Luofan Chen	b7448a348b	[Attributor][NFC] Use indexes instead of iterator When adding elements when iterating, the iterator will become valid, which could cause errors. This fixes the issue by using indexes instead of iterator.	2020-08-15 23:09:46 +08:00
Luofan Chen	87a85f3d57	[Attributor] Use internalized version of non-exact functions This patch internalize non-exact functions and replaces of their uses with the internalized version. Doing this enables the analysis of non-exact functions. We can do this because some non-exact functions with the same name whose linkage is `linkonce_odr` or `weak_odr` should have the same semantics, so we can safely internalize and replace use of them (the result of the other version of this function should be the same.). Note that not all functions can be internalized, e.g., function with `linkonce` or `weak` linkage. For now when specified in commandline, we internalize all functions that meet the requirements without calculating the cost of such internalzation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84167	2020-08-15 20:23:38 +08:00
Dávid Bolvanský	f134fc4f1b	Reland "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)"	2020-08-15 12:14:57 +02:00
Martin Storsjö	3e7403a134	Revert "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)" This reverts commit `6dbf0cfcf7`. That commit caused failed assertions, e.g. like this: $ cat sprintf-strcpy.c char ptr; void func(void) { ptr += sprintf(ptr, "%s", ""); } $ clang -c sprintf-strcpy.c -O2 -target x86_64-linux-gnu clang: ../lib/IR/Value.cpp:473: void llvm::Value::doRAUW(llvm::Value, llvm::Value::ReplaceMetadataUses): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.	2020-08-15 09:35:11 +03:00
Dávid Bolvanský	f62de7c9c7	[SLC] Transform strncpy(dst, "text", C) to memcpy(dst, "text\0\0\0", C) for C <= 128 only Transformation creates big strings for big C values, so bail out for C > 128. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86004	2020-08-15 01:53:32 +02:00
Gui Andrade	05e3ab41e4	[MSAN] Avoid dangling ActualFnStart when replacing instruction This would be a problem if the entire instrumented function was a call to e.g. memcpy Use FnPrologueEnd Instruction* instead of ActualFnStart BB* Differential Revision: https://reviews.llvm.org/D86001	2020-08-14 23:50:38 +00:00
Christopher Tetreault	416a6a85b1	[SVE] Remove calls to VectorType::getNumElements from AggressiveInstCombine Reviewed By: fpetrogalli Differential Revision: https://reviews.llvm.org/D82218	2020-08-14 16:40:34 -07:00
Jordan Rupprecht	38884641f2	Temporarily revert "[SCEVExpander] Add helper to clean up instrs inserted while expanding." This reverts commit `7829c33084`. The assertion is triggering on some internal code. A reduced test case is in progress.	2020-08-14 14:52:37 -07:00
Dávid Bolvanský	6dbf0cfcf7	[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str) Transform sprintf(dst, "%s", str) -> strcpy(dst, str) if result is unused Avoid sprintf(dest, "%s", str) -> llvm.memcpy(align 1 dest, align 1 str, strlen(str)+1) if optimizing for size. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85963	2020-08-14 23:48:53 +02:00
Gui Andrade	36ebabc153	[MSAN] Convert ActualFnStart to be a particular Instruction *, not BB This allows us to add addtional instrumentation before the function start, without splitting the first BB. Differential Revision: https://reviews.llvm.org/D85985	2020-08-14 21:43:56 +00:00
Gui Andrade	97de0188dd	[MSAN] Reintroduce libatomic load/store instrumentation Have the front-end use the `nounwind` attribute on atomic libcalls. This prevents us from seeing `invoke __atomic_load` in MSAN, which is problematic as it has no successor for instrumentation to be added.	2020-08-14 20:31:10 +00:00
Shinji Okumura	5f55a8193c	[Attributor] Implement AAPotentialValues This patch provides an implementation of `AAPotentialValues`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85632	2020-08-14 20:51:14 +09:00
Sam Parker	725400f993	[NFCI][SimpleLoopUnswitch] Adjust CostKind query When getUserCost was transitioned to use an explicit CostKind, TCK_CodeSize was used even though the original kind was implicitly SizeAndLatency so restore this behaviour. We now only query for CodeSize when optimising for minsize. I expect this to not change anything as, I think all, targets will currently return the same value for CodeSize and SizeLatency. Indeed I see no changes in the test suite for Arm, AArch64 and X86. Differential Revision: https://reviews.llvm.org/D85829	2020-08-14 07:54:20 +01:00
Arthur Eubanks	48cd5b72b1	Revert "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)" This reverts commit `ab9fc8bae8`. Incorrect transformation if the result is used. Causes breakages, e.g. http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/8193/	2020-08-13 21:05:03 -07:00
Peter Collingbourne	c201f27225	hwasan: Emit the globals note even when globals are uninstrumented. This lets us support the scenario where a binary is linked from a mix of object files with both instrumented and non-instrumented globals. This is likely to occur on Android where the decision of whether to use instrumented globals is based on the API level, which is user-facing. Previously, in this scenario, it was possible for the comdat from one of the object files with non-instrumented globals to be selected, and since this comdat did not contain the note it would mean that the note would be missing in the linked binary and the globals' shadow memory would be left uninitialized, leading to a tag mismatch failure at runtime when accessing one of the instrumented globals. It is harmless to include the note when targeting a runtime that does not support instrumenting globals because it will just be ignored. Differential Revision: https://reviews.llvm.org/D85871	2020-08-13 16:33:22 -07:00
Dávid Bolvanský	ab9fc8bae8	[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str) Solves 46489	2020-08-14 00:05:55 +02:00
Dávid Bolvanský	5ef2287d36	[SLC] Optimize strncpy(a, a, C) to memcpy(a, a000, C) Solves PR47154	2020-08-13 22:22:51 +02:00
Aditya Kumar	1a8c9cd1d9	Fix PR45442: Bail out when MemorySSA information is not available Reviewers: sebpop, uabelho, fhahn Reviewed by: fhahn Differential Revision: https://reviews.llvm.org/D85881	2020-08-13 11:25:58 -07:00
Aditya Kumar	44716856db	Fix PR45442: Bail out when MemorySSA information is not available	2020-08-13 09:31:18 -07:00
Bjorn Pettersson	11446b02c7	[VectorCombine] Fix for non-zero addrspace when creating vector load from scalar load This is a fixup to commit `43bdac2906`, to make sure the address space from the original load pointer is retained in the vector pointer. Resolves problem with Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed. due to address space mismatch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D85912	2020-08-13 18:25:32 +02:00
Serguei Katkov	98ba0a5ffe	[InstCombine] Handle gc.relocate(null) in one iteration InstCombine adds users of transformed instruction to working list to process on the same iteration. However gc.relocate may have a hidden user (next gc.relocate) which is connected through gc.statepoint intrinsic and there is no direct def-use chain between them. In this case if the next gc.relocation is already processed it will not be added to worklist and will not be able to be processed on the same iteration. Let's we have the following case: A = gc.relocate(null) B = statepoint(A) C = gc.relocate(B, hidden(A)) If C is already considered then after replacement of A with null, statepoint B instruction will be added to the queue but not C. C can be processed only on the next iteration. If the chain of relocation is pretty long the many iteration may be required. This change is to reduce the number of iteration to meet the latest changes related to reducing infinite loop threshold. This is a quick (not best) fix. In the follow up patches I plan to move gc relocation handling into statepoint handler. This should also help to remove unused gc live entries in statepoint bundle. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D75598	2020-08-13 23:16:27 +07:00
David Stenberg	e8ebebb0bd	[InstCombine] Fix incorrect Modified status When removing instructions from unreachable blocks, and only debug info intrinsics were removed, InstCombine could incorrectly return a false Modified status. This is fixed by making removeAllNonTerminatorAndEHPadInstructions() also return how many debug info intrinsics that were removed, and take that into account. This was caught using the check introduced by D80916. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D85839	2020-08-13 15:10:41 +02:00
Florian Hahn	3b0878a370	[DSE,MSSA] Fix crash when using tryToMergePartialOverlappingStores. We are re-using tryToMergePartialOverlappingStores, which requires earlier to domiante Later. In the long run, tryToMergeParialOverlappingStores should be re-written using MemorySSA. Fixes PR46513.	2020-08-13 12:07:56 +01:00
Aditya Kumar	f902a7eccf	[HotColdSplit] Fix variable name spelling	2020-08-12 22:50:08 -07:00
Sanjay Patel	23bd33c6ac	[InstCombine] prefer xor with -1 because 'not' is easier to understand (PR32706) This is a retry of rL300977 which was reverted because of infinite loops. We have fixed all of the known places where that would happen, but there's still a chance that this patch will cause infinite loops. This matches the demanded bits behavior in the DAG and should fix: https://bugs.llvm.org/show_bug.cgi?id=32706 Differential Revision: https://reviews.llvm.org/D32255	2020-08-12 15:50:33 -04:00
Roman Lebedev	d6f0600c96	[NFC][InstCombine] Add FIXME's for getLogBase2() / visitUDivOperand() These are not correctness issues. In visitUDivOperand(), if the (potential) divisor is undef, then udiv is already UB, so it is not incorrect to keep undef as shift amount. But, that is suboptimal. We could instead simply drop that select, picking the other operand. Afterwards, getLogBase2() could assert that there is no undef in divisor.	2020-08-12 22:06:54 +03:00
Roman Lebedev	12d93a27e7	[InstCombine] Sanitize undef vector constant to 1 in X(2^C) with X << C (PR47133) While xundef is undef, shift-by-undef is poison, which we must avoid introducing. Also log2(iN undef) is NOT iN undef, because log2(iN undef) u< N. See https://bugs.llvm.org/show_bug.cgi?id=47133	2020-08-12 22:06:53 +03:00
Ilya Leoshkevich	f5a252ed68	[SanitizerCoverage] Use zeroext for cmp parameters on all targets Commit `9385aaa848` ("[sancov] Fix PR33732") added zeroext to __sanitizer_cov_trace(_const)?_cmp[1248] parameters for x86_64 only, however, it is useful on other targets, in particular, on SystemZ: it fixes swap-cmp.test. Therefore, use it on all targets. This is safe: if target ABI does not require zero extension for a particular parameter, zeroext is simply ignored. A similar change has been implemeted as part of commit `3bc439bdff` ("[MSan] Add instrumentation for SystemZ"), and there were no problems with it. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D85689	2020-08-12 18:38:12 +02:00
Sanjay Patel	cc892fd9f4	[VectorCombine] early exit if target has no vector registers Based on post-commit discussion in: D81766 Other vectorization passes (SLP and Loop) use this TTI API similarly.	2020-08-12 09:22:31 -04:00
Sanjay Patel	912c09e845	[InstCombine] eliminate a pointer cast around insertelement I'm not sure if this solves PR46839 completely, but reducing the casting should help: https://bugs.llvm.org/show_bug.cgi?id=46839 Differential Revision: https://reviews.llvm.org/D85647	2020-08-12 09:08:17 -04:00
Sam Parker	ea8448e361	[LoopUnroll] Adjust CostKind query When TTI was updated to use an explicit cost, TCK_CodeSize was used although the default implicit cost would have been the hand-wavey cost of size and latency. So, revert back to this behaviour. This is not expected to have (much) impact on targets since most (all?) of them return the same value for SizeAndLatency and CodeSize. When optimising for size, the logic has been changed to query CodeSize costs instead of SizeAndLatency. This patch also adds a testing option in the unroller so that OptSize thresholds can be specified. Differential Revision: https://reviews.llvm.org/D85723	2020-08-12 12:56:09 +01:00
Cullen Rhodes	511d5aaca3	[Transforms][SROA] Skip uses of allocas where the type is scalable When visiting load and store instructions in SROA skip scalable vectors. This is relevant in the implementation of the 'arm_sve_vector_bits' attribute that is used to define VLS types, where an alloca of a fixed-length vector could be bitcasted to scalable. See D85128 for more information. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85725	2020-08-12 09:35:48 +00:00
Kyungwoo Lee	d73be5af0a	[NFC] Factor out hasForceAttributes This is a preparation for https://reviews.llvm.org/D85586. Differential Revision: https://reviews.llvm.org/D85793	2020-08-12 02:16:57 -04:00
Sanjay Patel	b0b95dab1c	[VectorCombine] add safety check for 0-width register Based on post-commit discussion in D81766, Hexagon sets this to "0". I'll see if I can come up with a test, but making the obvious code fix first to unblock that target.	2020-08-11 20:30:02 -04:00
Vedant Kumar	30c1633386	Revert "[Instruction] Add updateLocationAfterHoist helper" This reverts commit `4a646ca9e2`. This is causing some bots to fail with "!dbg attachment points at wrong subprogram for function", like: http://lab.llvm.org:8011/builders/sanitizer-windows/builds/67958/steps/stage%201%20check/logs/stdio	2020-08-11 14:54:09 -07:00
Amy Huang	54b6cca0f2	[globalopt] Change so that emitting fragments doesn't use the type size of DIVariables When turning on -debug-info-kind=constructor we ran into a "fragment covers entire variable" error during thinlto. The fragment is currently always emitted if there is no type size, but sometimes the variable has a forward declared struct type which doesn't have a size. This changes the code to get the type size from the GlobalVariable instead. Differential Revision: https://reviews.llvm.org/D85572	2020-08-11 14:50:56 -07:00
Kazu Hirata	cfdc96714b	[Instcombine] Fix uses of undef (PR46940) Without this patch, we attempt to distribute And over Xor even in unsafe circumstances like so: undef & (true ^ true) ==> (undef & true) ^ (undef & true) and evaluate it to undef instead of false. Note that "true ^ true" may show up implicitly with one true being part of a PHI node. This patch fixes the problem by teaching SimplifyUsingDistributiveLaws to not use undef as part of simplifications. Reviewers: spatel, aqjune, nikic, lebedev.ri, fhahn, jdoerfert Differential Revision: https://reviews.llvm.org/D85687	2020-08-11 14:13:32 -07:00
Vedant Kumar	4a646ca9e2	[Instruction] Add updateLocationAfterHoist helper Introduce a helper on Instruction which can be used to update the debug location after hoisting. Use this in GVN and LICM, where we were mistakenly introducing new line 0 locations after hoisting (the docs recommend dropping the location in this case). For more context, see the discussion in https://reviews.llvm.org/D60913. Differential Revision: https://reviews.llvm.org/D85670	2020-08-11 14:05:20 -07:00
Whitney Tsang	aa994d9867	[NFC][LoopUnrollAndJam] Use BasicBlock::replacePhiUsesWith instead of static function updatePHIBlocks. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85673	2020-08-11 15:35:14 +00:00
Dinar Temirbulatov	b1600d8b89	[NFC] Guard the cost report block of debug outputs with NDEBUG and switch to SmallString, this is part of D57779.	2020-08-11 16:34:47 +02:00
Kai Nacke	b3aece0531	[SystemZ/ZOS] Add binary format goff and operating system zos to the triple Adds the binary format goff and the operating system zos to the triple class. goff is selected as default binary format if zos is choosen as operating system. No further functionality is added. Reviewers: efriedma, tahonermann, hubert.reinterpertcast, MaskRay Reviewed By: efriedma, tahonermann, hubert.reinterpertcast Differential Revision: https://reviews.llvm.org/D82081	2020-08-11 05:26:26 -04:00
Florian Hahn	0b774acf11	[SLP] Make sure instructions are ordered when computing spill cost. The entries in VectorizableTree are not necessarily ordered by their position in basic blocks. Collect them and order them by dominance so later instructions are guaranteed to be visited first. For instructions in different basic blocks, we only scan to the beginning of the block, so their order does not matter, as long as all instructions in a basic block are grouped together. Using dominance ensures a deterministic order. The modified test case contains an example where we compute a wrong spill cost (2) without this patch, even though there is no call between any instruction in the bundle. This seems to have limited practical impact, .e.g on X86 with a recent Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006 there are no binary changes. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D82444	2020-08-11 11:18:12 +02:00
Dávid Bolvanský	c2f0101310	[InstCombine] ~(~X + Y) -> X - Y Proof: https://alive2.llvm.org/ce/z/4xharr Solves PR47051 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85593	2020-08-11 11:05:42 +02:00
Florian Hahn	7829c33084	[SCEVExpander] Add helper to clean up instrs inserted while expanding. SCEVExpander already tracks which instructions have been inserted n InsertedValues/InsertedPostIncValues. This patch adds an additional vector to collect the instructions in insertion order. This can then be used to remove exactly the instructions inserted by the expander. This replaces ExpandedValuesCleaner, which in some cases might remove values not inserted by the expander (e.g. if a value was dead before insertion and is then used during expansion). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D84327	2020-08-11 09:30:31 +01:00
Shinji Okumura	06eee8748f	[Attributor][NFC] Connect AAPotentialValues with AAValueSimplify This patch enables `AAValueSimplify` to use information from `AAPotentialValues` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85668	2020-08-11 15:52:02 +09:00
Wei Mi	4cd8e9b169	[SampleFDO] Stop letting findCalleeFunctionSamples return unrelated profiles for invoke instructions. We see a warning of "No debug information found in function foo: Function profile not used" in a case. The function foo is called by an invoke instruction. It has no debug information because it has attribute((nodebug)) in the definition. It shouldn't have profile instance in the sample profile but compiler thinks it does, that turns out to be a compiler bug in findCalleeFunctionSamples. The bug is exposed when sample-profile-merge-inlinee is enabled recently. Currently in findCalleeFunctionSamples, CalleeName is unset and is empty for invoke instruction. For empty CalleeName, findFunctionSamplesAt will treat the call as an indirect call and will return any inline instance profile at the same location as the instruction. That leads to a wrong profile being returned to function foo. The patch set CalleeName when the instruction is an invoke. Differential Revision: https://reviews.llvm.org/D85664	2020-08-10 12:41:09 -07:00
Fangrui Song	3b21a07fd7	[PGO] Delete dead comdat renaming code related to GlobalAlias. NFC A GlobalAlias is an address-taken user of its aliased function. canRenameComdatFunc has excluded such cases. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D85597	2020-08-10 09:02:04 -07:00
Sanjay Patel	bebca662d4	[InstCombine] rearrange code for readability; NFC The code comment refers to the path where we change the size of the integer type, so handle that first, otherwise deal with the general case.	2020-08-10 08:07:29 -04:00
Florian Hahn	8393b9fd1f	[LoopInterchange] Move instructions from preheader to outer loop header. Instructions defined in the original inner loop preheader may depend on values defined in the outer loop header, but the inner loop header will become the entry block in the loop nest. Move the instructions from the preheader to the outer loop header, so we do not break dominance. We also have to check for unsafe instructions in the preheader. If there are no unsafe instructions, all instructions should be movable. Currently we move all instructions except the terminator and rely on LICM to hoist out invariant instructions later. Fixes PR45743	2020-08-10 12:41:33 +01:00
Florian Hahn	54cb552b96	[LoopInterchange] Form LCSSA phis for values in orig outer loop header. Values defined in the outer loop header could be used in the inner loop latch. In that case, we need to create LCSSA phis for them, because after interchanging they will be defined in the new inner loop and used in the new outer loop.	2020-08-10 11:33:19 +01:00
Juneyoung Lee	ef018cb65c	[BuildLibCalls] Add noundef to standard I/O functions This patch adds noundef to return value and arguments of standard I/O functions. With this patch, passing undef or poison to the functions becomes undefined behavior in LLVM IR. Since undef/poison is lowered from operations having UB in C/C++, passing undef to them was already UB in source. With this patch, the functions cannot return undef or poison anymore as well. According to C17 standard, ungetc/ungetwc/fgetpos/ftell can generate unspecified value; 3.19.3 says unspecified value is a valid value of the relevant type, and using unspecified value is unspecified behavior, which is not UB, so it cannot be undef (using undef is UB when e.g. it is used at branch condition). — The value of the file position indicator after a successful call to the ungetc function for a text stream, or the ungetwc function for any stream, until all pushed-back characters are read or discarded (7.21.7.10, 7.29.3.10). — The details of the value stored by the fgetpos function (7.21.9.1). — The details of the value returned by the ftell function for a text stream (7.21.9.4). In the long run, most of the functions listed in BuildLibCalls should have noundefs; to remove redundant diffs which will anyway disappear in the future, I added noundef to a few more non-I/O functions as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85345	2020-08-10 10:58:25 +09:00
Florian Hahn	d236e1c7b6	[InstSimplify/NewGVN] Add option to control the use of undef. Making use of undef is not safe if the simplification result is not used to replace all uses of the result. This leads to problems in NewGVN, which does not replace all uses in the IR directly. See PR33165 for more details. This patch adds an option to SimplifyQuery to disable the use of undef. Note that I've only guarded uses if isa<UndefValue>/m_Undef where SimplifyQuery is currently available. If we agree on the general direction, I'll update the remaining uses. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84792	2020-08-09 19:16:56 +01:00
Florian Hahn	23817cbd0b	[SCEVExpander] Make sure cast properly dominates Builder's IP. The selected cast must properly dominate the Builder's IP, so we cannot re-use the cast, if it matches the builder's IP.	2020-08-09 16:51:19 +01:00
Aditya Kumar	53ac144848	[HotColdSplit] Add options for splitting cold functions in separate section Add support for (if enabled) splitting cold functions into a separate section in order to further boost locality of hot code. Authored by: rjf (Ruijie Fang) Reviewed by: hiraditya,rcorcs,vsk Differential Revision: https://reviews.llvm.org/D85331	2020-08-09 08:48:12 -07:00
Sanjay Patel	43bdac2906	[VectorCombine] try to create vector loads from scalar loads This patch was adjusted to match the most basic pattern that starts with an insertelement (so there's no extract created here). Hopefully, that removes any concern about interfering with other passes. Ie, the transform should almost always be profitable. We could make an argument that this could be part of canonicalization, but we conservatively try not to create vector ops from scalar ops in passes like instcombine. If the transform is not profitable, the backend should be able to re-scalarize the load. Differential Revision: https://reviews.llvm.org/D81766	2020-08-09 09:05:06 -04:00
Florian Hahn	c70f0b9d4a	[SCEVExpander] Avoid re-using existing casts if it means updating users. Currently the SCEVExpander tries to re-use existing casts, even if they are not exactly at the insertion point it was asked to create the cast. To do so in some case, it creates a new cast at the insertion point and updates all users to use the new cast. This behavior is problematic, because it changes the IR outside of the instructions created during the expansion. Therefore we cannot completely undo all changes made during expansion. This re-use should be only an extra optimization, so only using the new cast in the expanded instructions should not be a correctness issue. There are many cases equivalent instructions are created during expansion. This patch also adjusts findInsertPointAfter to skip instructions inserted during expansion. This enables re-using existing casts without the renaming any uses, by picking a better insertion point. Reviewed By: efriedma, lebedev.ri Differential Revision: https://reviews.llvm.org/D84399	2020-08-09 13:25:17 +01:00
Simon Pilgrim	f13e92d4b2	[InstCombine] Use CreateVectorSplat(ElementCount) variant directly This was introduced at rGe20223672100, and the CreateVectorSplat(unsigned NumElements) variant calls it internally	2020-08-08 19:26:02 +01:00
Roman Lebedev	e492f0e03b	[SimplifyCFG] Fix invoke->call fold w/ multiple invokes in presence of lifetime intrinsics SimplifyCFG has two main folds for resumes - one when resume is directly using the landingpad, and the other one where resume is using a PHI node. While for the first case, we were already correctly ignoring all the PHI nodes, and both the debug info intrinsics and lifetime intrinsics, in the PHI-based-one, we weren't ignoring PHI's in the resume block, and weren't ignoring lifetime intrinsics. That is clearly a bug. On RawSpeed library, this results in +9.34% (+81) more invoke->call folds, -0.19% (-39) landing pads, -0.24% (-81) invoke instructions but +51 call instructions and -132 basic blocks. Though, the run-time performance impact appears to be within the noise.	2020-08-08 20:00:28 +03:00
Roman Lebedev	1f452ac1d7	[NFC][SimplifyCFG] Rewrite isCleanupBlockEmpty() to be iterator_range-based	2020-08-08 20:00:28 +03:00
Roman Lebedev	a587bf3eb0	[NFC][SimplifyCFG] Count the number of invokes turned into calls due to empty cleanup blocks	2020-08-08 20:00:27 +03:00
Juneyoung Lee	b6d9add71b	[InstCombine] Optimize select(freeze(icmp eq/ne x, y), x, y) This patch adds an optimization that folds select(freeze(icmp eq/ne x, y), x, y) to x or y. This was needed to resolve slowdown after D84940 is applied. I tried to bake this logic into foldSelectInstWithICmp, but it wasn't clear. This patch conservatively writes the pattern in a separate function, foldSelectWithFrozenICmp. The output does not need freeze; https://alive2.llvm.org/ce/z/X49hNE (from @nikic) Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85533	2020-08-08 15:22:29 +09:00
Gui Andrade	17ff170e3a	Revert "[MSAN] Instrument libatomic load/store calls" Problems with instrumenting atomic_load when the call has no successor, blocking compiler roll This reverts commit `33d239513c`.	2020-08-07 19:45:51 +00:00

... 15 16 17 18 19 ...

26463 Commits