llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	ca4bf58e4e	[AMDGPU] Support unaligned flat scratch in TLI Adjust SITargetLowering::allowsMisalignedMemoryAccessesImpl for unaligned flat scratch support. Mostly needed for global isel. Differential Revision: https://reviews.llvm.org/D93669	2020-12-22 16:12:31 -08:00
Sanjay Patel	f6929c0195	[SLP] add reduction tests for maxnum/minnum intrinsics; NFC	2020-12-22 16:05:39 -05:00
Arnold Schwaighofer	333108e8be	Add a llvm.coro.end.async intrinsic The llvm.coro.end.async intrinsic allows to specify a function that is to be called as the last action before returning. This function will be inlined after coroutine splitting. This function can contain a 'musttail' call to allow for guaranteed tail calling as the last action. Differential Revision: https://reviews.llvm.org/D93568	2020-12-22 10:52:28 -08:00
Florian Hahn	ac90bbc9cb	[LoopDeletion] Add test case where outer loop needs to be deleted. In the test case @test1, the inner loop cannot be removed, because it has a live-out value. But the outer loop is a no-op and can be removed.	2020-12-22 17:49:20 +00:00
Philip Reames	f106b281be	[tests] precommit a test mentioned in review for D93317	2020-12-22 09:47:19 -08:00
Siddhesh Poyarekar	6fcb039956	Fold comparison of __builtin_object_size expression with -1 for non-const size When __builtin_dynamic_object_size returns a non-constant expression, it cannot be -1 since that is an invalid return value for object size. However since passes running after the substitution don't know this, they are unable to optimize away the comparison and hence the comparison and branch stays in there. This change generates an appropriate call to llvm.assume to help the optimizer folding the test. glibc is considering adopting __builtin_dynamic_object_size for additional protection[1] and this change will help reduce branching overhead in fortified implementations of all of the functions that don't have the __builtin___*_chk type builtins, e.g. __ppoll_chk. Also remove the test limit-max-iterations.ll because it was deemed unnecessary during review. [1] https://sourceware.org/pipermail/libc-alpha/2020-November/120191.html Differential Revision: https://reviews.llvm.org/D93015	2020-12-22 10:56:31 +01:00
Gil Rapaport	a56280094e	[LV] Avoid needless fold tail When the trip-count is provably divisible by the maximal/chosen VF, folding the loop's tail during vectorization is redundant. This commit extends the existing test for constant trip-counts to any trip-count known to be divisible by maximal/selected VF by SCEV. Differential Revision: https://reviews.llvm.org/D93615	2020-12-22 10:25:20 +02:00
Simon Pilgrim	88c5b50060	[AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts (REAPPLIED) The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns. This should allow us to begin resolving PR46896 et al. Ensure we block poison in a funnel shift value - similar to rG0fe91ad463fea9d08cbcd640a62aa9ca2d8d05e0 Reapplied with fix for PR48068 - we weren't checking that the shift values could be hoisted from their basicblocks. Differential Revision: https://reviews.llvm.org/D90625	2020-12-21 15:22:27 +00:00
Sanjay Patel	38ca7face6	[InstSimplify] reduce logic with inverted add/sub ops https://llvm.org/PR48559 This could be part of a larger ValueTracking API, but I don't see that currently. https://rise4fun.com/Alive/gR0 Name: and Pre: C1 == ~C2 %sub = add i8 %x, C1 %sub1 = sub i8 C2, %x %r = and i8 %sub, %sub1 => %r = 0 Name: or Pre: C1 == ~C2 %sub = add i8 %x, C1 %sub1 = sub i8 C2, %x %r = or i8 %sub, %sub1 => %r = -1 Name: xor Pre: C1 == ~C2 %sub = add i8 %x, C1 %sub1 = sub i8 C2, %x %r = xor i8 %sub, %sub1 => %r = -1	2020-12-21 08:51:43 -05:00
Sanjay Patel	d6118759f3	[InstSimplify] add tests for inverted logic operands; NFC	2020-12-21 08:51:42 -05:00
Arthur Eubanks	1a883484af	[test] Fix reg-usage.ll under NPM The -O2 isn't used in the test.	2020-12-20 15:41:29 -08:00
Andrew Litteken	7c6f28a438	[IROutliner] Deduplicating functions that only require inputs. Extracted regions can have both inputs and outputs. In addition, the CodeExtractor removes inputs that are only used in llvm.assumes, and sunken allocas (values are used entirely in the extracted region as denoted by lifetime intrinsics). We also cannot combine sections that have different constants in the same structural location, and these constants will have to elevated to argument. This patch deduplicates extracted functions that only have inputs and non of the special cases. We test that correctly deduplicate in: test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D86978	2020-12-19 17:34:34 -06:00
Andrew Litteken	b8a2b6af37	Revert "[IROutliner] Deduplicating functions that only require inputs." Missing reviewers and differential revision in commit message. This reverts commit `5cdc4f57e5`.	2020-12-19 17:33:49 -06:00
Andrew Litteken	5cdc4f57e5	[IROutliner] Deduplicating functions that only require inputs. Extracted regions can have both inputs and outputs. In addition, the CodeExtractor removes inputs that are only used in llvm.assumes, and sunken allocas (values are used entirely in the extracted region as denoted by lifetime intrinsics). We also cannot combine sections that have different constants in the same structural location, and these constants will have to elevated to argument. This patch deduplicates extracted functions that only have inputs and non of the special cases. We test that correctly deduplicate in: test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll	2020-12-19 17:26:29 -06:00
Roman Lebedev	c043f5055e	[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 1 ... for conditional branch case	2020-12-20 00:18:36 +03:00
Roman Lebedev	262ff9c23e	[SimplifyCFG] Teach TryToMergeLandingPad() to preserve DomTree	2020-12-20 00:18:36 +03:00
Roman Lebedev	6a1617d67c	[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 2 ... for the custom case returning void.	2020-12-20 00:18:36 +03:00
Roman Lebedev	b94520c9ee	[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 1 ... for the general case of returning a value.	2020-12-20 00:18:35 +03:00
Roman Lebedev	83659c7076	[SimplifyCFG] simplifySingleResume(): FoldReturnIntoUncondBranch() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Apparently, there were no dedicated tests just for that functionality, so i'm adding one here.	2020-12-20 00:18:34 +03:00
Roman Lebedev	b7d00e29b7	[SimplifyCFG] Teach simplifySingleResume() to preserve DomTree	2020-12-20 00:18:34 +03:00
Roman Lebedev	c209b88dd4	[SimplifyCFG] Teach simplifyCommonResume() to preserve DomTree	2020-12-20 00:18:34 +03:00
Roman Lebedev	76e74d9395	[SimplifyCFG] Teach removeEmptyCleanup() to preserve DomTree	2020-12-20 00:18:33 +03:00
Roman Lebedev	4be8707e64	[SimplifyCFG] Teach FoldTwoEntryPHINode() to preserve DomTree Still boring, simply drop all edges to successors of DomBlock, and add an edge to to BB instead.	2020-12-20 00:18:33 +03:00
Roman Lebedev	b43b77ff9b	[NFCI][SimlifyCFG] simplifyOnce(): also perform DomTree validation And that exposes that a number of tests don't actually manage to maintain DomTree validity, which is inline with my observations. Once again, SimlifyCFG pass currently does not require/preserve DomTree by default, so this is effectively NFC.	2020-12-20 00:18:32 +03:00
Andrew Litteken	c52bcf3a9b	[IRSim][IROutliner] Limit to extracting regions that only require inputs. Extracted regions can have both inputs and outputs. In addition, the CodeExtractor removes inputs that are only used in llvm.assumes, and sunken allocas (values are used entirely in the extracted region as denoted by lifetime intrinsics). We also cannot combine sections that have different constants in the same structural location, and these constants will have to elevated to argument. This patch limits the extracted regions to those that only require inputs, and do not have any other special cases. We test that we do not outline the wrong constants in: test/Transforms/IROutliner/outliner-different-constants.ll test/Transforms/IROutliner/outliner-different-globals.ll test/Transforms/IROutliner/outliner-constant-vs-registers.ll We test that correctly outline in: test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Reviewers: paquette, plofti Differential Revision: https://reviews.llvm.org/D86977	2020-12-19 13:33:54 -06:00
Juneyoung Lee	1e785e9262	apply update_test_checks.py to a few files in llvm/test/Transforms/InstCombine	2020-12-20 01:03:19 +09:00
Aditya Kumar	1ab4db0f84	[HotColdSplit] Reflect full cost of parameters in split penalty Make the penalty for splitting a region more accurately reflect the cost of materializing all of the inputs/outputs to/from the region. This almost entirely eliminates code growth within functions which undergo splitting in key internal frameworks, and reduces the size of those frameworks between 2.6% to 3%. rdar://49167240 Patch by: Vedant Kumar(@vsk) Reviewers: hiraditya,rjf,t.p.northover Reviewed By: hiraditya,rjf Differential Revision: https://reviews.llvm.org/D59715	2020-12-18 17:06:17 -08:00
Akira Hatanaka	ffd982f7db	[ObjC][ARC] Fix a bug where the inline-asm retain/claim RV marker wasn't inserted when the original call had a 'returned' argument The code is testing whether the instruction BBI points to is the call that is paired up with the retainRV/claimRV call, but it doesn't work when the call has a 'returned' argument since GetArgRCIdentityRoot looks through 'returned' arguments. rdar://72485383	2020-12-18 16:59:06 -08:00
Nikita Popov	2af2f58ec0	[InstCombine] Regenerate test checks (NFC)	2020-12-18 20:55:26 +01:00
Nikita Popov	1f1145006b	[DSE] Use correct memory location for read clobber check MSSA DSE starts at a killing store, finds an earlier store and then checks that the earlier store is not read along any paths (without being killed first). However, it uses the memory location of the killing store for that, not the earlier store that we're attempting to eliminate. This has a number of problems: * Mismatches between what BasicAA considers aliasing and what DSE considers an overwrite (even though both are correct in isolation) can result in miscompiles. This is PR48279, which D92045 tries to fix in a different way. The problem is that we're using a location from a store that is potentially not executed and thus may be UB, in which case analysis results can be arbitrary. * Metadata on the killing store may be used to determine aliasing, but there is no guarantee that the metadata is valid, as the specific killing store may not be executed. Using the metadata on the earlier store is valid (it is the store we're removing, so on any execution where its removal may be observed, it must be executed). * The location is imprecise. For full overwrites the killing store will always have a location that is larger or equal than the earlier access location, so it's beneficial to use the earlier access location. This is not the case for partial overwrites, in which case either location might be smaller. There is some room for improvement here. Using the earlier access location means that we can no longer cache which accesses are read for a given killing store, as we may be querying different locations. However, it turns out that simply dropping the cache has no notable impact on compile-time. Differential Revision: https://reviews.llvm.org/D93523	2020-12-18 20:26:53 +01:00
Roman Lebedev	0e94ba9d40	[NFC][InstCombine] Fixup check lines for prof md in select_meta.ll test	2020-12-18 21:33:30 +03:00
Roman Lebedev	897c985e1e	[InstCombine] Canonicalize SPF to abs intrinsic This patch enables canonicalization of SPF_ABS and SPF_ABS to the abs intrinsic. This is a recommit, the original try was `05d4c4ebc2`, but it was reverted due to an apparent miscompile, which since then has just been fixed by the previous commit. Differential Revision: https://reviews.llvm.org/D87188	2020-12-18 21:18:14 +03:00
Roman Lebedev	e9289dc25f	[InstSimplify] Don't miscompile `X == 0 ? abs(X) : -abs(X) --> -abs(X)` xform The transform wasn't checking that the LHS of the comparison is the `X` in question... This is the miscompile that was holding up D87188. Thanks to Dave Green for producing an actionable reproducer!	2020-12-18 21:18:13 +03:00
Roman Lebedev	9b183a1452	[NFC][InstSimplify] Add miscompiled testcase from D87188/D87197 Thanks to Dave Green for producing an actionable reproducer! It is (obviously) a miscompile: ``` ---------------------------------------- define i32 @select_abs_of_abs_eq_wrong(i32 %x, i32 %y) { %0: %abs = abs i32 %x, 0 %neg = sub i32 0, %abs %cmp = icmp eq i32 %y, 0 %sel = select i1 %cmp, i32 %neg, i32 %abs ret i32 %sel } => define i32 @select_abs_of_abs_eq_wrong(i32 %x, i32 %y) { %0: %abs = abs i32 %x, 0 ret i32 %abs } Transformation doesn't verify! ERROR: Value mismatch Example: i32 %x = #xe0000000 (3758096384, -536870912) i32 %y = #x00000000 (0) Source: i32 %abs = #x20000000 (536870912) i32 %neg = #xe0000000 (3758096384, -536870912) i1 %cmp = #x1 (1) i32 %sel = #xe0000000 (3758096384, -536870912) Target: i32 %abs = #x20000000 (536870912) Source value: #xe0000000 (3758096384, -536870912) Target value: #x20000000 (536870912) Alive2: Transform doesn't verify! ```	2020-12-18 21:18:13 +03:00
Florian Hahn	a74941da71	Revert "[BasicAA] Handle two unknown sizes for GEPs" Temporarily revert commit `8b1c4e310c`. After `8b1c4e310c` the compile-time for `MultiSource/Benchmarks/MiBench/consumer-lame` dramatically increases with -O3 & LTO, causing issues for builders with that configuration. I filed PR48553 with a smallish reproducer that shows a 10-100x compile time increase.	2020-12-18 17:59:12 +00:00
Whitney Tsang	2a814cd9e1	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-18 17:37:17 +00:00
Xun Li	4652718ee3	Cleanup coro-inline.ll Following up with the comments in D92706. - Use -passes instead of -enable-new-pm - CoroEarly should happen before AlwaysInliner, adjust it. - Remove some unnecessary barriers (still kept one) - Cleanup unnecessary debug info Differential Revision: https://reviews.llvm.org/D93342	2020-12-18 08:05:04 -08:00
Sanjay Patel	47aaa99c0e	[VectorCombine] allow peeking through GEPs when creating a vector load This is an enhancement motivated by https://llvm.org/PR16739 (see D92858 for another). We can look through a GEP to find a base pointer that may be safe to use for a vector load. If so, then we shuffle (shift) the necessary vector element over to index 0. Alive2 proof based on 1 of the regression tests: https://alive2.llvm.org/ce/z/yPJLkh The vector translation is independent of endian (verify by changing to leading 'E' in the datalayout string). Differential Revision: https://reviews.llvm.org/D93229	2020-12-18 09:25:03 -05:00
Yevgeny Rouban	324d96b637	[IndVars] A test for adding trunc instructions to unwind blocks Differential Revision: https://reviews.llvm.org/D93521 Reviewed By: skatkov	2020-12-18 17:08:26 +07:00
Andrew Litteken	cea807602a	[IRSim][IROutliner] Adding InstVisitor to disallow certain operations. This adds a custom InstVisitor to return false on instructions that should not be allowed to be outlined. These match the illegal instructions in the IRInstructionMapper with exception of the addition of the llvm.assume intrinsic. Tests all the tests marked: illegal-*-.ll with a test for each kind of instruction that has been marked as illegal. Reviewers: jroelofs, paquette Differential Revisions: https://reviews.llvm.org/D86976	2020-12-17 19:33:57 -06:00
Nikita Popov	3d56644f18	[DSE] Add test for potential caching bug (NFC) This one would miscompile if read-clobber checks switched to using the EarlierAccess location, but the read cache was retained.	2020-12-17 23:35:01 +01:00
Sanjay Patel	71a1b9fe76	[VectorCombine] add tests for gep load with cast; NFC	2020-12-17 16:40:55 -05:00
Roman Lebedev	2d07414ee5	[SimplifyCFG] Teach simplifyUnreachable() to preserve DomTree Pretty boring, removeUnwindEdge() already known how to update DomTree, so if we are to call it, we must first flush our own pending updates; otherwise, we just stop predecessors from branching to us, and for certain predecessors, stop their predecessors from branching to them also.	2020-12-18 00:37:22 +03:00
Roman Lebedev	2ee724863e	[SimplifyCFG] ConstantFoldTerminator() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a number of tests, all of which are marked as such so that they do not regress.	2020-12-18 00:37:22 +03:00
Roman Lebedev	164e0847a5	[SimplifyCFG] DeleteDeadBlock() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-18 00:37:21 +03:00
Bangtian Liu	511cfe9441	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit `d20e0c3444`.	2020-12-17 21:00:37 +00:00
Nikita Popov	1b84934f90	[DSE] Add more tests for read clobber location (NFC)	2020-12-17 21:03:00 +01:00
Andrew Litteken	dae34463e3	[IRSim][IROutliner] Adding the extraction basics for the IROutliner. Extracting the similar regions is the first step in the IROutliner. Using the IRSimilarityIdentifier, we collect the SimilarityGroups and sort them by how many instructions will be removed. Each IRSimilarityCandidate is used to define an OutlinableRegion. Each region is ordered by their occurrence in the Module and the regions that are not compatible with previously outlined regions are discarded. Each region is then extracted with the CodeExtractor into its own function. We test that correctly extract in: test/Transforms/IROutliner/extraction.ll test/Transforms/IROutliner/address-taken.ll test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Recommit of `bf899e8913` fixing memory leaks. Reviewers: paquette, jroelofs, yroux Differential Revision: https://reviews.llvm.org/D86975	2020-12-17 11:27:26 -06:00
Nabeel Omer	df2b9a3e02	[DebugInfo] Avoid re-ordering assignments in LCSSA The LCSSA pass makes use of a function insertDebugValuesForPHIs() to propogate dbg.value() intrinsics to newly inserted PHI instructions. Faulty behaviour occurs when the parent PHI of a newly inserted PHI is not the most recent assignment to a source variable. insertDebugValuesForPHIs ends up propagating a value that isn't the most recent assignemnt. This change removes the call to insertDebugValuesForPHIs() from LCSSA, preventing incorrect dbg.value intrinsics from being propagated. Propagating variable locations between blocks will occur later, during LiveDebugValues. Differential Revision: https://reviews.llvm.org/D92576	2020-12-17 16:17:32 +00:00
Bangtian Liu	d20e0c3444	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-17 16:00:15 +00:00
Florian Hahn	01089c876b	[InstCombine] Preserve !annotation on newly created instructions. If the source instruction has !annotation metadata, all instructions created during combining should also have it. Tell the builder to add it. The !annotation system was discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks' (http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html) This patch is based on an earlier patch by Francis Visoiu Mistrih. Reviewed By: thegameg, lebedev.ri Differential Revision: https://reviews.llvm.org/D91444	2020-12-17 15:20:23 +00:00
Florian Hahn	75c04bfc61	[SimplifyCFG] Preserve !annotation in FoldBranchToCommonDest. When folding a branch to a common destination, preserve !annotation on the created instruction, if the terminator of the BB that is going to be removed has !annotation. This should ensure that !annotation is attached to the instructions that 'replace' the original terminator. Reviewed By: jdoerfert, lebedev.ri Differential Revision: https://reviews.llvm.org/D93410	2020-12-17 14:06:58 +00:00
Jun Ma	0138399903	[InstCombine] Remove scalable vector restriction in InstCombineCasts Differential Revision: https://reviews.llvm.org/D93389	2020-12-17 22:02:33 +08:00
Cullen Rhodes	1fd3a04775	[LV] Disable epilogue vectorization for scalable VFs Epilogue vectorization doesn't support scalable vectorization factors yet, disable it for now. Reviewed By: sdesmalen, bmahjour Differential Revision: https://reviews.llvm.org/D93063	2020-12-17 12:14:03 +00:00
Florian Hahn	eba09a2db9	[InstCombine] Preserve !annotation for newly created instructions. When replacing an instruction with !annotation with a newly created replacement, add the !annotation metadata to the replacement. This mostly covers cases where the new instructions are created using the ::Create helpers. Instructions created by IRBuilder will be handled by D91444. Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D93399	2020-12-17 09:06:51 +00:00
Hongtao Yu	ac068e014b	[CSSPGO] Consume pseudo-probe-based AutoFDO profile This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`. Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites). Adjustment to profile format A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like ``` !CFGChecksum: 12345 ``` is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums. Differential Revision: https://reviews.llvm.org/D92347	2020-12-16 15:57:18 -08:00
alex-t	35ec3ff76d	Disable Jump Threading for the targets with divergent control flow Details: Jump Threading does not make sense for the targets with divergent CF since they do not use branch prediction for speculative execution. Also in the high level IR there is no enough information to conclude that the branch is divergent or uniform. This may cause errors in further CF lowering. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93302	2020-12-17 02:40:54 +03:00
Roman Lebedev	d22a47e9ff	[SimplifyCFG] Teach mergeEmptyReturnBlocks() to preserve DomTree A first real transformation that didn't already knew how to do that, but it's pretty tame - either change successor of all the predecessors of a block and carefully delay deletion of the block until afterwards the DomTree updates are appled, or add a successor to the block. There wasn't a great test coverage for this, so i added extra, to be sure.	2020-12-17 01:03:50 +03:00
Roman Lebedev	5cce4aff18	[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-17 01:03:49 +03:00
Roman Lebedev	49dac4aca0	[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-17 01:03:49 +03:00
Roman Lebedev	4fc169f664	[SimplifyCFG] removeUnreachableBlocks() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Apparently, there were no dedicated tests just for that functionality, so i'm adding one here.	2020-12-17 01:03:49 +03:00
Roman Lebedev	aa2009fe78	[NFCI][SimplifyCFG] Mark all the SimplifyCFG tests that already don't invalidate DomTree as such First step after `e113317958`, in these tests, DomTree is valid afterwards, so mark them as such, so that they don't regress. In further steps, SimplifyCFG transforms shall taught to preserve DomTree, in as small steps as possible.	2020-12-17 01:03:49 +03:00
Rong Xu	0abd744597	[PGO] Use the sum of profile counts to fix the function entry count Raw profile count values for each BB are not kept after profile annotation. We record function entry count and branch weights and use them to compute the count when needed. This mechanism works well in a perfect world, but often breaks in real programs, because of number prevision, inconsistent profile, or bugs in BFI). This patch uses sum of profile count values to fix function entry count to make the BFI count close to real profile counts. Differential Revision: https://reviews.llvm.org/D61540	2020-12-16 13:37:43 -08:00
Sanjay Patel	46c331bf26	[VectorCombine] adjust test alignments for better coverage; NFC	2020-12-16 16:30:45 -05:00
Sanjay Patel	38ebc1a13d	[VectorCombine] optimize alignment for load transform Here's another minimal step suggested by D93229 / D93397 . (I'm trying to be extra careful in these changes because load transforms are easy to get wrong.) We can optimistically choose the greater alignment of a load and its pointer operand. As the test diffs show, this can improve what would have been unaligned vector loads into aligned loads. When we enhance with gep offsets, we will need to adjust the alignment calculation to include that offset. Differential Revision: https://reviews.llvm.org/D93406	2020-12-16 15:25:45 -05:00
Florian Hahn	70bd75426e	[SimplifyCFG] Precommit test for preserving !annotation.	2020-12-16 18:47:45 +00:00
Sanjay Patel	aaaf0ec72b	[VectorCombine] loosen alignment constraint for load transform As discussed in D93229, we only need a minimal alignment constraint when querying whether a hypothetical vector load is safe. We still pass/use the potentially stronger alignment attribute when checking costs and creating the new load. There's already a test that changes with the minimum code change, so splitting this off as a preliminary commit independent of any gep/offset enhancements. Differential Revision: https://reviews.llvm.org/D93397	2020-12-16 12:25:18 -05:00
Florian Hahn	4a6a4e573f	[InstCombine] Precommit tests for !annotation metadata handling.	2020-12-16 15:43:30 +00:00
Bangtian Liu	c10757200d	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit `cf638d793c`.	2020-12-16 11:52:30 +00:00
Bangtian Liu	cf638d793c	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-15 23:32:29 +00:00
Johannes Doerfert	dcaec81211	[OpenMP] Use assumptions during ICV tracking The OpenMP 5.1 assumptions `no_openmp` and `no_openmp_routines` allow us to ignore calls that would otherwise prevent ICV tracking. Once we track more ICVs we might need to distinguish the ones that could be impacted even with `no_openmp_routines`. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D92050	2020-12-15 16:51:34 -06:00
Roman Lebedev	a7deedc414	[NFC][Tests][SimplifyCFG] Trim whitespaces at the end of lines	2020-12-16 00:38:00 +03:00
Philip Reames	a048e2fa1d	[tests] fix an accidental target dependence added in `99ac8868`	2020-12-15 11:07:30 -08:00
Philip Reames	99ac8868cf	[tests][LV] precommit tests for D93317	2020-12-15 10:53:34 -08:00
Jun Ma	52a3267ffa	[InstCombine] Remove scalable vector restriction in foldVectorBinop Differential Revision: https://reviews.llvm.org/D93289	2020-12-15 21:14:59 +08:00
Florian Hahn	0e0295fd61	[LV] Pass explicit vector width to not require a X86 target.	2020-12-15 12:52:22 +00:00
Jun Ma	e12f584578	[InstCombine] Remove scalable vector restriction in InstCombineCompares Differential Revision: https://reviews.llvm.org/D93269	2020-12-15 20:36:57 +08:00
Jun Ma	2ac58e21a1	[InstCombine] Remove scalable vector restriction when fold SelectInst Differential Revision: https://reviews.llvm.org/D93083	2020-12-15 20:36:57 +08:00
Paul Walker	6d35bd1d48	[CodeGenPrepare] Update optimizeGatherScatterInst for scalable vectors. optimizeGatherScatterInst does nothing specific to fixed length vectors but uses FixedVectorType to extract the number of elements. This patch simply updates the code to use VectorType and getElementCount instead. For testing I just copied Transforms/CodeGenPrepare/X86/gather-scatter-opt.ll replacing `<4 x ` with `<vscale x 4`. Differential Revision: https://reviews.llvm.org/D92572	2020-12-15 10:57:51 +00:00
Florian Hahn	8a7e770638	[LV] Add reduction test, which exposed a crash in a pending patch.	2020-12-15 09:42:00 +00:00
Max Kazantsev	8b330f1f69	[SCEV] Add missing type check into getRangeForAffineNoSelfWrappingAR We make type widening without checking if it's needed. Bail if the max iteration count is wider than AR's type.	2020-12-15 14:50:32 +07:00
Max Kazantsev	2fc2e6de82	[Test] Test on assertion failure with expensive SCEV range inference	2020-12-15 13:47:19 +07:00
Rong Xu	54e03d03a7	[PGO] Verify BFI counts after loading profile data This patch adds the functionality to compare BFI counts with real profile counts right after reading the profile. It will print remarks under -Rpass-analysis=pgo, or the internal option -pass-remarks-analysis=pgo. Differential Revision: https://reviews.llvm.org/D91813	2020-12-14 15:56:10 -08:00
Sanjay Patel	8593e197bc	[VectorCombine] add alignment test for gep load; NFC	2020-12-14 18:31:19 -05:00
Sanjay Patel	d399f870b5	[VectorCombine] make load transform poison-safe As noted in D93229, the transform from scalar load to vector load potentially leaks poison from the extra vector elements that are being loaded. We could use freeze here (and x86 codegen at least appears to be the same either way), but we already have a shuffle in this logic to optionally change the vector size, so let's allow that instruction to serve both purposes. Differential Revision: https://reviews.llvm.org/D93238	2020-12-14 17:42:01 -05:00
Craig Topper	25067f179f	[LoopIdiomRecognize] Teach detectShiftUntilZeroIdiom to recognize loops where the counter is decrementing. This adds support for loops like unsigned clz(unsigned x) { unsigned w = sizeof (x) * CHAR_BIT; while (x) { w--; x >>= 1; } return w; } and unsigned clz(unsigned x) { unsigned w = sizeof (x) * CHAR_BIT - 1; while (x >>= 1) { w--; } return w; } To support these we look for add x, -1 as well as add x, 1 that we already matched. If the value was -1 we need to subtract from the initial counter value instead of adding to it. Fixes PR48404. Differential Revision: https://reviews.llvm.org/D92745	2020-12-14 14:25:05 -08:00
Sanjay Patel	9c1765acab	[VectorCombine] add test for load with offset; NFC	2020-12-14 14:40:06 -05:00
Roman Lebedev	59560e8589	[SimplifyCFG] FoldBranchToCommonDest(): temporairly put back restrictions on liveout uses of bonus instructions (PR48450) Even though `d38205144f` was mostly a correct fix for the external non-PHI users, it's not a generally correct fix, because the 'placeholder' values in those trivial PHI's we create shouldn't be always 'undef', but the PHI itself for the backedges, else we end up with wrong value, as the `@pr48450_2` test shows. But we can't just do that, because we can't check that the PHI can be it's own incoming value when coming from certain predecessor, because we don't have a dominator tree. So until we can address this correctness problem properly, ensure that we don't perform the transformation if there are such problematic external uses. Making dominator tree available there is going to be involved, since `-simplifycfg` pass currently does not preserve/update domtree...	2020-12-14 20:14:31 +03:00
Roman Lebedev	effbbdec6e	[NFC][SimplifyCFG] Add another miscompiled test for PR48450	2020-12-14 20:14:31 +03:00
Stanislav Mekhanoshin	87d7757bbe	[SLP] Control maximum vectorization factor from TTI D82227 has added a proper check to limit PHI vectorization to the maximum vector register size. That unfortunately resulted in at least a couple of regressions on SystemZ and x86. This change reverts PHI handling from D82227 and replaces it with a more general check in SLPVectorizerPass::tryToVectorizeList(). Moved to tryToVectorizeList() it allows to restart vectorization if initial chunk fails. However, this function is more general and handles not only PHI but everything which SLP handles. If vectorization factor would be limited to maximum vector register size it would limit much more vectorization than before leading to further regressions. Therefore a new TTI callback getMaximumVF() is added with the default 0 to preserve current behavior and limit nothing. Then targets can decide what is better for them. The callback gets ElementSize just like a similar getMinimumVF() function and the main opcode of the chain. The latter is to avoid regressions at least on the AMDGPU. We can have loads and stores up to 128 bit wide, and <2 x 16> bit vector math on some subtargets, where the rest shall not be vectorized. I.e. we need to differentiate based on the element size and operation itself. Differential Revision: https://reviews.llvm.org/D92059	2020-12-14 08:49:40 -08:00
Markus Lavin	2a6782bb9f	Reland [DebugInfo] Improve dbg preservation in LSR. Use SCEV to salvage additional @llvm.dbg.value that have turned into referencing undef after transformation (and traditional salvageDebugInfo). Before rewrite (but after introduction of new induction variables) use SCEV to compute an equivalent set of values for each @llvm.dbg.value in the loop body (among the loop header PHI-nodes). After rewrite (and dead PHI elimination) update those @llvm.dbg.value now referencing undef by picking a remaining value from its equivalence set. Allow match with offset by inserting compensation code in the DIExpression. Fixes : PR38815 Differential Revision: https://reviews.llvm.org/D87494	2020-12-14 16:15:18 +01:00
Anton Afanasyev	fac7c7ec3c	[SLP] Fix vector element size for the store chains Vector element size could be different for different store chains. This patch prevents wrong computation of maximum number of elements for that case. Differential Revision: https://reviews.llvm.org/D93192	2020-12-14 15:51:43 +03:00
Simon Pilgrim	5a02bf4f95	[IRCE] Add test case for PR48051	2020-12-14 12:01:19 +00:00
Craig Topper	2acd5a4738	[LoopIdiom] Pre-commit tests for D92745. NFC	2020-12-13 23:25:00 -08:00
Anton Afanasyev	b8c847ee73	[SLP][Test] Precommit test for D93192 This test shows failure of combined stores chains vectorization	2020-12-14 09:23:47 +03:00
Nikita Popov	22dba707b0	[AC] Handle (X+C1)<C2 assumes (PR48408) InstCombine canonicalizes X>C && X<C' style comparisons into (X+C1)<C2. This type of expression is recognized by some analyses like LVI, but currently not when used inside assumptions, because AssumptionCache does not track affected values for it.	2020-12-13 21:00:32 +01:00
Roman Lebedev	d38205144f	[SimplifyCFG] FoldBranchToCommonDest(): bonus instrns must only be used by PHI nodes in successors (PR48450) In particular, if the successor block, which is about to get a new predecessor block, currently only has a single predecessor, then the bonus instructions will be directly used within said successor, which is fine, since the block with bonus instructions dominates that successor. But once there's a new predecessor, the IR is no longer valid, and we don't fix it, because we only update PHI nodes. Which means, the live-out bonus instructions must be exclusively used by the PHI nodes in successor blocks. So we have to form trivial PHI nodes. which will then be successfully updated to recieve cloned bonus instns. This all works fine, except for the fact that we don't have access to the dominator tree, and we don't ignore unreachable code, so we sometimes do end up having to deal with some weird IR. Fixes https://bugs.llvm.org/show_bug.cgi?id=48450	2020-12-13 00:06:57 +03:00
Nikita Popov	afbb6d97b5	[CVP] Simplify and generalize switch handling CVP currently handles switches by checking an equality predicate on all edges from predecessor blocks. Of course, this can only work if the value being switched over is defined in a different block. Replace this implementation with a call to getPredicateAt(), which also does the predecessor edge predicate check (if not defined in the same block), but can also do quite a bit more: It can reason about phi-nodes by checking edge predicates for incoming values, it can reason about assumes, and it can reason about block values. As such, this makes the implementation both simpler and more powerful. The compile-time impact on CTMark is in the noise.	2020-12-12 21:12:27 +01:00
Nikita Popov	ff523aa441	[CVP] Add additional switch tests (NFC) These cover cases handled by getPredicateAt(), but not by the current implementation: * Assumes based on context instruction. * Value from phi node in same block (using per-pred reasoning). * Value from non-phi node in same block (using block-val reasoning).	2020-12-12 20:58:00 +01:00
David Green	ab97c9bdb7	[LV] Fix scalar cost for tail predicated loops When it comes to the scalar cost of any predicated block, the loop vectorizer by default regards this predication as a sign that it is looking at an if-conversion and divides the scalar cost of the block by 2, assuming it would only be executed half the time. This however makes no sense if the predication has been introduced to tail predicate the loop. Original patch by Anna Welker Differential Revision: https://reviews.llvm.org/D86452	2020-12-12 14:21:40 +00:00

1 2 3 4 5 ...

17117 Commits