llvm-project

Commit Graph

Author	SHA1	Message	Date
Philip Reames	3e5ce49e53	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCI-ish prep work, but the changes are a bit too involved for me to feel comfortable tagging the change that way. Differential Revision: https://reviews.llvm.org/D94892	2021-02-04 17:28:30 -08:00
Richard Smith	ab243efb26	Don't infer attributes on '::operator new'. These attributes were all incorrect or inappropriate for LLVM to infer: - inaccessiblememonly is generally wrong; user replacement operator new can access memory that's visible to the caller, as can a new_handler function. - willreturn is generally wrong; a custom new_handler is not guaranteed to terminate. - noalias is inappropriate: Clang has a flag to determine whether this attribute should be present and adds it itself when appropriate. - noundef and nonnull on the return value should be specified by the frontend on all 'operator new' functions if we want them, not here. In any case, inferring attributes on functions declared 'nobuiltin' (as these are when Clang emits them) seems questionable.	2021-02-04 13:59:49 -08:00
Richard Smith	1484ad4137	Revert "[BuildLibcalls, Attrs] Support more variants of C++'s new, add attributes for C++'s delete" Several of the new attributes here were incorrect, and even the ones that are generally correct were being added even to nobuiltin calls. This reverts commit `bb3f169b59`.	2021-02-04 13:59:49 -08:00
Sander de Smalen	75b2555d6e	NFC: Migrate LoopUnrollPass to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm, fhahn Differential Revision: https://reviews.llvm.org/D95817	2021-02-04 14:05:40 +00:00
Florian Hahn	703f6a6828	[ConstraintElimination] Support conditions from loop preheaders This patch extends the condition collection logic to allow adding conditions from pre-headers to loop headers, by allowing cases where the target block dominates some of its predecessors.	2021-02-04 13:58:32 +00:00
Chuanqi Xu	9511fa2dda	[NFC][Coroutine] Remove redundant comment The functionallity in the TODO was added before: https://reviews.llvm.org/rGb3a722e66b75328ab5e2eb5c8572022cb083855b	2021-02-04 12:54:30 +08:00
Kazu Hirata	be37475897	[Transforms/IPO] Use range-based for loops (NFC)	2021-02-03 20:41:20 -08:00
Nico Weber	b995314143	Revert "[InstrProfiling] Use !associated metadata for counters, data and values" This reverts commit `97ba5cde52`. Still breaks tests: https://reviews.llvm.org/D76802#2540647	2021-02-03 19:14:34 -05:00
Arthur Eubanks	f020544601	[NewPM][HelloWorld] Move HelloWorld to Utils To prevent creating a new component, which creates a new library. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D95907	2021-02-03 12:59:40 -08:00
Rong Xu	b8f13db5b7	[SampleFDO][NFC] Detach SampleProfileLoader from SampleCoverageTracker This patch detaches SampleProfileLoader from class SampleCoverageTracker. We plan to move SampleProfileLoader to a template class. This would remain SampleCoverageTracker as a class. Also make callsiteIsHot() as a file static function. Differential Revision: https://reviews.llvm.org/D95823	2021-02-03 11:38:04 -08:00
Florian Hahn	daaa0e3501	[VPlan] Manage induction value creation using VPValues. This patch updates the induction value creation to use VPValues of recipes to map the created values. This should bring is one step closer to being able to optimize induction recipes directly in VPlan. Currently widenIntOrFpInduction also generates vector values for a cast of the induction, if it exists. Make this explicit by adding the cast instruction to the values defined by the recipe. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92284	2021-02-03 17:45:03 +00:00
David Sherwood	d4626eb0bd	[VPlan][NFC] Introduce constructors for VPIteration This patch adds constructors to VPIteration as a cleaner way of initialising the struct and replaces existing constructions of the form: {Part, Lane} with VPIteration(Part, Lane) I have also added a default constructor, which is used by VPlan.cpp when deciding whether to replicate a block or not. This refactoring will be required in a later patch that adds more members and functions to VPIteration. Differential Revision: https://reviews.llvm.org/D95676	2021-02-03 08:52:27 +00:00
Petr Hosek	97ba5cde52	[InstrProfiling] Use !associated metadata for counters, data and values C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics. The !associated metadata may be attached to a global object declaration with a single argument that references another global object, and it gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. Note that !associated metadata is only supported by ELF, it does not have any effect on non-ELF targets. Differential Revision: https://reviews.llvm.org/D76802	2021-02-02 23:19:51 -08:00
Kazu Hirata	dc3d5453bc	[Transforms/Utils] Use range-based for loops (NFC)	2021-02-02 22:52:47 -08:00
Florian Hahn	d8e90716df	[ConstraintElimination] Skip pointer casts. We should be able to look through pointer casts that do not impact the value.	2021-02-02 21:25:29 +00:00
Hongtao Yu	3d89b3cbec	[CSSPGO] Introducing distribution factor for pseudo probe. Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count. This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes. A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead. Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D93264	2021-02-02 11:55:01 -08:00
Fangrui Song	51da12680f	[ConstraintElimination] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build	2021-02-02 10:23:14 -08:00
Jeroen Dobbelaere	50c523a9d4	[InlineFunction] Only update noalias scopes once for an instruction. Inlining sometimes maps different instructions to be inlined onto the same instruction. We must ensure to only remap the noalias scopes once. Otherwise the scope might disappear (at best). This patch ensures that we only replace scopes for which the mapping is known. This approach is preferred over tracking which instructions we already handled in a SmallPtrSet, as that one will need more memory. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95862	2021-02-02 17:57:10 +01:00
Florian Hahn	3e09bc2500	[ConstraintElimination] Add nicer way to dump constraints (NFC). Use ConstraintSystem::dump(Names) to display the result of decomposing a condition.	2021-02-02 16:36:45 +00:00
Wenlei He	1645f465be	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. This is resubmit of D95024, with build break and overtighten assertion fixed. Test Plan:	2021-02-02 07:55:08 -08:00
Roman Lebedev	485c4b552b	[InstCombine] Host inversion out of ashr's value operand (PR48995) This is a yet another hint that we will eventually need InstCombineInverter, which would consistently sink inversions, but but for that we'll need to consistently hoist inversions where possible, so let's do that here. Example of a proof: https://alive2.llvm.org/ce/z/78SbDq See https://bugs.llvm.org/show_bug.cgi?id=48995	2021-02-02 17:56:43 +03:00
Tom Weaver	4f1320b77d	Revert "[InstrProfiling] Use !associated metadata for counters, data and values" This reverts commit `df3e39f60b`. introduced failing test instrprof-gc-sections.c causing build bot to fail: http://lab.llvm.org:8011/#/builders/53/builds/1184	2021-02-02 14:19:31 +00:00
Sander de Smalen	3d3ca8f8eb	NFC: Migrate SpeculateAroundPHIs to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: ctetreau Differential Revision: https://reviews.llvm.org/D95353	2021-02-02 13:32:45 +00:00
Sander de Smalen	00da322788	NFC: Migrate SimpleLoopUnswitch to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D95352	2021-02-02 13:32:44 +00:00
Adrian Kuegel	48ca6da9d2	Revert "[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline" This reverts commit `9a03058d63`.	2021-02-02 11:51:04 +01:00
Adrian Kuegel	3a65ec4bf9	Revert "Fix build break from D95024" This reverts commit `09cd849fde`.	2021-02-02 11:51:04 +01:00
David Sherwood	d4d4ceeb8f	[SVE][LoopVectorize] Add masked load/store and gather/scatter support for SVE This patch updates IRBuilder::CreateMaskedGather/Scatter to work with ScalableVectorType and adds isLegalMaskedGather/Scatter functions to AArch64TargetTransformInfo. In addition I've fixed up isLegalMaskedLoad/Store to return true for supported scalar types, since this is what the vectorizer asks for. In LoopVectorize.cpp I've changed LoopVectorizationCostModel::getInterleaveGroupCost to return an invalid cost for scalable vectors, since currently this relies upon using shuffle vector for reversing vectors. In addition, in LoopVectorizationCostModel::setCostBasedWideningDecision I have assumed that the cost of scalarising memory ops is infinitely expensive. I have added some simple masked load/store and gather/scatter tests, including cases where we use gathers and scatters for conditional invariant loads and stores. Differential Revision: https://reviews.llvm.org/D95350	2021-02-02 09:52:39 +00:00
Wenlei He	09cd849fde	Fix build break from D95024	2021-02-02 01:01:12 -08:00
Wenlei He	9a03058d63	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. Test Plan: Differential Revision: https://reviews.llvm.org/D95024	2021-02-02 00:34:06 -08:00
Wenlei He	6bae5973c4	[CSSPGO] Call site prioritized inlining for sample PGO This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`). Motivation With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO. - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat. - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately. - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO. New FDO Inliner Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed. - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655. - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check. - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites. - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path. Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO). Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO. Tunings and knobs I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO. Results Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it). Differential Revision: https://reviews.llvm.org/D94001	2021-02-01 23:46:34 -08:00
Gil Rapaport	d475030dc2	[SCEV] Apply loop guards to divisibility tests Extend applyLoopGuards() to take into account conditions/assumes proving some value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the loop unroller and the loop vectorizer identify more loops as not requiring remainder loops. Differential Revision: https://reviews.llvm.org/D95521	2021-02-02 08:09:39 +02:00
Petr Hosek	df3e39f60b	[InstrProfiling] Use !associated metadata for counters, data and values C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics. The !associated metadata may be attached to a global object declaration with a single argument that references another global object, and it gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. Note that !associated metadata is only supported by ELF, it does not have any effect on non-ELF targets. Differential Revision: https://reviews.llvm.org/D76802	2021-02-01 15:01:43 -08:00
Hongtao Yu	224fee8219	[CSSPGO] Tweaking inlining with pseudo probes. Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites. Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95791	2021-02-01 13:56:40 -08:00
Sanjay Patel	bbed5f2f8a	[LoopVectorize] improve IR fast-math-flags propagation in reductions This is another step (see D95452) towards correcting fast-math-flags bugs in vector reductions. There are multiple bugs visible in the test diffs, and this is still not working as it should. We still use function attributes (rather than FMF) to drive part of the logic, but we are not checking for the correct FP function attributes. Note that FMF may not be propagated optimally on selects (example in https://llvm.org/PR35607 ). That's why I'm proposing to union the FMF of a fcmp+select pair and avoid regressions on existing vectorizer tests. Differential Revision: https://reviews.llvm.org/D95690	2021-02-01 16:21:36 -05:00
Florian Hahn	0b28d756af	[ConstraintElimination] Add support for EQ predicates. A == B map to A >= B && A <= B (https://alive2.llvm.org/ce/z/_dwxKn). This extends the constraint construction to return a list of constraints, which can be used to properly de-compose nested AND & OR.	2021-02-01 20:48:31 +00:00
Michael Holman	8bfef78722	[ConstantHoisting] Fix bug where constant materialization could insert into EH pad If the incoming block to a phi node is an EH pad, then we will materialize into an EH pad, which is not supposed to happen. To fix this, I added a check to see if incoming block of a phi node is an EH pad before using it as the insertion point. Differential Revision: https://reviews.llvm.org/D95019	2021-02-01 11:23:56 -08:00
Sanjay Patel	0ce2920f17	[InstCombine] try to narrow min/max intrinsics with constant operand The constant trunc/ext may not be the optimal pre-condition, but I think that handles the common cases. Example of Alive2 proof: https://alive2.llvm.org/ce/z/sREeLC This is another step towards canonicalizing to the intrinsics. Narrowing was identified as source of potential regression for abs(), so we need to handle this for min/max - see: https://llvm.org/PR48816 If this is not enough, we could process intrinsics in the trunc-driven matching in canEvaluateTruncated().	2021-02-01 13:44:13 -05:00
Florian Hahn	ce190e4144	[ConstraintElimination] Negate IR condition directly. Instead of using ConstraintSystem::negate when adding new constraints, flip the condition in IR. The main advantage is that EQ predicates can be represented by 2 constraints, which makes negating based on the constraint tricky. The IR condition can easily negated.	2021-02-01 17:21:40 +00:00
Sander de Smalen	bf294953e7	NFC: Migrate SimplifyCFG to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D95351	2021-02-01 16:14:05 +00:00
Sander de Smalen	880b64aa22	[SimplifyCFG] NFC: Rename static methods to clang-tidy standards. This patch is a precursor to D95351, which changes the signature of these methods.	2021-02-01 16:14:05 +00:00
Cullen Rhodes	8cda227432	[LV] Fix crash when computing max VF too early D90687 introduced a crash: llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int): Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() && "No decisions should have been taken at this point"' failed. when compiling the following C code: typedef struct { char a; } b; b *c; int d, e; int f() { int g = 0; for (; d; d++) { e = 0; for (; e < c[d].a; e++) g++; } return g; } with: clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.c -S -o - This occurred since prior to D90687 computeFeasibleMaxVF would only be called in computeMaxVF when a scalar epilogue was allowed, but now it's always called. This causes the assert above since computeFeasibleMaxVF collects all viable VFs larger than the default MaxVF, and for each VF calculates the register usage which results in analysis being done the assert above guards against. This can occur in computeFeasibleMaxVF if TTI.shouldMaximizeVectorBandwidth and this target hook is implemented in the hexagon backend to always return true. Reported by @iajbar. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94869	2021-02-01 12:14:59 +00:00
Sander de Smalen	3b8a1d581e	NFC: Migrate SpeculativeExecution to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D95356	2021-02-01 12:13:23 +00:00
Florian Hahn	a9583a1923	[LoopUnswitch] Pacify compiler warnings. Attempt to fix some compiler warnings on some bots after `b8c81fa5c7`.	2021-02-01 09:13:06 +00:00
Florian Hahn	b8c81fa5c7	[LoopUnswitch] Add shortcut if unswitched path is a no-op. If we determine that the invariant path through the loop has no effects, we can directly branch to the exit block, instead to unswitching first. Besides avoiding some extra work (unswitching first, then deleting the loop again) this allows to be more aggressive than regular unswitching with respect to cost-modeling. This approach should always be be desirable. This is similar in spirit to D93734, just that it uses the previously added checks for loop-unswitching. I tried to add the required no-op checks from scratch, as we only check a subset of the loop. There is potential to unify the checks with LoopDeletion, at the cost of adding a predicate whether a block should be considered. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95468	2021-02-01 09:03:30 +00:00
Jeroen Dobbelaere	80cdd30eb9	[LoopPeel] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed. The reduction of a sanitizer build failure when enabling the dominance check (D95335) showed that loop peeling also needs to take care of scope duplication, just like loop unrolling (D92887). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95544	2021-02-01 10:01:17 +01:00
Kazu Hirata	3d1200b9f6	[llvm] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-01-31 10:23:43 -08:00
Florian Hahn	39486753d5	[ConstraintElimination] Verify CS and DFSInStack are in sync.(NFC) After the main loop is done, we should have one constraint per item in DFSInStack. Otherwise we added a constraint without a proper DFSInStack item.	2021-01-30 18:27:04 +00:00
Florian Hahn	10c57268c0	[LoopUnswitch] Properly update MSSA if header has non-clobbering stores. This patch fixes updating MemorySSA if the header contains memory defs that do not clobber a duplicated instruction. We need to find the first defining access outside the loop body and use that as defining access of the duplicated instruction. This fixes a crash caused by `bee486851c`.	2021-01-30 13:51:05 +00:00
Kazu Hirata	8ed1636184	[llvm] Use isa instead of dyn_cast (NFC)	2021-01-29 23:23:37 -08:00
Roman Lebedev	c2534a7097	[ShadowStackGCLowering] Preserve Dominator Tree, if avaliable This doesn't help avoid any Dominator Tree recalculations just yet, there's one more pass to go..	2021-01-30 01:14:51 +03:00
Roman Lebedev	a78d8feb48	[LowerConstantIntrinsics] Preserve Dominator Tree, if avaliable	2021-01-30 01:14:50 +03:00
Sriraman Tallam	9a81a4ef79	Emit metadata when instr. profiles hash mismatch occurs. This patch emits "instr_prof_hash_mismatch" function annotation metadata if there is a hash mismatch while applying instrumented profiles. During the PGO optimized build using instrumented profiles, if the CFG of the function has changed since generating the profile, a hash mismatch is encountered. This patch emits this information as annotation metadata. We plan to use this with Propeller which is done at the machine IR level. Propeller is usually applied on top of PGO and a hash mismatch during PGO could be used to detect source drift. Differential Revision: https://reviews.llvm.org/D95495	2021-01-29 12:56:01 -08:00
Florian Hahn	f3a710cade	[LTO] Update splitCodeGen to take a reference to the module. (NFC) splitCodeGen does not need to take ownership of the module, as it currently clones the original module for each split operation. There is an ~4 year old fixme to change that, but until this is addressed, the function can just take a reference to the module. This makes the transition of LTOCodeGenerator to use LTOBackend a bit easier, because under some circumstances, LTOCodeGenerator needs to write the original module back after codegen. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95222	2021-01-29 11:53:11 +00:00
Yang Fan	59bd2068e9	[NFC][ScalarizeMaskedMemIntrin] Fix unused variable warning GCC warning: ``` /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp: In function ‘void scalarizeMaskedStore(llvm::CallInst, llvm::DomTreeUpdater, bool&)’: /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp:295:15: warning: variable ‘IfBlock’ set but not used [-Wunused-but-set-variable] 295 \| BasicBlock IfBlock = CI->getParent(); \| ^~~~~~~ /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp: In function ‘void scalarizeMaskedScatter(llvm::CallInst, llvm::DomTreeUpdater, bool&)’: /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp:555:15: warning: variable ‘IfBlock’ set but not used [-Wunused-but-set-variable] 555 \| BasicBlock IfBlock = CI->getParent(); \| ^~~~~~~ ```	2021-01-29 15:15:58 +08:00
Roman Lebedev	056385921d	[ScalarizeMaskedMemIntrin] Preserve Dominator Tree, if avaliable This de-pessimizes the arguably more usual case of no masked mem intrinsics, and gets rid of one more Dominator Tree recalculation. As per llvm/test/CodeGen/X86/opt-pipeline.ll, there's one more Dominator Tree recalculation left, we could get rid of.	2021-01-29 01:11:36 +03:00
Roman Lebedev	577fdcaa93	[PartiallyInlineLibCalls] Preserve Dominator Tree, if avaliable This doesn't get rid of any Dominator Tree recalculations just yet, there is one more pass to update..	2021-01-29 01:11:36 +03:00
Roman Lebedev	573f74117b	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedCompressStore(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:35 +03:00
Roman Lebedev	2e4bb3f119	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedExpandLoad(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:35 +03:00
Roman Lebedev	e8efc03a1e	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedScatter(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:35 +03:00
Roman Lebedev	1356399a11	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedGather(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:34 +03:00
Roman Lebedev	22b8421156	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedStore(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:34 +03:00
Roman Lebedev	0ea45a412a	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedLoad(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:34 +03:00
Roman Lebedev	394685481c	[NFC][PartiallyInlineLibCalls] Port to SplitBlockAndInsertIfThen() This makes follow-up patch for Dominator Tree preservation somewhat more straight-forward.	2021-01-29 01:11:33 +03:00
Roman Lebedev	2de2d84ed0	[NFC][EntryExitInstrumenter] Mark Dominator Tree as preserved in legacy-PM too This is correctly handled in new-PM wrappers, but not in old-PM.	2021-01-29 01:11:33 +03:00
Adrian Prantl	62140d943c	Better document the limitations of coro::salvageDebugInfo() and fix a few edge cases that show up in the Swift compiler but weren't caught by the existing tests. Most notably the old code wasn't salvaging load operations correctly. The patch also gets rid of the LoadFromFramePtr argument and replaces it with a more generalized mechanism.	2021-01-28 09:53:19 -08:00
Roman Lebedev	8cfa963463	[SimplifyCFG] If provided, preserve Dominator Tree SimplifyCFG is an utility pass, and the fact that it does not preserve DomTree's, forces it's users to somehow workaround that, likely by not preserving DomTrees's themselves. Indeed, simplifycfg pass didn't know how to preserve dominator tree, it took me just under a month (starting with `e113317958`) do rectify that, now it fully knows how to, there's likely some problems with that still, but i've dealt with everything i can spot so far. I think we now can flip the switch. Note that this is functionally an NFC change, since this doesn't change the users to pass in the DomTree, that is a separate question. Reviewed By: kuhar, nikic Differential Revision: https://reviews.llvm.org/D94827	2021-01-28 14:11:34 +03:00
Yang Fan	8644eb024b	[NFC][Transforms][Coroutines] Remove unused variable	2021-01-28 16:42:30 +08:00
Kazu Hirata	0da15ea581	[llvm] Use append_range (NFC)	2021-01-27 23:25:41 -08:00
Hongtao Yu	7e99bddfea	[CSSPGO] Support of CS profiles in extended binary format. This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95547	2021-01-27 21:29:46 -08:00
Teresa Johnson	1487747e99	[LTO] Prevent devirtualization for symbols dynamically exported Identify dynamically exported symbols (--export-dynamic[-symbol=], --dynamic-list=, or definitions needed to preempt shared objects) and prevent their LTO visibility from being upgraded. This helps avoid use of whole program devirtualization when there may be overrides in dynamic libraries. Differential Revision: https://reviews.llvm.org/D91583	2021-01-27 15:54:13 -08:00
Sanjay Patel	ab93c18c12	[LoopVectorize] use IR fast-math-flags exclusively (not FP function attributes) I am trying to untangle the fast-math-flags propagation logic in the vectorizers (see `a6f022127` for SLP). The loop vectorizer has a mix of checking FP function attributes, IR-level FMF, and just wrong assumptions. I am trying to avoid regressions while fixing this, and I think the IR-level logic is good enough for that, but it's hard to say for sure. This would be the 1st step in the clean-up. The existing test that I changed to include 'fast' actually shows a miscompile: the function only had the equivalent of nnan, but we created new instructions that had fast (all FMF set). This is similar to the example in https://llvm.org/PR35538 Differential Revision: https://reviews.llvm.org/D95452	2021-01-27 14:17:11 -05:00
Fangrui Song	54fb3ca96e	[ThinLTO] Add Visibility bits to GlobalValueSummary::GVFlags Imported functions and variable get the visibility from the module supplying the definition. However, non-imported definitions do not get the visibility from (ELF) the most constraining visibility among all modules (Mach-O) the visibility of the prevailing definition. This patch * adds visibility bits to GlobalValueSummary::GVFlags * computes the result visibility and propagates it to all definitions Protected/hidden can imply dso_local which can enable some optimizations (this is stronger than GVFlags::DSOLocal because the implied dso_local can be leveraged for ELF -shared while default visibility dso_local has to be cleared for ELF -shared). Note: we don't have summaries for declarations, so for ELF if a declaration has the most constraining visibility, the result visibility may not be that one. Differential Revision: https://reviews.llvm.org/D92900	2021-01-27 10:43:51 -08:00
Florian Hahn	28410d17f5	[LoopUtils] Pass SCEVExpander instead SE to addRuntimeChecks. This gives the user control over which expander to use, which in turn allows the user to decide what to do with the expanded instructions. Used in D75980. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D94295	2021-01-27 17:36:19 +00:00
Petr Hosek	bb9eb19829	Support for instrumenting only selected files or functions This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820	2021-01-26 17:13:34 -08:00
Adrian Prantl	0554541b44	Salvage debug info for function arguments in coro-split funclets. This patch improves the availability for variables stored in the coroutine frame by emitting an alloca to hold the pointer to the frame object and rewriting dbg.declare intrinsics to point inside the frame object using salvaged DIExpressions. Finally, a new alloca is created in the funclet to hold the FramePtr pointer to ensure that it is available throughout the entire function at -O0. This path also effectively reverts D90772. The testcase updates highlight nicely how every removed CHECK for a dbg.value is preceded by a new CHECK for a dbg.declare. Thanks to JunMa, Yifeng, and Bruno for their thoughtful reviews! Differential Revision: https://reviews.llvm.org/D93497 rdar://71866936	2021-01-26 15:01:26 -08:00
Bjorn Pettersson	a9bd3d37bd	[NewPM] Add ExtraVectorizerPasses support As it looks like NewPM generally is using SimpleLoopUnswitch instead of LoopUnswitch, this patch also use SimpleLoopUnswitch in the ExtraVectorizerPasses sequence (compared with LegacyPM which use the LoopUnswitch pass). Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D95457	2021-01-26 22:59:10 +01:00
Valery N Dmitriev	716b9dd0d8	[InstCombine] Preserve FMF for powi simplifications. Differential Revision: https://reviews.llvm.org/D95455	2021-01-26 13:26:06 -08:00
Petr Hosek	1e634f3952	Revert "Support for instrumenting only selected files or functions" This reverts commit `4edf35f11a` because the test fails on Windows bots.	2021-01-26 12:25:28 -08:00
Petr Hosek	4edf35f11a	Support for instrumenting only selected files or functions This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820	2021-01-26 11:11:39 -08:00
Sanjay Patel	09b1c56366	[LoopUtils] do not initialize Cmp predicate unnecessarily; NFC The switch must set the predicate correctly; anything else should lead to unreachable/assert. I'm trying to fix FMF propagation here and the callers, so this is a preliminary cleanup.	2021-01-26 11:22:51 -05:00
Florian Hahn	1272f16d14	[LoopUnswitch] Avoid partially unswitching too aggressively. This patch adds additional checks to avoid partial unswitching in cases where it won't be profitable, e.g. because the path directly exits the loop anyways.	2021-01-26 15:18:41 +00:00
Florian Hahn	35b3989a30	[Passes] Run peeling as part of simple/full loop unrolling. Loop peeling removes conditions from loop bodies that become invariant after a small number of iterations. When triggered, this leads to fewer compares and possibly PHIs in loop bodies, enabling further optimizations. The current cost-model of loop peeling should be quite conservative/safe, i.e. only peel if a condition in the loop becomes known after peeling. For example, see PR47671, where loop peeling enables vectorization by removing a PHI the vectorizer does not understand. Granted, the loop-vectorizer could also be taught about constant PHIs, but loop peeling is likely to enable other optimizations as well. This has an impact on quite a few benchmarks from MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsVectorized Program base patch diff test-suite...ve-susan/automotive-susan.test 8.00 9.00 12.5% test-suite...nal/skidmarks10/skidmarks.test 35.00 31.00 -11.4% test-suite...lications/sqlite3/sqlite3.test 41.00 43.00 4.9% test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test 25.00 26.00 4.0% test-suite...006/450.soplex/450.soplex.test 88.00 89.00 1.1% test-suite...TimberWolfMC/timberwolfmc.test 120.00 119.00 -0.8% test-suite.../CINT2006/403.gcc/403.gcc.test 215.00 216.00 0.5% test-suite...006/447.dealII/447.dealII.test 957.00 958.00 0.1% test-suite...ternal/HMMER/hmmcalibrate.test 75.00 75.00 0.0% Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsAnalyzed Program base patch diff test-suite...ks/Prolangs-C/agrep/agrep.test 440.00 434.00 -1.4% test-suite...nal/skidmarks10/skidmarks.test 312.00 308.00 -1.3% test-suite...marks/7zip/7zip-benchmark.test 6399.00 6323.00 -1.2% test-suite...lications/minisat/minisat.test 134.00 135.00 0.7% test-suite...rks/FreeBench/pifft/pifft.test 295.00 297.00 0.7% test-suite...TimberWolfMC/timberwolfmc.test 1879.00 1869.00 -0.5% test-suite...pplications/treecc/treecc.test 689.00 691.00 0.3% test-suite...T2000/300.twolf/300.twolf.test 1593.00 1597.00 0.3% test-suite.../Benchmarks/Bullet/bullet.test 1394.00 1392.00 -0.1% test-suite...ications/JM/ldecod/ldecod.test 1431.00 1429.00 -0.1% test-suite...6/464.h264ref/464.h264ref.test 2229.00 2230.00 0.0% test-suite...lications/sqlite3/sqlite3.test 2590.00 2589.00 -0.0% test-suite...ications/JM/lencod/lencod.test 2732.00 2733.00 0.0% test-suite...006/453.povray/453.povray.test 3395.00 3394.00 -0.0% Note the -11% regression in number of loops vectorized for skidmarks. I suspect this corresponds to the fact that those loops are gone now (see the reduction in number of loops analyzed by LV). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88471	2021-01-26 13:52:30 +00:00
Sergey Dmitriev	13cedcaf45	[llvm-link] Fix crash when materializing appending global This patch fixes llvm-link crash when materializing global variable with appending linkage and initializer that depends on another global with appending linkage. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D95329	2021-01-25 18:08:07 -08:00
modimo	ce7f9cdb50	[InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks This change leverages the work done in D83743 to replay in the SampleProfile inliner to also be used in the CGSCC inliner. NOTE: currently restricted to non-ML advisors only. The added switch `-cgscc-inline-replay=<remarks file>` will replay the inlining decisions in that file where the remarks file is generated via `-Rpass=inline`. The aim here is to make it easier to analyze changes that would modify inlining heuristics to be separated from this behavior. Doing so allows easier examination of assembly and runtime behavior compared to the baseline rather than trying to dig through the large churn caused by inlining. In LTO compilation, since inlining is done twice you can separately specify replay by passing the flag to the FE (`-cgscc-inline-replay=`) and to the linker (`-Wl,cgscc-inline-replay=`) with the remarks generated from their respective places. Testing on mysqld by comparing the inline decisions between base (generates remarks.txt) and diff (replay using identical input/tools with remarks.txt) and examining the inlining sites with `diff` shows 14,000 mismatches out of 247,341 for a ~94% replay accuracy. I believe this gap can be narrowed further though for the general case we may never achieve full accuracy. For my personal use, this is close enough to be representative: I set the baseline as the one generated by the replay on identical input/toolset and compare that to my modified input/toolset using the same replay. Testing: ninja check-llvm newly added test correctly replays CGSCC inlining decisions Reviewed By: mtrofin, wenlei Differential Revision: https://reviews.llvm.org/D94334	2021-01-25 15:38:57 -08:00
Nikita Popov	835104a114	[LSR] Drop potentially invalid nowrap flags when switching to post-inc IV (PR46943) When LSR converts a branch on the pre-inc IV into a branch on the post-inc IV, the nowrap flags on the addition may no longer be valid. Previously, a poison result of the addition might have been ignored, in which case the program was well defined. After branching on the post-inc IV, we might be branching on poison, which is undefined behavior. Fix this by discarding nowrap flags which are not present on the SCEV expression. Nowrap flags on the SCEV expression are proven by SCEV to always hold, independently of how the expression will be used. This is essentially the same fix we applied to IndVars LFTR, which also performs this kind of pre-inc to post-inc conversion. I believe a similar problem can also exist for getelementptr inbounds, but I was not able to come up with a problematic test case. The inbounds case would have to be addressed in a differently anyway (as SCEV does not track this property). Fixes https://bugs.llvm.org/show_bug.cgi?id=46943. Differential Revision: https://reviews.llvm.org/D95286	2021-01-25 23:13:48 +01:00
Richard Smith	925ae8c790	Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV" This reverts commit `53176c1680`, which introduceed a layering violation. LLVM's IR library can't include headers from Analysis.	2021-01-25 13:53:38 -08:00
Akira Hatanaka	53176c1680	[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end annotates calls with attribute "clang.arc.rv"="retain" or "clang.arc.rv"="claim", which indicates the call is implicitly followed by a marker instruction and a retainRV/claimRV call that consumes the call result. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the annotated calls in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the annotated calls. It doesn't remove the attribute on the call since the backend needs it to emit the marker instruction. The retainRV/claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes the autoreleaseRV call in the callee that returns the result if nothing in the callee prevents it from being paired up with the calls annotated with "clang.arc.rv"="retain/claim" in the caller. If the call is annotated with "claim", a release call is inserted since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the attributes to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV call returning the callee result, which makes it impossible to pair it up with the retainRV or claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the call is annotated with "retain" and does nothing if it's annotated with "claim". - This patch teaches dead argument elimination pass not to change the return type of a function if any of the calls to the function are annotated with attribute "clang.arc.rv". This is necessary since the pass can incorrectly determine nothing in the IR uses the function return, which can happen since the front-end no longer explicitly emits retainRV/claimRV calls in the IR, and change its return type to 'void'. Future work: - Use the attribute on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the attributes. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-01-25 11:57:08 -08:00
Florian Hahn	76afbf60ed	[VPlan] Replace uses with new value in VPInstructionsToVPRecipe (NFC). Now that VPRecipeBase inherits from VPDef, we can always use the new VPValue for replacement, if the recipe defines one. Given the recipes that are supported at the moment, all new recipes must have either 0 or 1 defined values.	2021-01-25 19:38:08 +00:00
Nick Desaulniers	d36812892c	[GVN] do not repeat PRE on failure to split critical edge Fixes an infinite loop encountered in GVN. GVN will delay PRE if it encounters critical edges, attempt to split them later via calls to SplitCriticalEdge(), then restart. The caller of GVN::splitCriticalEdges() assumed a return value of true meant that critical edges were split, that the IR had changed, and that PRE should be re-attempted, upon which we loop infinitely. This was exposed after D88438, by compiling the Linux kernel for s390, but the test case is reproducible on x86. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1261 Reviewed By: void Differential Revision: https://reviews.llvm.org/D94996	2021-01-25 11:23:44 -08:00
Wei Mi	c9cd9a0066	[SampleFDO] Report error when reading a bad/incompatible profile instead of turning off SampleFDO silently. Currently sample loader pass turns off SampleFDO optimization silently when it sees error in reading the profile. This behavior will defeat the tests which could have caught those bad/incompatible profile problems. This patch change the behavior to report error. Differential Revision: https://reviews.llvm.org/D95269	2021-01-25 10:28:23 -08:00
Xun Li	17c3538aef	Revert "Fix unused variable in CoroFrame.cpp when building Release with GCC 10" This reverts commit `ff5e896425`.	2021-01-25 08:37:45 -08:00
Florian Hahn	3201274dea	[VPlan] Handle scalarized values in VPTransformState. This patch adds plumbing to handle scalarized values directly in VPTransformState. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92282	2021-01-25 14:21:56 +00:00
Sanjay Patel	09a136bcc6	[InstCombine] narrow min/max intrinsics with extended inputs We can sink extends after min/max if they match and would not change the sign-interpreted compare. The only combo that doesn't work is zext+smin/smax because the zexts could change a negative number into positive: https://alive2.llvm.org/ce/z/D6sz6J Sext+umax/umin works: define i32 @src(i8 %x, i8 %y) { %0: %sx = sext i8 %x to i32 %sy = sext i8 %y to i32 %m = umax i32 %sx, %sy ret i32 %m } => define i32 @tgt(i8 %x, i8 %y) { %0: %m = umax i8 %x, %y %r = sext i8 %m to i32 ret i32 %r } Transformation seems to be correct!	2021-01-25 07:52:50 -05:00
Sander de Smalen	171d12489f	[SLPVectorizer] NFC: Migrate getVectorCallCosts to use InstructionCost. This change also changes getReductionCost to return InstructionCost, and it simplifies two expressions by removing a redundant 'isValid' check.	2021-01-25 12:27:01 +00:00
Nikita Popov	8b9df70bf7	[Utils] Use NoAliasScopeDeclInst in a few more places (NFC) In the cloning infrastructure, only track an MDNode mapping, without explicitly storing the Metadata mapping, same as is done during inlining. This makes things slightly simpler.	2021-01-24 16:24:11 +01:00
Sanjay Patel	77adbe6a8c	[SLP] fix fast-math requirements for fmin/fmax reductions `a6f0221276` enabled intersection of FMF on reduction instructions, so it is safe to ease the check here. There is still some room to improve here - it looks like we have nearly duplicate flags propagation logic inside of the LoopUtils helper but it is limited targets that do not form reduction intrinsics (they form the shuffle expansion).	2021-01-24 08:55:56 -05:00
Jeroen Dobbelaere	dcc7706fcf	[InstCombine] Remove unused llvm.experimental.noalias.scope.decl A @llvm.experimental.noalias.scope.decl is only useful if there is !alias.scope and !noalias metadata that uses the declared scope. When that is not the case for at least one of the two, the intrinsic call can as well be removed. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95141	2021-01-24 13:55:50 +01:00
Jeroen Dobbelaere	659c7bcde6	[LoopRotate] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed Similar to D92887, LoopRotation also needs duplicate the noalias scopes when rotating a `@llvm.experimental.noalias.scope.decl` across a block boundary. This is based on the version from the Full Restrict paches (D68511). The problem it fixes also showed up in Transforms/Coroutines/ex5.ll after D93040 (when enabling strict checking with -verify-noalias-scope-decl-dom). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94306	2021-01-24 13:53:13 +01:00
Jeroen Dobbelaere	774629641b	[LoopUnroll] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed This is a fix for https://bugs.llvm.org/show_bug.cgi?id=39282. Compared to D90104, this version is based on part of the full restrict patched (D68484) and uses the `@llvm.experimental.noalias.scope.decl` intrinsic to track the location where !noalias and !alias.scope scopes have been introduced. This allows us to only duplicate the scopes that are really needed. Notes: - it also includes changes and tests from D90104 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D92887	2021-01-24 13:48:20 +01:00
Roman Lebedev	6f2753273e	[NFC][SimplifyCFG] Extract CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses() out of PerformBranchToCommonDestFolding() To be used in PerformValueComparisonIntoPredecessorFolding()	2021-01-24 00:54:55 +03:00
Roman Lebedev	67f9c87a65	[NFC][SimplifyCFG] Perform early-continue in FoldValueComparisonIntoPredecessors() per-pred loop	2021-01-24 00:54:54 +03:00
Roman Lebedev	a4e6c2e647	[NFC][SimplifyCFG] Extract PerformValueComparisonIntoPredecessorFolding() out of FoldValueComparisonIntoPredecessors() Less nested code is much easier to follow and modify.	2021-01-24 00:54:54 +03:00
Nikita Popov	c83cff45c7	[IR] Add NoAliasScopeDeclInst (NFC) Add an intrinsic type class to represent the llvm.experimental.noalias.scope.decl intrinsic, to make code working with it a bit nicer by hiding the metadata extraction from view.	2021-01-23 22:40:32 +01:00
Kazu Hirata	1238378f18	[llvm] Use pop_back_val (NFC)	2021-01-23 10:56:33 -08:00
Florian Hahn	d60b74c28a	[InstCombine] Set MadeIRChange in replaceInstUsesWith. Some utilities used by InstCombine, like SimplifyLibCalls, may add new instructions and replace the uses of a call, but return nullptr because the inserted call produces multiple results. Previously, the replaced library calls would get removed by InstCombine's deleter, but after `292077072e` this may not happen, if the willreturn attribute is missing. As a work-around, update replaceInstUsesWith to set MadeIRChange, if it replaces any uses. This catches the cases where it is used as replacer by utilities used by InstCombine and seems useful in general; updating uses will modify the IR. This fixes an expensive-check failure when replacing @__sinpif/@__cospifi with @__sincospif_sret.	2021-01-23 17:52:59 +00:00
Sanjay Patel	a6f0221276	[SLP] fix fast-math-flag propagation on FP reductions As shown in the test diffs, we could miscompile by propagating flags that did not exist in the original code. The flags required for fmin/fmax reductions will be fixed in a follow-up patch.	2021-01-23 11:17:20 -05:00
Florian Hahn	292077072e	[Local] Treat calls that may not return as being alive. With the addition of the `willreturn` attribute, functions that may not return (e.g. due to an infinite loop) are well defined, if they are not marked as `willreturn`. This patch updates `wouldInstructionBeTriviallyDead` to not consider calls that may not return as dead. This patch still provides an escape hatch for intrinsics, which are still assumed as willreturn unconditionally. It will be removed once all intrinsics definitions have been reviewed and updated. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94106	2021-01-23 16:05:14 +00:00
Roman Lebedev	022da61f6b	[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers If i change it to AssertingVH instead, a number of existing tests fail, which means we don't consistently remove from the set when deleting blocks, which means newly-created blocks may happen to appear in that set if they happen to occupy the same memory chunk as did some block that was in the set originally. There are many places where we delete blocks, and while we could probably consistently delete from LoopHeaders when deleting a block in transforms located in SimplifyCFG.cpp itself, transforms located elsewhere (Local.cpp/BasicBlockUtils.cpp) also may delete blocks, and it doesn't seem good to teach them to deal with it. Since we at most only ever delete from LoopHeaders, let's just delegate to WeakVH to do that automatically. But to be honest, personally, i'm not sure that the idea behind LoopHeaders is sound.	2021-01-23 16:48:35 +03:00
Jeroen Dobbelaere	2b9a834c43	[InlineFunction] Use llvm.experimental.noalias.scope.decl for noalias arguments. Insert a llvm.experimental.noalias.scope.decl intrinsic that identifies where a noalias argument was inlined. This patch includes some refactorings from D90104. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93040	2021-01-23 12:10:57 +01:00
Zequan Wu	867bdfeff1	[InstCombine] remove incompatible attribute when simplifying some lib calls Like D95088, remove incompatible attribute in more lib calls. Differential Revision: https://reviews.llvm.org/D95278	2021-01-22 17:27:36 -08:00
Philip Reames	ef51eed37b	[LoopDeletion] Handle inner loops w/untaken backedges This builds on the restricted after initial revert form of D93906, and adds back support for breaking backedges of inner loops. It turns out the original invalidation logic wasn't quite right, specifically around the handling of LCSSA. When breaking the backedge of an inner loop, we can cause blocks which were in the outer loop only because they were also included in a sub-loop to be removed from both loops. This results in the exit block set for our original parent loop changing, and thus a need for new LCSSA phi nodes. This case happens when the inner loop has an exit block which is also an exit block of the parent, and there's a block in the child which reaches an exit to said block without also reaching an exit to the parent loop. (I'm describing this in terms of the immediate parent, but the problem is general for any transitive parent in the nest.) The approach implemented here involves a potentially expensive LCSSA rebuild. Perf testing during review didn't show anything concerning, but we may end up needing to revert this if anyone encounters a practical compile time issue. Differential Revision: https://reviews.llvm.org/D94378	2021-01-22 16:31:29 -08:00
Francis Visoiu Mistrih	0cc38acfc4	[Matrix] Propagate shape information through fneg Similar to binary operators like fadd/fmul/fsub, propagate shape info through unary operators (fneg is the only one?). Differential Revision: https://reviews.llvm.org/D95252	2021-01-22 14:34:28 -08:00
Roman Lebedev	1742203844	[SimplifyCFG] FoldBranchToCommonDest(): re-lift restrictions on liveout uses of bonus instructions I have previously tried doing that in `b33fbbaa34` / `d38205144f`, but eventually it was pointed out that the approach taken there was just broken wrt how the uses of bonus instructions are updated to account for the fact that they should now use either bonus instruction or the cloned bonus instruction. In particluar, all that manual handling of PHI nodes in successors was just wrong. But, the fix is actually much much simpler than my initial approach: just tell SSAUpdate about both instances of bonus instruction, and let it deal with all the PHI handling. Alive2 confirms that the reproducers from the original bugs (@pr48450*) are now handled correctly. This effectively reverts commit `59560e8589`, effectively relanding `b33fbbaa34`.	2021-01-23 01:29:05 +03:00
Roman Lebedev	eae1cc0de5	[NFC][SimplifyCFG] PerformBranchToCommonDestFolding(): move instruction cloning to after CFG update This simplifies follow-up patch, and is NFC otherwise.	2021-01-23 01:29:04 +03:00
Roman Lebedev	9bd8bcf993	[NFC][SimplifyCFG] PerformBranchToCommonDestFolding(): fix instruction name preservation NewBonusInst just took name from BonusInst, so BonusInst has no name, so BonusInst.getName() makes no sense. So we need to ask NewBonusInst for the name.	2021-01-23 01:29:03 +03:00
Shimin Cui	99a0aa07e9	[Analysis] Support AIX vec_malloc routines This is to support the memory routines vec_malloc, vec_calloc, vec_realloc, and vec_free. These routines manage memory that is 16-byte aligned. And they are only available on AIX. Differential Revision: https://reviews.llvm.org/D94710	2021-01-22 16:03:01 -05:00
Nikita Popov	45b259f995	[SimplifyLibCalls] Skip unused calls in sincos transform If the call result is unused, we should let it get DCEd rather than replacing it. Also, don't try to replace an existing sincos with another one (unless it's as part of combining sin and cos). This avoids an infinite combine loop if the calls are not DCEd as expected, which can happen with D94106 and lack of willreturn annotation in hand-crafted IR.	2021-01-22 20:57:13 +01:00
Sanjay Patel	411c144e4c	[InstCombine] narrow abs with sign-extended input In the motivating cases from https://llvm.org/PR48816 , we have a trailing trunc. But that is not required to reduce the abs width: https://alive2.llvm.org/ce/z/ECaz-p ...as long as we clear the int-min-is-poison bit (nsw). We have some existing tests that are affected, and I'm not sure what the overall implications are, but in general we favor narrowing operations over preserving nsw/nuw. If that causes problems, we could restrict this transform based on type (shouldChangeType() and/or vector vs. scalar). Differential Revision: https://reviews.llvm.org/D95235	2021-01-22 13:36:04 -05:00
Florian Hahn	86991d3231	[LoopUnswitch] Fix logic to avoid unswitching with atomic loads. The existing code did not deal with atomic loads correctly. Such loads are represented as MemoryDefs. Bail out on any MemoryAccess that is not a MemoryUse.	2021-01-22 15:10:12 +00:00
Arnold Schwaighofer	87b628dadd	[coro.async] Make sure we process async coroutines Because we were not looking for the llvm.coro.id.async intrinsic in the early coro pass which triggers follow-up passes we relied on the llvm.coro.end intrinsic being present. This might not be the case in functions that end in unreachable code. Differential Revision: https://reviews.llvm.org/D95144	2021-01-22 07:04:01 -08:00
Roman Lebedev	85e7578c6d	Revert "[NFCI-ish][SimplifyCFG] FoldBranchToCommonDest(): really don't deal with uncond branches" Does not build in XCode: http://green.lab.llvm.org/green/job/clang-stage1-RA/17963/consoleFull#-1704658317a1ca8a51-895e-46c6-af87-ce24fa4cd561 This reverts commit `aabed3718a`.	2021-01-22 17:37:11 +03:00
Roman Lebedev	d1a6f92fd5	[InstCombine] Fold `(~x) \| y` --> `~(x & (~y))` iff it is free to do so Iff we know we can get rid of the inversions in the new pattern, we can thus get rid of the inversion in the old pattern, this decreasing instruction count. Note that we could position this transformation as just hoisting of the `not` (still, iff y is freely negatible), but the test changes show a number of regressions, so let's not do that.	2021-01-22 17:23:54 +03:00
Roman Lebedev	79b0d21ce9	[InstCombine] Fold `(~x) & y` --> `~(x \| (~y))` iff it is free to do so Iff we know we can get rid of the inversions in the new pattern, we can thus get rid of the inversion in the old pattern, this decreasing instruction count.	2021-01-22 17:23:54 +03:00
Roman Lebedev	4ed0d8f2f0	[NFC][InstCombine] Extract freelyInvertAllUsersOf() out of canonicalizeICmpPredicate() I'd like to use it in an upcoming fold.	2021-01-22 17:23:53 +03:00
Roman Lebedev	efeb8caf8b	[NFC][SimplifyCFG] FoldBranchToCommonDest(): extract the actual transform into helper function I'm intentionally structuring it this way, so that the actual fold only does the fold, and no legality/correctness checks, all of which must be done by the caller. This allows for the fold code to be more compact and more easily grokable.	2021-01-22 17:23:53 +03:00
Roman Lebedev	b482560a59	[NFC][SimplifyCFG] FoldBranchToCommonDest(): extract check for destination sharing into a helper function As a follow-up, i'll extract the actual transform into a function, and this helper will be called from both places, so this avoids code duplication.	2021-01-22 17:23:53 +03:00
Roman Lebedev	7b89efb55e	[NFC][SimplifyCFG] FoldBranchToCommonDest(): somewhat better structure weight updating code Hoist the successor updating out of the code that deals with branch weight updating, and hoist the 'has weights' check from the latter, making code more consistent and easier to follow.	2021-01-22 17:23:41 +03:00
Roman Lebedev	256a035752	[NFC][SimplifyCFG] FoldBranchToCommonDest(): unclutter Cond/CondInPred handling We don't need those variables, we can just get the final value directly.	2021-01-22 17:23:11 +03:00
Roman Lebedev	aabed3718a	[NFCI-ish][SimplifyCFG] FoldBranchToCommonDest(): really don't deal with uncond branches While we already ignore uncond branches, we could still potentially end up with a conditional branches with identical destinations due to the visitation order, or because we were called as an utility. But if we have such a disguised uncond branch, we still probably shouldn't deal with it here.	2021-01-22 17:23:10 +03:00
Roman Lebedev	0895b836d7	[SimplifyCFG] FoldBranchToCommonDest(): don't deal with unconditional branches The case where BB ends with an unconditional branch, and has a single predecessor w/ conditional branch to BB and a single successor of BB is exactly the pattern SpeculativelyExecuteBB() transform deals with. (and in this case they both allow speculating only a single instruction) Well, or FoldTwoEntryPHINode(), if the final block has only those two predecessors. Here, in FoldBranchToCommonDest(), only a weird subset of that transform is supported, and it's glued on the side in a weird way. In particular, it took me a bit to understand that the Cond isn't actually a branch condition in that case, but just the value we allow to speculate (otherwise it reads as a miscompile to me). Additionally, this only supports for the speculated instruction to be an ICmp. So let's just unclutter FoldBranchToCommonDest(), and leave this transform up to SpeculativelyExecuteBB(). As far as i can tell, this shouldn't really impact optimization potential, but if it does, improving SpeculativelyExecuteBB() will be more beneficial anyways. Notably, this only affects a single test, but EarlyCSE should have run beforehand in the pipeline, and then FoldTwoEntryPHINode() would have caught it. This reverts commit rL158392 / commit `d33f4efbfd`.	2021-01-22 17:22:49 +03:00
Anton Rapetov	a4914dc1f2	[SLP] do not traverse constant uses Walking the use list of a Constant (particularly, ConstantData) is not scalable, since a given constant may be used by many instructinos in many functions in many modules. Differential Revision: https://reviews.llvm.org/D94713	2021-01-22 08:14:09 -05:00
David Sherwood	2e080eb00a	[SVE] Add support for scalable vectorization of loops with selects and cmps I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost that prevented a cost being calculated for select instructions when using scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost to only do special cost calculations for fixed width vectors and fall back to the base version for scalable vectors. I have added a simple cost model test for cmps and selects: test/Analysis/CostModel/sve-cmpsel.ll and some simple tests that show we vectorize loops with cmp and select: test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll Differential Revision: https://reviews.llvm.org/D95039	2021-01-22 09:48:13 +00:00
Xun Li	bd3ca6666d	[Inlining] Delete redundant optnone/alwaysinline check The same check is done in InlineCost: `8b0bd54d0e/llvm/lib/Analysis/InlineCost.cpp (L2537-L2552)` Also, doing a check on the callee here is confusing, because anything that deals with callee should be done in the inner loop where we proecss all calls from the same caller. Differential Revision: https://reviews.llvm.org/D95186	2021-01-21 18:38:10 -08:00
David Green	39db5753f9	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in `745bf6cf44`. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
Sanjay Patel	2f03528f5e	[SLP] rename reduction variable to avoid shadowing; NFC The code structure can likely be improved now that 'OperationData' is gone.	2021-01-21 16:02:38 -05:00
Anton Rapetov	bfec9148a0	Scalar: Don't visit constants in findInnerReductionPhi in LoopInterchange In LoopInterchange, `findInnerReductionPhi()` looks for reduction variables, which cannot be constants. Update it to return early in that case. This also addresses a blocker for removing use-lists from ConstantData, whose users could be spread across arbitrary modules in the same LLVMContext. Differential Revision: https://reviews.llvm.org/D94712	2021-01-21 12:33:06 -08:00
Sanjay Patel	d77753381f	[SLP] simplify reduction matching This is NFC-intended and removes the "OperationData" class which had become nothing more than a recurrence (reduction) type. I adjusted the matching logic to distinguish instructions from non-instructions - that's all that the "IsLeafValue" member was keeping track of.	2021-01-21 14:58:57 -05:00
Nikita Popov	65fd034b95	[FunctionAttrs] Infer willreturn for functions without loops If a function doesn't contain loops and does not call non-willreturn functions, then it is willreturn. Loops are detected by checking for backedges in the function. We don't attempt to handle finite loops at this point. Differential Revision: https://reviews.llvm.org/D94633	2021-01-21 20:29:33 +01:00
Sanjay Patel	070af1b788	[InstCombine] avoid crashing on attribute propagation In https://llvm.org/PR48810 , we are crashing while trying to propagate attributes from mempcpy (returns void*) to memcpy (returns nothing - void). We can avoid the crash by removing known incompatible attributes for the void return type. I'm not sure if this goes far enough (should we just drop all attributes since this isn't the same function?). We also need to audit other transforms in LibCallSimplifier to make sure there are no other cases that have the same problem. Differential Revision: https://reviews.llvm.org/D95088	2021-01-21 08:13:26 -05:00
Florian Hahn	bee486851c	[LoopUnswitch] Implement first version of partial unswitching. This patch applies the idea from D93734 to LoopUnswitch. It adds support for unswitching on conditions that are only invariant along certain paths through a loop. In particular, it targets conditions in the loop header that depend on values loaded from memory. If either path from the true or false successor through the loop does not modify memory, perform partial loop unswitching. That is, duplicate the instructions feeding the condition in the pre-header. Then unswitch on the duplicated condition. The condition is now known in the unswitched version for the 'invariant' path through the original loop. On caveat of this approach is that one of the loops created can be partially unswitched again. To avoid this behavior, `llvm.loop.unswitch.partial.disable` metadata is added to the unswitched loops, to avoid subsequent partial unswitching. If that's the approach to go, I can move the code handling the metadata kind into separate functions. This increases the cases we unswitch quite a bit in SPEC2006/SPEC2000 & MultiSource. It also allows us to eliminate a dead loop in SPEC2017's omnetpp ``` Tests: 236 Same hash: 170 (filtered out) Remaining: 66 Metric: loop-unswitch.NumBranches Program base patch diff test-suite...000/255.vortex/255.vortex.test 2.00 23.00 1050.0% test-suite...T2006/401.bzip2/401.bzip2.test 7.00 55.00 685.7% test-suite :: External/Nurbs/nurbs.test 5.00 26.00 420.0% test-suite...s-C/unix-smail/unix-smail.test 1.00 3.00 200.0% test-suite.../Prolangs-C++/ocean/ocean.test 1.00 3.00 200.0% test-suite...tions/lambda-0.1.3/lambda.test 1.00 3.00 200.0% test-suite...yApps-C++/PENNANT/PENNANT.test 2.00 5.00 150.0% test-suite...marks/Ptrdist/yacr2/yacr2.test 1.00 2.00 100.0% test-suite...lications/viterbi/viterbi.test 1.00 2.00 100.0% test-suite...plications/d/make_dparser.test 12.00 24.00 100.0% test-suite...CFP2006/433.milc/433.milc.test 14.00 27.00 92.9% test-suite.../Applications/lemon/lemon.test 7.00 12.00 71.4% test-suite...ce/Applications/Burg/burg.test 6.00 10.00 66.7% test-suite...T2006/473.astar/473.astar.test 16.00 26.00 62.5% test-suite...marks/7zip/7zip-benchmark.test 78.00 121.00 55.1% ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93764	2021-01-21 09:46:41 +00:00
Kazu Hirata	e53472de68	[Transforms] Use llvm::append_range (NFC)	2021-01-20 21:35:54 -08:00
Kazu Hirata	8f5da41c4d	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-20 21:35:52 -08:00
Dávid Bolvanský	bb3f169b59	[BuildLibcalls, Attrs] Support more variants of C++'s new, add attributes for C++'s delete Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95095	2021-01-21 00:12:37 +01:00
Mircea Trofin	ccec2cf1d9	Reland "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit `d97f776be5`. The original problem was due to build failures in shared lib builds. D95079 moved ImportedFunctionsInliningStatistics under Analysis, unblocking this.	2021-01-20 13:33:43 -08:00
Mircea Trofin	95ce32c787	[NFC] Move ImportedFunctionsInliningStatistics to Analysis This is related to D94982. We want to call these APIs from the Analysis component, so we can't leave them under Transforms. Differential Revision: https://reviews.llvm.org/D95079	2021-01-20 13:18:03 -08:00
Nikita Popov	1c6d1e57c1	[PredicateInfo] Handle logical and/or Teach PredicateInfo to handle logical and/or the same way as bitwise and/or. This allows handling logical and/or inside IPSCCP and NewGVN.	2021-01-20 21:03:07 +01:00
Nikita Popov	ca4ed1e7ae	[PredicateInfo] Generalize processing of conditions Branch/assume conditions in PredicateInfo are currently handled in a rather ad-hoc manner, with some arbitrary limitations. For example, an `and` of two `icmp`s will be handled, but an `and` of an `icmp` and some other condition will not. That also includes the case where more than two conditions and and'ed together. This patch makes the handling more general by looking through and/ors up to a limit and considering all kinds of conditions (though operands will only be taken for cmps of course). Differential Revision: https://reviews.llvm.org/D94447	2021-01-20 20:40:41 +01:00
Mircea Trofin	d97f776be5	Revert "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit `e8aec763a5`.	2021-01-20 11:19:34 -08:00
Mircea Trofin	e8aec763a5	[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor When using 2 InlinePass instances in the same CGSCC - one for other mandatory inlinings, the other for the heuristic-driven ones - the order in which the ImportedFunctionStats would be output-ed would depend on the destruction order of the inline passes, which is not deterministic. This patch moves the ImportedFunctionStats responsibility to the InlineAdvisor to address this problem. Differential Revision: https://reviews.llvm.org/D94982	2021-01-20 11:07:36 -08:00
Dávid Bolvanský	16d6e85271	[BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94850	2021-01-20 19:45:23 +01:00
Sanjay Patel	c09be0d2a0	[SLP] reduce reduction code for checking vectorizable ops; NFC This is another step towards removing `OperationData` and fixing FMF matching/propagation bugs when forming reductions.	2021-01-20 11:14:48 -05:00
Sanjay Patel	1c54112a57	[SLP] refactor more reduction functions; NFC We were able to remove almost all of the state from OperationData, so these don't make sense as members of that class - just pass the RecurKind in as a param. More streamlining is possible, but I'm trying to avoid logic/typo bugs while fixing this. Eventually, we should not need the `OperationData` class.	2021-01-20 11:14:48 -05:00
Sanjay Patel	8590d24543	[SLP] move reduction createOp functions; NFC We were able to remove almost all of the state from OperationData, so these don't make sense as members of that class - just pass the RecurKind in as a param.	2021-01-20 11:14:48 -05:00
Joseph Tremoulet	40cd262c43	Loop peeling: check that latch is conditional branch Loop peeling assumes that the loop's latch is a conditional branch. Add a check to canPeel that explicitly checks for this, and testcases that otherwise fail an assertion when trying to peel a loop whose back-edge is a switch case or the non-unwind edge of an invoke. Reviewed By: skatkov, fhahn Differential Revision: https://reviews.llvm.org/D94995	2021-01-20 11:01:16 -05:00
Chuanqi Xu	c1bc7981ba	[Coroutine] Remain alignment information when merging frame variables Summary: This is to address bug48712. The solution in this patch is that when we want to merge two variable a into the storage frame of variable b only if the alignment of a is multiple of b. There may be other strategies. But now I think they are hard to handle and benefit little. Or we can implement them in the future. Test-plan: check-llvm Reviewers: jmorse, lxfind, junparser Differential Revision: https://reviews.llvm.org/D94891	2021-01-20 18:59:00 +08:00
David Sherwood	255a507716	[NFC][InstructionCost] Use InstructionCost in lib/Transforms/IPO/IROutliner.cpp In places where we call a TTI.getXXCost() function I have changed the code to use InstructionCost instead of unsigned. This is in preparation for later on when we will change the TTI interfaces to return InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94427	2021-01-20 08:33:59 +00:00
Kazu Hirata	b023cdeacc	[llvm] Use llvm::all_of (NFC)	2021-01-19 20:19:17 -08:00
Kazu Hirata	8857202489	[llvm] Use llvm::find (NFC)	2021-01-19 20:19:14 -08:00
Juneyoung Lee	4479c0c2c0	Allow nonnull/align attribute to accept poison Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison. To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null. This makes many transformations like below legal: ``` %p = gep inbounds %x, 1 ; % p is non-null pointer or poison call void @f(%p) ; instcombine converts this to call void @f(nonnull %p) ``` Instead, this semantics makes propagation of `nonnull` to caller illegal. The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument. Having `noundef` attribute there re-allows this. ``` define void @f(i8* %p) { ; functionattr cannot mark %p nonnull here anymore call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p. ret void } ``` Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90529	2021-01-20 11:31:23 +09:00
Wei Mi	21b1ad0340	[SampleFDO] Add the support to split the function profiles with context into separate sections. For ThinLTO, all the function profiles without context has been annotated to outline functions if possible in prelink phase. In postlink phase, profile annotation in postlink phase is only meaningful for function profile with context. If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context. To have the profile splitting, we extend the ExtBinary format to support different section arrangement. It will be flexible to add other section layout in the future without the need to create new class inheriting from ExtBinary class. Differential Revision: https://reviews.llvm.org/D94435	2021-01-19 15:16:19 -08:00
Alexey Bataev	e463bd53c0	Revert "[SLP]Merge reorder and reuse shuffles." This reverts commit `438682de6a` to fix the bug with the reducing size of the resulting vector for the entry node with multiple users.	2021-01-19 11:48:04 -08:00
Mariya Podchishchaeva	7113de301a	[ScalarizeMaskedMemIntrin] Add missing dependency The pass has dependency on 'TargetTransformInfoWrapperPass', but the corresponding call to INITIALIZE_PASS_DEPENDENCY was missing. Differential Revision: https://reviews.llvm.org/D94916	2021-01-19 22:33:47 +03:00
Nikita Popov	21443381c0	Reapply [InstCombine] Replace one-use select operand based on condition Relative to the original change, this adds a check that the instruction on which we're replacing operands is safe to speculatively execute, because that's what we're effectively doing. We're executing the instruction with the replaced operand, which is fine if it's pure, but not fine if can cause side-effects or UB (aka is not speculatable). Additionally, we cannot (generally) replace operands in phi nodes, as these may refer to a different loop iteration. This is also covered by the speculation check. ----- InstCombine already performs a fold where X == Y ? f(X) : Z is transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, if f(X) only has one use, then we can always directly replace the use inside the instruction. To actually be profitable, limit it to the case where Y is a non-expr constant. This could be further extended to replace uses further up a one-use instruction chain, but for now this only looks one level up. Among other things, this also subsumes D94860. Differential Revision: https://reviews.llvm.org/D94862	2021-01-19 20:26:38 +01:00
Jeroen Dobbelaere	121cac01e8	[noalias.decl] Look through llvm.experimental.noalias.scope.decl Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93042	2021-01-19 20:09:42 +01:00
Hans Wennborg	58bdfcfac0	Revert `5238e7b302` "[InstCombine] Replace one-use select operand based on condition" This caused a miscompile in Chromium, see comments on the codereview for discussion and pointer to a reproducer. > InstCombine already performs a fold where X == Y ? f(X) : Z is > transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, > if f(X) only has one use, then we can always directly replace the > use inside the instruction. To actually be profitable, limit it to > the case where Y is a non-expr constant. > > This could be further extended to replace uses further up a one-use > instruction chain, but for now this only looks one level up. > > Among other things, this also subsumes D94860. > > Differential Revision: https://reviews.llvm.org/D94862 This also reverts the follow-up a003f26539cf4db744655e76c41f4c4a8913f116: > [llvm] Prevent infinite loop in InstCombine of select statements > > This fixes an issue where the RHS and LHS the comparison operation > creating the predicate were swapped back and forth forever. > > Differential Revision: https://reviews.llvm.org/D94934	2021-01-19 11:50:56 +01:00
Florian Hahn	83daa49758	[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands. D84108 exposed a bad interaction between inlining and loop-rotation during regular LTO, which is causing notable regressions in at least CINT2006/473.astar. The problem boils down to: we now rotate a loop just before the vectorizer which requires duplicating a function call in the preheader when compiling the individual files ('prepare for LTO'). But this then prevents further inlining of the function during LTO. This patch tries to resolve this issue by making LoopRotate more conservative with respect to rotating loops that have inline-able calls during the 'prepare for LTO' stage. I think this change intuitively improves the current situation in general. Loop-rotate tries hard to avoid creating headers that are 'too big'. At the moment, it assumes all inlining already happened and the cost of duplicating a call is equal to just doing the call. But with LTO, inlining also happens during full LTO and it is possible that a previously duplicated call is actually a huge function which gets inlined during LTO. From the perspective of LV, not much should change overall. Most loops calling user-provided functions won't get vectorized to start with (unless we can infer that the function does not touch memory, has no other side effects). If we do not inline the 'inline-able' call during the LTO stage, we merely delayed loop-rotation & vectorization. If we inline during LTO, chances should be very high that the inlined code is itself vectorizable or the user call was not vectorizable to start with. There could of course be scenarios where we inline a sufficiently large function with code not profitable to vectorize, which would have be vectorized earlier (by scalarzing the call). But even in that case, there probably is no big performance impact, because it should be mostly down to the cost-model to reject vectorization in that case. And then the version with scalarized calls should also not be beneficial. In a way, LV should have strictly more information after inlining and make more accurate decisions (barring cost-model issues). There is of course plenty of room for things to go wrong unexpectedly, so we need to keep a close look at actual performance and address any follow-up issues. I took a look at the impact on statistics for MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer loops rotated, but no change to the number of loops vectorized. Reviewed By: sanwou01 Differential Revision: https://reviews.llvm.org/D94232	2021-01-19 10:15:29 +00:00
Tres Popp	a003f26539	[llvm] Prevent infinite loop in InstCombine of select statements This fixes an issue where the RHS and LHS the comparison operation creating the predicate were swapped back and forth forever. Differential Revision: https://reviews.llvm.org/D94934	2021-01-19 10:31:48 +01:00
David Sherwood	c3ce262794	[NFC] Make remaining cost functions in LoopVectorize.cpp use InstructionCost A previous patch has already changed getInstructionCost to return an InstructionCost type. This patch changes the other various getXXXCost functions to return an InstructionCost too. This is a non-functional change - I've added a few asserts that the costs are valid in places where we're selecting between vector call and intrinsic costs. However, since we don't yet return invalid costs from any of the TTI implementations these asserts should not fire. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94065	2021-01-19 09:08:40 +00:00
Juneyoung Lee	2d89ebd5d1	Address unused variable warning	2021-01-19 09:30:16 +09:00
Juneyoung Lee	0441df94ad	[InstCombine,InstSimplify] Optimize select followed by and/or/xor This patch adds `A & (A && B)` -> `A && B` (similarly for or + logical or) Also, this patch adds `~(select C, (icmp pred X, Y), const)` -> `select C, (icmp pred' X, Y), ~const`. Alive2 proof: merge_and: https://alive2.llvm.org/ce/z/teMR97 merge_or: https://alive2.llvm.org/ce/z/b4yZUp xor_and: https://alive2.llvm.org/ce/z/_-TXHi xor_or: https://alive2.llvm.org/ce/z/2uYx_a Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94861	2021-01-19 09:14:17 +09:00
Juneyoung Lee	395c737d9f	[SimplifyCFG] Update SimplifyBranchOnICmpChain to recognize select form of and/or This patch teaches SimplifyCFG::SimplifyBranchOnICmpChain to understand select form of (x == C1 \|\| x == C2 \|\| ...) / (x != C1 && x != C2 && ...) and optimize them into switch if possible. D93065 has more context about the transition, including links to the list of optimizations being updated. Differential Revision: https://reviews.llvm.org/D93943	2021-01-19 08:53:40 +09:00
Sanjay Patel	5b77ac32b1	[SLP] match maxnum/minnum intrinsics as FP reduction ops After much refactoring over the last 2 weeks to the reduction matching code, I think this change is finally ready. We effectively broke fmax/fmin vector reduction optimization when we started canonicalizing to intrinsics in instcombine, so this should restore that functionality for SLP. There are still FMF problems here as noted in the code comments, but we should be avoiding miscompiles on those for fmax/fmin by restricting to full 'fast' ops (negative tests are included). Fixing FMF propagation is a planned follow-up. Differential Revision: https://reviews.llvm.org/D94913	2021-01-18 17:37:16 -05:00
Kazu Hirata	23b0ab2acb	[llvm] Use the default value of drop_begin (NFC)	2021-01-18 10:16:36 -08:00
Kazu Hirata	dc300beba7	[STLExtras] Add a default value to drop_begin This patch adds the default value of 1 to drop_begin. In the llvm codebase, 70% of calls to drop_begin have 1 as the second argument. The interface similar to with std::next should improve readability. This patch converts a couple of calls to drop_begin as examples. Differential Revision: https://reviews.llvm.org/D94858	2021-01-18 10:16:34 -08:00
Xun Li	1d04dc52dd	[Coroutine] Do not CoroElide if there are musttail calls This is to address https://bugs.llvm.org/show_bug.cgi?id=48626. When there are musttail calls that use parameters aliasing the newly created coroutine frame, the existing implementation will fatal. We simply cannot perform CoroElide in such cases. In theory a precise analysis can be done to check whether the parameters of the musttail call actually alias the frame, but it's very hard to do it before the transformation happens. Also in most cases the existence of musttail call is generated due to symmetric transfers, and in those cases alias analysis won't be able to tell that they don't alias anyway. Differential Revision: https://reviews.llvm.org/D94834	2021-01-18 09:06:21 -08:00
Sanjay Patel	3dbbadb8ef	[SLP] rename reduction query for min/max ops; NFC This will avoid confusion once we start matching min/max intrinsics. All of these hacks to accomodate cmp+sel idioms should disappear once we canonicalize to min/max intrinsics.	2021-01-18 09:32:57 -05:00
Sanjay Patel	d1c4e859ce	[SLP] reduce opcode API dependency in reduction cost calc; NFC The icmp opcode is now hard-coded in the cost model call. This will make it easier to eventually remove all opcode queries for min/max patterns as we transition to intrinsics.	2021-01-18 09:32:57 -05:00
Florian Hahn	e6d758de82	[InferAttrs] Mark some library functions as willreturn. This patch marks some library functions as willreturn. On the first pass, I excluded most functions that interact with streams/the filesystem. Along with willreturn, it also adds nounwind to a set of math functions. There probably are a few additional attributes we can add for those, but that should be done separately. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94684	2021-01-18 13:40:21 +00:00
Caroline Concatto	36710c38c1	[NFC]Migrate VectorCombine.cpp to use InstructionCost This patch changes these functions: vectorizeLoadInsert isExtractExtractCheap foldExtractedCmps scalarizeBinopOrCmp getShuffleExtract foldBitcastShuf to use the class InstructionCost when calling TTI.get<something>Cost(). This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174 ps.:This patch adds the test \|\| !NewCost.isValid(), because we want to return false when: !NewCost.isValid && !OldCost.isValid()->the cost to transform it expensive and !NewCost.isValid() && OldCost.isValid() Therefore for simplication we only add test for !NewCost.isValid() Differential Revision: https://reviews.llvm.org/D94069	2021-01-18 13:37:21 +00:00
Dávid Bolvanský	ed396212da	[InstCombine] Transform abs pattern using multiplication to abs intrinsic (PR45691) ``` unsigned r(int v) { return (1 \| -(v < 0)) * v; } `r` is equivalent to `abs(v)`. ``` ``` define <4 x i8> @src(<4 x i8> %0) { %1: %2 = ashr <4 x i8> %0, { 31, undef, 31, 31 } %3 = or <4 x i8> %2, { 1, 1, 1, undef } %4 = mul nsw <4 x i8> %3, %0 ret <4 x i8> %4 } => define <4 x i8> @tgt(<4 x i8> %0) { %1: %2 = icmp slt <4 x i8> %0, { 0, 0, 0, 0 } %3 = sub nsw <4 x i8> { 0, 0, 0, 0 }, %0 %4 = select <4 x i1> %2, <4 x i8> %3, <4 x i8> %0 ret <4 x i8> %4 } Transformation seems to be correct! ``` Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94874	2021-01-17 17:06:14 +01:00
Nikita Popov	5238e7b302	[InstCombine] Replace one-use select operand based on condition InstCombine already performs a fold where X == Y ? f(X) : Z is transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, if f(X) only has one use, then we can always directly replace the use inside the instruction. To actually be profitable, limit it to the case where Y is a non-expr constant. This could be further extended to replace uses further up a one-use instruction chain, but for now this only looks one level up. Among other things, this also subsumes D94860. Differential Revision: https://reviews.llvm.org/D94862	2021-01-16 23:25:02 +01:00
Roman Lebedev	32fc32317a	[SimplifyCFG] markAliveBlocks(): catchswitch: preserve PostDomTree When removing catchpad's from catchswitch, if that removes a successor, we need to record that in DomTreeUpdater. This fixes PostDomTree preservation failure in an existing test. This appears to be the single issue that i see in my current test coverage.	2021-01-17 01:21:05 +03:00
Sanjay Patel	49b96cd9ef	[SLP] remove opcode field from reduction data class This is NFC-intended and another step towards supporting intrinsics as reduction candidates. The remaining bits of the OperationData class do not make much sense as-is, so I will try to improve that, but I'm trying to take minimal steps because it's still not clear how this was intended to work.	2021-01-16 13:55:52 -05:00
Sanjay Patel	fcfcc3cc6b	[SLP] fix typos; NFC	2021-01-16 13:55:52 -05:00
Sanjay Patel	48dbac5b6b	[SLP] remove unnecessary use of 'OperationData' This is another NFC-intended patch to allow matching intrinsics (example: maxnum) as candidates for reductions. It's possible that the loop/if logic can be reduced now, but it's still difficult to understand how this all works.	2021-01-16 13:55:52 -05:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Mircea Trofin	e8049dc3c8	[NewPM][Inliner] Move the 'always inliner' case in the same CGSCC pass as 'regular' inliner Expanding from D94808 - we ensure the same InlineAdvisor is used by both InlinerPass instances. The notion of mandatory inlining is moved into the core InlineAdvisor: advisors anyway have to handle that case, so this change also factors out that a bit better. Differential Revision: https://reviews.llvm.org/D94825	2021-01-15 17:59:38 -08:00
Dávid Bolvanský	a1500105ee	[SimplifyCFG] Optimize CFG when null is passed to a function with nonnull argument Example: ``` __attribute__((nonnull,noinline)) char * pinc(char p) { return ++p; } char foo(bool b, char a) { return pinc(b ? 0 : a); } ``` optimize to ``` char foo(bool b, char *a) { return pinc(a); } ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94180	2021-01-15 23:53:43 +01:00
Sanjay Patel	ceb3cdccd0	[SLP] remove dead code in reduction matching; NFC To get into this block we had: !A \|\| B \|\| C and we checked C in the first 'if' clause leaving !A \|\| B. But the 2nd 'if' is checking: A && !B --> !(!A \|\| B)	2021-01-15 17:03:26 -05:00
Nick Desaulniers	ed0fd567eb	BreakCriticalEdges: do not split the critical edge from a CallBr indirect successor Otherwise we'll fail the assertion in SplitBlockPredecessors() related to splitting the edges from CallBr's. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1161 Fixes: https://github.com/ClangBuiltLinux/linux/issues/1252 Reviewed By: void, MaskRay, jyknight Differential Revision: https://reviews.llvm.org/D88438	2021-01-15 13:51:47 -08:00
Roman Lebedev	a14c36fe27	[SimplifyCFG] switchToSelect(): don't forget to insert DomTree edge iff needed DestBB might or might not already be a successor of SelectBB, and it wasn't we need to ensure that we record the fact in DomTree. The testcase used to crash in lazy domtree updater mode + non-per-function domtree validity checks disabled.	2021-01-15 23:35:57 +03:00
Roman Lebedev	c6654a4cda	[SimplifyCFG][BasicBlockUtils] Port SplitBlockPredecessors()/SplitLandingPadPredecessors() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	286cf6cb02	[SimplifyCFG] Port SplitBlockAndInsertIfThen() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	c845c724c2	[Utils][SimplifyCFG] Port SplitBlock() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	b81f75fa79	[Utils] splitBlockBefore() always operates on DomTreeUpdater, so take it, not DomTree Even though not all it's users operate on DomTreeUpdater, it itself internally operates on DomTreeUpdater, so it must mean everything is fine with that, so just do that globally.	2021-01-15 23:35:56 +03:00
Sanjay Patel	1f21de535d	[SLP] remove unused reduction functions; NFC These were made obsolete by simplifying the code in recent patches.	2021-01-15 14:59:33 -05:00
Jamie Schmeiser	17d0fb7f57	Set option default for enabling memory ssa for new pass manager loop sink pass to true. Summary: Set the default for the option enabling memory ssa use in the loop sink pass to true for the new pass manager. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D92486	2021-01-15 09:56:44 -05:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Kazu Hirata	2efcbe24a7	[llvm] Use llvm::drop_begin (NFC)	2021-01-14 20:30:33 -08:00
Kazu Hirata	9bcc0d1040	[CodeGen, Transforms] Use llvm::sort (NFC)	2021-01-14 20:30:31 -08:00
Sanjay Patel	b21905dfe3	[SLP] remove unnecessary state in matching reductions This is NFC-intended. I'm still trying to figure out how the loop where this is used works. It does not seem like we require this data at all, but it's hard to confirm given the complicated predicates.	2021-01-14 18:32:37 -05:00
Bjorn Pettersson	d58512b2e3	[SLP] Don't vectorize stores of non-packed types (like i1, i2) In the spirit of commit `fc783e91e0` (llvm-svn: 248943) we shouldn't vectorize stores of non-packed types (i.e. types that has padding between consecutive variables in a scalar layout, but being packed in a vector layout). The problem was detected as a miscompile in a downstream test case. Reviewed By: anton-afanasyev Differential Revision: https://reviews.llvm.org/D94446	2021-01-14 11:30:33 +01:00
Daniel Paoliello	ff5e896425	Fix unused variable in CoroFrame.cpp when building Release with GCC 10 When building with GCC 10, the following warning is reported: ``` /llvm-project/llvm/lib/Transforms/Coroutines/CoroFrame.cpp:1527:28: warning: unused variable ‘CS’ [-Wunused-variable] 1527 \| if (CatchSwitchInst *CS = ``` This change adds a cast to `void` to avoid the warning. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D94456	2021-01-13 22:53:25 -08:00
Kazu Hirata	125ea20d55	[llvm] Use llvm::stable_sort (NFC)	2021-01-13 19:14:43 -08:00
Kazu Hirata	5c1c39e8d8	[llvm] Use *Set::contains (NFC)	2021-01-13 19:14:41 -08:00
Wei Mi	86341247c4	[NFC] Rename ThinLTOPhase to ThinOrFullLTOPhase and move it from PassBuilder.h to Pass.h. In some compiler passes like SampleProfileLoaderPass, we want to know which LTO/ThinLTO phase the pass is in. Currently the phase is represented in enum class PassBuilder::ThinLTOPhase, so it is only available in PassBuilder and it also cannot represent phase in full LTO. The patch extends it to include full LTO phases and move it from PassBuilder.h to Pass.h, then it is much easier for PassBuilder to communiate with each pass about current LTO phase. Differential Revision: https://reviews.llvm.org/D94613	2021-01-13 15:55:40 -08:00
Arthur Eubanks	39e6d24237	[NewPM] Only non-trivially loop unswitch at -O3 and for non-optsize functions This matches the legacy pipeline/pass. Reviewed By: asbirlea, SjoerdMeijer Differential Revision: https://reviews.llvm.org/D94559	2021-01-13 14:54:49 -08:00
Kazu Hirata	fb98a1be43	Fix the warnings on unused variables (NFC)	2021-01-13 13:32:40 -08:00
Sanjay Patel	123674a816	[SLP] simplify type check for reductions This is NFC-intended. The 'valid' call allows int/FP/pointers for other parts of SLP. The difference here is that we can't reduce pointers.	2021-01-13 13:30:46 -05:00
Andrew Litteken	05b1a15f70	[IROutliner] Adapting to hoisted bitcasts in CodeExtractor In commit `700d2417d8` the CodeExtractor was updated so that bitcasts that have lifetime markers that beginning outside of the region are deduplicated outside the region and are not used as an output. This caused a discrepancy in the IROutliner, where in these cases there were arguments added to the aggregate function that were not needed causing assertion errors. The IROutliner queries the CodeExtractor twice to determine the inputs and outputs, before and after `findAllocas` is called with the same ValueSet for the outputs causing the duplication. This has been fixed with a dummy ValueSet for the first call. However, the additional bitcasts prevent us from using the same similarity relationships that were previously defined by the IR Similarity Analysis Pass. In these cases, we check whether the initial version of the region being analyzed for outlining is still the same as it was previously. If it is not, i.e. because of the additional bitcast instructions from the CodeExtractor, we discard the region. Reviewers: yroux Differential Revision: https://reviews.llvm.org/D94303	2021-01-13 11:10:37 -06:00
Nikita Popov	17863614da	[InstCombine] Fold select -> and/or using impliesPoison We can fold a ? b : false to a & b if is_poison(b) implies that is_poison(a), at which point we're able to reuse all the usual fold on ands. In particular, this covers the very common case of icmp X, C && icmp X, C'. The same applies to ors. This currently only has an effect if the -instcombine-unsafe-select-transform=0 option is set. Differential Revision: https://reviews.llvm.org/D94550	2021-01-13 17:45:40 +01:00
David Sherwood	4cd48535ec	[NFC][InstructionCost] Use InstructionCost in Transforms/Scalar/RewriteStatepointsForGC.cpp In places where we calculate costs using TTI.getXXXCost() interfaces I have changed the code to use InstructionCost instead of unsigned. The change is non functional since InstructionCost behaves in the same way as an integer for valid costs. Currently the getXXXCost() functions used in this file do not return invalid costs. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential revision: https://reviews.llvm.org/D94484	2021-01-13 09:42:58 +00:00
Kazu Hirata	8a20e2b3d3	[llvm] Use Optional::getValueOr (NFC)	2021-01-12 21:43:50 -08:00
Kazu Hirata	12fc9ca3a4	[llvm] Remove redundant string initialization (NFC) Identified with readability-redundant-string-init.	2021-01-12 21:43:46 -08:00
Yuanfang Chen	5c7dcd7aea	[Coroutine] Update promise object's final layout index promise is a header field but it is not guaranteed that it would be the third field of the frame due to `performOptimizedStructLayout`. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D94137	2021-01-12 17:44:02 -08:00
Luo, Yuanke	055644cc45	[X86][AMX] Prohibit pointer cast on load. The load/store instruction will be transformed to amx intrinsics in the pass of AMX type lowering. Prohibiting the pointer cast make that pass happy. Differential Revision: https://reviews.llvm.org/D94372	2021-01-13 09:39:19 +08:00
Hongtao Yu	175288a1af	Add sample-profile-suffix-elision-policy attribute with -funique-internal-linkage-names. Adding sample-profile-suffix-elision-policy attribute to functions whose linkage names are uniquefied so that their unique name suffix won't be trimmed when applying AutoFDO profiles. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D94455	2021-01-12 15:15:53 -08:00
modimo	2a49b7c64a	[Inliner] Change inline remark format and update ReplayInlineAdvisor to use it This change modifies the source location formatting from: LineNumber.Discriminator to: LineNumber:ColumnNumber.Discriminator The motivation here is to enhance location information for inline replay that currently exists for the SampleProfile inliner. This will be leveraged further in inline replay for the CGSCC inliner in the related diff. The ReplayInlineAdvisor is also modified to read the new format and now takes into account the callee for greater accuracy. Testing: ninja check-llvm Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D94333	2021-01-12 13:43:48 -08:00
Nikita Popov	23390e7a13	[InstCombine] Handle logical and/or in assume optimization assume(a && b) can be converted to assume(a); assume(b) even if the condition is logical. Same for assume(!(a \|\| b)).	2021-01-12 22:36:40 +01:00
Sanjay Patel	9e7895a868	[SLP] reduce code duplication while processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	92fb5c49e8	[SLP] rename variable to improve readability; NFC The OperationData in the 2nd block (visiting the operands) is completely independent of the 1st block.	2021-01-12 16:03:57 -05:00
Sanjay Patel	554be30a42	[SLP] reduce code duplication in processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	46507a96fc	[SLP] reduce code duplication while matching reductions; NFC	2021-01-12 16:03:57 -05:00
Philip Reames	caafdf07bb	[LV] Weaken spuriously strong assert in LoopVersioning LoopVectorize uses some utilities on LoopVersioning, but doesn't actually use it for, you know, versioning. As a result, the precondition LoopVersioning expects is too strong for this user. At the moment, LoopVectorize supports any loop with a unique exit block, so check the same precondition here. Really, the whole class structure here is a mess. We should separate the actual versioning from the metadata updates, but that's a bigger problem.	2021-01-12 12:57:13 -08:00
Philip Reames	9f61fbd75a	[LV] Relax assumption that LCSSA implies single entry This relates to the ongoing effort to support vectorization of multiple exit loops (see D93317). The previous code assumed that LCSSA phis were always single entry before the vectorizer ran. This was correct, but only because the vectorizer allowed only a single exiting edge. There's nothing in the definition of LCSSA which requires single entry phis. A common case where this comes up is with a loop with multiple exiting blocks which all reach a common exit block. (e.g. see the test updates) Differential Revision: https://reviews.llvm.org/D93725	2021-01-12 12:34:52 -08:00
Florian Hahn	6cd44b204c	[FunctionAttrs] Derive willreturn for fns with readonly` & `mustprogress`. Similar to D94125, derive `willreturn` for functions that are `readonly` and `mustprogress` in FunctionAttrs. To quote the reasoning from D94125: Since D86233 we have `mustprogress` which, in combination with `readonly`, implies `willreturn`. The idea is that every side-effect has to be modeled as a "write". Consequently, `readonly` means there is no side-effect, and `mustprogress` guarantees that we cannot "loop" forever without side-effect. Reviewed By: jdoerfert, nikic Differential Revision: https://reviews.llvm.org/D94502	2021-01-12 20:02:34 +00:00
Dávid Bolvanský	0529946b5b	[instCombine] Add (A ^ B) \| ~(A \| B) -> ~(A & B) define i32 @src(i32 %x, i32 %y) { %0: %xor = xor i32 %y, %x %or = or i32 %y, %x %neg = xor i32 %or, 4294967295 %or1 = or i32 %xor, %neg ret i32 %or1 } => define i32 @tgt(i32 %x, i32 %y) { %0: %and = and i32 %x, %y %neg = xor i32 %and, 4294967295 ret i32 %neg } Transformation seems to be correct! https://alive2.llvm.org/ce/z/Cvca4a	2021-01-12 19:29:17 +01:00
Quentin Colombet	905623b64d	[NFC][LICM] Minor improvements to debug output Added a utility function in Value class to print block name and use block labels for unnamed blocks. Changed LICM to call this function in its debug output. Patch by Xiaoqing Wu <xiaoqing_wu@apple.com> Differential Revision: https://reviews.llvm.org/D93577	2021-01-11 18:02:49 -08:00
Roman Lebedev	ec8a6c11db	[SimplifyCFGPass] iterativelySimplifyCFG(): support lazy DomTreeUpdater This boils down to how we deal with early-increment iterator over function's basic blocks: not only we need to early-increment, after that we also need to skip all the blocks that are scheduled for removal, as per DomTreeUpdater.	2021-01-12 02:09:47 +03:00
Roman Lebedev	81afeacd37	[SimplifyCFGPass] mergeEmptyReturnBlocks(): skip blocks scheduled for removal as per DomTreeUpdater Thus supporting lazy DomTreeUpdater mode, where the domtree updates (and thus block removals) aren't applied immediately, but are delayed until last possible moment.	2021-01-12 02:09:47 +03:00
Roman Lebedev	90a92f8b4d	[NFCI][Utils/Local] removeUnreachableBlocks(): cleanup support for lazy DomTreeUpdater When DomTreeUpdater is in lazy update mode, the blocks that were scheduled to be removed, won't be removed until the updates are flushed, e.g. by asking DomTreeUpdater for a up-to-date DomTree. From the function's current code, it is pretty evident that the support for the lazy mode is an afterthought, see e.g. how we roll-back NumRemoved statistic.. So instead of considering all the unreachable blocks as the blocks-to-be-removed, simply additionally skip all the blocks that are already scheduled to be removed	2021-01-12 02:09:47 +03:00
Roman Lebedev	f9ba347706	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): don't insert a DomTree edge if it already exists When we are adding edges to the terminator and potentially turning it into a switch (if it wasn't already), it is possible that the case we're adding will share it's destination with one of the preexisting cases, in which case there is no domtree edge to add. Indeed, this change does not have a test coverage change. This failure has been exposed in an existing test coverage by a follow-up patch that switches to lazy domtreeupdater mode, and removes domtree verification from SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(), IOW it does not appear feasible to add dedicated test coverage here.	2021-01-12 02:09:47 +03:00
Roman Lebedev	c0de0a1b72	[SimplifyCFG] SimplifyBranchOnICmpChain(): don't insert a DomTree edge that already exists BB was already always branching to EdgeBB, there is no edge to add. Indeed, this change does not have a test coverage change. This failure has been exposed in an existing test coverage by a follow-up patch that switches to lazy domtreeupdater mode, and removes domtree verification from SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(), IOW it does not appear feasible to add dedicated test coverage here.	2021-01-12 02:09:46 +03:00
Roman Lebedev	c22bc5f1f8	[SimplifyCFG] SwitchToLookupTable(): don't insert a DomTree edge that already exists SI is the terminator of BB, so the edge we are adding obviously already existed. Indeed, this change does not have a test coverage change. This failure has been exposed in an existing test coverage by a follow-up patch that switches to lazy domtreeupdater mode, and removes domtree verification from SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(), IOW it does not appear feasible to add dedicated test coverage here.	2021-01-12 02:09:46 +03:00
Hongtao Yu	32bcfcda4e	Rename debug linkage name with -funique-internal-linkage-names Functions that are renamed under -funique-internal-linkage-names have their debug linkage name updated as well. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D93747	2021-01-11 13:56:07 -08:00
Sanjay Patel	288f3fc5df	[InstCombine] reduce icmp(ashr X, C1), C2 to sign-bit test This is a more basic pattern that we should handle before trying to solve: https://llvm.org/PR48640 There might be a better way to think about this because the pre-condition that I came up with (number of sign bits in the compare constant) misses a potential transform for each of ugt and ult as commented on in the test file. Tried to model this is in Alive: https://rise4fun.com/Alive/juX1 ...but I couldn't get the ComputeNumSignBits() pre-condition to work as expected, so replaced with leading 0/1 preconditions instead. Name: ugt Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1 %a = ashr %x, C1 %r = icmp ugt i8 %a, C2 => %r = icmp slt i8 %x, 0 Name: ult Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1 %a = ashr %x, C1 %r = icmp ult i4 %a, C2 => %r = icmp sgt i4 %x, -1 Also approximated in Alive2: https://alive2.llvm.org/ce/z/u5hCcz https://alive2.llvm.org/ce/z/__szVL Differential Revision: https://reviews.llvm.org/D94014	2021-01-11 15:53:39 -05:00
Sriraman Tallam	d8c6d24359	-funique-internal-linkage-names appends a hex md5hash suffix to the symbol name which is not demangler friendly, convert it to decimal. Please see D93747 for more context which tries to make linkage names of internal linkage functions to be the uniqueified names. This causes a problem with gdb because breaking using the demangled function name will not work if the new uniqueified name cannot be demangled. The problem is the generated suffix which is a mix of integers and letters which do not demangle. The demangler accepts either all numbers or all letters. This patch simply converts the hash to decimal. There is no loss of uniqueness by doing this as the precision is maintained. The symbol names get longer by a few characters though. Differential Revision: https://reviews.llvm.org/D94154	2021-01-11 11:10:29 -08:00
Giorgis Georgakoudis	9751705512	[OpenMPOpt][WIP] Expand parallel region merging The existing implementation of parallel region merging applies only to consecutive parallel regions that have speculatable sequential instructions in-between. This patch lifts this limitation to expand merging with any sequential instructions in-between, except calls to unmergable OpenMP runtime functions. In-between sequential instructions in the merged region are sequentialized in a "master" region and any output values are broadcasted to the following parallel regions and the sequential region continuation of the merged region. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90909	2021-01-11 08:06:23 -08:00
Florian Hahn	eb0371e403	[VPlan] Unify value/recipe printing after VPDef transition. This patch unifies the way recipes and VPValues are printed after the transition to VPDef. VPSlotTracker has been updated to iterate over all recipes and all their defined values to number those. There is no need to number values in Value2VPValue. It also updates a few places that only used slot numbers for VPInstruction. All recipes now can produce numbered VPValues.	2021-01-11 14:42:46 +00:00
Florian Hahn	a94497a342	[VPlan] Move initial quote emission from ::print to ::dumpBasicBlock. This means there will be no stray " when printing individual recipes using print()/dump() in a debugger, for example.	2021-01-11 12:22:15 +00:00
Bjorn Pettersson	675be65106	Require chained analyses in BasicAA and AAResults to be transitive This patch fixes a bug that could result in miscompiles (at least in an OOT target). The problem could be seen by adding checks that the DominatorTree used in BasicAliasAnalysis and ValueTracking was valid (e.g. by adding DT->verify() call before every DT dereference and then running all tests in test/CodeGen). Problem was that the LegacyPassManager calculated "last user" incorrectly for passes such as the DominatorTree when not telling the pass manager that there was a transitive dependency between the different analyses. And then it could happen that an incorrect dominator tree was used when doing alias analysis (which was a pretty serious bug as the alias analysis result could be invalid). Fixes: https://bugs.llvm.org/show_bug.cgi?id=48709 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94138	2021-01-11 11:50:07 +01:00
David Sherwood	40abeb11f4	[NFC][InstructionCost] Change LoopVectorizationCostModel::getInstructionCost to return InstructionCost This patch is part of a series of patches that migrate integer instruction costs to use InstructionCost. In the function selectVectorizationFactor I have simply asserted that the cost is valid and extracted the value as is. In future we expect to encounter invalid costs, but we should filter out those vectorization factors that lead to such invalid costs. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D92178	2021-01-11 09:22:37 +00:00
David Sherwood	b7ccaca537	[NFC] Remove min/max functions from InstructionCost Removed the InstructionCost::min/max functions because it's fine to use std::min/max instead. Differential Revision: https://reviews.llvm.org/D94301	2021-01-11 09:00:12 +00:00
Serguei Katkov	7f69860243	[LoopUnroll] Fix a crash Loop peeling as a last step triggers loop simplification and this can change the loop structure. As a result all cashed values like latch branch becomes invalid. Patch re-structure the code to take into account the possible changes caused by peeling. Reviewers: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour Reviewed By: Meinersbur, fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D93686	2021-01-11 10:19:26 +07:00
Philip Reames	4739dd67e7	[LoopDeletion] Break backedge of outermost loops when known not taken This is a resubmit of `dd6bb367` (which was reverted due to stage2 build failures in `7c63aac`), with the additional restriction added to the transform to only consider outer most loops. As shown in the added test case, ensuring LCSSA is up to date when deleting an inner loop is tricky as we may actually need to remove blocks from any outer loops, thus changing the exit block set. For the moment, just avoid transforming this case. I plan to return to this case in a follow up patch and see if we can do better. Original commit message follows... The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects. Differential Revision: https://reviews.llvm.org/D93906	2021-01-10 16:02:33 -08:00
Roman Lebedev	8e8d214c4a	[NFCI][SimplifyCFG] Prefer to add Insert edges before Delete edges into DomTreeUpdater, if reasonable This has a measurable impact on the number of DomTree recalculations. While this doesn't handle all the cases, it deals with the most obvious ones.	2021-01-11 00:30:44 +03:00
Sanjay Patel	3f09c77d33	[SLP] fix typo in assert This snuck into `0aa75fb12f` , but I didn't catch it locally.	2021-01-10 13:15:04 -05:00
Sanjay Patel	0aa75fb12f	[SLP] put verifyFunction call behind EXPENSIVE_CHECKS A severe compile-time slowdown from this call is noted in: https://llvm.org/PR48689 My naive fix was to put it under LLVM_DEBUG ( `267ff79` ), but that's not limiting in the way we want. This is a quick fix (or we could just remove the call completely and rely on some later pass to discover potentially wrong IR?). A bigger/better fix would be to improve/limit verifyFunction() as noted in: https://llvm.org/PR47712 Differential Revision: https://reviews.llvm.org/D94328	2021-01-10 12:32:21 -05:00
Florian Hahn	c701f85c45	[STLExtras] Use return type from operator* of the wrapped iter. Currently make_early_inc_range cannot be used with iterators with operator* implementations that do not return a reference. Most notably in the LLVM codebase, this means the User iterator ranges cannot be used with make_early_inc_range, which slightly simplifies iterating over ranges while elements are removed. Instead of directly using BaseT::reference as return type of operator, this patch uses decltype to get the actual return type of the operator implementation in WrappedIteratorT. This patch also updates a few places to use make use of make_early_inc_range. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D93992	2021-01-10 14:41:13 +00:00
Florian Hahn	d98fc62ae6	[SimplifyCFG] Keep !dgb metadata of moved instruction, if they match. Currently SimplifyCFG drops the debug locations of 'bonus' instructions. Such instructions are moved before the first branch. The reason for the current behavior is that this could lead to surprising debug stepping, if the block that's folded is dead. In case the first branch and the instructions to be folded have the same debug location, this shouldn't be an issue and we can keep the debug location. Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D93662	2021-01-09 19:15:16 +00:00
Kazu Hirata	6a6e382161	[llvm] Drop unnecessary make_range (NFC)	2021-01-09 09:25:00 -08:00
Kazu Hirata	4d92ab1669	[Transforms] Use llvm::find_if (NFC)	2021-01-09 09:24:58 -08:00
Kazu Hirata	9a7c03b800	[SCEV] Remove unused getOrInsertCanonicalInductionVariable (NFC) The last use was removed on Mar 22, 2012 in commit `f47d0af551`.	2021-01-09 09:24:56 -08:00
Florian Hahn	65f578fc0e	[VPlan] Keep start value of VPWidenPHIRecipe as VPValue. Similar to D92129, update VPWidenPHIRecipe to manage the start value as VPValue. This allows adjusting the start value as a VPlan transform, which will be used in a follow-up patch to support reductions during epilogue vectorization. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D93975	2021-01-09 16:34:15 +00:00
Kazu Hirata	f62b93b9a2	[SCEV] Remove unused getExactExistingExpansion (NFC) The last use was removed on Sep 4, 2018 in commit `2cbba56337`.	2021-01-08 18:39:57 -08:00
Kazu Hirata	b7c5e0b02c	[Target, Transforms] Use *Set::contains (NFC)	2021-01-08 18:39:54 -08:00
Arthur Eubanks	756dd70766	[NewPM] Run ObjC ARC passes Match the legacy PM in running various ObjC ARC passes. This requires making some module passes into function passes. These were initially ported as module passes since they add function declarations (e.g. https://reviews.llvm.org/D86178), but that's still up for debate and other passes do so. Reviewed By: ahatanak Differential Revision: https://reviews.llvm.org/D93743	2021-01-08 15:47:11 -08:00
Florian Hahn	c493e9216b	[VPlan] Move reduction start value creation to widenPHIRecipe. This was suggested to prepare for D93975. By moving the start value creation to widenPHInstruction, we set the stage to manage the start value directly in VPWidenPHIRecipe, which be used subsequently to set the 'resume' value for reductions during epilogue vectorization. It also moves RdxDesc to the recipe, so we do not have to rely on Legal to look it up later. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D94175	2021-01-08 17:49:43 +00:00
Alexander Belyaev	bcbdeafa9c	Revert "[SLP]Need shrink the load vector after reordering." This reverts commit `4284afdf94`. This changes computed values in fused_batchnorm_test_cpu. Not equal to tolerance rtol=1e-06, atol=0.001 Mismatched value: a is different from b. not close where = (array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]), array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5])) not close lhs = [-0.6636615 -0.9804948 -1.148275 -0.68193716 -0.8572368 -0.65046215 -0.6993756 -1.2244141 -1.0938729 -0.50369143 -0.51830524 -0.738452 -0.7214286 -0.48115745 -0.9380924 -0.9341769 -0.5916775 -1.2896856 -0.7264182 -0.9746917 -0.783249 -0.7659018 -0.86214024 -0.47784212] not close rhs = [ 0.44102234 0.12418899 -0.04359123 0.42274666 0.24744703 0.45422167 0.40530816 -0.11973029 0.01081094 0.6009924 0.5863786 0.3662318 0.38325527 0.62352633 0.1665914 0.1705069 0.5130063 -0.18500176 0.37826565 0.12999213 0.3214348 0.338782 0.24254355 0.62684166] not close dif = [1.1046839 1.1046838 1.1046838 1.1046839 1.1046839 1.1046839 1.1046838 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046838 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046838 1.1046838] not close tol = [0.00100044 0.00100012 0.00100004 0.00100042 0.00100025 0.00100045 0.00100041 0.00100012 0.00100001 0.0010006 0.00100059 0.00100037 0.00100038 0.00100062 0.00100017 0.00100017 0.00100051 0.00100019 0.00100038 0.00100013 0.00100032 0.00100034 0.00100024 0.00100063]	2021-01-08 14:42:26 +01:00
Sanjay Patel	267ff7901c	[SLP] limit verifyFunction to debug build (PR48689) As noted in PR48689, the verifier may have some kind of exponential behavior that should be addressed separately. For now, only run it in debug mode to prevent problems for release+asserts. That limit is what we had before D80401, and I'm not sure if there was a reason to change it in that patch.	2021-01-08 08:10:17 -05:00
Cullen Rhodes	1e7efd397a	[LV] Legalize scalable VF hints In the following loop: void foo(int a, int b, int N) { for (int i=0; i<N; ++i) a[i + 4] = a[i] + b[i]; } The loop dependence constrains the VF to a maximum of (4, fixed), which would mean using <4 x i32> as the vector type in vectorization. Extending this to scalable vectorization, a VF of (4, scalable) implies a vector type of <vscale x 4 x i32>. To determine if this is legal vscale must be taken into account. For this example, unless max(vscale)=1, it's unsafe to vectorize. For SVE, the number of bits in an SVE register is architecturally defined to be a multiple of 128 bits with a maximum of 2048 bits, thus the maximum vscale is 16. In the loop above it is therefore unfeasible to vectorize with SVE. However, in this loop: void foo(int a, int b, int N) { #pragma clang loop vectorize_width(X, scalable) for (int i=0; i<N; ++i) a[i + 32] = a[i] + b[i]; } As long as max(vscale) multiplied by the number of lanes 'X' doesn't exceed the dependence distance, it is safe to vectorize. For SVE a VF of (2, scalable) is within this constraint, since a vector of <16 x 2 x 32> will have no dependencies between lanes. For any number of lanes larger than this it would be unsafe to vectorize. This patch extends 'computeFeasibleMaxVF' to legalize scalable VFs specified as loop hints, implementing the following behaviour: * If the backend does not support scalable vectors, ignore the hint. * If scalable vectorization is unfeasible given the loop dependence, like in the first example above for SVE, then use a fixed VF. * Accept scalable VFs if it's safe to do so. * Otherwise, clamp scalable VFs that exceed the maximum safe VF. Reviewed By: sdesmalen, fhahn, david-arm Differential Revision: https://reviews.llvm.org/D91718	2021-01-08 10:49:44 +00:00
David Green	72fb5ba079	[LV] Don't sink into replication regions The new test case here contains a first order recurrences and an instruction that is replicated. The first order recurrence forces an instruction to be sunk _into_, as opposed to after the replication region. That causes several things to go wrong including registering vector instructions multiple times and failing to create dominance relations correctly. Instead we should be sinking to after the replication region, which is what this patch makes sure happens. Differential Revision: https://reviews.llvm.org/D93629	2021-01-08 09:50:10 +00:00
Kazu Hirata	33bf1cad75	[llvm] Use *Set::contains (NFC)	2021-01-07 20:29:34 -08:00
Ruiling Song	8dddcc762d	[Cloning] Copy metadata of global declarations We have modules with metadata on declarations, and out-of-tree passes use that metadata, and we need to clone those modules. We really expect such metadata is kept during the clone operation. Reviewed by: arsenm, aprantl Differential Revision: https://reviews.llvm.org/D93451	2021-01-08 08:21:18 +08:00
Roman Lebedev	f2f81c554b	[SimplifyCFG] markAliveBlocks(): switch to non-permissive DomTree updates No actual changes needed, invoke can't have the same block as an unwind destination and a normal destination.	2021-01-08 02:15:27 +03:00
Roman Lebedev	d59f97bb3a	[SimplifyCFG] removeUnwindEdge(): switch to non-permissive DomTree updates No actual changes needed, Catchswitch cannot unwind to one of its catchpads.	2021-01-08 02:15:27 +03:00
Roman Lebedev	f0eba8ce2d	[SimplifyCFG] changeToCall(): switch to non-permissive DomTree updates No actual changes needed, normal and unwind destinations of an invoke can never be identical.	2021-01-08 02:15:27 +03:00
Roman Lebedev	be0a31d13b	[SimplifyCFG] DeleteDeadBlocks(): switch to non-permissive DomTree updates No actual changes needed, DetatchDeadBlocks() was already doing the right thing.	2021-01-08 02:15:27 +03:00
Roman Lebedev	66189212bb	[SimplifyCFG] MergeBlockIntoPredecessor(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same successor more than once.	2021-01-08 02:15:26 +03:00
Roman Lebedev	05adc73db0	[SimplifyCFG] changeToUnreachable(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same predecessor more than once.	2021-01-08 02:15:26 +03:00
Roman Lebedev	7600d7c7be	[SimplifyCFG] removeUnreachableBlocks(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same predecessor more than once.	2021-01-08 02:15:26 +03:00
Roman Lebedev	1f9b591ee6	[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same predecessor more than once.	2021-01-08 02:15:25 +03:00
Roman Lebedev	b3822728fa	[SimplifyCFG] ConstantFoldTerminator(): switch to non-permissive DomTree updates in `indirectbr` handling ... which requires not deleting edges that were just deleted already.	2021-01-08 02:15:25 +03:00
Roman Lebedev	36593a30a4	[SimplifyCFG] ConstantFoldTerminator(): switch to non-permissive DomTree updates in `SwitchInst` handling ... which requires not deleting edges that will still be present.	2021-01-08 02:15:24 +03:00
Roman Lebedev	16ab8e5f6d	[SimplifyCFG] ConstantFoldTerminator(): handle matching destinations of condbr earlier We need to handle this case before dealing with the case of constant branch condition, because if the destinations match, latter fold would try to remove the DomTree edge that would still be present. This allows to make that particular DomTree update non-permissive	2021-01-08 02:15:24 +03:00
Arthur Eubanks	1a2eaebc09	[CoroSplit][NewPM] Don't call LazyCallGraph functions to split when no clones Apparently there can be no clones, as happens in coro-retcon-unreachable.ll. The alternative is to allow no split functions in addSplitRefRecursiveFunctions(), but it seems better to have the caller make sure it's not accidentally splitting no functions out. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D94258	2021-01-07 14:06:35 -08:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Sanjay Patel	4c7148d75c	[SLP] remove opcode identifier for reduction; NFC Another step towards allowing intrinsics in reduction matching.	2021-01-07 14:07:27 -05:00
Hiroshi Yamauchi	cf5415c727	[PGO][PGSO] Let unroll hints take precedence over PGSO. Differential Revision: https://reviews.llvm.org/D94199	2021-01-07 10:10:31 -08:00
Roman Lebedev	6be1fd6b20	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): drop reachable errneous assert I have added it in `d15d81c` because it seemed correct, was holding for all the tests so far, and was validating the fix added in the same commit, but as David Major is pointing out (with a reproducer), the assertion isn't really correct after all. So remove it. Note that the `d15d81c` still fine.	2021-01-07 18:05:04 +03:00
Sidharth Baveja	048f184ee4	[SplitEdge] Add new parameter to SplitEdge to name the newly created basic block Summary: Currently SplitEdge does not support passing in parameter which allows you to name the newly created BasicBlock. This patch updates the function such that the name of the block can be passed in, if users of this utility decide to do so. Reviewed By: Whitney, bmahjour, asbirlea, jamieschmeiser Differential Revision: https://reviews.llvm.org/D94176	2021-01-07 14:49:23 +00:00
Alexey Bataev	4284afdf94	[SLP]Need shrink the load vector after reordering. After merging the shuffles, we cannot rely on the previous shuffle anymore and need to shrink the final shuffle, if it is required. Reported in D92668 Differential Revision: https://reviews.llvm.org/D93967	2021-01-07 04:50:48 -08:00
Oliver Stannard	76f6b125ce	Revert "[llvm] Use BasicBlock::phis() (NFC)" Reverting because this causes crashes on the 2-stage buildbots, for example http://lab.llvm.org:8011/#/builders/7/builds/1140. This reverts commit `9b228f107d`.	2021-01-07 09:43:33 +00:00
Kazu Hirata	cfeecdf7b6	[llvm] Use llvm::all_of (NFC)	2021-01-06 18:27:36 -08:00
Kazu Hirata	9b228f107d	[llvm] Use BasicBlock::phis() (NFC)	2021-01-06 18:27:35 -08:00
Alina Sbirlea	63aeaf754a	[DominatorTree] Add support for mixed pre/post CFG views. Add support for mixed pre/post CFG views. Update usages of the MemorySSAUpdater to use the new DT API by requesting the DT updates to be done by the MSSAUpdater. Differential Revision: https://reviews.llvm.org/D93371	2021-01-06 14:53:09 -08:00
Sanjay Patel	4c022b5a41	[SLP] use reduction kind's opcode to create new instructions; NFC Similar to `5a1d31a28` - This should be no-functional-change because the reduction kind opcodes are 1-for-1 mappings to the instructions we are matching as reductions. But we want to remove the need for the `OperationData` opcode field because that does not work when we start matching intrinsics (eg, maxnum) as reduction candidates.	2021-01-06 14:37:44 -05:00
Sanjay Patel	5d24089a70	[SLP] reduce code for propagating flags on reductions; NFC If we add/change to match intrinsics, this might get more wordy, but there's no need to list each kind currently.	2021-01-06 14:37:44 -05:00
Arthur Eubanks	7fea561eb1	[CGSCC][Coroutine][NewPM] Properly support function splitting/outlining Previously when trying to support CoroSplit's function splitting, we added in a hack that simply added the new function's node into the original function's SCC (https://reviews.llvm.org/D87798). This is incorrect since it might be in its own SCC. Now, more similar to the previous design, we have callers explicitly notify the LazyCallGraph that a function has been split out from another one. In order to properly support CoroSplit, there are two ways functions can be split out. One is the normal expected "outlining" of one function into a new one. The new function may only contain references to other functions that the original did. The original function must reference the new function. The new function may reference the original function, which can result in the new function being in the same SCC as the original function. The weird case is when the original function indirectly references the new function, but the new function directly calls the original function, resulting in the new SCC being a parent of the original function's SCC. This form of function splitting works with CoroSplit's Switch ABI. The second way of splitting is more specific to CoroSplit. CoroSplit's Retcon and Async ABIs split the original function into multiple functions that all reference each other and are referenced by the original function. In order to keep the LazyCallGraph in a valid state, all new functions must be processed together, else some nodes won't be populated. To keep things simple, this only supports the case where all new edges are ref edges, and every new function references every other new function. There can be a reference back from any new function to the original function, putting all functions in the same RefSCC. This also adds asserts that all nodes in a (Ref)SCC can reach all other nodes to prevent future incorrect hacks. The original hacks in https://reviews.llvm.org/D87798 are no longer necessary since all new functions should have been registered before calling updateCGAndAnalysisManagerForPass. This fixes all coroutine tests when opt's -enable-new-pm is true by default. This also fixes PR48190, which was likely due to the previous hack breaking SCC invariants. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D93828	2021-01-06 11:19:15 -08:00
Francesco Petrogalli	dfd3384fee	[InstCombine] Update valueCoversEntireFragment to use TypeSize * Update valueCoversEntireFragment to use TypeSize. * Add a regression test. * Assertions have been added to protect untested codepaths. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91806	2021-01-06 17:14:59 +00:00
Florian Hahn	494db3816b	[LoopDeletion] Also consider loops with subloops for deletion. Currently, LoopDeletion does skip loops that have sub-loops, but this means we currently fail to remove some no-op loops. One example are inner loops with live-out values. Those cannot be removed by itself. But the containing loop may itself be a no-op and the whole loop-nest can be deleted. The legality checks do not seem to rely on analyzing inner-loops only for correctness. With LoopDeletion being a LoopPass, the change means that we now unfortunately need to do some extra work in parent loops, by checking some conditions we already checked. But there appears to be no noticeable compile time impact: http://llvm-compile-time-tracker.com/compare.php?from=02d11f3cda2ab5b8bf4fc02639fd1f4b8c45963e&to=843201e9cf3b6871e18c52aede5897a22994c36c&stat=instructions This changes patch leads to ~10 more loops being deleted on MultiSource, SPEC2000, SPEC2006 with -O3 & LTO This patch is also required (together with a few others) to eliminate a no-op loop in omnetpp as discussed on llvm-dev 'LoopDeletion / removal of empty loops.' (http://lists.llvm.org/pipermail/llvm-dev/2020-December/147462.html) This change becomes relevant after removing potentially infinite loops is made possible in 'must-progress' loops (D86844). Note that I added a function call with side-effects to an outer loop in `llvm/test/Transforms/LoopDeletion/update-scev.ll` to preserve the original spirit of the test. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D93716	2021-01-06 14:49:00 +00:00
Florian Hahn	816dba48af	[VPlan] Keep start value in VPWidenIntOrFpInductionRecipe (NFC). This patch updates VPWidenIntOrFpInductionRecipe to hold the start value for the induction variable. This makes the start value explicit and allows for adjusting the start value for a VPlan. The flexibility will be used in further patches. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D92129	2021-01-06 11:47:33 +00:00
Florian Hahn	0ce5f402e0	[VPlan] Add getLiveInIRValue accessor to VPValue. This patch adds a new getLiveInIRValue accessor to VPValue, which returns the underlying value, if the VPValue is defined outside of VPlan. This is required to handle scalars in VPTransformState, which requires dealing with scalars defined outside of VPlan. We can simply check VPValue::Def to determine if the value is defined inside a VPlan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D92281	2021-01-06 11:20:42 +00:00
Florian Hahn	f73c09caa2	[VPlan] Use public VPValue constructor in VPPRedInstPHIRecipe (NFC). VPPredInstPHIRecipe does not need access to VPValue via friendship. It can just use the public constructor, Discussed as part of D92281.	2021-01-06 10:47:09 +00:00
Juneyoung Lee	29f8628d1f	[Constant] Add containsPoisonElement This patch - Adds containsPoisonElement that checks existence of poison in constant vector elements, - Renames containsUndefElement to containsUndefOrPoisonElement to clarify its behavior & updates its uses properly With this patch, isGuaranteedNotToBeUndefOrPoison's tests w.r.t constant vectors are added because its analysis is improved. Thanks! Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94053	2021-01-06 12:10:33 +09:00
Juneyoung Lee	4a8e6ed2f7	[SLP,LV] Use poison constant vector for shufflevector/initial insertelement This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef. This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector. The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94061	2021-01-06 11:22:50 +09:00
Roman Lebedev	a14945c1db	[SimplifyCFG] SimplifyEqualityComparisonWithOnlyPredecessor(): really don't delete DomTree edges multiple times	2021-01-06 01:52:39 +03:00
Roman Lebedev	2b437fcd47	[SimplifyCFG] SwitchToLookupTable(): switch to non-permissive DomTree updates ... which requires not deleting a DomTree edge that we just deleted.	2021-01-06 01:52:38 +03:00
Roman Lebedev	fa5447aa3f	[NFC][SimplifyCFG] SwitchToLookupTable(): pull out SI->getParent() into a variable	2021-01-06 01:52:38 +03:00
Roman Lebedev	d15d81ce15	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): deal with each predecessor only once If the predecessor is a switch, and BB is not the default destination, multiple cases could have the same destination. and it doesn't make sense to re-process the predecessor, because we won't make any changes, once is enough. I'm not sure this can be really tested, other than via the assertion being added here, which fires without the fix.	2021-01-06 01:52:37 +03:00
Roman Lebedev	fc96cb2dad	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): switch to non-permissive DomTree updates ... which requires not adding a DomTree edge that we just added.	2021-01-06 01:52:37 +03:00
Roman Lebedev	29ca7d5a1a	[SimplifyCFG] simplifyUnreachable(): fix handling of degenerate same-destination conditional branch One would hope that it would have been already canonicalized into an unconditional branch, but that isn't really guaranteed to happen with SimplifyCFG's visitation order.	2021-01-06 01:52:36 +03:00
Roman Lebedev	3460719f58	[NFC][SimplifyCFG] Add a test with same-destination condidional branch Reported by Mikael Holmén as post-commit feedback on https://reviews.llvm.org/rG2d07414ee5f74a09fb89723b4a9bb0818bdc2e18#968162	2021-01-06 01:52:36 +03:00
Roman Lebedev	f98535686e	[SimplifyCFG] simplifyUnreachable(): switch to non-permissive DomTree updates ... which requires not removing a DomTree edge if the switch's default still points at that destination, because it can't be removed; ... and not processing the same predecessor more than once.	2021-01-06 01:52:36 +03:00
Sanjay Patel	6a03f8ab62	[SLP] reduce code for finding reduction costs; NFC We can get both (vector/scalar) costs in a single switch instead of sequentially.	2021-01-05 17:35:54 -05:00
Arthur Eubanks	8cf1cc578d	[FuncAttrs] Infer noreturn A function is noreturn if all blocks terminating with a ReturnInst contain a call to a noreturn function. Skip looking at naked functions since there may be asm that returns. This can be further refined in the future by checking unreachable blocks and taking into account recursion. It looks like the attributor pass does this, but that is not yet enabled by default. This seems to help with code size under the new PM since PruneEH does not run under the new PM, missing opportunities to mark some functions noreturn, which in turn doesn't allow simplifycfg to clean up dead code. https://bugs.llvm.org/show_bug.cgi?id=46858. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D93946	2021-01-05 13:25:42 -08:00
Sanjay Patel	5a1d31a284	[SLP] use reduction kind's opcode for cost model queries; NFC This should be no-functional-change because the reduction kind opcodes are 1-for-1 mappings to the instructions we are matching as reductions. But we want to remove the need for the `OperationData` opcode field because that does not work when we start matching intrinsics (eg, maxnum) as reduction candidates.	2021-01-05 15:12:40 -05:00
Sanjay Patel	d4a999b453	[SLP] reduce code duplication; NFC	2021-01-05 15:12:40 -05:00
Atmn Patel	f88a797521	[LoopDeletion] Allows deletion of possibly infinite side-effect free loops From C11 and C++11 onwards, a forward-progress requirement has been introduced for both languages. In the case of C, loops with non-constant conditionals that do not have any observable side-effects (as defined by 6.8.5p6) can be assumed by the implementation to terminate, and in the case of C++, this assumption extends to all functions. The clang frontend will emit the `mustprogress` function attribute for C++ functions (D86233, D85393, D86841) and emit the loop metadata `llvm.loop.mustprogress` for every loop in C11 or later that has a non-constant conditional. This patch modifies LoopDeletion so that only loops with the `llvm.loop.mustprogress` metadata or loops contained in functions that are required to make progress (`mustprogress` or `willreturn`) are checked for observable side-effects. If these loops do not have an observable side-effect, then we delete them. Loops without observable side-effects that do not satisfy the above conditions will not be deleted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86844	2021-01-05 09:56:16 -05:00
Sanjay Patel	3b8b2c7da2	[SLP] delete unused pairwise reduction option SLP tries to model 2 forms of vector reductions: pairwise and splitting. From the cost model code comments, those are defined using an example as: /// Pairwise: /// (v0, v1, v2, v3) /// ((v0+v1), (v2+v3), undef, undef) /// Split: /// (v0, v1, v2, v3) /// ((v0+v2), (v1+v3), undef, undef) I don't know the full history of this functionality, but it was partly added back in D29402. There are apparently no users at this point (no regression tests change). X86 might have managed to work-around the need for this through cost model and codegen improvements. Removing this code makes it easier to continue the work that was started in D87416 / D88193. The alternative -- if there is some target that is silently using this option -- is to move this logic into LoopUtils. We have related/duplicate functionality there via llvm::createTargetReduction(). Differential Revision: https://reviews.llvm.org/D93860	2021-01-05 13:23:07 -05:00
Florian Hahn	8a47e6252a	[VPlan] Re-add interleave group members to plan. Creating in-loop reductions relies on IR references to map IR values to VPValues after interleave group creation. Make sure we re-add the updated member to the plan, so the look-ups still work as expected This fixes a crash reported after D90562.	2021-01-05 15:06:47 +00:00
Simon Pilgrim	313d982df6	[IR] Add ConstantInt::getBool helpers to wrap getTrue/getFalse.	2021-01-05 11:01:10 +00:00
Florian Hahn	38c6933dcc	[LV] Simplify lambda in all_of to directly return hasVF() result. (NFC) The if in the lambda is not necessary. We can directly return the result of hasVF.	2021-01-05 10:34:06 +00:00
Simon Pilgrim	a000366d05	[SimplifyIndVar] createWideIV - make WideIVInfo arg a const ref. NFCI. The WideIVInfo arg is only ever used as a const. Fixes cppcheck warning.	2021-01-05 10:31:45 +00:00
Simon Pilgrim	7a97eeb197	[Coroutines] checkAsyncFuncPointer - use cast<> instead of dyn_cast<> for dereferenced pointer. NFCI. We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null. Fixes static analyzer warning.	2021-01-05 10:31:45 +00:00
Jeremy Morse	914066fe38	[DebugInfo] Avoid LSR crash on large integer inputs Loop strength reduction tries to recover debug variable values by looking for simple offsets from PHI values. In really extreme conditions there may be an offset used that won't fit in an int64_t, hitting an APInt assertion. This patch adds a regression test and adjusts the equivalent value collecting code to filter out any values where the offset can't be represented by an int64_t. This means that for very large integers with very large offsets, the variable location will become undef, which is the same behaviour as before `2a6782bb9f` / D87494. Differential Revision: https://reviews.llvm.org/D94016	2021-01-05 10:25:37 +00:00
Simon Pilgrim	84d5768d97	MemProfiler::insertDynamicShadowAtFunctionEntry - use cast<> instead of dyn_cast<> for dereferenced pointer. NFCI. We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null. Fixes static analyzer warning.	2021-01-05 09:34:01 +00:00
Arthur Eubanks	e30fbbe9a5	[JumpThreading][NewPM] Skip when target has divergent CF Matches the legacy pass. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94028	2021-01-04 16:08:08 -08:00
Roman Lebedev	32c47ebef1	[SimplifyCFG] SimplifyCondBranchToTwoReturns(): switch to non-permissive DomTree updates ... which requires not deleting an edge that just got deleted, because we could be dealing with a block that didn't go through ConstantFoldTerminator() yet, and thus has a degenerate cond br with matching true/false destinations.	2021-01-05 01:26:37 +03:00
Roman Lebedev	110b3d7855	[SimplifyCFG] SimplifyEqualityComparisonWithOnlyPredecessor(): switch to non-permissive DomTree updates ... which requires not deleting an edge that just got deleted.	2021-01-05 01:26:37 +03:00
Roman Lebedev	a8604e3d5b	[SimplifyCFG] simplifyIndirectBr(): switch to non-permissive DomTree updates ... which requires not deleting an edge that just got deleted.	2021-01-05 01:26:36 +03:00
Roman Lebedev	ed9de61cc3	[SimplifyCFGPass] mergeEmptyReturnBlocks(): switch to non-permissive DomTree updates ... which requires not inserting an edge that already exists.	2021-01-05 01:26:36 +03:00
Roman Lebedev	3fb57222c4	[NFCI] SimplifyCFG: switch to non-permissive DomTree updates, where possible Notably, this doesn't switch every case, remaining cases don't actually pass sanity checks in non-permissve mode, and therefore require further analysis. Note that SimplifyCFG still defaults to not preserving DomTree by default, so this is effectively a NFC change.	2021-01-05 01:26:36 +03:00
Sanjay Patel	36263a7ccc	[LoopUtils] remove redundant opcode parameter; NFC While here, rename the inaccurate getRecurrenceBinOp() because that was also used to get CmpInst opcodes. The recurrence/reduction kind should always refer to the expected opcode for a reduction. SLP appears to be the only direct caller of createSimpleTargetReduction(), and that calling code ideally should not be carrying around both an opcode and a reduction kind. This should allow us to generalize reduction matching to use intrinsics instead of only binops.	2021-01-04 17:05:28 -05:00
Sanjay Patel	9766957524	[LoopUtils] reduce code for creatng reduction; NFC We can return from each case instead creating a temporary variable just to have a common return.	2021-01-04 16:05:03 -05:00
Sanjay Patel	58b6c5d932	[LoopUtils] reorder logic for creating reduction; NFC If we are using a shuffle reduction, we don't need to go through the switch on opcode - return early.	2021-01-04 16:05:02 -05:00
Whitney Tsang	de6d43f16c	Revert "[LoopNest] Allow empty basic blocks without loops" This reverts commit `9a17bff4f7`.	2021-01-04 20:42:21 +00:00
Whitney Tsang	9a17bff4f7	[LoopNest] Allow empty basic blocks without loops Allow loop nests with empty basic blocks without loops in different levels as perfect. Reviewers: Meinersbur Differential Revision: https://reviews.llvm.org/D93665	2021-01-04 19:59:50 +00:00
Philip Reames	7c63aac7bd	Revert "[LoopDeletion] Break backedge of loops when known not taken" This reverts commit `dd6bb367d1`. Multi-stage builders are showing an assertion failure w/LCSSA not being preserved on entry to IndVars. Reason isn't clear, reverting while investigating.	2021-01-04 09:50:47 -08:00
Philip Reames	dd6bb367d1	[LoopDeletion] Break backedge of loops when known not taken The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects. Differential Revision: https://reviews.llvm.org/D93906	2021-01-04 09:19:29 -08:00
Florian Hahn	c367258b5c	[SimplifyCFG] Enabled hoisting late in LTO pipeline. `bb7d3af113` disabled hoisting in SimplifyCFG by default, but enabled it late in the pipeline. But it appears as if the LTO pipelines got missed. This patch adjusts the LTO pipelines to also enable hoisting in the later stages. Unfortunately there's no easy way to add a test for the change I think. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D93684	2021-01-04 16:26:58 +00:00
Florian Hahn	e0905553b4	[ArgPromotion] Delay dead GEP removal until doPromotion. Currently ArgPromotion removes dead GEPs as part of the legality check in isSafeToPromoteArgument. If no promotion happens, this means the pass claims no modifications happened, even though GEPs were removed. This patch fixes the issue by delaying removal of dead GEPs until doPromotion: isSafeToPromoteArgument can simply skips dead GEPs and the code in doPromotion dealing with GEPs is updated to account for dead GEPs. Once we committed to promotion, it should be safe to remove dead GEPs. Alternatively isSafeToPromoteArgument could return an additional boolean to indicate whether it made changes, but this is quite cumbersome and there should be no real benefit of weeding out some dead GEPs here if we do not perform promotion. I added a test for the case where dead GEPs need to be removed when promotion happens in `578c5a0c6e`. Fixes PR47477. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93991	2021-01-04 09:51:20 +00:00
Andrew Litteken	5c951623bc	[IROutliner] Refactoring errors in the cost model from past patches. There were was the reuse of a variable that should not have been occurred due to confusion during committing patches.	2021-01-04 00:11:18 -06:00
Andrew Litteken	05e6ac4eb8	[IROutliner] Removing a duplicate addition, causing overestimates in IROutliner. There was an extra addition left over from a previous commit for the cost model, this removes it.	2021-01-03 23:36:28 -06:00
Roman Lebedev	98cd1c33e3	[NFC][SimplifyCFG] Hoist 'original' DomTree verification from simplifyOnce() into run() This is NFC since SimplifyCFG still currently defaults to not preserving DomTree. SimplifyCFGOpt::simplifyOnce() is only be called from SimplifyCFGOpt::run(), and can not be called externally, since SimplifyCFGOpt is defined in .cpp This avoids some needless verifications, and is thus a bit faster without sacrificing precision.	2021-01-04 01:02:02 +03:00
Roman Lebedev	a7684940f0	[SimplifyCFG] SimplifyTerminatorOnSelect(): fix/tune DomTree updates We only need to remove non-TrueBB/non-FalseBB successors, and we only need to do that once. We don't need to insert any new edges, because no new successors will be added.	2021-01-04 01:02:02 +03:00
Roman Lebedev	70935b9595	[NFC][SimplifyCFG] SimplifyTerminatorOnSelect(): pull out OldTerm->getParent() into a variable	2021-01-04 01:02:02 +03:00
Kazu Hirata	ba82c0b315	[llvm] Call *(Set\|Map)::erase directly (NFC) We can erase an item in a set or map without checking its membership first.	2021-01-03 09:57:47 -08:00
Juneyoung Lee	1fc992bd86	[Scalarizer] Use poison as insertelement's placeholder This patch makes Scalarizer to use poison as insertelement's placeholder. It contains two changes in Scalarizer.cpp, and the both changes does not change the semantics of the optimized program. It is because the placeholder value (poison) is already completely hidden by following insertelement instructions. The first change at visitBitCastInst() creates poison vector of MidTy and consecutively inserts FanIn times, which is # of elems of MidTy. The second change at ScalarizerVisitor::finish() creates poison with Op->getType(), and it is filled with Count insertelements. The test diffs show that the poison value is never exposed after insertelements. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93989	2021-01-04 00:35:28 +09:00
Roman Lebedev	5fa241a657	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): fine-tune/fix DomTree preservation, take 2	2021-01-03 01:45:48 +03:00
Roman Lebedev	6a3a8d17eb	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): fine-tune/fix DomTree preservation	2021-01-03 01:45:48 +03:00
Roman Lebedev	7c8b8063b6	[SimplifyCFG][AMDGPU] AMDGPUUnifyDivergentExitNodes: SimplifyCFG isn't ready to preserve PostDomTree There is a number of transforms in SimplifyCFG that take DomTree out of DomTreeUpdater, and do updates manually. Until they are fixed, user passes are unable to claim that PDT is preserved. Note that the default for SimplifyCFG is still not to preserve DomTree, so this is still effectively NFC.	2021-01-03 01:45:46 +03:00
Kazu Hirata	530c5af6a4	[Transforms] Construct SmallVector with iterator ranges (NFC)	2021-01-02 09:24:17 -08:00
Florian Hahn	c50f9b2351	[LV] Clean up trailing whitespace (NFC). Clean up some stray whitespace that sneaked in recently.	2021-01-02 16:43:13 +00:00
Roman Lebedev	b9da488ad7	[SimplifyCFG] Don't actually take DomTreeUpdater unless we intend to maintain DomTree validity This guards against unintentional mistakes like the one i just fixed in previous commit.	2021-01-02 14:40:55 +03:00
Roman Lebedev	b4429f3cdd	[SimplifyCFG] Teach removeUndefIntroducingPredecessor to preserve DomTree	2021-01-02 01:01:20 +03:00
Roman Lebedev	657c1e09da	[SimplifyCFG] Teach eliminateDeadSwitchCases() to preserve DomTree, part 2	2021-01-02 01:01:18 +03:00
Roman Lebedev	f1ce696056	[SimplifyCFG] Teach tryWidenCondBranchToCondBranch() to preserve DomTree	2021-01-02 01:01:17 +03:00
Roman Lebedev	e08fea3b24	[SimplifyCFGPass] Ensure that DominatorTreeWrapperPass is init'd before SimplifyCFG It's probably better than hoping that it will happen to be already initialized.	2021-01-02 01:01:17 +03:00
Kazu Hirata	f43daf1b62	[SSAUpdater] Remove unused code InstrIsPHI (NFC) The last use of this function was removed on Jan 4, 2018 in commit commit `90ecac01e9`.	2021-01-01 12:44:52 -08:00
Sanjay Patel	c74e8539ff	[Analysis] flatten enums for recurrence types This is almost all mechanical search-and-replace and no-functional-change-intended (NFC). Having a single enum makes it easier to match/reason about the reduction cases. The goal is to remove `Opcode` from reduction matching code in the vectorizers because that makes it harder to adapt the code to handle intrinsics. The code in RecurrenceDescriptor::AddReductionVar() is the only place that required closer inspection. It uses a RecurrenceDescriptor and a second InstDesc to sometimes overwrite part of the struct. It seem like we should be able to simplify that logic, but it's not clear exactly which cmp+sel patterns that we are trying to handle/avoid.	2021-01-01 12:20:16 -05:00
Florian Hahn	d9f306aa52	[LV] Fix crash when generating remarks with multi-exit loops. If DoExtraAnalysis is true (e.g. because remarks are enabled), we continue with the analysis rather than exiting. Update code to conditionally check if the ExitBB has phis or not a single predecessor. Otherwise a nullptr is dereferenced with DoExtraAnalysis.	2021-01-01 13:54:41 +00:00
Roman Lebedev	831636b0e6	[SimplifyCFG] SUCCESS! Teach createUnreachableSwitchDefault() to preserve DomTree This pretty much concludes patch series for updating SimplifyCFG to preserve DomTree. All 318 dedicated `-simplifycfg` tests now pass with `-simplifycfg-require-and-preserve-domtree=1`. There are a few leftovers that apparently don't have good test coverage. I do not yet know what gaps in test coverage will the wider-scale testing reveal, but the default flip might be close.	2021-01-01 03:25:25 +03:00
Roman Lebedev	e1440d43bc	[SimplifyCFG] Teach tryToSimplifyUncondBranchWithICmpInIt() to preserve DomTree	2021-01-01 03:25:25 +03:00
Roman Lebedev	8866583953	[SimplifyCFG] Teach FoldValueComparisonIntoPredecessors() to preserve DomTree, part 2	2021-01-01 03:25:24 +03:00
Roman Lebedev	a815b6b2b2	[SimplifyCFG] Teach eliminateDeadSwitchCases() to preserve DomTree, part 1	2021-01-01 03:25:24 +03:00
Roman Lebedev	0d2f219d4d	[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 3	2021-01-01 03:25:23 +03:00
Roman Lebedev	9f17dab1f4	[SimplifyCFG] Teach simplifyIndirectBr() to preserve DomTree	2021-01-01 03:25:23 +03:00
Roman Lebedev	b7c463d7b8	[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 2	2021-01-01 03:25:23 +03:00
Roman Lebedev	c1b825d4b8	[SimplifyCFG] Teach FoldValueComparisonIntoPredecessors() to preserve DomTree, part 1	2021-01-01 03:25:22 +03:00
Andrew Litteken	1a9eb19af9	[IROutliner] Adding consistent function attribute merging When combining extracted functions, they may have different function attributes. We want to make sure that we do not make any assumptions, or lose any information. This attempts to make sure that we consolidate function attributes to their most general case. Tests: llvm/test/Transforms/IROutliner/outlining-compatible-and-attribute-transfer.ll llvm/test/Transforms/IROutliner/outlining-compatible-or-attribute-transfer.ll Reviewers: jdoefert, paquette Differential Revision: https://reviews.llvm.org/D87301	2020-12-31 12:30:23 -06:00
Fangrui Song	a90b42b0fe	[ThinLTO] Default -enable-import-metadata to false The default value is dependent on `-DLLVM_ENABLE_ASSERTIONS={off,on}` (D22167), which is error-prone. The few tests checking `!thinlto_src_module` can specify -enable-import-metadata explicitly. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D93959	2020-12-31 10:04:21 -08:00
Dávid Bolvanský	ae69fa9b9f	[InstCombine] Transform (A + B) - (A & B) to A \| B (PR48604) define i32 @src(i32 %x, i32 %y) { %0: %a = add i32 %x, %y %o = and i32 %x, %y %r = sub i32 %a, %o ret i32 %r } => define i32 @tgt(i32 %x, i32 %y) { %0: %b = or i32 %x, %y ret i32 %b } Transformation seems to be correct! https://alive2.llvm.org/ce/z/2fhW6r	2020-12-31 15:04:32 +01:00
Dávid Bolvanský	742ea77ca4	[InstCombine] Transform (A + B) - (A \| B) to A & B (PR48604) define i32 @src(i32 %x, i32 %y) { %0: %a = add i32 %x, %y %o = or i32 %x, %y %r = sub i32 %a, %o ret i32 %r } => define i32 @tgt(i32 %x, i32 %y) { %0: %b = and i32 %x, %y ret i32 %b } Transformation seems to be correct! https://alive2.llvm.org/ce/z/aQRh2j	2020-12-31 14:03:20 +01:00
Bogdan Graur	8bee4d4e8f	Revert "[LoopDeletion] Allows deletion of possibly infinite side-effect free loops" Test clang/test/Misc/loop-opt-setup.c fails when executed in Release. This reverts commit `6f1503d598`. Reviewed By: SureYeaah Differential Revision: https://reviews.llvm.org/D93956	2020-12-31 11:47:49 +00:00
Atmn Patel	6f1503d598	[LoopDeletion] Allows deletion of possibly infinite side-effect free loops From C11 and C++11 onwards, a forward-progress requirement has been introduced for both languages. In the case of C, loops with non-constant conditionals that do not have any observable side-effects (as defined by 6.8.5p6) can be assumed by the implementation to terminate, and in the case of C++, this assumption extends to all functions. The clang frontend will emit the `mustprogress` function attribute for C++ functions (D86233, D85393, D86841) and emit the loop metadata `llvm.loop.mustprogress` for every loop in C11 or later that has a non-constant conditional. This patch modifies LoopDeletion so that only loops with the `llvm.loop.mustprogress` metadata or loops contained in functions that are required to make progress (`mustprogress` or `willreturn`) are checked for observable side-effects. If these loops do not have an observable side-effect, then we delete them. Loops without observable side-effects that do not satisfy the above conditions will not be deleted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86844	2020-12-30 21:43:01 -05:00
Kazu Hirata	95ea86587c	[PGO] Use isa instead of dyn_cast (NFC)	2020-12-30 17:45:38 -08:00
Roman Lebedev	51879a5256	[LoopIdiom] 'left-shift until bittest': don't forget to check that PHI node is in loop header Fixes an issue reported by Peter Collingbourne in https://reviews.llvm.org/D91726#2475301	2020-12-30 23:58:41 +03:00
Roman Lebedev	7f221c9196	[SimplifyCFG] Teach SwitchToLookupTable() to preserve DomTree	2020-12-30 23:58:41 +03:00
Roman Lebedev	a17025aa61	[SimplifyCFG] Teach switchToSelect() to preserve DomTree	2020-12-30 23:58:40 +03:00
Roman Lebedev	c45f765c0d	[SimplifyCFG] Teach SimplifyBranchOnICmpChain() to preserve DomTree	2020-12-30 23:58:40 +03:00
Sanjay Patel	8ca60db40b	[LoopUtils] reduce FMF and min/max complexity when forming reductions I don't know if there's some way this changes what the vectorizers may produce for reductions, but I have added test coverage with `3567908` and `5ced712` to show that both passes already have bugs in this area. Hopefully this does not make things worse before we can really fix it.	2020-12-30 15:22:26 -05:00
Yuanfang Chen	277ebe46c6	Fix `LLVM_ENABLE_MODULES=On` build for commit `480936e741`.	2020-12-30 10:54:04 -08:00
Andrew Litteken	fe431103b6	[IROutliner] Adding option to enable outlining from linkonceodr functions There are functions that the linker is able to automatically deduplicate, we do not outline from these functions by default. This allows for outlining from those functions. Tests: llvm/test/Transforms/IROutliner/outlining-odr.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87309	2020-12-30 12:08:04 -06:00
Sanjay Patel	e90ea76380	[IR] remove 'NoNan' param when creating FP reductions This is no-functional-change-intended (AFAIK, we can't isolate this difference in a regression test). That's because the callers should be setting the IRBuilder's FMF field when creating the reduction and/or setting those flags after creating. It doesn't make sense to override this one flag alone. This is part of a multi-step process to clean up the FMF setting/propagation. See PR35538 for an example.	2020-12-30 09:51:23 -05:00
Juneyoung Lee	420d046d6b	clang-format, address warnings	2020-12-30 23:05:07 +09:00
Juneyoung Lee	9b29610228	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923	2020-12-30 22:36:08 +09:00
Juneyoung Lee	bfedd5d2b6	[ConstraintElimination] Add support for select form of and/or This patch adds support for select form of and/or. Currently there is an ongoing effort for moving towards using `select a, b, false` instead of `and i1 a, b` and `select a, true, b` instead of `or i1 a, b` as well. D93065 has links to relevant changes. Alive2 proof: (undef input was disabled due to timeout :( ) - and: https://alive2.llvm.org/ce/z/AgvFbQ - or: https://alive2.llvm.org/ce/z/KjLJyb Differential Revision: https://reviews.llvm.org/D93935	2020-12-30 21:27:36 +09:00
Andrew Litteken	30feb93036	[IROutliner] Adding support for swift errors in the IROutliner Since some values can be swift errors, we need to make sure that we correctly propagate the parameter attributes. Tests found at: llvm/test/Transforms/IROutliner/outlining-swift-error.ll Reviewers: jroelofs, paquette Recommit of: `71867ed5e6` Differential Revision: https://reviews.llvm.org/D87742	2020-12-30 01:17:27 -06:00
Andrew Litteken	eeb99c2ac2	Revert "[IROutliner] Adding support for swift errors" This reverts commit `71867ed5e6`. Reverting for lack of commit messages.	2020-12-30 01:17:27 -06:00
Andrew Litteken	71867ed5e6	[IROutliner] Adding support for swift errors	2020-12-30 01:14:55 -06:00
Luo, Yuanke	981a0bd858	[X86] Add x86_amx type for intel AMX. The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. It can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981. Differential Revision: https://reviews.llvm.org/D91927	2020-12-30 13:52:13 +08:00
Kazu Hirata	16d20e2554	[Transforms/Utils] Construct SmallVector with iterator ranges (NFC)	2020-12-29 19:23:23 -08:00
Andrew Litteken	df4a931c63	[IROutliner] Adding OptRemarks to the IROutliner Pass This prints OptRemarks at each location where a decision is made to not outline, or to outline a specific section for the IROutliner pass. Test: llvm/test/Transforms/IROutliner/opt-remarks.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87300	2020-12-29 15:52:08 -06:00
Roman Lebedev	39a56f7f17	[SimplifyCFG] Teach SimplifyTerminatorOnSelect() to preserve DomTree	2020-12-30 00:48:12 +03:00
Roman Lebedev	ec0b671a61	[SimplifyCFG] Teach SimplifyCondBranchToCondBranch() to preserve DomTree	2020-12-30 00:48:12 +03:00
Roman Lebedev	307156246f	[SimplifyCFG] Teach mergeConditionalStoreToAddress() to preserve DomTree	2020-12-30 00:48:11 +03:00
Roman Lebedev	d4c0abb4a3	[SimplifyCFG] Teach FoldCondBranchOnPHI() to preserve DomTree	2020-12-30 00:48:11 +03:00
Roman Lebedev	b8121b2e62	[SimplifyCFG] Teach SinkCommonCodeFromPredecessors() to preserve DomTree	2020-12-30 00:48:11 +03:00
Roman Lebedev	18c407bf4c	[SimplifyCFG] Teach HoistThenElseCodeToIf() to preserve DomTree	2020-12-30 00:48:10 +03:00
Roman Lebedev	fe9bdd9621	[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 2	2020-12-30 00:48:10 +03:00
Roman Lebedev	6027e05dbf	[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 1	2020-12-30 00:48:10 +03:00
Sanjay Patel	8d18bc8e6d	[Utils] reduce code in createTargetReduction(); NFC The switch duplicated the translation in getRecurrenceBinOp(). This code is still weird because it translates to the TTI ReductionFlags for min/max, but then createSimpleTargetReduction() converts that back to RecurrenceDescriptor::MinMaxRecurrenceKind.	2020-12-29 15:56:19 -05:00
Sanjay Patel	21a3a0225d	[SLP] replace local reduction enum with RecurrenceKind; NFCI I'm not sure if the SLP enum was created before the IVDescriptor RecurrenceDescriptor / RecurrenceKind existed, but the code in SLP is now redundant with that class, so it just makes things more complicated to have both. We eventually call LoopUtils createSimpleTargetReduction() to create reduction ops, so we might as well standardize on those enum names. There's still a question of whether we need to use TTI::ReductionFlags vs. MinMaxRecurrenceKind, but that can be another clean-up step. Another option would just be to flatten the enums in RecurrenceDescriptor into a single enum. There isn't much benefit (smaller switches?) to having a min/max subset.	2020-12-29 14:52:11 -05:00
Andrew Litteken	6df161a2fb	[IROutliner] Adding a cost model, and debug option to turn the model off. This adds a cost model that takes into account the total number of machine instructions to be removed from each region, the number of instructions added by adding a new function with a set of instructions, and the instructions added by handling arguments. Tests not adding flags: llvm/test/Transforms/IROutliner/outlining-cost-model.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87299	2020-12-29 12:43:41 -06:00
Roman Lebedev	374ef57f13	[InstCombine] 'hoist xor-by-constant from xor-by-value': completely give up on constant exprs As Mikael Holmén is noting in the post-commit review for the first fix https://reviews.llvm.org/rGd4ccef38d0bb#967466 not hoisting constantexprs is not enough, because if the xor originally was a constantexpr (i.e. X is a constantexpr). `SimplifyAssociativeOrCommutative()` in `visitXor()` will immediately undo this transform, thus again causing an infinite combine loop. This transform has resulted in a surprising number of constantexpr failures.	2020-12-29 16:28:18 +03:00
Arthur Eubanks	c2ef06d3dd	[NewPM] Port infer-address-spaces And add it to the AMDGPU opt pipeline. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D93880	2020-12-28 19:58:12 -08:00
Kazu Hirata	5d2529f28f	[Scalar] Construct SmallVector with iterator ranges (NFC)	2020-12-28 19:55:18 -08:00
Andrew Litteken	1e23802507	[IROutliner] Merging identical output blocks for extracted functions. Many of the sets of output stores will be the same. When a block is created, we check if there is an output block with the same set of store instructions. If there is, we map the output block of the region back to the block, so that the extra argument controlling the switch statement can be set to the appropriate block value. Tests: - llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87298	2020-12-28 21:01:48 -06:00
Andrew Litteken	e6ae623314	[IROutliner] Adding support for consolidating functions with different output arguments. Certain regions can have values introduced inside the region that are used outside of the region. These may not be the same for each similar region, so we must create one over arching set of arguments for the consolidated function. We do this by iterating over the outputs for each extracted function, and creating as many different arguments to encapsulate the different outputs sets. For each output set, we create a different block with the necessary stores from the value to the output register. There is then one switch statement, controlled by an argument to the function, to differentiate which block to use. Changed Tests for consistency: llvm/test/Transforms/IROutliner/extraction.ll llvm/test/Transforms/IROutliner/illegal-assumes.ll llvm/test/Transforms/IROutliner/illegal-memcpy.ll llvm/test/Transforms/IROutliner/illegal-memmove.ll llvm/test/Transforms/IROutliner/illegal-vaarg.ll Tests to test new functionality: llvm/test/Transforms/IROutliner/outlining-different-output-blocks.ll llvm/test/Transforms/IROutliner/outlining-remapped-outputs.ll llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87296	2020-12-28 16:17:07 -06:00
Nikita Popov	4a16c507cb	[InstCombine] Disable unsafe select transform behind a flag This disables the poison-unsafe select -> and/or transform behind a flag (we continue to perform the fold by default). This is intended to simplify evaluation and testing while we teach various passes to directly recognize the select pattern. This only disables the main select -> and/or transform. A number of related ones are instead changed to canonicalize to the a ? b : false and a ? true : b forms which represent and/or respectively. This requires a bit of care to avoid infinite loops, as we do not want !a ? b : false to be converted into a ? false : b. The basic idea here is the same as D93065, but keeps the change behind a flag for now. Differential Revision: https://reviews.llvm.org/D93840	2020-12-28 22:43:52 +01:00
Roman Lebedev	ef93f7a11c	[SimplifyCFG] FoldBranchToCommonDest: gracefully handle unreachable code () We might be dealing with an unreachable code, so the bonus instruction we clone might be self-referencing. There is a sanity check that all uses of bonus instructions that are not in the original block with said bonus instructions are PHI nodes, and that is obviously not the case for self-referencing instructions.. So if we find such an use, just rewrite it. Thanks to Mikael Holmén for the reproducer! Fixes https://bugs.llvm.org/show_bug.cgi?id=48450#c8	2020-12-28 23:31:19 +03:00
Philip Reames	4b33b23877	Reapply "[LV] Vectorize (some) early and multiple exit loops"" w/fix for builder This reverts commit `4ffcd4fe9a` thus restoring `e4df6a40da`. The only change from the original patch is to add "llvm::" before the call to empty(iterator_range). This is a speculative fix for the ambiguity reported on some builders.	2020-12-28 10:13:28 -08:00
Arthur Eubanks	4ffcd4fe9a	Revert "[LV] Vectorize (some) early and multiple exit loops" This reverts commit `e4df6a40da`. Breaks Windows bots, e.g. http://45.33.8.238/win/30472/step_4.txt and http://lab.llvm.org:8011/#/builders/83/builds/2078/steps/5/logs/stdio	2020-12-28 10:05:41 -08:00
Philip Reames	e4df6a40da	[LV] Vectorize (some) early and multiple exit loops This patch is a major step towards supporting multiple exit loops in the vectorizer. This patch on it's own extends the loop forms allowed in two ways: single exit loops which are not bottom tested multiple exit loops w/ a single exit block reached from all exits and no phis in the exit block (because of LCSSA this implies no values defined in the loop used later) The restrictions on multiple exit loop structures will be removed in follow up patches; disallowing cases for now makes the code changes smaller and more obvious. As before, we can only handle loops with entirely analyzable exits. Removing that restriction is much harder, and is not part of currently planned efforts. The basic idea here is that we can force the last iteration to run in the scalar epilogue loop (if we have one). From the definition of SCEV's backedge taken count, we know that no earlier iteration can exit the vector body. As such, we can leave the decision on which exit to be taken to the scalar code and generate a bottom tested vector loop which runs all but the last iteration. The existing code already had the notion of requiring one iteration in the scalar epilogue, this patch is mainly about generalizing that support slightly, making sure we don't try to use this mechanism when tail folding, and updating the code to reflect the difference between a single exit block and a unique exit block (very mechanical). Differential Revision: https://reviews.llvm.org/D93317	2020-12-28 09:40:42 -08:00
Roman Lebedev	d4ccef38d0	[InstCombine] 'hoist xor-by-constant from xor-by-value': ignore constantexprs As it is being reported (in post-commit review) in https://reviews.llvm.org/D93857 this fold (as i expected, but failed to come up with test coverage despite trying) has issues with constant expressions. Since we only care about true constants, which constantexprs are not, don't perform such hoisting for constant expressions.	2020-12-28 20:15:20 +03:00
Yevgeny Rouban	d76c1d2247	[RS4GC] Lazily set changed flag when folding single entry phis The function FoldSingleEntryPHINodes() is changed to return if it has changed IR or not. This return value is used by RS4GC to set the MadeChange flag respectively. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D93810	2020-12-28 10:54:21 +07:00
Juneyoung Lee	9d70dbdc2b	[InstCombine] use poison as placeholder for undemanded elems Currently undef is used as a don’t-care vector when constructing a vector using a series of insertelement. However, this is problematic because undef isn’t undefined enough. Especially, a sequence of insertelement can be optimized to shufflevector, but using undef as its placeholder makes shufflevector a poison-blocking instruction because undef cannot be optimized to poison. This makes a few straightforward optimizations incorrect, such as: ``` ; https://bugs.llvm.org/show_bug.cgi?id=44185 define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) { %xv = insertelement <4 x float> %q, float %x, i32 2 %r = shufflevector <4 x float> %y, <4 x float> %xv, <4 x i32> { 0, 6, 2, undef } ret <4 x float> %r ; %r[3] is undef } => define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) { %r = insertelement <4 x float> %y, float %x, i32 1 ret <4 x float> %r ; %r[3] = %y[3], incorrect if %y[3] = poison } Transformation doesn't verify! ERROR: Target is more poisonous than source ``` I’d like to suggest 1. Using poison as insertelement’s placeholder value (IRBuilder::CreateVectorSplat should be patched too) 2. Updating shufflevector’s semantics to return poison element if mask is undef Note that poison is currently lowered into UNDEF in SelDag, so codegen part is okay. m_Undef() matches PoisonValue as well, so existing optimizations will still fire. The only concern is hidden miscompilations that will go incorrect when poison constant is given. A conservative way is copying all tests having `insertelement undef` & replacing it with `insertelement poison` & run Alive2 on it, but it will create many tests and people won’t like it. :( Instead, I’ll simply locally maintain the tests and run Alive2. If there is any bug found, I’ll report it. Relevant links: https://bugs.llvm.org/show_bug.cgi?id=43958 , http://lists.llvm.org/pipermail/llvm-dev/2019-November/137242.html Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93586	2020-12-28 08:58:15 +09:00
Florian Hahn	4ad41902e8	[GVN] Correctly set modified status when doing PRE on indices. This patch updates GVN to correctly return the modified status, if PRE is performed on indices. It fixes a crash when building the test-suite with EXPENSIVE_CHECKS and LTO.	2020-12-27 21:58:31 +00:00
Juneyoung Lee	d3f1f7b6bc	[EarlyCSE] Use m_LogicalAnd/Or matchers to handle branch conditions EarlyCSE's handleBranchCondition says: ``` // If the condition is AND operation, we can propagate its operands into the // true branch. If it is OR operation, we can propagate them into the false // branch. ``` This holds for the corresponding select patterns as well. This is a part of an ongoing work for disabling buggy select->and/or transformations. See llvm.org/pr48353 and D93065 for more context Proof: and: https://alive2.llvm.org/ce/z/MQWodU or: https://alive2.llvm.org/ce/z/9GLbB_ Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93842	2020-12-28 05:36:26 +09:00
Juneyoung Lee	f1d648b973	[GVN] Use m_LogicalAnd/Or to propagate equality from branch conditions This patch makes GVN recognize `select c1, c2, false` as well as `select c1, true, c2` branch condition and propagate equality from these. See llvm.org/pr48353, D93065 Differential Revision: https://reviews.llvm.org/D93841	2020-12-28 05:28:38 +09:00
Florian Hahn	0ea3749b3c	[LV] Set up branch from middle block earlier. Previously the branch from the middle block to the scalar preheader & exit was being set-up at the end of skeleton creation in completeLoopSkeleton. Inserting SCEV or runtime checks may result in LCSSA phis being created, if they are required. Adjusting branches afterwards may break those PHIs. To avoid this, we can instead create the branch from the middle block to the exit after we created the middle block, so we have the final CFG before potentially adjusting/creating PHIs. This fixes a crash for the included test case. For the non-crashing case, this is almost a NFC with respect to the generated code. The only change is the order of the predecessors of the involved branch targets. Note an assertion was moved from LoopVersioning() to LoopVersioning::versionLoop. Adjusting the branches means loop-simplify form may be broken before constructing LoopVersioning. But LV only uses LoopVersioning to annotate the loop instructions with !noalias metadata, which does not require loop-simplify form. This is a fix for an existing issue uncovered by D93317.	2020-12-27 18:21:12 +00:00
Kazu Hirata	8299fb8f25	[Transforms] Use llvm::append_range (NFC)	2020-12-27 09:57:29 -08:00
Kazu Hirata	789d250613	[CodeGen, Transforms] Use *Map::lookup (NFC)	2020-12-27 09:57:27 -08:00
Sanjay Patel	badf0f20f3	[SLP] rename reduction variables for readability; NFC I am hoping to extend the reduction matching code, and it is hard to distinguish "ReductionData" from "ReducedValueData". So extend the tree/root metaphor to include leaves. Another problem is that the name "OperationData" does not provide insight into its purpose. I'm not sure if we can alter that underlying data structure to make the code clearer.	2020-12-26 11:20:25 -05:00
Sanjay Patel	c4ca108966	[SLP] use switch to improve readability; NFC This will get more complicated when we handle intrinsics like maxnum.	2020-12-26 10:59:45 -05:00
Kazu Hirata	46bea9b297	[Local] Remove unused function RemovePredecessorAndSimplify (NFC) The last use of the function was removed on Sep 29, 2010 in commit `99c985c37d`.	2020-12-25 09:35:20 -08:00
Roman Lebedev	25aebe2ccf	[LoopIdiom] 'left-shift-until-bittest': keep no-wrap flags on shift, fix edge-case miscompilation for %x.next While `%x.curr` is always safe to compute, because `LoopBackedgeTakenCount` will always be smaller than `bitwidth(X)`, i.e. we never get poison, rewriting `%x.next` is more complicated, however, because `X << LoopTripCount` will be poison iff `LoopTripCount == bitwidth(X)` (which will happen iff `BitPos` is `bitwidth(x) - 1` and `X` is `1`). So unless we know that isn't the case (as alive2 notes, we know it's safe to do iff shift had no-wrap flags, or bitpos does not indicate signbit, or we know that %x is never `1`), we'll need to emit an alternative, safe IR, by either just shifting the `%x.curr`, or conditionally selecting between the computed `%x.next` and `0`.. Former IR looks better so let's do that. While there, ensure that we don't drop no-wrap flags from said shift.	2020-12-24 21:20:52 +03:00
Roman Lebedev	d9ebaeeb46	[InstCombine] Hoist xor-by-constant from xor-by-value This is one of the deficiencies that can be observed in https://godbolt.org/z/YPczsG after D91038 patch set. This exposed two missing folds, one was fixed by the previous commit, another one is `(A ^ B) \| ~(A ^ B) --> -1` / `(A ^ B) & ~(A ^ B) --> 0`. `-early-cse` will catch it: https://godbolt.org/z/4n1T1v, but isn't meaningful to fix it in InstCombine, because we'd need to essentially do our own CSE, and we can't even rely on `Instruction::isIdenticalTo()`, because there are no guarantees that the order of operands matches. So let's just accept it as a loss.	2020-12-24 21:20:50 +03:00
Roman Lebedev	5b78303433	[InstCombine] Fold `a & ~(a ^ b)` to `x & y` ``` ---------------------------------------- define i32 @and_xor_not_common_op(i32 %a, i32 %b) { %0: %b2 = xor i32 %b, 4294967295 %t2 = xor i32 %a, %b2 %t4 = and i32 %t2, %a ret i32 %t4 } => define i32 @and_xor_not_common_op(i32 %a, i32 %b) { %0: %t4 = and i32 %a, %b ret i32 %t4 } Transformation seems to be correct! ```	2020-12-24 21:20:49 +03:00
Roman Lebedev	b3021a72a6	[IR][InstCombine] Add m_ImmConstant(), that matches on non-ConstantExpr constants, and use it A pattern to ignore ConstantExpr's is quite common, since they frequently lead into infinite combine loops, so let's make writing it easier.	2020-12-24 21:20:47 +03:00
Roman Lebedev	ff3749fc79	[NFC] SimplifyCFGOpt::simplifyUnreachable(): pacify unused variable warning Thanks to Luke Benes for pointing it out.	2020-12-24 21:20:46 +03:00
Kazu Hirata	df812115e3	[CodeGen, Transforms] Use llvm::any_of (NFC)	2020-12-24 09:08:36 -08:00
Simon Pilgrim	89abe1cf83	[InstCombine] foldICmpUsingKnownBits - use KnownBits signed/unsigned getMin/MaxValue helpers. NFCI. Replace the local compute*SignedMinMaxValuesFromKnownBits methods with the equivalent KnownBits helpers to determine the min/max value ranges.	2020-12-24 14:22:26 +00:00
Nikita Popov	ef2f843347	Revert "[InstCombine] Check inbounds in load/store of gep null transform (PR48577)" This reverts commit `899faa50f2`. Upon further consideration, this does not fix the right issue. Doing this fold for non-inbounds GEPs is legal, because the resulting pointer is still based-on null, which has no associated address range, and as such and access to it is UB. https://bugs.llvm.org/show_bug.cgi?id=48577#c3	2020-12-24 12:36:56 +01:00
Nikita Popov	90177912a4	Revert "[InstCombine] Fold gep inbounds of null to null" This reverts commit `eb79fd3c92`. This causes stage2 crashes, possibly due to StringMap being miscompiled. Reverting for now.	2020-12-24 10:20:31 +01:00
Roman Lebedev	f8079355c6	[InstCombine] canonicalizeAbsNabs(): don't propagate NSW flag for NABS patter As Nuno is noting in post-commit review in https://reviews.llvm.org/D87188#2467915 it is not correct to keep NSW for negated abs pattern, so don't do that.	2020-12-24 00:06:09 +03:00
Nikita Popov	759b8c11c3	[InstCombine] Handle different pointer types when folding gep of null The source pointer type is not necessarily the same as the result pointer type, so we can't simply return the original null pointer, it might be a different one.	2020-12-23 21:58:26 +01:00
Nikita Popov	eb79fd3c92	[InstCombine] Fold gep inbounds of null to null Effectively, this is what we were previously already doing when the GEP was used in conjunction with a load or store, but this fold can also be applied more generally: > The only in bounds address for a null pointer in the default > address-space is the null pointer itself.	2020-12-23 21:41:53 +01:00
Nikita Popov	899faa50f2	[InstCombine] Check inbounds in load/store of gep null transform (PR48577) If the GEP isn't inbounds, then accessing a GEP of null location is generally not UB. While this is a minimal fix, the GEP of null handling should probably be its own fold.	2020-12-23 21:03:22 +01:00
Craig Topper	897990e614	[IROutliner] Use isa instead of dyn_cast where the casted value isn't used. NFC Fixes unused variable warnings.	2020-12-23 11:40:15 -08:00
Roman Lebedev	2b61e7c68c	[LoopIdiom] 'left-shift until bittest' idiom: support rewriting loop as countable, allow extra cruft The current state of the transform is still not enough to support my motivational pattern, because it has one more "induction variable". I have delayed posting this patch, because originally even just rewriting the loop as countable wasn't enough to nicely transform my motivational pattern, because i expected that extra IV to be rewritten afterwards, but it wasn't happening until i fixed that in D91800. So, this patch allows the 'left-shift until bittest' loop idiom as long as the inserted ops are cheap, and lifts any and all extra use checks on the instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D92754	2020-12-23 22:28:10 +03:00
Roman Lebedev	a0ddc61c5b	[LoopIdiom] 'left-shift until bittest' idiom: support canonical sign bit mask If the bitmask is for sign bit, instcombine would have canonicalized the pattern into a proper sign bit check. Supporting that is still simple, but requires a bit of a roundtrip - we first have to use `decomposeBitTestICmp()`, and the rest again just works. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91726	2020-12-23 22:28:09 +03:00
Roman Lebedev	cb2e5980ba	[LoopIdiom] 'left-shift until bittest' idiom: support constant bit mask The handing of the case where the mask is a constant is trivial, if said constant is a power of two, the bit in question is log2(mask), rest just works. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91725	2020-12-23 22:28:09 +03:00
Roman Lebedev	e124844709	[LoopIdiom] Introduce 'left-shift until bittest' idiom The motivation here is the following inner loop in fp16/fp24 -> fp32 expander, that runs as part of the floating-point DNG decompression in RawSpeed library: `cd380bb9a2/src/librawspeed/decompressors/DeflateDecompressor.cpp (L112-L115)` ``` while (!(fp32_fraction & (1 << 23))) { fp32_exponent -= 1; fp32_fraction <<= 1; } ``` (https://godbolt.org/z/r13YMh) As one might notice, that loop is currently uncountable, and that whole code stays scalar. Yet, it is rather trivial to make that loop countable: https://godbolt.org/z/do8WMz and we can prove that via alive2: https://alive2.llvm.org/ce/z/7vQnji (ha nice, isn't it?) ... and that allow for the whole fp16->fp32 code to vectorize: https://godbolt.org/z/7hYr13 Now, while i'd love to get there, i feel like i should take it in steps. For now, this introduces support for the most basic case, where the bit position is known as a variable, and the loop will go away (has no live-outs other than the recurrence, no extra instructions in the loop). I have added sufficient (i believe) test coverage, and alive2 is happy with those transforms. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91038	2020-12-23 22:28:09 +03:00
Andrew Litteken	b1191c8438	[IROutliner] Adding support for elevating constants that are not the same in each region to arguments When there are constants that have the same structural location, but not the same value, between different regions, we cannot simply outline the region. Instead, we find the constants that are not the same in each location, and promote them to arguments to be passed into the respective functions. At each call site, we pass the constant in as an argument regardless of type. Added/Edited Tests: llvm/test/Transforms/IROutliner/outlining-constants-vs-registers.ll llvm/test/Transforms/IROutliner/outlining-different-constants.ll llvm/test/Transforms/IROutliner/outlining-different-globals.ll Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D87294	2020-12-23 13:03:05 -06:00
Evgeniy Brevnov	9fb074e7bb	[BPI] Improve static heuristics for "cold" paths. Current approach doesn't work well in cases when multiple paths are predicted to be "cold". By "cold" paths I mean those containing "unreachable" instruction, call marked with 'cold' attribute and 'unwind' handler of 'invoke' instruction. The issue is that heuristics are applied one by one until the first match and essentially ignores relative hotness/coldness of other paths. New approach unifies processing of "cold" paths by assigning predefined absolute weight to each block estimated to be "cold". Then we propagate these weights up/down IR similarly to existing approach. And finally set up edge probabilities based on estimated block weights. One important difference is how we propagate weight up. Existing approach propagates the same weight to all blocks that are post-dominated by a block with some "known" weight. This is useless at least because it always gives 50\50 distribution which is assumed by default anyway. Worse, it causes the algorithm to skip further heuristics and can miss setting more accurate probability. New algorithm propagates the weight up only to the blocks that dominates and post-dominated by a block with some "known" weight. In other words, those blocks that are either always executed or not executed together. In addition new approach processes loops in an uniform way as well. Essentially loop exit edges are estimated as "cold" paths relative to back edges and should be considered uniformly with other coldness/hotness markers. Reviewed By: yrouban Differential Revision: https://reviews.llvm.org/D79485	2020-12-23 22:47:36 +07:00
Kazu Hirata	3c707d73f2	[NewGVN] Remove for_each_found (NFC) The last use of the function was removed on Sep 30, 2017 in commit `9b926e90d3`.	2020-12-22 20:13:27 -08:00
Sanjay Patel	0d15d4b6f4	[SLP] use operand index abstraction for number of operands I think this is NFC currently, but the bug would be exposed when we allow binary intrinsics (maxnum, etc) as candidates for reductions. The code in matchAssociativeReduction() is using OperationData::getNumberOfOperands() when comparing whether the "EdgeToVisit" iterator is in-bounds, so this code must use the same (potentially offset) operand value to set the "EdgeToVisit".	2020-12-22 16:05:39 -05:00
Arnold Schwaighofer	333108e8be	Add a llvm.coro.end.async intrinsic The llvm.coro.end.async intrinsic allows to specify a function that is to be called as the last action before returning. This function will be inlined after coroutine splitting. This function can contain a 'musttail' call to allow for guaranteed tail calling as the last action. Differential Revision: https://reviews.llvm.org/D93568	2020-12-22 10:52:28 -08:00
Florian Hahn	ef4dbb2b7a	[LV] Use ScalarEvolution::getURemExpr to reduce duplication. ScalarEvolution should be able to handle both constant and variable trip counts using getURemExpr, so we do not have to handle them separately. This is a small simplification of `a56280094e`. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D93677	2020-12-22 14:48:42 +00:00
Florian Hahn	c0c0ae16c3	[VPlan] Make VPInstruction a VPDef This patch turns updates VPInstruction to manage the value it defines using VPDef. The VPValue is used during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90565	2020-12-22 09:53:47 +00:00
Gil Rapaport	a56280094e	[LV] Avoid needless fold tail When the trip-count is provably divisible by the maximal/chosen VF, folding the loop's tail during vectorization is redundant. This commit extends the existing test for constant trip-counts to any trip-count known to be divisible by maximal/selected VF by SCEV. Differential Revision: https://reviews.llvm.org/D93615	2020-12-22 10:25:20 +02:00
Ta-Wei Tu	d7a6f3a105	[LoopNest] Extend `LPMUpdater` and adaptor to handle loop-nest passes This is a follow-up patch of D87045. The patch implements "loop-nest mode" for `LPMUpdater` and `FunctionToLoopPassAdaptor` in which only top-level loops are operated. `createFunctionToLoopPassAdaptor` decides whether the returned adaptor is in loop-nest mode or not based on the given pass. If the pass is a loop-nest pass or the pass is a `LoopPassManager` which contains only loop-nest passes, the loop-nest version of adaptor is returned; otherwise, the normal (loop) version of adaptor is returned. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D87531	2020-12-22 08:47:38 +08:00
Congzhe Cao	c60a58f8d4	[InstCombine] Add check of i1 types in select-to-zext/sext transformation When doing select-to-zext/sext transformations, we should not handle TrueVal and FalseVal of i1 type otherwise it would result in zext/sext i1 to i1. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93272	2020-12-21 18:46:24 -05:00
Michael Forster	d56982b6f5	Remove unused variables. Differential Revision: https://reviews.llvm.org/D93635	2020-12-21 16:24:43 +01:00
Simon Pilgrim	88c5b50060	[AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts (REAPPLIED) The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns. This should allow us to begin resolving PR46896 et al. Ensure we block poison in a funnel shift value - similar to rG0fe91ad463fea9d08cbcd640a62aa9ca2d8d05e0 Reapplied with fix for PR48068 - we weren't checking that the shift values could be hoisted from their basicblocks. Differential Revision: https://reviews.llvm.org/D90625	2020-12-21 15:22:27 +00:00
Florian Hahn	f250892373	[VPlan] Make VPRecipeBase inherit from VPDef. This patch makes VPRecipeBase a direct subclass of VPDef, moving the SubclassID to VPDef. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90564	2020-12-21 13:34:00 +00:00
Florian Hahn	cd608dc8d3	[VPlan] Use VPDef for VPInterleaveRecipe. This patch turns updates VPInterleaveRecipe to manage the values it defines using VPDef. The VPValue is used during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90562	2020-12-21 10:56:53 +00:00
David Sherwood	3bf7d47a97	[NFC][InstructionCost] Remove isValid() asserts in SLPVectorizer.cpp An earlier patch introduced asserts that the InstructionCost is valid because at that time the ReuseShuffleCost variable was an unsigned. However, now that the variable is an InstructionCost instance the asserts can be removed. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174	2020-12-21 09:12:28 +00:00
Kazu Hirata	5d24935f22	[PGO] Remove dead member variable InstrumentFuncEntry (NFC) This patch removes InstrumentFuncEntry as it is dead. The constructor of FuncPGOInstrumentation passes InstrumentFuncEntry to MST, but it doesn't make a local copy as a member variable.	2020-12-20 09:57:05 -08:00
Andrew Litteken	7c6f28a438	[IROutliner] Deduplicating functions that only require inputs. Extracted regions can have both inputs and outputs. In addition, the CodeExtractor removes inputs that are only used in llvm.assumes, and sunken allocas (values are used entirely in the extracted region as denoted by lifetime intrinsics). We also cannot combine sections that have different constants in the same structural location, and these constants will have to elevated to argument. This patch deduplicates extracted functions that only have inputs and non of the special cases. We test that correctly deduplicate in: test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D86978	2020-12-19 17:34:34 -06:00
Andrew Litteken	b8a2b6af37	Revert "[IROutliner] Deduplicating functions that only require inputs." Missing reviewers and differential revision in commit message. This reverts commit `5cdc4f57e5`.	2020-12-19 17:33:49 -06:00
Andrew Litteken	5cdc4f57e5	[IROutliner] Deduplicating functions that only require inputs. Extracted regions can have both inputs and outputs. In addition, the CodeExtractor removes inputs that are only used in llvm.assumes, and sunken allocas (values are used entirely in the extracted region as denoted by lifetime intrinsics). We also cannot combine sections that have different constants in the same structural location, and these constants will have to elevated to argument. This patch deduplicates extracted functions that only have inputs and non of the special cases. We test that correctly deduplicate in: test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll	2020-12-19 17:26:29 -06:00
Roman Lebedev	c043f5055e	[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 1 ... for conditional branch case	2020-12-20 00:18:36 +03:00
Roman Lebedev	262ff9c23e	[SimplifyCFG] Teach TryToMergeLandingPad() to preserve DomTree	2020-12-20 00:18:36 +03:00
Roman Lebedev	6a1617d67c	[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 2 ... for the custom case returning void.	2020-12-20 00:18:36 +03:00
Roman Lebedev	b94520c9ee	[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 1 ... for the general case of returning a value.	2020-12-20 00:18:35 +03:00
Roman Lebedev	4d87a6ad13	[NFCI][SimplifyCFG] SimplifyCondBranchToTwoReturns(): pull out BI->getParent() into a variable	2020-12-20 00:18:35 +03:00
Roman Lebedev	83659c7076	[SimplifyCFG] simplifySingleResume(): FoldReturnIntoUncondBranch() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Apparently, there were no dedicated tests just for that functionality, so i'm adding one here.	2020-12-20 00:18:34 +03:00
Roman Lebedev	b7d00e29b7	[SimplifyCFG] Teach simplifySingleResume() to preserve DomTree	2020-12-20 00:18:34 +03:00
Roman Lebedev	c209b88dd4	[SimplifyCFG] Teach simplifyCommonResume() to preserve DomTree	2020-12-20 00:18:34 +03:00
Roman Lebedev	76e74d9395	[SimplifyCFG] Teach removeEmptyCleanup() to preserve DomTree	2020-12-20 00:18:33 +03:00
Roman Lebedev	4be8707e64	[SimplifyCFG] Teach FoldTwoEntryPHINode() to preserve DomTree Still boring, simply drop all edges to successors of DomBlock, and add an edge to to BB instead.	2020-12-20 00:18:33 +03:00
Roman Lebedev	b43b77ff9b	[NFCI][SimlifyCFG] simplifyOnce(): also perform DomTree validation And that exposes that a number of tests don't actually manage to maintain DomTree validity, which is inline with my observations. Once again, SimlifyCFG pass currently does not require/preserve DomTree by default, so this is effectively NFC.	2020-12-20 00:18:32 +03:00
Andrew Litteken	c52bcf3a9b	[IRSim][IROutliner] Limit to extracting regions that only require inputs. Extracted regions can have both inputs and outputs. In addition, the CodeExtractor removes inputs that are only used in llvm.assumes, and sunken allocas (values are used entirely in the extracted region as denoted by lifetime intrinsics). We also cannot combine sections that have different constants in the same structural location, and these constants will have to elevated to argument. This patch limits the extracted regions to those that only require inputs, and do not have any other special cases. We test that we do not outline the wrong constants in: test/Transforms/IROutliner/outliner-different-constants.ll test/Transforms/IROutliner/outliner-different-globals.ll test/Transforms/IROutliner/outliner-constant-vs-registers.ll We test that correctly outline in: test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Reviewers: paquette, plofti Differential Revision: https://reviews.llvm.org/D86977	2020-12-19 13:33:54 -06:00
Kazu Hirata	56edfcada9	[Target, Transforms] Use contains (NFC)	2020-12-19 10:43:19 -08:00
Aditya Kumar	1ab4db0f84	[HotColdSplit] Reflect full cost of parameters in split penalty Make the penalty for splitting a region more accurately reflect the cost of materializing all of the inputs/outputs to/from the region. This almost entirely eliminates code growth within functions which undergo splitting in key internal frameworks, and reduces the size of those frameworks between 2.6% to 3%. rdar://49167240 Patch by: Vedant Kumar(@vsk) Reviewers: hiraditya,rjf,t.p.northover Reviewed By: hiraditya,rjf Differential Revision: https://reviews.llvm.org/D59715	2020-12-18 17:06:17 -08:00
Akira Hatanaka	ffd982f7db	[ObjC][ARC] Fix a bug where the inline-asm retain/claim RV marker wasn't inserted when the original call had a 'returned' argument The code is testing whether the instruction BBI points to is the call that is paired up with the retainRV/claimRV call, but it doesn't work when the call has a 'returned' argument since GetArgRCIdentityRoot looks through 'returned' arguments. rdar://72485383	2020-12-18 16:59:06 -08:00
Sanjay Patel	37d0dda739	[SLP] fix typo; NFC	2020-12-18 16:55:52 -05:00
Nikita Popov	1f1145006b	[DSE] Use correct memory location for read clobber check MSSA DSE starts at a killing store, finds an earlier store and then checks that the earlier store is not read along any paths (without being killed first). However, it uses the memory location of the killing store for that, not the earlier store that we're attempting to eliminate. This has a number of problems: * Mismatches between what BasicAA considers aliasing and what DSE considers an overwrite (even though both are correct in isolation) can result in miscompiles. This is PR48279, which D92045 tries to fix in a different way. The problem is that we're using a location from a store that is potentially not executed and thus may be UB, in which case analysis results can be arbitrary. * Metadata on the killing store may be used to determine aliasing, but there is no guarantee that the metadata is valid, as the specific killing store may not be executed. Using the metadata on the earlier store is valid (it is the store we're removing, so on any execution where its removal may be observed, it must be executed). * The location is imprecise. For full overwrites the killing store will always have a location that is larger or equal than the earlier access location, so it's beneficial to use the earlier access location. This is not the case for partial overwrites, in which case either location might be smaller. There is some room for improvement here. Using the earlier access location means that we can no longer cache which accesses are read for a given killing store, as we may be querying different locations. However, it turns out that simply dropping the cache has no notable impact on compile-time. Differential Revision: https://reviews.llvm.org/D93523	2020-12-18 20:26:53 +01:00
Kazu Hirata	5ac37725df	[GVNHoist] Remove successorDominate (NFC) The function was introduced on Aug 25, 2016 in commit `5f0d0e60d1`. Its last use was removed on Sep 13, 2017 in commit `dfa8741c96`.	2020-12-18 10:29:52 -08:00
Roman Lebedev	897c985e1e	[InstCombine] Canonicalize SPF to abs intrinsic This patch enables canonicalization of SPF_ABS and SPF_ABS to the abs intrinsic. This is a recommit, the original try was `05d4c4ebc2`, but it was reverted due to an apparent miscompile, which since then has just been fixed by the previous commit. Differential Revision: https://reviews.llvm.org/D87188	2020-12-18 21:18:14 +03:00
Whitney Tsang	2a814cd9e1	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-18 17:37:17 +00:00
Arnamoy Bhattacharyya	06d5b1c9ad	[SROA] Remove Dead Instructions while creating speculative instructions The SROA pass tries to be lazy for removing dead instructions that are collected during iterative run of the pass in the DeadInsts list. However it does not remove instructions from the dead list while running eraseFromParent() on those instructions. This causes (rare) null pointer dereferences. For example, in the speculatePHINodeLoads() instruction, in the following code snippet: ``` while (!PN.use_empty()) { LoadInst LI = cast<LoadInst>(PN.user_back()); LI->replaceAllUsesWith(NewPN); LI->eraseFromParent(); } ``` If the Load instruction LI belongs to the DeadInsts list, it should be removed when eraseFromParent() is called. However, the bug does not show up in most cases, because immediately in the same function, a new LoadInst is created in the following line: ``` LoadInst Load = PredBuilder.CreateAlignedLoad( LoadTy, InVal, Alignment, (PN.getName() + ".sroa.speculate.load." + Pred->getName())); ``` This new LoadInst object takes the same memory address of the just deleted LI using eraseFromParent(), therefore the bug does not materialize. In very rare cases, the addresses differ and therefore, a dangling pointer is created, causing a crash. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D92431	2020-12-18 11:47:02 -05:00
Sanjay Patel	47aaa99c0e	[VectorCombine] allow peeking through GEPs when creating a vector load This is an enhancement motivated by https://llvm.org/PR16739 (see D92858 for another). We can look through a GEP to find a base pointer that may be safe to use for a vector load. If so, then we shuffle (shift) the necessary vector element over to index 0. Alive2 proof based on 1 of the regression tests: https://alive2.llvm.org/ce/z/yPJLkh The vector translation is independent of endian (verify by changing to leading 'E' in the datalayout string). Differential Revision: https://reviews.llvm.org/D93229	2020-12-18 09:25:03 -05:00
Yevgeny Rouban	f0e3d1d6ca	[IndVars] Fix adding trunc instructions to unwind blocks Truncate instruction must not be inserted before landing pads. The insertion point is fixed.	2020-12-18 12:52:23 +07:00
Kazu Hirata	b621116716	[Transforms] Use llvm::erase_if (NFC)	2020-12-17 19:53:10 -08:00
Rong Xu	31c0b8700b	Fix clang-ppc64le-rhel buildbot build error ix buildbot build error due to commit 3733463d: [IR][PGO] Add hot func attribute and use hot/cold attribute in func section	2020-12-17 19:14:43 -08:00
Rong Xu	3733463dbb	[IR][PGO] Add hot func attribute and use hot/cold attribute in func section Clang FE currently has hot/cold function attribute. But we only have cold function attribute in LLVM IR. This patch adds support of hot function attribute to LLVM IR. This attribute will be used in setting function section prefix/suffix. Currently .hot and .unlikely suffix only are added in PGO (Sample PGO) compilation (through isFunctionHotInCallGraph and isFunctionColdInCallGraph). This patch changes the behavior. The new behavior is: (1) If the user annotates a function as hot or isFunctionHotInCallGraph is true, this function will be marked as hot. Otherwise, (2) If the user annotates a function as cold or isFunctionColdInCallGraph is true, this function will be marked as cold. The changes are: (1) user annotated function attribute will used in setting function section prefix/suffix. (2) hot attribute overwrites profile count based hotness. (3) profile count based hotness overwrite user annotated cold attribute. The intention for these changes is to provide the user a way to mark certain function as hot in cases where training input is hard to cover all the hot functions. Differential Revision: https://reviews.llvm.org/D92493	2020-12-17 18:41:12 -08:00
Andrew Litteken	cea807602a	[IRSim][IROutliner] Adding InstVisitor to disallow certain operations. This adds a custom InstVisitor to return false on instructions that should not be allowed to be outlined. These match the illegal instructions in the IRInstructionMapper with exception of the addition of the llvm.assume intrinsic. Tests all the tests marked: illegal-*-.ll with a test for each kind of instruction that has been marked as illegal. Reviewers: jroelofs, paquette Differential Revisions: https://reviews.llvm.org/D86976	2020-12-17 19:33:57 -06:00
Roman Lebedev	2d07414ee5	[SimplifyCFG] Teach simplifyUnreachable() to preserve DomTree Pretty boring, removeUnwindEdge() already known how to update DomTree, so if we are to call it, we must first flush our own pending updates; otherwise, we just stop predecessors from branching to us, and for certain predecessors, stop their predecessors from branching to them also.	2020-12-18 00:37:22 +03:00
Roman Lebedev	2ee724863e	[SimplifyCFG] ConstantFoldTerminator() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a number of tests, all of which are marked as such so that they do not regress.	2020-12-18 00:37:22 +03:00
Roman Lebedev	164e0847a5	[SimplifyCFG] DeleteDeadBlock() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-18 00:37:21 +03:00
Bangtian Liu	511cfe9441	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit `d20e0c3444`.	2020-12-17 21:00:37 +00:00
Johannes Doerfert	994bb6eb7d	[OpenMP][NFC] Provide a new remark and documentation If a GPU function is externally reachable we give up trying to find the (unique) kernel it is called from. This can hinder optimizations. Emit a remark and explain mitigation strategies. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D93439	2020-12-17 14:38:26 -06:00
Andrew Litteken	dae34463e3	[IRSim][IROutliner] Adding the extraction basics for the IROutliner. Extracting the similar regions is the first step in the IROutliner. Using the IRSimilarityIdentifier, we collect the SimilarityGroups and sort them by how many instructions will be removed. Each IRSimilarityCandidate is used to define an OutlinableRegion. Each region is ordered by their occurrence in the Module and the regions that are not compatible with previously outlined regions are discarded. Each region is then extracted with the CodeExtractor into its own function. We test that correctly extract in: test/Transforms/IROutliner/extraction.ll test/Transforms/IROutliner/address-taken.ll test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Recommit of `bf899e8913` fixing memory leaks. Reviewers: paquette, jroelofs, yroux Differential Revision: https://reviews.llvm.org/D86975	2020-12-17 11:27:26 -06:00
Nabeel Omer	df2b9a3e02	[DebugInfo] Avoid re-ordering assignments in LCSSA The LCSSA pass makes use of a function insertDebugValuesForPHIs() to propogate dbg.value() intrinsics to newly inserted PHI instructions. Faulty behaviour occurs when the parent PHI of a newly inserted PHI is not the most recent assignment to a source variable. insertDebugValuesForPHIs ends up propagating a value that isn't the most recent assignemnt. This change removes the call to insertDebugValuesForPHIs() from LCSSA, preventing incorrect dbg.value intrinsics from being propagated. Propagating variable locations between blocks will occur later, during LiveDebugValues. Differential Revision: https://reviews.llvm.org/D92576	2020-12-17 16:17:32 +00:00
Bangtian Liu	d20e0c3444	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-17 16:00:15 +00:00
Florian Hahn	01089c876b	[InstCombine] Preserve !annotation on newly created instructions. If the source instruction has !annotation metadata, all instructions created during combining should also have it. Tell the builder to add it. The !annotation system was discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks' (http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html) This patch is based on an earlier patch by Francis Visoiu Mistrih. Reviewed By: thegameg, lebedev.ri Differential Revision: https://reviews.llvm.org/D91444	2020-12-17 15:20:23 +00:00
Florian Hahn	75c04bfc61	[SimplifyCFG] Preserve !annotation in FoldBranchToCommonDest. When folding a branch to a common destination, preserve !annotation on the created instruction, if the terminator of the BB that is going to be removed has !annotation. This should ensure that !annotation is attached to the instructions that 'replace' the original terminator. Reviewed By: jdoerfert, lebedev.ri Differential Revision: https://reviews.llvm.org/D93410	2020-12-17 14:06:58 +00:00
Jun Ma	0138399903	[InstCombine] Remove scalable vector restriction in InstCombineCasts Differential Revision: https://reviews.llvm.org/D93389	2020-12-17 22:02:33 +08:00
Florian Hahn	29077ae860	[IRBuilder] Generalize debug loc handling for arbitrary metadata. This patch extends IRBuilder to allow adding/preserving arbitrary metadata on created instructions. Instead of using references to specific metadata nodes (like DebugLoc), IRbuilder now keeps a vector of (metadata kind, MDNode *) pairs, which are added to each created instruction. The patch itself is a NFC and only moves the existing debug location handling over to the new system. In a follow-up patch it will be used to preserve !annotation metadata besides !dbg. The current approach requires iterating over MetadataToCopy to avoid adding duplicates, but given that the number of metadata kinds to copy/preserve is going to be very small initially (0, 1 (for !dbg) or 2 (!dbg and !annotation)) that should not matter. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D93400	2020-12-17 13:27:43 +00:00
Cullen Rhodes	1fd3a04775	[LV] Disable epilogue vectorization for scalable VFs Epilogue vectorization doesn't support scalable vectorization factors yet, disable it for now. Reviewed By: sdesmalen, bmahjour Differential Revision: https://reviews.llvm.org/D93063	2020-12-17 12:14:03 +00:00
dfukalov	9ed8e0caab	[NFC] Reduce include files dependency and AA header cleanup (part 2). Continuing work started in https://reviews.llvm.org/D92489: Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h". Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92852	2020-12-17 14:04:48 +03:00
Barry Revzin	92310454bf	Make LLVM build in C++20 mode Part of the <=> changes in C++20 make certain patterns of writing equality operators ambiguous with themselves (sorry!). This patch goes through and adjusts all the comparison operators such that they should work in both C++17 and C++20 modes. It also makes two other small C++20-specific changes (adding a constructor to a type that cases to be an aggregate, and adding casts from u8 literals which no longer have type const char*). There were four categories of errors that this review fixes. Here are canonical examples of them, ordered from most to least common: // 1) Missing const namespace missing_const { struct A { #ifndef FIXED bool operator==(A const&); #else bool operator==(A const&) const; #endif }; bool a = A{} == A{}; // error } // 2) Type mismatch on CRTP namespace crtp_mismatch { template <typename Derived> struct Base { #ifndef FIXED bool operator==(Derived const&) const; #else // in one case changed to taking Base const& friend bool operator==(Derived const&, Derived const&); #endif }; struct D : Base<D> { }; bool b = D{} == D{}; // error } // 3) iterator/const_iterator with only mixed comparison namespace iter_const_iter { template <bool Const> struct iterator { using const_iterator = iterator<true>; iterator(); template <bool B, std::enable_if_t<(Const && !B), int> = 0> iterator(iterator<B> const&); #ifndef FIXED bool operator==(const_iterator const&) const; #else friend bool operator==(iterator const&, iterator const&); #endif }; bool c = iterator<false>{} == iterator<false>{} // error \|\| iterator<false>{} == iterator<true>{} \|\| iterator<true>{} == iterator<false>{} \|\| iterator<true>{} == iterator<true>{}; } // 4) Same-type comparison but only have mixed-type operator namespace ambiguous_choice { enum Color { Red }; struct C { C(); C(Color); operator Color() const; bool operator==(Color) const; friend bool operator==(C, C); }; bool c = C{} == C{}; // error bool d = C{} == Red; } Differential revision: https://reviews.llvm.org/D78938	2020-12-17 10:44:10 +00:00
Florian Hahn	eba09a2db9	[InstCombine] Preserve !annotation for newly created instructions. When replacing an instruction with !annotation with a newly created replacement, add the !annotation metadata to the replacement. This mostly covers cases where the new instructions are created using the ::Create helpers. Instructions created by IRBuilder will be handled by D91444. Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D93399	2020-12-17 09:06:51 +00:00
Kazu Hirata	4ad5b634f6	[GCN] Remove unused function handleNewInstruction (NFC) The function was added without a user on Dec 22, 2016 in commit `7e274e02ae`. It seems to be unused since then.	2020-12-16 21:57:48 -08:00
Hongtao Yu	ac068e014b	[CSSPGO] Consume pseudo-probe-based AutoFDO profile This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`. Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites). Adjustment to profile format A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like ``` !CFGChecksum: 12345 ``` is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums. Differential Revision: https://reviews.llvm.org/D92347	2020-12-16 15:57:18 -08:00
alex-t	35ec3ff76d	Disable Jump Threading for the targets with divergent control flow Details: Jump Threading does not make sense for the targets with divergent CF since they do not use branch prediction for speculative execution. Also in the high level IR there is no enough information to conclude that the branch is divergent or uniform. This may cause errors in further CF lowering. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93302	2020-12-17 02:40:54 +03:00
Roman Lebedev	d22a47e9ff	[SimplifyCFG] Teach mergeEmptyReturnBlocks() to preserve DomTree A first real transformation that didn't already knew how to do that, but it's pretty tame - either change successor of all the predecessors of a block and carefully delay deletion of the block until afterwards the DomTree updates are appled, or add a successor to the block. There wasn't a great test coverage for this, so i added extra, to be sure.	2020-12-17 01:03:50 +03:00
Roman Lebedev	5cce4aff18	[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-17 01:03:49 +03:00
Roman Lebedev	49dac4aca0	[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-17 01:03:49 +03:00
Roman Lebedev	4fc169f664	[SimplifyCFG] removeUnreachableBlocks() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Apparently, there were no dedicated tests just for that functionality, so i'm adding one here.	2020-12-17 01:03:49 +03:00
Rong Xu	0abd744597	[PGO] Use the sum of profile counts to fix the function entry count Raw profile count values for each BB are not kept after profile annotation. We record function entry count and branch weights and use them to compute the count when needed. This mechanism works well in a perfect world, but often breaks in real programs, because of number prevision, inconsistent profile, or bugs in BFI). This patch uses sum of profile count values to fix function entry count to make the BFI count close to real profile counts. Differential Revision: https://reviews.llvm.org/D61540	2020-12-16 13:37:43 -08:00
Nikita Popov	e728024808	[DSE] Pass MemoryLocation by const ref (NFC)	2020-12-16 21:47:46 +01:00
Sanjay Patel	38ebc1a13d	[VectorCombine] optimize alignment for load transform Here's another minimal step suggested by D93229 / D93397 . (I'm trying to be extra careful in these changes because load transforms are easy to get wrong.) We can optimistically choose the greater alignment of a load and its pointer operand. As the test diffs show, this can improve what would have been unaligned vector loads into aligned loads. When we enhance with gep offsets, we will need to adjust the alignment calculation to include that offset. Differential Revision: https://reviews.llvm.org/D93406	2020-12-16 15:25:45 -05:00
Sanjay Patel	aaaf0ec72b	[VectorCombine] loosen alignment constraint for load transform As discussed in D93229, we only need a minimal alignment constraint when querying whether a hypothetical vector load is safe. We still pass/use the potentially stronger alignment attribute when checking costs and creating the new load. There's already a test that changes with the minimum code change, so splitting this off as a preliminary commit independent of any gep/offset enhancements. Differential Revision: https://reviews.llvm.org/D93397	2020-12-16 12:25:18 -05:00
Whitney Tsang	fa3693ad0b	[LoopNest] Handle loop-nest passes in LoopPassManager Per http://llvm.org/OpenProjects.html#llvm_loopnest, the goal of this patch (and other following patches) is to create facilities that allow implementing loop nest passes that run on top-level loop nests for the New Pass Manager. This patch extends the functionality of LoopPassManager to handle loop-nest passes by specializing the definition of LoopPassManager that accepts both kinds of passes in addPass. Only loop passes are executed if L is not a top-level one, and both kinds of passes are executed if L is top-level. Currently, loop nest passes should have the following run method: PreservedAnalyses run(LoopNest &, LoopAnalysisManager &, LoopStandardAnalysisResults &, LPMUpdater &); Reviewed By: Whitney, ychen Differential Revision: https://reviews.llvm.org/D87045	2020-12-16 17:07:14 +00:00
Caroline Concatto	be9184bc55	[SLPVectorizer]Migrate getEntryCost to return InstructionCost This patch also changes: the return type of getGatherCost and the signature of the debug function dumpTreeCosts to use InstructionCost. This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174 Depends on D93049 Differential Revision: https://reviews.llvm.org/D93127	2020-12-16 14:18:40 +00:00
Caroline Concatto	07217e0a1b	[CostModel]Migrate getTreeCost() to use InstructionCost This patch changes the type of cost variables (for instance: Cost, ExtractCost, SpillCost) to use InstructionCost. This patch also changes the type of cost variables to InstructionCost in other functions that use the result of getTreeCost() This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Depends on D91174 Differential Revision: https://reviews.llvm.org/D93049	2020-12-16 13:08:37 +00:00
Bangtian Liu	c10757200d	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit `cf638d793c`.	2020-12-16 11:52:30 +00:00
Philip Reames	1f6e15566f	[LV] Weaken a unnecessarily strong assert [NFC] Account for the fact that (in the future) the latch might be a switch not a branch. The existing code is correct, minus the assert.	2020-12-15 19:07:53 -08:00
Philip Reames	af7ef895d4	[LV] Extend dead instruction detection to multiple exiting blocks Given we haven't yet enabled multiple exiting blocks, this is currently non functional, but it's an obvious extension which cleans up a later patch. I don't think this is worth review (as it's pretty obvious), if anyone disagrees, feel feel to revert or comment and I will.	2020-12-15 18:46:32 -08:00
Bangtian Liu	cf638d793c	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-15 23:32:29 +00:00
Johannes Doerfert	dcaec81211	[OpenMP] Use assumptions during ICV tracking The OpenMP 5.1 assumptions `no_openmp` and `no_openmp_routines` allow us to ignore calls that would otherwise prevent ICV tracking. Once we track more ICVs we might need to distinguish the ones that could be impacted even with `no_openmp_routines`. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D92050	2020-12-15 16:51:34 -06:00
Johannes Doerfert	d08d490a4c	[OpenMPOpt][NFC] Clang format	2020-12-15 16:51:34 -06:00
Roman Lebedev	e113317958	[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware Two observations: 1. Unavailability of DomTree makes it impossible to make `FoldBranchToCommonDest()` transform in certain cases, where the successor is dominated by predecessor, because we then don't have PHI's, and can't recreate them, well, without handrolling 'is dominated by' check, which doesn't really look like a great solution to me. 2. Avoiding invalidating DomTree in SimplifyCFG will decrease the number of `Dominator Tree Construction` by 5 (from 28 now, i.e. -18%) in `-O3` old-pm pipeline (as per `llvm/test/Other/opt-O3-pipeline.ll`) This might or might not be beneficial for compile time. So the plan is to make SimplifyCFG preserve DomTree, and then eventually make DomTree fully required and preserved by the pass. Now, SimplifyCFG is ~7KLOC. I don't think it will be nice to do all this uplifting in a single mega-commit, nor would it be possible to review it in any meaningful way. But, i believe, it should be possible to do this in smaller steps, introducing the new behavior, in an optional way, off-by-default, opt-in option, and gradually fixing transforms one-by-one and adding the flag to appropriate test coverage. Then, eventually, the default should be flipped, and eventually^2 the flag removed. And that is what is happening here - when the new off-by-default option is specified, DomTree is required and is claimed to be preserved, and SimplifyCFG-internal assertions verify that the DomTree is still OK.	2020-12-16 00:38:00 +03:00
Philip Reames	a81db8b315	[LV] Restructure handling of -prefer-predicate-over-epilogue option [NFC] This should be purely non-functional. When touching this code for another reason, I found the handling of the PredicateOrDontVectorize piece here very confusing. Let's make it an explicit state (instead of an implicit combination of two variables), and use early return for options/hint processing.	2020-12-15 12:38:13 -08:00
Simon Pilgrim	a3bd67f222	SeparateConstOffsetFromGEP::lowerToSingleIndexGEPs - don't use dyn_cast_or_null. NFCI. ResultPtr is guaranteed to be non-null - and using dyn_cast_or_null causes unnecessary static analyzer warnings. We can't say the same for FirstResult AFAICT, so keep dyn_cast_or_null for that.	2020-12-15 17:27:25 +00:00
Florian Hahn	7ea3932ab1	[AnnotationRemarks] Also generate annotation remarks when using -O0. The AnnotationRemarks pass is already run at the end of the module pipeline. This patch also adds it before bailing out for -O0, so remarks are also generated with -O0.	2020-12-15 14:46:52 +00:00
Florian Hahn	7186a3965a	[VPlan] Use VPDef for VPWidenSelectRecipe. This patch turns updates VPWidenSelectRecipe to manage the value it defines using VPDef. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90560	2020-12-15 14:15:01 +00:00
Jun Ma	52a3267ffa	[InstCombine] Remove scalable vector restriction in foldVectorBinop Differential Revision: https://reviews.llvm.org/D93289	2020-12-15 21:14:59 +08:00
Jun Ma	ffe84d90e9	[InstCombine][NFC] Change cast of FixedVectorType to dyn_cast.	2020-12-15 20:36:57 +08:00
Jun Ma	e12f584578	[InstCombine] Remove scalable vector restriction in InstCombineCompares Differential Revision: https://reviews.llvm.org/D93269	2020-12-15 20:36:57 +08:00
Jun Ma	2ac58e21a1	[InstCombine] Remove scalable vector restriction when fold SelectInst Differential Revision: https://reviews.llvm.org/D93083	2020-12-15 20:36:57 +08:00
Florian Hahn	318f5798d8	[VPlan] Use VPDef for VPWidenGEPRecipe. This patch turns updates VPWidenGEPRecipe to manage the value it defines using VPDef. The VPValue is used during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90561	2020-12-15 09:30:14 +00:00
Florian Hahn	ad1161f9b5	[VPlan] Use VPdef for VPWidenCall. This patch turns updates VPWidenREcipe to manage the value it defines using VPDef. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90559	2020-12-15 09:20:07 +00:00
Nico Weber	a852ee199c	Reland "[MachineDebugify] Insert synthetic DBG_VALUE instructions" This reverts commit `841f9c937f`. The change landed many months ago; something else broke those tests.	2020-12-14 22:34:23 -05:00
Nico Weber	841f9c937f	Revert "[MachineDebugify] Insert synthetic DBG_VALUE instructions" This reverts commit `2a5675f11d`. The tests it adds fail: https://reviews.llvm.org/D78135#2453736	2020-12-14 22:14:48 -05:00
Reid Kleckner	d2ed9d6b7e	Revert "ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC" We determined that the MSVC implementation of std::aligned* isn't suited to our needs. It doesn't support 16 byte alignment or higher, and it doesn't really guarantee 8 byte alignment. See https://github.com/microsoft/STL/issues/1533 Also reverts "ADT: Change AlignedCharArrayUnion to an alias of std::aligned_union_t, NFC" Also reverts "ADT: Remove AlignedCharArrayUnion, NFC" to bring back AlignedCharArrayUnion. This reverts commit `4d8bf870a8`. This reverts commit `d10f9863a5`. This reverts commit `4b5dc150b9`.	2020-12-14 17:04:06 -08:00
Rong Xu	54e03d03a7	[PGO] Verify BFI counts after loading profile data This patch adds the functionality to compare BFI counts with real profile counts right after reading the profile. It will print remarks under -Rpass-analysis=pgo, or the internal option -pass-remarks-analysis=pgo. Differential Revision: https://reviews.llvm.org/D91813	2020-12-14 15:56:10 -08:00
Gulfem Savrun Yeniceri	7c0e3a77bc	[clang][IR] Add support for leaf attribute This patch adds support for leaf attribute as an optimization hint in Clang/LLVM. Differential Revision: https://reviews.llvm.org/D90275	2020-12-14 14:48:17 -08:00
Sanjay Patel	d399f870b5	[VectorCombine] make load transform poison-safe As noted in D93229, the transform from scalar load to vector load potentially leaks poison from the extra vector elements that are being loaded. We could use freeze here (and x86 codegen at least appears to be the same either way), but we already have a shuffle in this logic to optionally change the vector size, so let's allow that instruction to serve both purposes. Differential Revision: https://reviews.llvm.org/D93238	2020-12-14 17:42:01 -05:00
Craig Topper	25067f179f	[LoopIdiomRecognize] Teach detectShiftUntilZeroIdiom to recognize loops where the counter is decrementing. This adds support for loops like unsigned clz(unsigned x) { unsigned w = sizeof (x) * CHAR_BIT; while (x) { w--; x >>= 1; } return w; } and unsigned clz(unsigned x) { unsigned w = sizeof (x) * CHAR_BIT - 1; while (x >>= 1) { w--; } return w; } To support these we look for add x, -1 as well as add x, 1 that we already matched. If the value was -1 we need to subtract from the initial counter value instead of adding to it. Fixes PR48404. Differential Revision: https://reviews.llvm.org/D92745	2020-12-14 14:25:05 -08:00
Philip Reames	f5fe8493e5	[LAA] Relax restrictions on early exits in loop structure his is a preparation patch for supporting multiple exits in the loop vectorizer, by itself it should be mostly NFC. This patch moves the loop structure checks from LAA to their respective consumers (where duplicates don't already exist). Moving the checks does end up changing some of the optimization warnings and debug output slightly, but nothing that appears to be a regression. Why do this? Well, after auditing the code, I can't actually find anything in LAA itself which relies on having all instructions within a loop execute an equal number of times. This patch simply makes this explicit so that if one consumer - say LV in the near future (hopefully) - wants to handle a broader class of loops, it can do so. Differential Revision: https://reviews.llvm.org/D92066	2020-12-14 12:44:01 -08:00
Roman Lebedev	59560e8589	[SimplifyCFG] FoldBranchToCommonDest(): temporairly put back restrictions on liveout uses of bonus instructions (PR48450) Even though `d38205144f` was mostly a correct fix for the external non-PHI users, it's not a generally correct fix, because the 'placeholder' values in those trivial PHI's we create shouldn't be always 'undef', but the PHI itself for the backedges, else we end up with wrong value, as the `@pr48450_2` test shows. But we can't just do that, because we can't check that the PHI can be it's own incoming value when coming from certain predecessor, because we don't have a dominator tree. So until we can address this correctness problem properly, ensure that we don't perform the transformation if there are such problematic external uses. Making dominator tree available there is going to be involved, since `-simplifycfg` pass currently does not preserve/update domtree...	2020-12-14 20:14:31 +03:00
Roman Lebedev	e8360a8e1e	[NFC][SimplifyCFG] FoldBranchToCommonDest(): pull out 'common successor' into a variable Makes it easier to use it elsewhere	2020-12-14 20:14:31 +03:00
Stanislav Mekhanoshin	87d7757bbe	[SLP] Control maximum vectorization factor from TTI D82227 has added a proper check to limit PHI vectorization to the maximum vector register size. That unfortunately resulted in at least a couple of regressions on SystemZ and x86. This change reverts PHI handling from D82227 and replaces it with a more general check in SLPVectorizerPass::tryToVectorizeList(). Moved to tryToVectorizeList() it allows to restart vectorization if initial chunk fails. However, this function is more general and handles not only PHI but everything which SLP handles. If vectorization factor would be limited to maximum vector register size it would limit much more vectorization than before leading to further regressions. Therefore a new TTI callback getMaximumVF() is added with the default 0 to preserve current behavior and limit nothing. Then targets can decide what is better for them. The callback gets ElementSize just like a similar getMinimumVF() function and the main opcode of the chain. The latter is to avoid regressions at least on the AMDGPU. We can have loads and stores up to 128 bit wide, and <2 x 16> bit vector math on some subtargets, where the rest shall not be vectorized. I.e. we need to differentiate based on the element size and operation itself. Differential Revision: https://reviews.llvm.org/D92059	2020-12-14 08:49:40 -08:00
Markus Lavin	2a6782bb9f	Reland [DebugInfo] Improve dbg preservation in LSR. Use SCEV to salvage additional @llvm.dbg.value that have turned into referencing undef after transformation (and traditional salvageDebugInfo). Before rewrite (but after introduction of new induction variables) use SCEV to compute an equivalent set of values for each @llvm.dbg.value in the loop body (among the loop header PHI-nodes). After rewrite (and dead PHI elimination) update those @llvm.dbg.value now referencing undef by picking a remaining value from its equivalence set. Allow match with offset by inserting compensation code in the DIExpression. Fixes : PR38815 Differential Revision: https://reviews.llvm.org/D87494	2020-12-14 16:15:18 +01:00
Florian Hahn	e42e5263bd	[VPlan] Make VPWidenMemoryInstructionRecipe a VPDef. This patch updates VPWidenMemoryInstructionRecipe to use VPDef to manage the value it produces instead of inheriting from VPValue. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90563	2020-12-14 14:13:59 +00:00
Anton Afanasyev	fac7c7ec3c	[SLP] Fix vector element size for the store chains Vector element size could be different for different store chains. This patch prevents wrong computation of maximum number of elements for that case. Differential Revision: https://reviews.llvm.org/D93192	2020-12-14 15:51:43 +03:00
Kazu Hirata	5891ad4e22	[Transforms] Use llvm::erase_value (NFC)	2020-12-13 09:48:47 -08:00
Florian Hahn	533f85767c	[VPlan] Use interleaveComma in printOperands() (NFC).	2020-12-13 16:29:16 +00:00
Roman Lebedev	d38205144f	[SimplifyCFG] FoldBranchToCommonDest(): bonus instrns must only be used by PHI nodes in successors (PR48450) In particular, if the successor block, which is about to get a new predecessor block, currently only has a single predecessor, then the bonus instructions will be directly used within said successor, which is fine, since the block with bonus instructions dominates that successor. But once there's a new predecessor, the IR is no longer valid, and we don't fix it, because we only update PHI nodes. Which means, the live-out bonus instructions must be exclusively used by the PHI nodes in successor blocks. So we have to form trivial PHI nodes. which will then be successfully updated to recieve cloned bonus instns. This all works fine, except for the fact that we don't have access to the dominator tree, and we don't ignore unreachable code, so we sometimes do end up having to deal with some weird IR. Fixes https://bugs.llvm.org/show_bug.cgi?id=48450	2020-12-13 00:06:57 +03:00
Nikita Popov	afbb6d97b5	[CVP] Simplify and generalize switch handling CVP currently handles switches by checking an equality predicate on all edges from predecessor blocks. Of course, this can only work if the value being switched over is defined in a different block. Replace this implementation with a call to getPredicateAt(), which also does the predecessor edge predicate check (if not defined in the same block), but can also do quite a bit more: It can reason about phi-nodes by checking edge predicates for incoming values, it can reason about assumes, and it can reason about block values. As such, this makes the implementation both simpler and more powerful. The compile-time impact on CTMark is in the noise.	2020-12-12 21:12:27 +01:00
Kazu Hirata	215c1b1935	[Transforms] Use is_contained (NFC)	2020-12-12 09:37:49 -08:00
David Green	ab97c9bdb7	[LV] Fix scalar cost for tail predicated loops When it comes to the scalar cost of any predicated block, the loop vectorizer by default regards this predication as a sign that it is looking at an if-conversion and divides the scalar cost of the block by 2, assuming it would only be executed half the time. This however makes no sense if the predication has been introduced to tail predicate the loop. Original patch by Anna Welker Differential Revision: https://reviews.llvm.org/D86452	2020-12-12 14:21:40 +00:00
Fangrui Song	b5ad32ef5c	Migrate deprecated DebugLoc::get to DILocation::get This migrates all LLVM (except Kaleidoscope and CodeGen/StackProtector.cpp) DebugLoc::get to DILocation::get. The CodeGen/StackProtector.cpp usage may have a nullptr Scope and can trigger an assertion failure, so I don't migrate it. Reviewed By: #debug-info, dblaikie Differential Revision: https://reviews.llvm.org/D93087	2020-12-11 12:45:22 -08:00
Marco Elver	c28b18af19	[KernelAddressSanitizer] Fix globals exclusion for indirect aliases GlobalAlias::getAliasee() may not always point directly to a GlobalVariable. In such cases, try to find the canonical GlobalVariable that the alias refers to. Link: https://github.com/ClangBuiltLinux/linux/issues/1208 Reviewed By: dvyukov, nickdesaulniers Differential Revision: https://reviews.llvm.org/D92846	2020-12-11 12:20:40 +01:00
David Sherwood	9b76160e53	[Support] Introduce a new InstructionCost class This is the first in a series of patches that attempts to migrate existing cost instructions to return a new InstructionCost class in place of a simple integer. This new class is intended to be as light-weight and simple as possible, with a full range of arithmetic and comparison operators that largely mirror the same sets of operations on basic types, such as integers. The main advantage to using an InstructionCost is that it can encode a particular cost state in addition to a value. The initial implementation only has two states - Normal and Invalid - but these could be expanded over time if necessary. An invalid state can be used to represent an unknown cost or an instruction that is prohibitively expensive. This patch adds the new class and changes the getInstructionCost interface to return the new class. Other cost functions, such as getUserCost, etc., will be migrated in future patches as I believe this to be less disruptive. One benefit of this new class is that it provides a way to unify many of the magic costs in the codebase where the cost is set to a deliberately high number to prevent optimisations taking place, e.g. vectorization. It also provides a route to represent the extremely high, and unknown, cost of scalarization of scalable vectors, which is not currently supported. Differential Revision: https://reviews.llvm.org/D91174	2020-12-11 08:12:54 +00:00
Hongtao Yu	705a4c149d	[CSSPGO] Pseudo probe encoding and emission. This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. ELF object emission The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. Assembling Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq %rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` Disassembling* We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878	2020-12-10 17:29:28 -08:00
Mitch Phillips	7ead5f5aa3	Revert "[CSSPGO] Pseudo probe encoding and emission." This reverts commit `b035513c06`. Reason: Broke the ASan buildbots: http://lab.llvm.org:8011/#/builders/5/builds/2269	2020-12-10 15:53:39 -08:00
Zequan Wu	b5216b2950	[PGO] Enable preinline and cleanup when optimize for size Differential Revision: https://reviews.llvm.org/D91673	2020-12-10 12:29:17 -08:00
Sanjay Patel	4f051fe374	[InstCombine] avoid crash sinking to unreachable block The test is reduced from the example in D82005. Similar to `94f6d365e`, the test here would assert in the DomTree when we tried to convert a select to a phi with an unreachable block operand. We may want to add some kind of guard code in DomTree itself to avoid this sort of problem.	2020-12-10 13:10:26 -05:00
Sanjay Patel	12b684ae02	[VectorCombine] improve readability; NFC If we are going to allow adjusting the pointer for GEPs, rearranging the code a bit will make it easier to follow.	2020-12-10 13:10:26 -05:00
Hongtao Yu	b035513c06	[CSSPGO] Pseudo probe encoding and emission. This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. ELF object emission The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. Assembling Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq %rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` Disassembling* We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878	2020-12-10 09:50:08 -08:00
Jun Ma	137674f882	[TruncInstCombine] Remove scalable vector restriction Differential Revision: https://reviews.llvm.org/D92819	2020-12-10 18:00:19 +08:00
Jianzhou Zhao	ea981165a4	[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char, int> get_token(char p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************ Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. ************************* * An potential implementation plan ************************* The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. ************* * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert\|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261	2020-12-09 19:38:35 +00:00
Sanjay Patel	b2ef264096	[VectorCombine] allow peeking through an extractelt when creating a vector load This is an enhancement to load vectorization that is motivated by a pattern in https://llvm.org/PR16739. Unfortunately, it's still not enough to make a difference there. We will have to handle multi-use cases in some better way to avoid creating multiple overlapping loads. Differential Revision: https://reviews.llvm.org/D92858	2020-12-09 10:36:14 -05:00
Roman Lebedev	e6f2a79d7a	[InstCombine] canonicalizeSaturatedAdd(): last fold is only valid for strict comparison (PR48390) We could create uadd.sat under incorrect circumstances if a select with -1 as the false value was canonicalized by swapping the T/F values. Unlike the other transforms in the same function, it is not invariant to equality. Some alive proofs: https://alive2.llvm.org/ce/z/emmKKL Based on original patch by David Green! Fixes https://bugs.llvm.org/show_bug.cgi?id=48390 Differential Revision: https://reviews.llvm.org/D92717	2020-12-09 18:19:09 +03:00
Anton Afanasyev	e5bf2e8989	[SLP] Use the width of value truncated just before storing For stores chain vectorization we choose the size of vector elements to ensure we fit to minimum and maximum vector register size for the number of elements given. This patch corrects vector element size choosing the width of value truncated just before storing instead of the width of value stored. Fixes PR46983 Differential Revision: https://reviews.llvm.org/D92824	2020-12-09 16:38:45 +03:00
Sander de Smalen	d568cff696	[LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF. * Steps are scaled by `vscale`, a runtime value. * Changes to circumvent the cost-model for now (temporary) so that the cost-model can be implemented separately. This can vectorize the following loop [1]: void loop(int N, double a, double b) { #pragma clang loop vectorize_width(4, scalable) for (int i = 0; i < N; i++) { a[i] = b[i] + 1.0; } } [1] This source-level example is based on the pragma proposed separately in D89031. This patch only implements the LLVM part. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91077	2020-12-09 11:25:21 +00:00
Sander de Smalen	adc37145de	[LoopVectorizer] NFC: Remove unnecessary asserts that VF cannot be scalable. This patch removes a number of asserts that VF is not scalable, even though the code where this assert lives does nothing that prevents VF being scalable. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91060	2020-12-09 11:25:21 +00:00
Joe Ellis	80c33de2d3	[SelectionDAG] Add llvm.vector.{extract,insert} intrinsics This commit adds two new intrinsics. - llvm.experimental.vector.insert: used to insert a vector into another vector starting at a given index. - llvm.experimental.vector.extract: used to extract a subvector from a larger vector starting from a given index. The codegen work for these intrinsics has already been completed; this commit is simply exposing the existing ISD nodes to LLVM IR. Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D91362	2020-12-09 11:08:41 +00:00
Philip Reames	5171b7b40e	[indvars] Common a bit of code [NFC]	2020-12-08 15:25:48 -08:00
Anna Thomas	29356e3279	[ScalarizeMaskedMemIntrin] Add new PM support This patch adds new PM support for the pass and the pass can be now used during middle-end transforms. The old pass is remamed to ScalarizeMaskedMemIntrinLegacyPass. Reviewed-By: skatkov, aeubanks Differential Revision: https://reviews.llvm.org/D92743	2020-12-08 17:15:22 -05:00
Benjamin Kramer	5f18e2f31e	Move createScalarizeMaskedMemIntrinPass to Scalar.h	2020-12-08 19:08:09 +01:00
Benjamin Kramer	10987e30be	Remove unused include. NFC. This is also a layering violation.	2020-12-08 19:03:56 +01:00
Anna Thomas	09f2f9605f	[ScalarizeMaskedMemIntrinsic] Move from CodeGen into Transforms ScalarizeMaskedMemIntrinsic is currently a codeGen level pass. The pass is actually operating on IR level and does not use any code gen specific passes. It is useful to move it into transforms directory so that it can be more widely used as a mid-level transform as well (apart from usage in codegen pipeline). In particular, we have a usecase downstream where we would like to use this pass in our mid-level pipeline which operates on IR level. The next change will be to add support for new PM. Reviewers: craig.topper, apilipenko, skatkov Reviewed-By: skatkov Differential Revision: https://reviews.llvm.org/D92407	2020-12-08 12:25:58 -05:00
Xun Li	31e60b9133	[coroutine] should disable inline before calling coro split This is a rework of D85812, which didn't land. When callee coroutine function is inlined into caller coroutine function before coro-split pass, llvm will emits "coroutine should have exactly one defining @llvm.coro.begin". It seems that coro-early pass can not handle this quiet well. So we believe that unsplited coroutine function should not be inlined. This patch fix such issue by not inlining function if it has attribute "coroutine.presplit" (it means the function has not been splited) to fix this issue test plan: check-llvm, check-clang In D85812, there was suggestions on moving the macros to Attributes.td to avoid circular header dependency issue. I believe it's not worth doing just to be able to use one constant string in one place. Today, there are already 3 possible attribute values for "coroutine.presplit": `c6543cc6b8/llvm/lib/Transforms/Coroutines/CoroInternal.h (L40-L42)` If we move them into Attributes.td, we would be adding 3 new attributes to EnumAttr, just to support this, which I think is an overkill. Instead, I think the best way to do this is to add an API in Function class that checks whether this function is a coroutine, by checking the attribute by name directly. Differential Revision: https://reviews.llvm.org/D92706	2020-12-08 08:53:08 -08:00
Teresa Johnson	77b509710c	[ICP] Don't promote when target not defined in module This guards against cases where the symbol was dead code eliminated in the binary by ThinLTO, and we have a sample profile collected for one binary but used to optimize another. Most of the benefit from ICP comes from inlining the target, which we can't do with only a declaration anyway. If this is in the pre-ThinLTO link step (e.g. for instrumentation based PGO), we will attempt the promotion again in the ThinLTO backend after importing anyway, and we don't need the early promotion to facilitate that. Differential Revision: https://reviews.llvm.org/D92804	2020-12-08 07:45:36 -08:00
Sjoerd Meijer	1e260f955d	[LICM][docs] Document that LICM is also a canonicalization transform. NFC. This documents that LICM is a canonicalization transform, which we discussed recently in: http://lists.llvm.org/pipermail/llvm-dev/2020-December/147184.html but which was also discused earlier, e.g. in: http://lists.llvm.org/pipermail/llvm-dev/2019-September/135058.html	2020-12-08 11:56:35 +00:00
Evgeniy Brevnov	2d1b024d06	[DSE][NFC] Need to be carefull mixing signed and unsigned types Currently in some places we use signed type to represent size of an access and put explicit casts from unsigned to signed. For example: int64_t EarlierSize = int64_t(Loc.Size.getValue()); Even though it doesn't loos bits (immidiatly) it may overflow and we end up with negative size. Potentially that cause later code to work incorrectly. A simple expample is a check that size is not negative. I think it would be safer and clearer if we use unsigned type for the size and handle it appropriately. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D92648	2020-12-08 16:53:37 +07:00
Valentin Churavy	700cf7dcc9	[VNCoercion] Disallow coercion between different ni addrspaces I'm not sure if it would be legal by the IR reference to introduce an addrspacecast here, since the IR reference is a bit vague on the exact semantics, but at least for our usage of it (and I suspect for many other's usage) it is not. For us, addrspacecasts between non-integral address spaces carry frontend information that the optimizer cannot deduce afterwards in a generic way (though we have frontend specific passes in our pipline that do propagate these). In any case, I'm sure nobody is using it this way at the moment, since it would have introduced inttoptrs, which are definitely illegal. Fixes PR38375 Co-authored-by: Keno Fischer <keno@alumni.harvard.edu> Reviewed By: reames Differential Revision: https://reviews.llvm.org/D50010	2020-12-07 20:19:48 -05:00
Sanjay Patel	5fe1a49f96	[SLP] fix typo in debug string; NFC	2020-12-07 15:09:21 -05:00
Bardia Mahjour	4db9b78c81	[LV] Epilogue Vectorization with Optimal Control Flow - Default Enablement This patch enables epilogue vectorization by default per reviewer requests. Differential Revision: https://reviews.llvm.org/D89566	2020-12-07 14:29:36 -05:00
Florian Hahn	32825e8636	[ConstraintElimination] Tweak placement in pipeline. This patch adds the ConstraintElimination pass to the LTO pipeline and also runs it after SCCP in the function simplification pipeline. This increases the number of cases we can elimination. Pending further tuning.	2020-12-07 19:08:40 +00:00
Simon Pilgrim	50dd1dba6e	[IPO] Fix operator precedence warning. NFCI. Check the entire assertion condition before && with the message.	2020-12-07 18:23:54 +00:00
Alexey Bataev	438682de6a	[SLP]Merge reorder and reuse shuffles. It is possible to merge reuse and reorder shuffles and reduce the total cost of the ivectorization tree/number of final instructions. Differential Revision: https://reviews.llvm.org/D92668	2020-12-07 07:50:00 -08:00
Jun Ma	216689ace7	[Coroutines] Add DW_OP_deref for transformed dbg.value intrinsic. Differential Revision: https://reviews.llvm.org/D92462	2020-12-07 10:24:44 +08:00
Craig Topper	305fcc9122	[LoopIdiomRecognize] Merge a conditional operator with an earlier if and remove an extra temporary variable. NFC The CountPrev variable was only used to forward a value from the if statement to the conditional operator under the same condition. While there move some variable declarations to their first assignment.	2020-12-06 15:23:18 -08:00
Fangrui Song	2832f3528c	[Transforms] Delete unused declarations from NewGVN/CoroSplit/ValueMapper	2020-12-06 13:04:01 -08:00
Wenlei He	6b989a1710	[CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining This change adds the context-senstive sample PGO infracture described in CSSPGO RFC (https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s). It introduced an abstraction between input profile and profile loader that queries input profile for functions. Specifically, there's now the notion of base profile and context profile, and they are managed by the new SampleContextTracker for adjusting and merging profiles based on inline decisions. It works with top-down profiled guided inliner in profile loader (https://reviews.llvm.org/D70655) for better inlining with specialization and better post-inline profile fidelity. In the future, we can also expose this infrastructure to CGSCC inliner in order for it to take advantage of context-sensitive profile. This change is the consumption part of context-sensitive profile (The generation part is in this stack: https://reviews.llvm.org/D89707). We've seen good results internally in conjunction with Pseudo-probe (https://reviews.llvm.org/D86193). Pacthes for integration with Pseudo-probe coming up soon. Currently the new infrastructure kick in when input profile contains the new context-sensitive profile; otherwise it's no-op and does not affect existing AutoFDO. Interface There're two sets of interfaces for query and tracking respectively exposed from SampleContextTracker. For query, now instead of simply getting a profile from input for a function, we can explicitly query base profile or context profile for given call path of a function. For tracking, there're separate APIs for marking context profile as inlined, or promoting and merging not inlined context profile. - Query base profile (`getBaseSamplesFor`) Base profile is the merged synthetic profile for function's CFG profile from any outstanding (not inlined) context. We can query base profile by function. - Query context profile (`getContextSamplesFor`) Context profile is a function's CFG profile for a given calling context. We can query context profile by context string. - Track inlined context profile (`markContextSamplesInlined`) When a function is inlined for given calling context, we need to mark the context profile for that context as inlined. This is to make sure we don't include inlined context profile when synthesizing base profile for that inlined function. - Track not-inlined context profile (`promoteMergeContextSamplesTree`) When a function is not inlined for given calling context, we need to promote the context profile tree so the not inlined context becomes top-level context. This preserve the sub-context under that function so later inline decision for that not inlined function will still have context profile for its call tree. Note that profile will be merged if needed when promoting a context profile tree if any of the node already exists at its promoted destination. Implementation Implementation-wise, `SampleContext` is created as abstraction for context. Currently it's a string for call path, and we can later optimize it to something more efficient, e.g. context id. Each `SampleContext` also has a `ContextState` indicating whether it's raw context profile from input, whether it's inlined or merged, whether it's synthetic profile created by compiler. Each `FunctionSamples` now has a `SampleContext` that tells whether it's base profile or context profile, and for context profile what is the context and state. On top of the above context representation, a custom trie tree is implemented to track and manager context profiles. Specifically, `SampleContextTracker` is implemented that encapsulates a trie tree with `ContextTireNode` as node. Each node of the trie tree represents a frame in calling context, thus the path from root to a node represents a valid calling context. We also track `FunctionSamples` for each node, so this trie tree can serve efficient query for context profile. Accordingly, context profile tree promotion now becomes moving a subtree to be under the root of entire tree, and merge nodes for subtree if this move encounters existing nodes. Integration `SampleContextTracker` is now also integrated with AutoFDO, `SampleProfileReader` and `SampleProfileLoader`. When we detected input profile contains context-sensitive profile, `SampleContextTracker` will be used to track profiles, and all profile query will go to `SampleContextTracker` instead of `SampleProfileReader` automatically. Tracking APIs are called automatically for each inline decision from `SampleProfileLoader`. Differential Revision: https://reviews.llvm.org/D90125	2020-12-06 11:49:18 -08:00
Kazu Hirata	ddb002d7c7	[InstCombine] Remove replacePointer (NFC) The declaration was introduced on Feb 10, 2017 in commit `ba01ed00fe` without a corresponding definition.	2020-12-06 10:24:08 -08:00
Sanjay Patel	94f6d365e4	[InstCombine] avoid crash on phi with unreachable incoming block (PR48369)	2020-12-06 09:31:47 -05:00
Fangrui Song	204d0d51b3	[MemProf] Make __memprof_shadow_memory_dynamic_address dso_local in static relocation model The x86-64 backend currently has a bug which uses a wrong register when for the GOTPCREL reference. The program will crash without the dso_local specifier.	2020-12-05 21:36:31 -08:00
Florian Hahn	4ceecc820b	[ConstraintElimination] Handle constraints with all zero var coeffs. Constraints where all variable coefficients are 0 do not add any useful information. When checking, we can check if they are always true/false.	2020-12-05 12:06:53 +00:00
Kazu Hirata	8006043b13	[IRCE] Remove unused IsSigned and its accessor (NFC) IsSigned and its accessor, isSigned, were introduced on Oct 25, 2017 in commit `9ac7021a25`. The last use was removed on Nov 20, 2017 in commit `268467869b`.	2020-12-04 21:26:12 -08:00
Jianzhou Zhao	a28db8b27a	[dfsan] Add empty APIs for field-level shadow This is a child diff of D92261. This diff adds APIs that return shadow type/value/zero from origin objects. For the time being these APIs simply returns primitive shadow type/value/zero. The following diff will be implementing the conversion. As D92261 explains, some cases still use primitive shadow during the incremential changes. The cases include 1) alloca/load/store 2) custom function IO 3) vectors At the cases this diff does not use the new APIs, but uses primitive shadow objects explicitly. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92629	2020-12-04 21:42:07 +00:00
Duncan P. N. Exon Smith	d10f9863a5	ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC Prepare to delete `AlignedCharArrayUnion` by migrating its users over to `std::aligned_union_t`. I will delete `AlignedCharArrayUnion` and its tests in a follow-up commit so that it's easier to revert in isolation in case some downstream wants to keep using it. Differential Revision: https://reviews.llvm.org/D92516	2020-12-04 12:34:49 -08:00
Duncan P. N. Exon Smith	5b267fb796	ADT: Stop peeking inside AlignedCharArrayUnion, NFC Update all the users of `AlignedCharArrayUnion` to stop peeking inside (to look at `buffer`) so that a follow-up patch can replace it with an alias to `std::aligned_union_t`. This was reviewed as part of https://reviews.llvm.org/D92512, but I'm splitting this bit out to commit first to reduce churn in case the change to `AlignedCharArrayUnion` needs to be reverted for some unexpected reason.	2020-12-04 11:07:42 -08:00
Hiroshi Yamauchi	f9c3954a6e	Fix for Bug 48055. Differential Revision: https://reviews.llvm.org/D92599	2020-12-04 11:05:01 -08:00
Arthur Eubanks	7f6f9f4cf9	[NewPM] Make pass adaptors less templatey Currently PassBuilder.cpp is by far the file that takes longest to compile. This is due to tons of templates being instantiated per pass. Follow PassManager by using wrappers around passes to avoid making the adaptors templated on the pass type. This allows us to move various adaptors' run methods into .cpp files. This reduces the compile time of PassBuilder.cpp on my machine from 66 to 39 seconds. It also reduces the size of opt from 685M to 676M. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D92616	2020-12-04 08:30:50 -08:00
Evgeniy Brevnov	061cebb46f	[NFC][NARY-REASSOCIATE] Restructure code to aviod isPotentiallyReassociatable Currently we have to duplicate the same checks in isPotentiallyReassociatable and tryReassociate. With simple pattern like add/mul this may be not a big deal. But the situation gets much worse when I try to add support for min/max. Min/Max may be represented by several instructions and can take different forms. In order reduce complexity for upcoming min/max support we need to restructure the code a bit to avoid mentioned code duplication. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88286	2020-12-04 16:19:43 +07:00
Evgeniy Brevnov	f61c29b3a7	[NARY-REASSOCIATE] Simplify traversal logic by post deleting dead instructions Currently we delete optimized instructions as we go. That has several negative consequences. First it complicates traversal logic itself. Second if newly generated instruction has been deleted the traversal is repeated from scratch. But real motivation for the change is upcoming change with support for min/max reassociation. Here we employ SCEV expander to generate code. As a result newly generated instructions may be inserted not right before original instruction (because SCEV may do hoisting) and there is no way to know 'next' instruction. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88285	2020-12-04 16:17:50 +07:00
Kazu Hirata	e2fc11cf9f	[JumpThreading] Call eraseBlock when folding a conditional branch This patch teaches the jump threading pass to call BPI->eraseBlock when it folds a conditional branch. Without this patch, BranchProbabilityInfo could end up with stale edge probabilities for the basic block containing the conditional branch -- one edge probability with less than 1.0 and the other for a removed edge. Differential Revision: https://reviews.llvm.org/D92608	2020-12-03 23:50:17 -08:00
Max Kazantsev	12b6c5e682	Return "[IndVars] ICmpInst should not prevent IV widening" This reverts commit `4bd35cdc3a`. The patch was reverted during the investigation. The investigation shown that the patch did not cause any trouble, but just exposed the existing problem that is addressed by the previous patch "[IndVars] Quick fix LHS/RHS bug". Returning without changes.	2020-12-04 12:34:43 +07:00
Max Kazantsev	3df0daceb2	[IndVars] Quick fix LHS/RHS bug The code relies on fact that LHS is the NarrowDef but never really checks it. Adding the conservative restrictive check, will follow-up with handling of case where RHS is a NarrowDef.	2020-12-04 12:34:42 +07:00
Jianzhou Zhao	80e326a8c4	[dfsan] Support passing non-i16 shadow values in TLS mode This is a child diff of D92261. It extended TLS arg/ret to work with aggregate types. For a function t foo(t1 a1, t2 a2, ... tn an) Its arguments shadow are saved in TLS args like a1_s, a2_s, ..., an_s TLS ret simply includes r_s. By calculating the type size of each shadow value, we can get their offset. This is similar to what MSan does. See __msan_retval_tls and __msan_param_tls from llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp. Note that this change does not add test cases for overflowed TLS arg/ret because this is hard to test w/o supporting aggregate shdow types. We will be adding them after supporting that. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92440	2020-12-04 02:45:07 +00:00
Philip Reames	0c866a3d6a	[LoopVec] Support non-instructions as argument to uniform mem ops The initial step of the uniform-after-vectorization (lane-0 demanded only) analysis was very awkwardly written. It would revisit use list of each pointer operand of a widened load/store. As a result, it was in the worst case O(N^2) where N was the number of instructions in a loop, and had restricted operand Value types to reduce the size of use lists. This patch replaces the original algorithm with one which is at most O(2N) in the number of instructions in the loop. (The key observation is that each use of a potentially interesting pointer is visited at most twice, once on first scan, once in the use list of it's operand. Only instructions within the loop have their uses scanned.) In the process, we remove a restriction which required the operand of the uniform mem op to itself be an instruction. This allows detection of uniform mem ops involving global addresses. Differential Revision: https://reviews.llvm.org/D92056	2020-12-03 14:51:44 -08:00
dfukalov	2ce38b3f03	[NFC] Reduce include files dependency. 1. Removed #include "...AliasAnalysis.h" in other headers and modules. 2. Cleaned up includes in AliasAnalysis.h. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92489	2020-12-03 18:25:05 +03:00
Max Kazantsev	4bd35cdc3a	Revert "[IndVars] ICmpInst should not prevent IV widening" This reverts commit `0c9c6ddf17`. We are seeing some failures with this patch locally. Not clear if it's causing them or just triggering a problem in another place. Reverting while investigating.	2020-12-03 18:01:41 +07:00
modimo	c1ba991e8d	[NFC] Fix typo	2020-12-02 22:23:57 -08:00
Jianzhou Zhao	bd726d2796	[dfsan] Rename ShadowTy/ZeroShadow with prefix Primitive This is a child diff of D92261. After supporting field/index-level shadow, the existing shadow with type i16 works for only primitive types. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92459	2020-12-03 05:31:01 +00:00
Florian Hahn	2304528bb5	[ConstraintElimination] Make sure arguments of std:pow match. This should fix a build failure on some systems, e.g. solaris11-sparcv9 http://lab.llvm.org:8014/#/builders/22	2020-12-02 22:23:26 +00:00
Hongtao Yu	24d4291ca7	[CSSPGO] Pseudo probes for function calls. An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work. One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution. With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst. To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use. Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`. Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91756	2020-12-02 13:45:20 -08:00
Jianzhou Zhao	dad5d95883	[dfsan] Rename CachedCombinedShadow to be CachedShadow At D92261, this type will be used to cache both combined shadow and converted shadow values. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92458	2020-12-02 21:39:16 +00:00
jasonliu	a65d8c5d72	[XCOFF][AIX] Generate LSDA data and compact unwind section on AIX Summary: AIX uses the existing EH infrastructure in clang and llvm. The major differences would be 1. AIX do not have CFI instructions. 2. AIX uses a new personality routine, named __xlcxx_personality_v1. It doesn't use the GCC personality rountine, because the interoperability is not there yet on AIX. 3. AIX do not use eh_frame sections. Instead, it would use a eh_info section (compat unwind section) to store the information about personality routine and LSDA data address. Reviewed By: daltenty, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D91455	2020-12-02 18:42:44 +00:00
Bardia Mahjour	a7e2c26939	[LV] Epilogue Vectorization with Optimal Control Flow (Recommit) This is yet another attempt at providing support for epilogue vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none and reviews D30247 and D88819. Similar to D88819, this patch achieve epilogue vectorization by executing a single vplan twice: once on the main loop and a second time on the epilogue loop (using a different VF). However it's able to handle more loops, and generates more optimal control flow for cases where the trip count is too small to execute any code in vector form. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D89566	2020-12-02 10:09:56 -05:00
Sanjay Patel	56fd29e93b	[SLP] use 'match' for binop/select; NFC This might be a small improvement in readability, but the real motivation is to make it easier to adapt the code to deal with intrinsics like 'maxnum' and/or integer min/max. There is potentially help in doing that with D92086, but we might also just add specialized wrappers here to deal with the expected patterns.	2020-12-02 09:04:08 -05:00
Alex Zinenko	240dd92432	[OpenMPIRBuilder] forward arguments as pointers to outlined function OpenMPIRBuilder::createParallel outlines the body region of the parallel construct into a new function that accepts any value previously defined outside the region as a function argument. This function is called back by OpenMP runtime function __kmpc_fork_call, which expects trailing arguments to be pointers. If the region uses a value that is not of a pointer type, e.g. a struct, the produced code would be invalid. In such cases, make createParallel emit IR that stores the value on stack and pass the pointer to the outlined function instead. The outlined function then loads the value back and uses as normal. Reviewed By: jdoerfert, llitchev Differential Revision: https://reviews.llvm.org/D92189	2020-12-02 14:59:41 +01:00
David Sherwood	71bd59f0cb	[SVE] Add support for scalable vectors with vectorize.scalable.enable loop attribute In this patch I have added support for a new loop hint called vectorize.scalable.enable that says whether we should enable scalable vectorization or not. If a user wants to instruct the compiler to vectorize a loop with scalable vectors they can now do this as follows: br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2 ... !2 = !{!2, !3, !4} !3 = !{!"llvm.loop.vectorize.width", i32 8} !4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true} Setting the hint to false simply reverts the behaviour back to the default, using fixed width vectors. Differential Revision: https://reviews.llvm.org/D88962	2020-12-02 13:23:43 +00:00
Chen Zheng	3cb7d62452	[LSR][NFC] don't collect chains when isNumRegsMajorCostOfLSR is false. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D92159	2020-12-01 22:29:33 -05:00
Jianzhou Zhao	405ea2b93d	[msan] Replace 8 by kShadowTLSAlignment Reviewed-by: eugenis Differential Revision: https://reviews.llvm.org/D92275	2020-12-02 01:09:49 +00:00
Fangrui Song	a5309438fe	static const char *const foo => const char foo[] By default, a non-template variable of non-volatile const-qualified type having namespace-scope has internal linkage, so no need for `static`.	2020-12-01 10:33:18 -08:00
Bardia Mahjour	c94af03f7f	Revert "[LV] Epilogue Vectorization with Optimal Control Flow" This reverts commit `9c5504adce`. Reverting to investigate build failure in http://lab.llvm.org:8011/#/builders/98/builds/1461/steps/9	2020-12-01 12:50:36 -05:00
Bardia Mahjour	9c5504adce	[LV] Epilogue Vectorization with Optimal Control Flow This is yet another attempt at providing support for epilogue vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none and reviews D30247 and D88819. Similar to D88819, this patch achieve epilogue vectorization by executing a single vplan twice: once on the main loop and a second time on the epilogue loop (using a different VF). However it's able to handle more loops, and generates more optimal control flow for cases where the trip count is too small to execute any code in vector form. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D89566	2020-12-01 12:04:29 -05:00
Nikita Popov	624af932a8	[MemCpyOpt] Port to MemorySSA This is a straightforward port of MemCpyOpt to MemorySSA following the approach of D26739. MemDep queries are replaced with MSSA queries without changing the overall structure of the pass. Some care has to be taken to account for differences between these APIs (MemDep also returns reads, MSSA doesn't). Differential Revision: https://reviews.llvm.org/D89207	2020-12-01 17:57:41 +01:00
Clement Courbet	735e6c888e	[MergeICmps] Fix missing split. We were not correctly splitting a blocks for chains of length 1. Before that change, additional instructions for blocks in chains of length 1 were not split off from the block before removing (this was done correctly for chains of longer size). If this first block contained an instruction referenced elsewhere, deleting the block, would result in invalidation of the produced value. This caused a miscompile which motivated D92297 (before D17993, nonnull and dereferenceable attributed were not added so MergeICmps were not triggered.) The new test gep-references-bb.ll demonstrate the issue. The regression was introduced in rG0efadbbcdeb82f5c14f38fbc2826107063ca48b2. This supersedes D92364. Test case by MaskRay (Fangrui Song). Differential Revision: https://reviews.llvm.org/D92375	2020-12-01 16:50:55 +01:00
Sanjay Patel	9f60b8b3d2	[InstCombine] canonicalize sign-bit-shift of difference to ext(icmp) icmp is the preferred spelling in IR because icmp analysis is expected to be better than any other analysis. This should lead to more follow-on folding potential. It's difficult to say exactly what we should do in codegen to compensate. For example on AArch64, which of these is preferred: sub w8, w0, w1 lsr w0, w8, #31 vs: cmp w0, w1 cset w0, lt If there are perf regressions, then we should deal with those in codegen on a case-by-case basis. A possible motivating example for better optimization is shown in: https://llvm.org/PR43198 but that will require other transforms before anything changes there. Alive proof: https://rise4fun.com/Alive/o4E Name: sign-bit splat Pre: C1 == (width(%x) - 1) %s = sub nsw %x, %y %r = ashr %s, C1 => %c = icmp slt %x, %y %r = sext %c Name: sign-bit LSB Pre: C1 == (width(%x) - 1) %s = sub nsw %x, %y %r = lshr %s, C1 => %c = icmp slt %x, %y %r = zext %c	2020-12-01 09:58:11 -05:00
Florian Hahn	7a4f1d59b8	[ConstraintElimination] Decompose GEP %ptr, ZEXT(SHL()). Add support to decompose a GEP with a ZEXT(SHL()) operand.	2020-12-01 14:23:21 +00:00
Bhramar Vatsa	fd679107d6	[InstCombine] Optimize away the unnecessary multi-use sign-extend C.f. https://bugs.llvm.org/show_bug.cgi?id=47765 Added a case for handling the sign-extend (Shl+AShr) for multiple uses, to optimize it away for an individual use, when the demanded bits aren't affected by sign-extend. https://rise4fun.com/Alive/lgf Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D91343	2020-12-01 16:54:00 +03:00
Roman Lebedev	94ead0190f	[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold, 2 If the shift amount was undef for some lane, the shift amount in opposite shift is irrelevant for that lane, and the new shift amount for that lane can be undef.	2020-12-01 16:54:00 +03:00
Roman Lebedev	52533b52b8	Revert "[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold" It seems i have missed checklines, temporairly reverting, will reland momentairly.. This reverts commit `aa1aa13509`.	2020-12-01 15:47:04 +03:00
Roman Lebedev	aa1aa13509	[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold If the shift amount was undef for some lane, the shift amount in opposite shift is irrelevant for that lane, and the new shift amount for that lane can be undef.	2020-12-01 15:13:08 +03:00
Roman Lebedev	8e29e20e0d	[InstCombine] Evaluate new shift amount for sext(ashr(shl(trunc()))) fold in wide type (PR48343) It is not correct to compute that new shift amount in it's narrow type and only then extend it into the wide type: ---------------------------------------- Optimization: PR48343 good Precondition: (width(%X) == width(%r)) %o0 = trunc %X %o1 = shl %o0, %Y %o2 = ashr %o1, %Y %r = sext %o2 => %n0 = sext %Y %n1 = sub width(%o0), %n0 %n2 = sub width(%X), %n1 %n3 = shl %X, %n2 %r = ashr %n3, %n2 Done: 2016 Optimization is correct! ---------------------------------------- Optimization: PR48343 bad Precondition: (width(%X) == width(%r)) %o0 = trunc %X %o1 = shl %o0, %Y %o2 = ashr %o1, %Y %r = sext %o2 => %n0 = sub width(%o0), %Y %n1 = sub width(%X), %n0 %n2 = sext %n1 %n3 = shl %X, %n2 %r = ashr %n3, %n2 Done: 1 ERROR: Domain of definedness of Target is smaller than Source's for i9 %r Example: %X i9 = 0x000 (0) %Y i4 = 0x3 (3) %o0 i4 = 0x0 (0) %o1 i4 = 0x0 (0) %o2 i4 = 0x0 (0) %n0 i4 = 0x1 (1) %n1 i4 = 0x8 (8, -8) %n2 i9 = 0x1F8 (504, -8) %n3 i9 = 0x000 (0) Source value: 0x000 (0) Target value: undef I.e. we should be computing it in the wide type from the beginning. Fixes https://bugs.llvm.org/show_bug.cgi?id=48343	2020-12-01 15:13:07 +03:00
Roman Lebedev	15f8060f6f	[SimplifyCFG] FoldBranchToCommonDest: don't require that cmp of br is last instruction There is no correctness need for that, and since we allow live-out uses, this could theoretically happen, because currently nothing will move the cond to right before the branch in those tests. But regardless, lifting that restriction even makes the transform easier to understand. This makes the transform happen in 81 more cases (+0.55%) )	2020-12-01 15:13:06 +03:00
Cullen Rhodes	cba4accda0	[LV] Clamp VF hint when unsafe In the following loop the dependence distance is 2 and can only be vectorized if the vector length is no larger than this. void foo(int a, int b, int N) { #pragma clang loop vectorize(enable) vectorize_width(4) for (int i=0; i<N; ++i) { a[i + 2] = a[i] + b[i]; } } However, when specifying a VF of 4 via a loop hint this loop is vectorized. According to [1][2], loop hints are ignored if the optimization is not safe to apply. This patch introduces a check to bail of vectorization if the user specified VF is greater than the maximum feasible VF, unless explicitly forced with '-force-vector-width=X'. [1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave [2] https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations Reviewed By: sdesmalen, fhahn, Meinersbur Differential Revision: https://reviews.llvm.org/D90687	2020-12-01 11:30:34 +00:00
Caroline Concatto	4b0ef2b075	[NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type This patch replaces the attribute `unsigned VF` in the class IntrinsicCostAttributes by `ElementCount VF`. This is a non-functional change to help upcoming patches to compute the cost model for scalable vector inside this class. Differential Revision: https://reviews.llvm.org/D91532	2020-12-01 11:12:51 +00:00
Florian Hahn	efa9728a50	[ConstraintElimination] Decompose GEP %ptr, SHL(). Add support the decompose a GEP with an SHL operand.	2020-12-01 10:58:36 +00:00
Sjoerd Meijer	f44ba25135	ExtractValue instruction costs Instruction ExtractValue wasn't handled in LoopVectorizationCostModel::getInstructionCost(). As a result, it was modeled as a mul which is not really accurate. Since it is free (most of the times), this now gets a cost of 0 using getInstructionCost. This is a follow-up of D92208, that required changing this regression test. In a follow up I will look at InsertValue which also isn't handled yet. Differential Revision: https://reviews.llvm.org/D92317	2020-12-01 10:42:23 +00:00
Greg Parker	bcc802fa36	[DSE] Remove a redundant call to getLocForWriteEx() Differential Revision: https://reviews.llvm.org/D92263	2020-11-30 21:12:24 -08:00
Mircea Trofin	5fe10263ab	[llvm][inliner] Reuse the inliner pass to implement 'always inliner' Enable performing mandatory inlinings upfront, by reusing the same logic as the full inliner, instead of the AlwaysInliner. This has the following benefits: - reduce code duplication - one inliner codebase - open the opportunity to help the full inliner by performing additional function passes after the mandatory inlinings, but before th full inliner. Performing the mandatory inlinings first simplifies the problem the full inliner needs to solve: less call sites, more contextualization, and, depending on the additional function optimization passes run between the 2 inliners, higher accuracy of cost models / decision policies. Note that this patch does not yet enable much in terms of post-always inline function optimization. Differential Revision: https://reviews.llvm.org/D91567	2020-11-30 12:03:39 -08:00
Hongtao Yu	64fa8cce22	[CSSPGO] Pseudo probe instrumentation pass This change introduces a pseudo probe instrumentation pass for block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. Given the following LLVM IR: ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 br i1 %cmp, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: ret void } ``` The instrumented IR will look like below. Note that each llvm.pseudoprobe intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID. ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86499	2020-11-30 10:16:54 -08:00
Florian Hahn	fe83adb05a	[VPlan] Use VPUser to manage VPPredInstPHIRecipe operand (NFC). VPPredInstPHIRecipe is one of the recipes that was missed during the initial conversion. This patch adjusts the recipe to also manage its operand using VPUser.	2020-11-30 13:09:58 +00:00
Roman Lebedev	b0e9b7c59f	[NFC][SimplifyCFG] Add STATISTIC() to the FoldValueComparisonIntoPredecessors() fold	2020-11-30 12:27:16 +03:00
Max Kazantsev	0c9c6ddf17	[IndVars] ICmpInst should not prevent IV widening If we decided to widen IV with zext, then unsigned comparisons should not prevent widening (same for sext/sign comparisons). The result of comparison in wider type does not change in this case. Differential Revision: https://reviews.llvm.org/D92207 Reviewed By: nikic	2020-11-30 10:51:31 +07:00
Fangrui Song	5408fdcd78	[VPlan] Fix -Wunused-variable after `a813090072`	2020-11-29 10:38:01 -08:00
Florian Hahn	4bc9b909d7	[VPlan] Use VPValue and VPUser ops to print VPReplicateRecipe.	2020-11-29 18:28:27 +00:00
Florian Hahn	a813090072	[VPlan] Manage stored values of interleave groups using VPUser (NFC) Interleave groups also depend on the values they store. Manage the stored values as VPUser operands. This is currently a NFC, but is required to allow VPlan transforms and to manage generated vector values exclusively in VPTransformState.	2020-11-29 17:24:36 +00:00
Andrew Litteken	a8a43b6338	Revert "[IRSim][IROutliner] Adding the extraction basics for the IROutliner." Reverting commit due to address sanitizer errors. > Extracting the similar regions is the first step in the IROutliner. > > Using the IRSimilarityIdentifier, we collect the SimilarityGroups and > sort them by how many instructions will be removed. Each > IRSimilarityCandidate is used to define an OutlinableRegion. Each > region is ordered by their occurrence in the Module and the regions that > are not compatible with previously outlined regions are discarded. > > Each region is then extracted with the CodeExtractor into its own > function. > > We test that correctly extract in: > test/Transforms/IROutliner/extraction.ll > test/Transforms/IROutliner/address-taken.ll > test/Transforms/IROutliner/outlining-same-globals.ll > test/Transforms/IROutliner/outlining-same-constants.ll > test/Transforms/IROutliner/outlining-different-structure.ll > > Reviewers: paquette, jroelofs, yroux > > Differential Revision: https://reviews.llvm.org/D86975 This reverts commit `bf899e8913`.	2020-11-27 19:55:57 -06:00
Andrew Litteken	bf899e8913	[IRSim][IROutliner] Adding the extraction basics for the IROutliner. Extracting the similar regions is the first step in the IROutliner. Using the IRSimilarityIdentifier, we collect the SimilarityGroups and sort them by how many instructions will be removed. Each IRSimilarityCandidate is used to define an OutlinableRegion. Each region is ordered by their occurrence in the Module and the regions that are not compatible with previously outlined regions are discarded. Each region is then extracted with the CodeExtractor into its own function. We test that correctly extract in: test/Transforms/IROutliner/extraction.ll test/Transforms/IROutliner/address-taken.ll test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Reviewers: paquette, jroelofs, yroux Differential Revision: https://reviews.llvm.org/D86975	2020-11-27 19:08:29 -06:00
Florian Hahn	ae008798a4	[VPlan] Use VPTransformState::set in widenGEP. This patch updates widenGEP to manage the resulting vector values using the VPValue of VPWidenGEP recipe.	2020-11-27 17:01:55 +00:00
Francesco Petrogalli	8e0148dff7	[AllocaInst] Update `getAllocationSizeInBits` to return `TypeSize`. Reviewed By: peterwaller-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D92020	2020-11-27 16:39:10 +00:00
Sjoerd Meijer	10ad64aa3b	[SLP] Dump Tree costs. NFC. This adds LLVM_DEBUG messages to dump the (intermediate) tree cost calculations, which is useful to trace and see how the final cost is calculated.	2020-11-27 11:37:33 +00:00
Roman Lebedev	b33fbbaa34	Reland [SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions This was orginally committed in `2245fb8aaa`. but was immediately reverted in `f3abd54958` because of a PHI handling issue. Original commit message: 1. It doesn't make sense to enforce that the bonus instruction is only used once in it's basic block. What matters is whether those user instructions fit within our budget, sure, but that is another question. 2. It doesn't make sense to enforce that said bonus instructions are only used within their basic block. Perhaps the branch condition isn't using the value computed by said bonus instruction, and said bonus instruction is simply being calculated to be used in successors? So iff we can clone bonus instructions, to lift these restrictions, we just need to carefully update their external uses to use the new cloned instructions. Notably, this transform (even without this change) appears to be poison-unsafe as per alive2, but is otherwise (including the patch) legal. We don't introduce any new PHI nodes, but only "move" the instructions around, i'm not really seeing much potential for extra cost modelling for the transform, especially since now we allow at most one such bonus instruction by default. This causes the fold to fire +11.4% more (13216 -> 14725) as of vanilla llvm test-suite + RawSpeed. The motivational pattern is IEEE-754-2008 Binary16->Binary32 extension code: `ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)` ^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM apparently that fold is happening somewhere else afterall, so something else also has a similar 'artificial' restriction.	2020-11-27 12:47:15 +03:00
Wang, Pengfei	8dcf8d1da5	[msan] Fix bugs when instrument x86.avx512_cvt intrinsics. Scalar intrinsics x86.avx512_cvt have an extra rounding mode operand. We can directly ignore it to reuse the SSE/AVX math. This fix the bug https://bugs.llvm.org/show_bug.cgi?id=48298. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D92206	2020-11-27 16:33:14 +08:00
Markus Lavin	808fcfe594	Revert "[DebugInfo] Improve dbg preservation in LSR." This reverts commit `06758c6a61`. Bug: https://bugs.llvm.org/show_bug.cgi?id=48166 Additional discussion in: https://reviews.llvm.org/D91711	2020-11-27 08:52:32 +01:00
Max Kazantsev	faf183874c	[IndVars] LCSSA Phi users should not prevent widening When widening an IndVar that has LCSSA Phi users outside the loop, we can safely widen it as usual and then truncate the result outside the loop without hurting the performance. Differential Revision: https://reviews.llvm.org/D91593 Reviewed By: skatkov	2020-11-27 11:19:54 +07:00
Roman Lebedev	f3abd54958	Revert "[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions" Many bots are unhappy, at the very least missed a few codegen tests, and possibly this has a logic hole inducing a miscompile (will be really awesome to have ready reproducer..) Need to investigate. This reverts commit `2245fb8aaa`.	2020-11-26 23:13:43 +03:00
Roman Lebedev	2245fb8aaa	[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions 1. It doesn't make sense to enforce that the bonus instruction is only used once in it's basic block. What matters is whether those user instructions fit within our budget, sure, but that is another question. 2. It doesn't make sense to enforce that said bonus instructions are only used within their basic block. Perhaps the branch condition isn't using the value computed by said bonus instruction, and said bonus instruction is simply being calculated to be used in successors? So iff we can clone bonus instructions, to lift these restrictions, we just need to carefully update their external uses to use the new cloned instructions. Notably, this transform (even without this change) appears to be poison-unsafe as per alive2, but is otherwise (including the patch) legal. We don't introduce any new PHI nodes, but only "move" the instructions around, i'm not really seeing much potential for extra cost modelling for the transform, especially since now we allow at most one such bonus instruction by default. This causes the fold to fire +11.4% more (13216 -> 14725) as of vanilla llvm test-suite + RawSpeed. The motivational pattern is IEEE-754-2008 Binary16->Binary32 extension code: `ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)` ^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM apparently that fold is happening somewhere else afterall, so something else also has a similar 'artificial' restriction.	2020-11-26 22:51:22 +03:00
Roman Lebedev	65db7d38e0	[NFC][SimplifyCFG] Add statistic to `FoldBranchToCommonDest()` fold	2020-11-26 22:51:21 +03:00
Nikita Popov	4df8efce80	[AA] Split up LocationSize::unknown() Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object). This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses. The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way. Differential Revision: https://reviews.llvm.org/D91649	2020-11-26 18:39:55 +01:00
Florian Hahn	bd0b1311db	[VPlan] Turn VPReplicateRecipe into a VPValue. Update VPReplicateRecipe to inherit from VPValue. This still does not update scalarizeInstruction to set the result for the VPValue of VPReplicateRecipe, because this first requires tracking scalar values in VPTransformState. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D91500	2020-11-26 13:50:24 +00:00
David Stenberg	384996f9e1	[IndVarSimplify] Fix Modified status when handling dead PHI nodes When bailing out in rewriteLoopExitValues() you could be left with PHI nodes in the DeadInsts vector. Those would be not handled by the use of RecursivelyDeleteTriviallyDeadInstructions() in IndVarSimplify. This resulted in the IndVarSimplify pass returning an incorrect modified status. This was caught by the expensive check introduced in D86589. This patches changes IndVarSimplify so that it deletes those PHI nodes, using RecursivelyDeleteDeadPHINode(). This fixes PR47486. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D91153	2020-11-26 14:28:21 +01:00
Zhengyang Liu	345fcccb33	Fix use-of-uninitialized-value in rG75f50e15bf8f Differential Revision: https://reviews.llvm.org/D71126	2020-11-26 01:39:22 -07:00
Max Kazantsev	664e1da485	[LoopLoadElim] Make sure all loops are in simplify form. PR48150 LoopLoadElim may end up expanding an AddRec from a loop which is not the current loop. This loop may not be in simplify form. We figure it out after the no-return point, so cannot bail in this case. AddRec requires simplify form to expand. The only way to ensure this does not crash is to simplify all loops beforehand. The issue only exists in new PM. Old PM requests LoopSimplify required pass and it simplifies all loops before the opt begins. Differential Revision: https://reviews.llvm.org/D91525 Reviewed By: asbirlea, aeubanks	2020-11-26 10:51:11 +07:00
Roman Lebedev	a8d74517dc	[PassManager] Run Induction Variable Simplification pass after Recognize loop idioms pass, not before Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does. I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand, but i'm very sure that `-indvars` requires `-loop-idiom` to run afterwards, as it can be seen in the phase-ordering test. LoopIdiom runs on two types of loops: countable ones, and uncountable ones. For uncountable ones, IndVars obviously didn't make any change to them, since they are uncountable, so for them the order should be irrelevant. For countable ones, well, they should have been countable before IndVars for IndVars to make any change to them, and since SCEV is used on them, it shouldn't matter if IndVars have already canonicalized them. So i don't really see why we'd want the current ordering. Should this cause issues, it will give us a reproducer test case that shows flaws in this logic, and we then could adjust accordingly. While this is quite likely beneficial in-the-wild already, it's a required part for the full motivational pattern behind `left-shift-until-bittest` loop idiom (D91038). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91800	2020-11-25 19:20:07 +03:00
Cullen Rhodes	1ba4b82f67	[LAA] NFC: Rename [get]MaxSafeRegisterWidth -> [get]MaxSafeVectorWidthInBits MaxSafeRegisterWidth is a misnomer since it actually returns the maximum safe vector width. Register suggests it relates directly to a physical register where it could be a vector spanning one or more physical registers. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91727	2020-11-25 13:06:26 +00:00
Florian Hahn	ad5b83ddcf	[VPlan] Add VPReductionSC to VPUser::classof, unify VPValue IDs. This is a follow-up to `00a6601136` to make isa<VPReductionRecipe> work and unifies the VPValue ID names, by making sure they all consistently start with VPV*.	2020-11-25 11:08:25 +00:00
David Green	e0c479cd0e	[VPlan] Switch VPWidenRecipe to be a VPValue Similar to other patches, this makes VPWidenRecipe a VPValue. Because of the way it interacts with the reduction code it also slightly alters the way that VPValues are registered, removing the up front NeedDef and using getOrAddVPValue to create them on-demand if needed instead. Differential Revision: https://reviews.llvm.org/D88447	2020-11-25 08:25:06 +00:00
David Green	00a6601136	[VPlan] Turn VPReductionRecipe into a VPValue This converts the VPReductionRecipe into a VPValue, like other VPRecipe's in preparation for traversing def-use chains. It also makes it a VPUser, now storing the used VPValues as operands. It doesn't yet change how the VPReductionRecipes are created. It will need to call replaceAllUsesWith from the original recipe they replace, but that is not done yet as VPWidenRecipe need to be created first. Differential Revision: https://reviews.llvm.org/D88382	2020-11-25 08:25:05 +00:00
Kazu Hirata	1c82d32089	[CHR] Use pred_size (NFC)	2020-11-24 22:52:30 -08:00
Max Kazantsev	28d7ba1543	[IndVars] Use more precise context when eliminating narrowing When deciding to widen narrow use, we may need to prove some facts about it. For proof, the context is used. Currently we take the instruction being widened as the context. However, we may be more precise here if we take as context the point that dominates all users of instruction being widened. Differential Revision: https://reviews.llvm.org/D90456 Reviewed By: skatkov	2020-11-25 11:47:39 +07:00
Philip Reames	10ddb927c1	[SCEV] Use isa<> pattern for testing for CouldNotCompute [NFC] Some older code - and code copied from older code - still directly tested against the singelton result of SE::getCouldNotCompute. Using the isa<SCEVCouldNotCompute> form is both shorter, and more readable.	2020-11-24 18:47:49 -08:00
Sanjay Patel	678b9c5dde	[InstCombine] try difference-of-shifts factorization before negator We need to preserve wrapping flags to allow better folds. The cases with geps may be non-intuitive, but that appears to agree with Alive2: https://alive2.llvm.org/ce/z/JQcqw7 We create 'nsw' ops independent from the original wrapping on the sub.	2020-11-24 13:56:30 -05:00
Philip Reames	075468621c	[LoopVec] Add a minor clarifying comment	2020-11-24 10:45:06 -08:00
Teresa Johnson	6e4c1cf293	[ThinLTO/WPD] Enable -wholeprogramdevirt-skip in ThinLTO backends Previously this option could be used to skip devirtualizations of the given functions in regular LTO and in the ThinLTO indexing step. This change allows them to be skipped in the backend as well, which is useful when debugging WPD in a distributed ThinLTO backend. Differential Revision: https://reviews.llvm.org/D91812	2020-11-24 09:35:07 -08:00
Ayal Zaks	32d9a386bf	[LV] Keep Primary Induction alive when folding tail by masking Fix PR47390. The primary induction should be considered alive when folding tail by masking, because it will be used by said masking; even when it may otherwise appear useless: feeding only its own 'bump', which is correctly considered dead, and as the 'bump' of another induction variable, which may wrongfully want to consider its bump = the primary induction, dead. Differential Revision: https://reviews.llvm.org/D92017	2020-11-24 15:12:54 +02:00
Arthur Eubanks	932e4f8815	[FunctionAttrs][NPM] Fix handling of convergent The legacy pass didn't properly detect indirect calls. We can still remove the convergent attribute when there are indirect calls. The LangRef says: > When it appears on a call/invoke, the convergent attribute indicates that we should treat the call as though we’re calling a convergent function. This is particularly useful on indirect calls; without this we may treat such calls as though the target is non-convergent. So don't skip handling of convergent when there are unknown calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89826	2020-11-23 21:09:41 -08:00
Philip Reames	1a9c72f8a8	[LoopVec] Reuse a lambda [NFC] Minor code refactor to improve readability.	2020-11-23 21:07:34 -08:00
Philip Reames	b06a2ad94f	[LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE) A uniform load is one which loads from a uniform address across all lanes. As currently implemented, we cost model such loads as if we did a single scalar load + a broadcast, but the actual lowering replicates the load once per lane. This change tweaks the lowering to use the REPLICATE strategy by marking such loads (and the computation leading to their memory operand) as uniform after vectorization. This is a useful change in itself, but it's real purpose is to pave the way for a following change which will generalize our uniformity logic. In review discussion, there was an issue raised with coupling cost modeling with the lowering strategy for uniform inputs. The discussion on that item remains unsettled and is pending larger architectural discussion. We decided to move forward with this patch as is, and revise as warranted once the bigger picture design questions are settled. Differential Revision: https://reviews.llvm.org/D91398	2020-11-23 15:32:17 -08:00
Sanjay Patel	ab29f091eb	[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps This is a retry of `324a53205`. I cautiously reverted that at `6aa3fc4` because the rules about gep math were not clear. Since then, we have added this line to LangRef for gep inbounds: "The successive addition of offsets (without adding the base address) does not wrap the pointer index type in a signed sense (nsw)." See D90708 and post-commit comments on the revert patch for more details.	2020-11-23 16:50:09 -05:00
Arthur Eubanks	3c811ce4f3	[NPM] Share pass building options with legacy PM We should share options when possible. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91741	2020-11-23 13:04:05 -08:00
Sjoerd Meijer	33b2c88fa8	[LoopFlatten] Widen IV, support ZExt. I disabled the widening in `fa5cb4b` because it run in an assert, which was related to replacing values with different types. I forgot that an extend could also be a zero-extend, which I have added now. This means that the approach now is to create and insert a trunc value of the outerloop for each user, and use that to replace IV values. Differential Revision: https://reviews.llvm.org/D91690	2020-11-23 08:57:19 +00:00
Kazu Hirata	df73b8c174	[ValueMapper] Remove unused declaration remapFunction (NFC) The function declaration with two parameters was introduced on Apr 16 2016 in commit `f0d73f95c1` without a corresponding definition.	2020-11-22 21:52:03 -08:00
Kazu Hirata	186d129320	[hwasan] Remove unused declaration shadowBase (NFC) The function was introduced on Jan 23, 2019 in commit `73078ecd38`. Its definition was removed on Oct 27, 2020 in commit `0930763b4b`, leaving the declaration unused.	2020-11-22 20:08:51 -08:00
Kazu Hirata	def7cfb7ff	[InstCombine] Use is_contained (NFC)	2020-11-21 15:47:11 -08:00
Alexey Bataev	0b420d674a	[SLP][NFC]Fix assert condition in newTreeEntry, NFC.	2020-11-20 13:25:21 -08:00
Hongtao Yu	f3c445697d	[CSSPGO] IR intrinsic for pseudo-probe block instrumentation This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues: 1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality. 2. The counter atomics may not be fully cleaned up from the code stream eventually. 3. Extra work is needed for re-targeting. We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality. Let's now look at an example. Given the following LLVM IR: ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 br i1 %cmp, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: ret void } ``` The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID. ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86490	2020-11-20 10:39:24 -08:00
Jamie Schmeiser	7f6360cdc6	Reland: Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug. Summary: Expand existing loopsink testing to also test loopsinking using new pass manager. Enable memoryssa for loopsink with new pass manager. This combination exposed a bug that was previously fixed for loopsink without memoryssa. When sinking an instruction into a loop, the source block may not be part of the loop but still needs to be checked for pointer invalidation. This is the fix for bugzilla #39695 (PR 54659) expanded to also work with memoryssa. Respond to review comments. Enable Memory SSA in legacy Loop Sink pass under EnableMSSALoopDependency option control. Update tests accordingly. Respond to review comments. Add options controlling whether memoryssa is used for loop sink, defaulting to off. Expand testing based on these options. Respond to review comments. Properly indicated preserved analyses. This relanding addresses a compile-time performance problem by moving test for profile data earlier to avoid unnecessary computations. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D90249	2020-11-20 10:26:33 -05:00
Arthur Eubanks	b77436047a	[PGO] Make -disable-preinline work with NPM Fixes cspgo_profile_summary.ll under NPM. Reviewed By: xur Differential Revision: https://reviews.llvm.org/D91826	2020-11-19 22:58:55 -08:00
Arthur Eubanks	513d165b80	Port -lower-matrix-intrinsics-minimal to NPM This reuses the existing lower-matrix-intrinsics pass rather than going the legacy pass route of creating a new pass. Use this new variant in the NPM -O0 pipeline. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91811	2020-11-19 17:42:48 -08:00
Florian Hahn	7fa14a7c69	[ConstraintElimination] Decompose GEP with arbitrary offsets. This patch decomposes `GEP %x, %offset` as 0 + 1 * %x + 1 * %off.	2020-11-19 22:49:21 +00:00
Geoffrey Martin-Noble	b156514f8d	Remove unused private fields Unused since https://reviews.llvm.org/D91762 and triggering -Wunused-private-field ``` llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:365:13: error: private field 'GetArgTLS' is not used [-Werror,-Wunused-private-field] Constant GetArgTLS; ^ llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:366:13: error: private field 'GetRetvalTLS' is not used [-Werror,-Wunused-private-field] Constant GetRetvalTLS; ``` Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D91820	2020-11-19 13:54:54 -08:00
Roman Lebedev	a91e96702a	[InstCombine] Fold `and(shl(zext(x), width(SIGNMASK) - width(%x)), SIGNMASK)` to `and(sext(%x), SIGNMASK)` One less instruction and reducing use count of zext. As alive2 confirms, we're fine with all the weird combinations of undef elts in constants, but unless the shift amount was undef for a lane, we must sanitize undef mask to zero, since sign bits are no longer zeros. https://rise4fun.com/Alive/d7r ``` ---------------------------------------- Optimization: zz Precondition: ((C1 == (width(%r) - width(%x))) && isSignBit(C2)) %o0 = zext %x %o1 = shl %o0, C1 %r = and %o1, C2 => %n0 = sext %x %r = and %n0, C2 Done: 2016 Optimization is correct! ```	2020-11-20 00:31:27 +03:00
Jianzhou Zhao	6c1c308c0e	Remove deadcode from DFSanFunction::getTLS() clean more deadcode after D84704 Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D91762	2020-11-19 21:10:37 +00:00
Nikita Popov	393b9e9db3	[MemLoc] Require LocationSize argument (NFC) When constructing a MemoryLocation by hand, require that a LocationSize is explicitly specified. D91649 will split up LocationSize::unknown() into two different states, and callers should make an explicit choice regarding the kind of MemoryLocation they want to have.	2020-11-19 21:45:52 +01:00
Sander de Smalen	41c9f4c1ce	[LoopVectorize] NFC: Fix unused variable warning for MaxSafeDepDist rGf571fe6df585127d8b045f8e8f5b4e59da9bbb73 led to a warning of an unused variable for MaxSafeDepDist (written but not used). It seems this variable and assignment can be safely removed.	2020-11-19 17:41:35 +00:00
Joseph Huber	da8bec47ab	[OpenMP] Add Location Fields to Libomptarget Runtime for Debugging Summary: Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values. Reviewers: jdoerfert grokos Differential Revision: https://reviews.llvm.org/D87946	2020-11-19 12:01:53 -05:00
Simon Moll	a1de391dae	[LV][NFC-ish] Allow vector widths over 256 elements The assertion that vector widths are <= 256 elements was hard wired in the LV code. Eg, VE allows for vectors up to 512 elements. Test again the TTI vector register bit width instead - this is an NFC for non-asserting builds. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D91518	2020-11-19 10:58:29 +01:00
Max Kazantsev	515105f46b	[NFC] Remove comment (commited ahead of time by mistake)	2020-11-19 16:28:34 +07:00
Max Kazantsev	7c601d09a7	[NFC] Move code earlier as preparation for further changes	2020-11-19 16:27:23 +07:00
Andrew Wei	ea7ab5a42c	[IndVarSimplify] Notify top most loop to drop cached exit counts Some nested loops may share the same ExitingBB, so after we finishing FoldExit, we need to notify OuterLoop and SCEV to drop any stored trip count. Patched by: guopeilin Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D91325	2020-11-19 15:37:54 +08:00
Kazu Hirata	43c0e4f665	[Transforms] Use llvm::is_contained (NFC)	2020-11-18 20:42:22 -08:00
Jamie Schmeiser	cff479b145	Revert "Revert "Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug.""" This reverts commit `e29292969b`. This apparently causes a regression in compile time (ie, it slows down).	2020-11-18 16:07:16 -05:00
Roman Lebedev	7bf89c2174	[NFC][Reassociate] Delay checking isLoadCombineCandidate() until after ShouldConvertOrWithNoCommonBitsToAdd() but before haveNoCommonBitsSet() This appears to improve -O3 compile-time performance somewhat: https://llvm-compile-time-tracker.com/compare.php?from=87369c626114ae17f4c637635c119e6de0856a9a&to=c04b8271e1609b0dfb20609b40844b0c4324517e&stat=instructions It doesn't look like delaying it until after haveNoCommonBitsSet() is better: https://llvm-compile-time-tracker.com/compare.php?from=c04b8271e1609b0dfb20609b40844b0c4324517e&to=b2943d450eaf41b5f76d2dc7350f0a279f64cd99&stat=instructions	2020-11-18 23:57:12 +03:00
Jamie Schmeiser	e29292969b	Revert "Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug."" This reverts commit `562addba65`. Reverted change too quickly, the failing test cases passed on the next build. So reverting revert (to include the changes).	2020-11-18 15:33:02 -05:00

... 12 13 14 15 16 ...

27111 Commits