llvm-project

Commit Graph

Author	SHA1	Message	Date
Roman Lebedev	fe7b3ad8d5	[CVP] LVI: Use in-block values when checking value signedness domain This has a huge positive impact on all the folds that use these helpers, as it can be seen on vanilla test-suite + rawspeed + darktable: correlated-value-propagation.NumSRems +75.68% (+ 28) correlated-value-propagation.NumAShrs +63.87% (+198) correlated-value-propagation.NumSDivs +49.42% (+127) correlated-value-propagation.NumSExt + 8.85% (+593) correlated-value-propagation.NumUDivURemsNarrowed + 8.65% (+34) ... while having pretty minimal compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=e8c7f43e2c2c6f3581ec1c6489ec21ad9f98958a&to=4cd197711e58ee1b2faeee0c35eea54540185569&stat=instructions	2021-04-10 21:10:59 +03:00
Roman Lebedev	257eda0794	[NFC][LVI] getPredicateAt(): drop default value for UseBlockValue The default is likely wrong. Out of all the callees, only a single one needs to pass-in false (JumpThread), everything else either already passes true, or should pass true. Until the default is flipped, at least make it harder to unintentionally add new callees with UseBlockValue=false.	2021-04-10 20:46:01 +03:00
Roman Lebedev	e8c7f43e2c	[NFC][ConstantRange] Add 'icmp' helper method "Does the predicate hold between two ranges?" Not very surprisingly, some places were already doing this check, without explicitly naming the algorithm, cleanup them all.	2021-04-10 19:38:55 +03:00
Roman Lebedev	7b12c8c59d	Revert "[NFC][ConstantRange] Add 'icmp' helper method" This reverts commit `17cf2c9423`.	2021-04-10 19:37:53 +03:00
Roman Lebedev	17cf2c9423	[NFC][ConstantRange] Add 'icmp' helper method "Does the predicate hold between two ranges?" Not very surprisingly, some places were already doing this check, without explicitly naming the algorithm, cleanup them all.	2021-04-10 19:09:52 +03:00
Roman Lebedev	c329a47d9e	[CVP] @llvm.abs() handling Iff we know the sigdness domain of the argument, we can either skip @llvm.abs, or do negation directly. Notably, INT_MIN can belong to either domain: * X u<= INT_MIN --> X is always fine https://alive2.llvm.org/ce/z/QB8j-C https://alive2.llvm.org/ce/z/7sFKpS * X s<= 0 --> -X is always fine https://alive2.llvm.org/ce/z/QbGSyq https://alive2.llvm.org/ce/z/APsN84 If all else fails, try to inferr NSW flag: https://alive2.llvm.org/ce/z/qCJfYm	2021-04-10 16:47:31 +03:00
Adrian Prantl	6ce76ff7eb	Update the linkage name of coro-split functions in the debug info. This patch updates the linkage name in the DISubprogram of coro-split functions, which is particularly important for Swift, where the funclets have a special name mangling. This patch does not affect C++ coroutines, since the DW_AT_specification is expected to hold the (original) linkage name. I believe this is mostly due to limitations in AsmPrinter, so we might be able to relax this restriction in the future. Differential Revision: https://reviews.llvm.org/D99693	2021-04-09 09:50:56 -07:00
Sanjay Patel	84cdccc9dc	[InstCombine] try to eliminate an instruction in min/max -> abs fold As suggested in the review thread for `5094e12` and seen in the motivating example from https://llvm.org/PR49885, it's not clear if we have a way to create the optimal code without this heuristic.	2021-04-09 10:34:03 -04:00
dfukalov	c1a88e007b	[AA][NFC] Convert AliasResult to class containing offset for PartialAlias case. Add an ability to store `Offset` between partially aliased location. Use this storage within returned `ResultAlias` instead of caching it in `AAQueryInfo`. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98718	2021-04-09 13:26:09 +03:00
dfukalov	d066079728	[NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset. Main reason is preparation to transform AliasResult to class that contains offset for PartialAlias case. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98027	2021-04-09 12:54:22 +03:00
Max Kazantsev	baf17e2cc9	[NFC] Move statictic increment out of helper	2021-04-09 16:32:35 +07:00
Max Kazantsev	275f3a2540	[GVN][NFC] Factor out load elimination logic via PRE for reuse	2021-04-09 16:12:25 +07:00
Arthur Eubanks	4c89bcadf6	[LICM] Hoist loads with invariant.group metadata Previously loading the vtable used in calling a virtual method in a loop was not hoisted out of the loop. This fixes that. canSinkOrHoistInst() itself doesn't check that the load operands are loop invariant, callers also check that separately. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99784	2021-04-08 21:57:37 -07:00
Serguei Katkov	d2e15a83a6	[RS4GC] Cleanup meetBDVState. NFC. meetBDVState looks pretty difficult to read and follow. This is purely NFC but doing several things: 1) Combine meet and meetBDVState 2) Move the function to be a member of BDVState 3) Make BDVState be a mutable object 4) Convert switch to sequence of ifs 5) Adds comments. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D99064	2021-04-09 10:20:25 +07:00
Alexey Bataev	ab124bbe2a	[SLP]Fix PR49898: Infinite loop in SLP vectorizer. We should not re-try attempt of finding of the consecutive store chain if it was tried before. Differential Revision: https://reviews.llvm.org/D100131	2021-04-08 14:18:06 -07:00
Philip Reames	35393c865c	[funcattrs] Infer nosync from instruction walk Pretty straightforward use of existing infrastructure and port of the attributor inference rules for nosync. A couple points of interest: * I deliberately switched from "monotonic or better" to "unordered or better". This is simply me being conservative and is better in line with the rest of the optimizer. We treat monotonic conservatively pretty much everywhere. * The operand bundle test change is suspicious. It looks like we might have missed something here, but if so, it's an issue with the existing nofree inference as well. I'm going to take a closer look at that separately. * I needed to keep the previous inference from readnone. This surprised me, but made sense once I realized readonly inference goes to lengths to reason about local vs non-local memory and that writes to local memory are okay. This is fine for the purpose of nosync, but would e.g. prevent us from inferring nofree from readnone - which is slightly surprising. Differential Revision: https://reviews.llvm.org/D99769	2021-04-08 14:05:00 -07:00
Arthur Eubanks	c5d1ccbcdf	[GVN] Properly invalidate ICF cache when we simplify a value This fixes a "Cached first special instruction is wrong!" assert. The assert fires because replacing a value with another can cause an instruction to no longer be "special" to ICF. In this case, devirtualization happened, turning an indirect call to a call to a willreturn function which is no longer special. Reviewed By: nikic, rnk Differential Revision: https://reviews.llvm.org/D99977	2021-04-08 14:01:57 -07:00
Nikita Popov	59a2f67011	[LoopRotate] Don't split loop pass manager After D99249 we use three different loop pass managers for LICM, LoopRotate and LICM+LoopUnswitch. This happens because LazyBFI and LazyBPI are not preserved by LoopRotate (note that D74640 is no longer needed). Avoid this by marking them as preserved. My understanding of D86156 is that it is okay to simply preserve them (which LoopUnswitch already does for the same reason) and rely on callbacks to deal with deleted blocks. Differential Revision: https://reviews.llvm.org/D99843	2021-04-08 22:05:18 +02:00
Congzhe Cao	ce2db9005d	[LoopInterchange] Fix transformation bugs in loop interchange After loop interchange, the (old) outer loop header should not jump to the `LoopExit`. Note that the old outer loop becomes the new inner loop after interchange. If we branched to `LoopExit` then after interchange we would jump directly from the (new) inner loop header to `LoopExit` without executing the rest of outer loop. This patch modifies adjustLoopBranches() such that the old outer loop header (which becomes the new inner loop header) jumps to the old inner loop latch which becomes the new outer loop latch after interchange. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D98475	2021-04-08 14:58:13 -04:00
Sanjay Patel	5094e1279e	[InstCombine] fold min/max intrinsic with negated operand to abs The smax case shows up in https://llvm.org/PR49885 . The others seem unlikely, but we might as well try for uniformity (although that could mean an extra instruction to create "nabs"). smax -- https://alive2.llvm.org/ce/z/8yYaGy smin -- https://alive2.llvm.org/ce/z/0_7zc_ umax -- https://alive2.llvm.org/ce/z/EcsZWs umin -- https://alive2.llvm.org/ce/z/Xw6WvB	2021-04-08 14:37:39 -04:00
Florian Hahn	e4de3cdf3d	[LV] Pass VPWidenPHIRecipe to widenPHIInstruction (NFC). Instead of passing the start value and the defined value to widenPHIInstruction, pass the VPWidenPHIRecipe directly, which can be used to get both (and more in future patches).	2021-04-08 14:25:10 +01:00
Stephen Tozer	140757bfaa	[DebugInfo] Prevent invalid debug info being produced during LoopStrengthReduce During LoopStrengthReduce, some of the SSA values that are used by debug values may be lost and/or salvaged. After LSR we attempt to recover any undef debug values, including any that were salvaged but then lost their values afterwards, by replacing the lost values with any live equal values (plus a possible constant offset) that have been gathered prior to running LSR. When we do this we restore the debug value's original DIExpression, to undo any salvaging (as we have gone back to using the original debug value). This process can currently produce invalid debug info if the number of operands has changed by salvaging during LSR. Replacing old values during the applyEqualValues step does not change the number of location operands, which means that when we restore the old DIExpression we may have a mismatch between the number of operands used by the debug value and the number of operands referenced by the DIExpression. This patch fixes this by restoring the full original location metadata at the start of the applyEqualValues step, so that there is no mismatch in operand count between the debug value and its DIExpression. Differential Revision: https://reviews.llvm.org/D98644	2021-04-08 13:04:48 +01:00
David Green	8675ef100f	[LV] Logical and/or select costs D99674 stopped the folding of certain select operations into and/or, due to incorrect folding in the presence of poison. D97360 added some costs to attempt to account for the change, but only worked at the getUserCost level, not the getCmpSelInstrCost that the vectorizer will use directly. This adds similar logic into the vectorizer to handle these logical and/or selects, treating them like and/or directly. This fixes 60% performance regressions from code like the attached test case. Differential Revision: https://reviews.llvm.org/D99884	2021-04-08 10:39:47 +01:00
Congzhe Cao	593cb46550	Revert "[LoopInterchange] Fix transformation bugs in loop interchange" This reverts commit 6ec68bd815d00c1eec2a6b9766452554f0e6cb61.	2021-04-07 21:17:30 -04:00
CongzheUalberta	f5645ea65f	[LoopInterchange] Fix transformation bugs in loop interchange After loop interchange, the (old) outer loop header should not jump to `LoopExit`. Note that the old outer loop becomes the new inner loop after interchange. If we branched to `LoopExit` then after interchange we would jump directly from the (new) inner loop header to `LoopExit` without executing the rest of (new) outer loop. This patch modifies adjustLoopBranches() such that the old outer loop header (which becomes the new inner loop header) jumps to the old inner loop latch which becomes the new outer loop latch after interchange. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D98475	2021-04-07 20:55:44 -04:00
Sanjay Patel	c0bbd0cc35	[InstCombine] fold not ops around min/max intrinsics This is another step towards parity with the existing cmp+select folds (see D98152).	2021-04-07 17:31:36 -04:00
Craig Topper	5fc0e98d9a	[LoopIdiomRecognize] Minor cleanups to the FFS idiom matching. NFC -Make sure of the CreateShl/LShr/AShr methods that take a uint64_t instead of creating a ConstantInt for 1 ourselves. -Use Builder.getInt1 or ConstantInt::getBool instead of a conditional. -Pull out repeated calls to getType.	2021-04-07 10:03:14 -07:00
Roman Lebedev	24f67473dd	[InstCombine] foldAddWithConstant(): don't deal with non-immediate constants All of the code that handles general constant here (other than the more restrictive APInt-dealing code) expects that it is an immediate, because otherwise we won't actually fold the constants, and increase instruction count. And it isn't obvious why we'd be okay with increasing the number of constant expressions, those still will have to be run.. But after `2829094a8e` this could also cause endless combine loops. So actually properly restrict this code to immediates.	2021-04-07 19:50:19 +03:00
Sanjay Patel	1894c6c59e	[InstCombine] avoid infinite loop from partial undef vectors This fixes the examples from D99674 and https://llvm.org/PR49878 The matchers succeed on partial undef/poison vector constants, but the transform creates a full 'not' (-1) constant, so it would undo a demanded vector elements change triggered by the extractelement. Differential Revision: https://reviews.llvm.org/D100044	2021-04-07 12:18:12 -04:00
wlei	6d5132b426	[CSSPGO] Fix incorrect probe distribution factor computation in top-down inliner We see a regression related to low probe factor(0.01) which prevents some callsites being promoted in ICPPass and later cause the missing inline in CGSCC inliner. The root cause is due to redundant(the second) multiplication of the probe factor and this change try to fix it. `Sum` does multiply a factor right after findCallSamples but later when using as the parameter in setProbeDistributionFactor, it multiplies one again. This change could get ~2% perf back on mcf benchmark. In mcf, previously the corresponding factor is 1 and it's the recent feature introducing the <1 factor then trigger this bug. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D99787	2021-04-07 08:48:59 -07:00
Alexey Bataev	a78e86e6be	[SLP]Avoid multiple attempts to vectorize CmpInsts. No need to lookup through and/or try to vectorize operands of the CmpInst instructions during attempts to find/vectorize min/max reductions. Compiler implements postanalysis of the CmpInsts so we can skip extra attempts in tryToVectorizeHorReductionOrInstOperands and save compile time. Differential Revision: https://reviews.llvm.org/D99950	2021-04-07 06:15:42 -07:00
Sanjay Patel	0333ed8e0c	[InstCombine] move abs transform to helper function; NFC The swap of the operands can affect later transforms that are expecting a constant as operand 1. I don't think we can trigger a bug with the current code, but I hit that problem while drafting a new transform for min/max intrinsics.	2021-04-07 08:35:07 -04:00
Roman Lebedev	2829094a8e	Reland [InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858) This reverts commit `a547b4e26b`, relanding commit `31d219d299`, which was reverted because there was a conflicting inverse transform, which was causing an endless combine loop, which has now been adjusted. Original commit message: https://alive2.llvm.org/ce/z/67w-wQ We prefer `add`s over `sub`, and this particular xform allows further folds to happen: Fixes https://bugs.llvm.org/show_bug.cgi?id=49858	2021-04-07 12:06:25 +03:00
Roman Lebedev	93d1d94b74	[InstCombine] Restrict "C-(X+C2) --> (C-C2)-X" fold to immediate constants I.e., if any/all of the consants is an expression, don't do it. Since those constants won't reduce into an immediate, but would be left as an constant expression, they could cause endless combine loops after `31d219d299` added an inverse transformation.	2021-04-07 12:06:24 +03:00
Petr Hosek	a547b4e26b	Revert "[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)" This reverts commit `31d219d299` which causes an infinite loop when compiling the XRay runtime.	2021-04-06 22:30:28 -07:00
Sidharth Baveja	d81d9e8b86	[SplitEdge] Update SplitCriticalEdge to return a nullptr only when the edge is not critical Summary: The function SplitCriticalEdge (called by SplitEdge) can return a nullptr in cases where the edge is a critical. SplitEdge uses SplitCriticalEdge assuming it can always split all critical edges, which is an incorrect assumption. The three cases where the function SplitCriticalEdge will return a nullptr is: 1. DestBB is an exception block 2. Options.IgnoreUnreachableDests is set to true and isa(DestBB->getFirstNonPHIOrDbgOrLifetime()) is not equal to a nullptr 3. LoopSimplify form must be preserved (Options.PreserveLoopSimplify is true) and it cannot be maintained for a loop due to indirect branches For each of these situations they are handled in the following way: 1. Modified the function ehAwareSplitEdge originally from llvm/lib/Transforms/Coroutines/CoroFrame.cpp to handle the cases when the DestBB is an exception block. This function is called directly in SplitEdge. SplitEdge does not call SplitCriticalEdge in this case 2. Options.IgnoreUnreachableDests is set to false by default, so this situation does not apply. 3. Return a nullptr in this situation since the SplitCriticalEdge also returned nullptr. Nothing we can do in this case. Reviewed By: asbirlea Differential Revision:https://reviews.llvm.org/D94619	2021-04-06 21:24:40 +00:00
Philip Reames	4bf8985f4f	Replace calls to IntrinsicInst::Create with CallInst::Create [nfc] There is no IntrinsicInst::Create. These are binding to the method in the super type. Be explicitly about which method is being called.	2021-04-06 13:23:58 -07:00
Philip Reames	908215b346	Use AssumeInst in a few more places [nfc] Follow up to `a6d2a8d6f5`. These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.	2021-04-06 13:18:53 -07:00
Philip Reames	9ef6aa020b	Plumb AssumeInst through operand bundle apis [nfc] Follow up to `a6d2a8d6f5`. This covers all the public interfaces of the bundle related code. I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.	2021-04-06 12:53:53 -07:00
Luís Marques	0c3bc1f3a4	[ASan][RISCV] Fix RISC-V memory mapping Fixes the ASan RISC-V memory mapping (originally introduced by D87580 and D87581). This should be an improvement both in terms of first principles soundness and observed test failures --- test failures would occur non-deterministically depending on the ASLR random offset. On RISC-V Linux (64-bit), `TASK_UNMAPPED_BASE` is currently defined as `PAGE_ALIGN(TASK_SIZE / 3)`. The non-power-of-two divisor makes the result be the not very round number 0x1555556000. That address had to be further rounded to ensure page alignment after the shadow scale shifting is applied. Still, that value explains why the mapping table may look less regular than expected. Further cleanups: - Moved the mapping table comment, to ensure that the two Linux/AArch64 tables stayed together; - Removed mention of Sv48. Neither the original mapping nor this one are compatible with an actual Linux Sv48 address space (mainline Linux still operates Sv48 in Sv39 mode). A future patch can improve this; - Removed the additional comments, for consistency. Differential Revision: https://reviews.llvm.org/D97646	2021-04-06 20:46:17 +01:00
Philip Reames	a6d2a8d6f5	Add a subclass of IntrinsicInst for llvm.assume [nfc] Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class. A follow up change will do the same for the newer assumption query/bundle mechanisms.	2021-04-06 11:16:22 -07:00
Arthur Eubanks	4e83e59eb8	[GVN] Add missing ICF update performScalarPREInsertion() inserts instructions into blocks that we need to tell ImplicitControlFlowTracking about, otherwise the ICF cache may be invalid. Fixes PR49193. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99909	2021-04-06 10:13:42 -07:00
Philip Reames	21d4839948	Move GCRelocateInst and GCResultInst to IntrinsicInst.h [nfc] These two are part of the IntrinsicInst class hierarchy and it helps to cut down on some redundant includes.	2021-04-06 08:33:15 -07:00
Philip Reames	52ecd94cfb	Remove last remnants of PR49607 migration [NFC] The key change (`4f5e92c`) to switch gc.result and gc.relocate to being readnone landed nearly two weeks ago, and we haven't seen any fallout. Time to remove the code added to make reverting easy.	2021-04-06 07:56:55 -07:00
Jan Svoboda	fb6a5237aa	Revert "[IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction" This reverts commit `167ea67d` This causes a bunch of build failures: * http://lab.llvm.org:8011/#/builders/121/builds/6287 * http://green.lab.llvm.org/green/job/clang-stage1-RA/19915	2021-04-06 16:33:28 +02:00
Benjamin Kramer	ce4acb01b3	Avoid unused variable warning in Release builds	2021-04-06 16:25:19 +02:00
Kerry McLaughlin	7344f3d39a	[LoopVectorize] Add strict in-order reduction support for fixed-width vectorization Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to reorder FP operations. However, it may still be beneficial to vectorize the loop by moving the reduction inside the vectorized loop and making sure that the scalar reduction value be an input to the horizontal reduction, e.g: %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ] %load = load <8 x float> %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load) This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions. For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D98435	2021-04-06 14:45:34 +01:00
Roman Lebedev	31d219d299	[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858) https://alive2.llvm.org/ce/z/67w-wQ We prefer `add`s over `sub`, and this particular xform allows further folds to happen: Fixes https://bugs.llvm.org/show_bug.cgi?id=49858	2021-04-06 15:58:14 +03:00
Simon Pilgrim	b8aba76a4e	LoopFlatten - CanWidenIV - Fix uninitialized variable warnings and use for-range loop. NFCI. Fix static analysis uninitialized variable warnings, and use for-range loop iteration across WideIVs array.	2021-04-06 12:24:20 +01:00
Abhina Sreeskantharajan	82b3e28e83	[SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text Problem: On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable. Solution: This patch adds two new flags - OF_CRLF which indicates that CRLF translation is used. - OF_TextWithCRLF = OF_Text \| OF_CRLF indicates that the file is text and uses CRLF translation. Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF. So this is the behaviour per platform with my patch: z/OS: OF_None: open in binary mode OF_Text : open in text mode OF_TextWithCRLF: open in text mode Windows: OF_None: open file with no carriage return OF_Text: open file with no carriage return OF_TextWithCRLF: open file with carriage return The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set. ``` if (Flags & OF_CRLF) CrtOpenFlags \|= _O_TEXT; ``` These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows. ./llvm/lib/Support/raw_ostream.cpp ./llvm/lib/TableGen/Main.cpp ./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp ./llvm/unittests/Support/Path.cpp ./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp ./clang/lib/Frontend/CompilerInstance.cpp ./clang/lib/Driver/Driver.cpp ./clang/lib/Driver/ToolChains/Clang.cpp Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99426	2021-04-06 07:23:31 -04:00
Kerry McLaughlin	857b8a73da	[LoopVectorize] Change the identity element for FAdd Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd. Reviewed By: dmgreen, spatel Differential Revision: https://reviews.llvm.org/D98963	2021-04-06 12:13:43 +01:00
Florian Hahn	a6b06b785c	[VPlan] Print VPValue operands for VPWidenPHI if possible. For VPWidenPHIRecipes that model all incoming values as VPValue operands, print those operands instead of printing the original PHI. D99294 updates recipes of reduction PHIs to use the VPValue for the incoming value from the loop backedge, making use of this new printing.	2021-04-06 12:11:21 +01:00
madhur13490	167ea67d76	[IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction This patch enhances hasAddressTaken() to ignore bitcasts as a callee in callbase instruction. Such bitcast usage doesn't really take the address in a useful meaningful way. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D98884	2021-04-06 09:23:46 +00:00
Arthur Eubanks	ea0e2ca1ac	[SROA] Allow SROA on pointers with invariant group intrinsic uses When we are able to SROA an alloca, we know all uses of it, meaning we don't have to preserve the invariant group intrinsics and metadata. It's possible that we could lose information regarding redundant loads/stores, but that's unlikely to have any real impact since right now the only user is Clang and vtables. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99760	2021-04-05 19:53:40 -07:00
Ta-Wei Tu	6a82ace5f2	[LoopFusion] Bails out if only the second candidate is guarded (PR48060) If only the second candidate loop is guarded while the first one is not, fusioning two loops might not be valid but this check is currently missing. Fixes https://bugs.llvm.org/show_bug.cgi?id=48060 Reviewed By: sidbav Differential Revision: https://reviews.llvm.org/D99716	2021-04-06 01:08:56 +08:00
Sanjay Patel	c590a9880d	[InstCombine] fix potential miscompile in select value equivalence As shown in the example based on: https://llvm.org/PR49832 ...and the existing test, we can't substitute a vector value because the equality compare replacement that we are attempting requires that the comparison is true for the entire value. Vector select can be partly true/false.	2021-04-05 12:25:40 -04:00
Alexey Bataev	00a84f9a7f	[SLP]Improve vectorization of the CmpInst instructions. During vectorization better to postpone the vectorization of the CmpInst instructions till the end of the basic block. Otherwise we may vectorize it too early and may miss some vectorization patterns, like reductions. Reworked part of D57059 Differential Revision: https://reviews.llvm.org/D99796	2021-04-05 06:22:51 -07:00
Roman Lebedev	2760a808b9	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): check that adding shift amounts doesn't overflow (PR49778) This is identical to `781d077afb`, but for the other function. For certain shift amount bit widths, we must first ensure that adding shift amounts is safe, that the sum won't have an unsigned overflow. Fixes https://bugs.llvm.org/show_bug.cgi?id=49778	2021-04-04 23:26:41 +03:00
Roman Lebedev	dceb3e5996	[NFC][InstCombine] Extract canTryToConstantAddTwoShiftAmounts() as helper	2021-04-04 23:26:41 +03:00
Sanjay Patel	c0645f1324	[InstCombine] fold popcount of exactly one bit to shift This is discussed in https://llvm.org/PR48999 , but it does not solve that request. The difference in the vector test shows that some other logic transform is limited to scalar types.	2021-04-04 11:43:49 -04:00
Nikita Popov	9bad7de9a3	[SimplifyCFG] Handle two equal cases in switch to select When converting a switch with two cases and a default into a select, also handle the denegerate case where two cases have the same value. Generate this case directly as %or = or i1 %cmp1, %cmp2 %res = select i1 %or, i32 %val, i32 %default rather than %sel1 = select i1 %cmp1, i32 %val, i32 %default %res = select i1 %cmp2, i32 %val, i32 %sel1 as InstCombine is going to canonicalize to the former anyway.	2021-04-04 17:27:28 +02:00
Juneyoung Lee	5207cde5cb	[InstCombine] Conditionally fold select i1 into and/or This patch fixes llvm.org/pr49688 by conditionally folding select i1 into and/or: ``` select cond, cond2, false -> and cond, cond2 ``` This is not safe if cond2 is poison whereas cond isn’t. Unconditionally disabling this transformation affects later pipelines that depend on and/or i1s. To minimize its impact, this patch conservatively checks whether cond2 is an instruction that creates a poison or its operand creates a poison. This approach is similar to what InstSimplify's SimplifyWithOpReplaced is doing. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99674	2021-04-04 14:11:28 +09:00
Fangrui Song	8e5f3d04f2	[SLPVectorizer] Fix divide-by-zero after D99719 Will add a test case later.	2021-04-02 11:13:51 -07:00
Sanjay Patel	412fc74140	[InstCombine] fold not+or+neg ~((-X) \| Y) --> (X - 1) & (~Y) We generally prefer 'add' over 'sub', this reduces the dependency chain, and this looks better for codegen on x86, ARM, and AArch64 targets. https://llvm.org/PR45755 https://alive2.llvm.org/ce/z/cxZDSp	2021-04-02 13:16:36 -04:00
Dimitry Andric	6abb92f210	[SCCP] Avoid modifying AdditionalUsers while iterating over it When run under valgrind, or with a malloc that poisons freed memory, this can lead to segfaults or other problems. To avoid modifying the AdditionalUsers DenseMap while still iterating, save the instructions to be notified in a separate SmallPtrSet, and use this to later call OperandChangedState on each instruction. Fixes PR49582. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D98602	2021-04-02 19:05:59 +02:00
Florian Hahn	8867fc69f0	[LV] Hoist mapping of IR operands to VPValues (NFC). This patch moves mapping of IR operands to VPValues out of tryToCreateWidenRecipe. This allows using existing VPValue operands when widening recipes directly, which will be introduced in future patches.	2021-04-02 17:57:20 +01:00
Philip Reames	2c4548e18e	[rs4gc] Use loops instead of straightline code for attribute stripping [nfc] Mostly because I'm about to add more attributes and the straightline copies get much uglier. What's currently there isn't too bad.	2021-04-02 09:25:15 -07:00
Philip Reames	a505801e2b	[rs4gc] Strip nofree and nosync attributes when lowering from abstract model The safepoints being inserted exists to free memory, or coordinate with another thread to do so. Thus, we must strip any inferred attributes and reinfer them after the lowering. I'm not aware of any active miscompiles caused by this, but since I'm working on strengthening inference of both and leveraging them in the optimization decisions, I figured a bit of future proofing was warranted.	2021-04-02 09:12:24 -07:00
Alexey Bataev	5fcb07a070	[SLP]Fix a bug in min/max reduction, number of condition uses. The ultimate reduction node may have multiple uses, but if the ultimate reduction is min/max reduction and based on SelectInstruction, the condition of this select instruction must have only single use. Differential Revision: https://reviews.llvm.org/D99753	2021-04-02 07:09:44 -07:00
Jeroen Dobbelaere	b82b305cf9	[InstCombine] Fix out-of-bounds ashr(shl) optimization This fixes a crash found by the oss fuzzer and reported by @fhahn. The suggestion of @RKSimon seems to be the correct fix here. (See D91343). The oss fuzz report can be found here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32759 Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D99792	2021-04-02 13:45:11 +02:00
Florian Hahn	0f3230390b	[SLP] Better estimate cost of no-op extracts on target vectors. The motivation for this patch is to better estimate the cost of extracelement instructions in cases were they are going to be free, because the source vector can be used directly. A simple example is %v1.lane.0 = extractelement <2 x double> %v.1, i32 0 %v1.lane.1 = extractelement <2 x double> %v.1, i32 1 %a.lane.0 = fmul double %v1.lane.0, %x %a.lane.1 = fmul double %v1.lane.1, %y Currently we only consider the extracts free, if there are no other users. In this particular case, on AArch64 which can fit <2 x double> in a vector register, the extracts should be free, independently of other users, because the source vector of the extracts will be in a vector register directly, so it should be free to use the vector directly. The SLP vectorized version of noop_extracts_9_lanes is 30%-50% faster on certain AArch64 CPUs. It looks like this does not impact any code in SPEC2000/SPEC2006/MultiSource both on X86 and AArch64 with -O3 -flto. This originally regressed after D80773, so if there's a better alternative to explore, I'd be more than happy to do that. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D99719	2021-04-02 10:40:12 +01:00
Evgeniy Brevnov	2388aae401	[NARY-REASSOCIATE] Support reassociation of min/max Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available. Reviewed By: mkazantsev, lebedev.ri Differential Revision: https://reviews.llvm.org/D88287	2021-04-02 15:30:13 +07:00
Roman Lebedev	a26f1bf67e	[PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (sic) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9015799 \| -131 \| 0.00% \| 0.00% \| \| indvars.NumElimCmp \| 3536 \| 3544 \| 8 \| 0.23% \| 0.23% \| \| indvars.NumElimExt \| 36725 \| 36580 \| -145 \| -0.39% \| 0.39% \| \| indvars.NumElimIV \| 1197 \| 1187 \| -10 \| -0.84% \| 0.84% \| \| indvars.NumElimIdentity \| 143 \| 136 \| -7 \| -4.90% \| 4.90% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29890 \| 48 \| 0.16% \| 0.16% \| \| indvars.NumReplaced \| 2293 \| 2227 \| -66 \| -2.88% \| 2.88% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26329 \| -109 \| -0.41% \| 0.41% \| \| instcount.TotalBlocks \| 1178338 \| 1173840 \| -4498 \| -0.38% \| 0.38% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9896139 \| -9303 \| -0.09% \| 0.09% \| \| lcssa.NumLCSSA \| 425871 \| 423961 \| -1910 \| -0.45% \| 0.45% \| \| licm.NumHoisted \| 378357 \| 378753 \| 396 \| 0.10% \| 0.10% \| \| licm.NumMovedCalls \| 2193 \| 2208 \| 15 \| 0.68% \| 0.68% \| \| licm.NumMovedLoads \| 35899 \| 31821 \| -4078 \| -11.36% \| 11.36% \| \| licm.NumPromoted \| 11178 \| 11154 \| -24 \| -0.21% \| 0.21% \| \| licm.NumSunk \| 13359 \| 13587 \| 228 \| 1.71% \| 1.71% \| \| loop-delete.NumDeleted \| 8547 \| 8402 \| -145 \| -1.70% \| 1.70% \| \| loop-instsimplify.NumSimplified \| 12876 \| 11890 \| -986 \| -7.66% \| 7.66% \| \| loop-peel.NumPeeled \| 1008 \| 925 \| -83 \| -8.23% \| 8.23% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42003 \| -12 \| -0.03% \| 0.03% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 242 \| 2 \| 0.83% \| 0.83% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 497 \| 20 \| -477 \| -95.98% \| 95.98% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 336 \| -282 \| -45.63% \| 45.63% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11032 \| 4 \| 0.04% \| 0.04% \| \| loop-unroll.NumUnrolled \| 12608 \| 12529 \| -79 \| -0.63% \| 0.63% \| \| mem2reg.NumDeadAlloca \| 10222 \| 10221 \| -1 \| -0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192110 \| 192106 \| -4 \| 0.00% \| 0.00% \| \| mem2reg.NumSingleStore \| 637650 \| 637643 \| -7 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 814 \| 812 \| -2 \| -0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282934 \| -174 \| -0.06% \| 0.06% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106718 \| 6 \| 0.01% \| 0.01% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9014474 \| -1456 \| -0.02% \| 0.02% \| \| indvars.NumElimCmp \| 3536 \| 3546 \| 10 \| 0.28% \| 0.28% \| \| indvars.NumElimExt \| 36725 \| 36681 \| -44 \| -0.12% \| 0.12% \| \| indvars.NumElimIV \| 1197 \| 1185 \| -12 \| -1.00% \| 1.00% \| \| indvars.NumElimIdentity \| 143 \| 146 \| 3 \| 2.10% \| 2.10% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29899 \| 57 \| 0.19% \| 0.19% \| \| indvars.NumReplaced \| 2293 \| 2299 \| 6 \| 0.26% \| 0.26% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26404 \| -34 \| -0.13% \| 0.13% \| \| instcount.TotalBlocks \| 1178338 \| 1173652 \| -4686 \| -0.40% \| 0.40% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9895452 \| -9990 \| -0.10% \| 0.10% \| \| lcssa.NumLCSSA \| 425871 \| 425373 \| -498 \| -0.12% \| 0.12% \| \| licm.NumHoisted \| 378357 \| 383352 \| 4995 \| 1.32% \| 1.32% \| \| licm.NumMovedCalls \| 2193 \| 2204 \| 11 \| 0.50% \| 0.50% \| \| licm.NumMovedLoads \| 35899 \| 35755 \| -144 \| -0.40% \| 0.40% \| \| licm.NumPromoted \| 11178 \| 11163 \| -15 \| -0.13% \| 0.13% \| \| licm.NumSunk \| 13359 \| 14321 \| 962 \| 7.20% \| 7.20% \| \| loop-delete.NumDeleted \| 8547 \| 8538 \| -9 \| -0.11% \| 0.11% \| \| loop-instsimplify.NumSimplified \| 12876 \| 12041 \| -835 \| -6.48% \| 6.48% \| \| loop-peel.NumPeeled \| 1008 \| 924 \| -84 \| -8.33% \| 8.33% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42005 \| -10 \| -0.02% \| 0.02% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 241 \| 1 \| 0.42% \| 0.42% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 619 \| 1 \| 0.16% \| 0.16% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11029 \| 1 \| 0.01% \| 0.01% \| \| loop-unroll.NumUnrolled \| 12608 \| 12525 \| -83 \| -0.66% \| 0.66% \| \| mem2reg.NumPHIInsert \| 192110 \| 192073 \| -37 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637650 \| 637652 \| 2 \| 0.00% \| 0.00% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282998 \| -110 \| -0.04% \| 0.04% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106691 \| -21 \| -0.02% \| 0.02% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 5185 \| 7 \| 0.14% \| 0.14% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 925 \| 11 \| 1.20% \| 1.20% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 179 \| -4 \| -2.19% \| 2.19% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: \| statistic name \| LICM-LoopRotate \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015799 \| 9014474 \| -1325 \| -0.01% \| 0.01% \| \| indvars.NumElimCmp \| 3544 \| 3546 \| 2 \| 0.06% \| 0.06% \| \| indvars.NumElimExt \| 36580 \| 36681 \| 101 \| 0.28% \| 0.28% \| \| indvars.NumElimIV \| 1187 \| 1185 \| -2 \| -0.17% \| 0.17% \| \| indvars.NumElimIdentity \| 136 \| 146 \| 10 \| 7.35% \| 7.35% \| \| indvars.NumLFTR \| 29890 \| 29899 \| 9 \| 0.03% \| 0.03% \| \| indvars.NumReplaced \| 2227 \| 2299 \| 72 \| 3.23% \| 3.23% \| \| indvars.NumWidened \| 26329 \| 26404 \| 75 \| 0.28% \| 0.28% \| \| instcount.TotalBlocks \| 1173840 \| 1173652 \| -188 \| -0.02% \| 0.02% \| \| instcount.TotalInsts \| 9896139 \| 9895452 \| -687 \| -0.01% \| 0.01% \| \| lcssa.NumLCSSA \| 423961 \| 425373 \| 1412 \| 0.33% \| 0.33% \| \| licm.NumHoisted \| 378753 \| 383352 \| 4599 \| 1.21% \| 1.21% \| \| licm.NumMovedCalls \| 2208 \| 2204 \| -4 \| -0.18% \| 0.18% \| \| licm.NumMovedLoads \| 31821 \| 35755 \| 3934 \| 12.36% \| 12.36% \| \| licm.NumPromoted \| 11154 \| 11163 \| 9 \| 0.08% \| 0.08% \| \| licm.NumSunk \| 13587 \| 14321 \| 734 \| 5.40% \| 5.40% \| \| loop-delete.NumDeleted \| 8402 \| 8538 \| 136 \| 1.62% \| 1.62% \| \| loop-instsimplify.NumSimplified \| 11890 \| 12041 \| 151 \| 1.27% \| 1.27% \| \| loop-peel.NumPeeled \| 925 \| 924 \| -1 \| -0.11% \| 0.11% \| \| loop-rotate.NumRotated \| 42003 \| 42005 \| 2 \| 0.00% \| 0.00% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 242 \| 241 \| -1 \| -0.41% \| 0.41% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 20 \| 497 \| 477 \| 2385.00% \| 2385.00% \| \| loop-simplifycfg.NumTerminatorsFolded \| 336 \| 619 \| 283 \| 84.23% \| 84.23% \| \| loop-unroll.NumCompletelyUnrolled \| 11032 \| 11029 \| -3 \| -0.03% \| 0.03% \| \| loop-unroll.NumUnrolled \| 12529 \| 12525 \| -4 \| -0.03% \| 0.03% \| \| mem2reg.NumDeadAlloca \| 10221 \| 10222 \| 1 \| 0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192106 \| 192073 \| -33 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637643 \| 637652 \| 9 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 812 \| 814 \| 2 \| 0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 282934 \| 282998 \| 64 \| 0.02% \| 0.02% \| \| scalar-evolution.NumTripCountsNotComputed \| 106718 \| 106691 \| -27 \| -0.03% \| 0.03% \| \| simple-loop-unswitch.NumBranches \| 4752 \| 5185 \| 433 \| 9.11% \| 9.11% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 503 \| 925 \| 422 \| 83.90% \| 83.90% \| \| simple-loop-unswitch.NumSwitches \| 18 \| 20 \| 2 \| 11.11% \| 11.11% \| \| simple-loop-unswitch.NumTrivial \| 95 \| 179 \| 84 \| 88.42% \| 88.42% \| {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249	2021-04-02 11:11:42 +03:00
Juneyoung Lee	c664769330	[AssumeBundles] offset should be added to correctly calculate align This is a patch to fix the bug in alignment calculation (see https://reviews.llvm.org/D90529#2619492). Consider this code: ``` call void @llvm.assume(i1 true) ["align"(i32* %a, i32 32, i32 28)] %arrayidx = getelementptr inbounds i32, i32* %a, i64 -1 ; aligment of %arrayidx? ``` The llvm.assume guarantees that `%a - 28` is 32-bytes aligned, meaning that `%a` is 32k + 28 for some k. Therefore `a - 4` cannot be 32-bytes aligned but the existing code was calculating the pointer as 32-bytes aligned. The reason why this happened is as follows. `DiffSCEV` stores `%arrayidx - %a` which is -4. `OffSCEV` stores the offset value of “align”, which is 28. `DiffSCEV` + `OffSCEV` = 24 should be used for `a - 4`'s offset from 32k, but `DiffSCEV` - `OffSCEV` = 32 was being used instead. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D98759	2021-04-02 12:32:05 +09:00
Philip Reames	91790c6785	[indvars[ Fix pr49802 by checking for SCEVCouldNotCompute The code is assuming that having an exact exit count for the loop implies that exit counts for every exit are known. This used to be true, but when we added handling for dead exits we broke this invariant. The new invariant is that an exact loop count implies that any exits non trivially dead have exit counts. We could have fixed this by either a) explicitly checking for a dead exit, or b) just testing for SCEVCouldNotCompute. I chose the second as it was simpler. (Debugging this took longer than it should have since I'd mistyped the original assert and it wasn't checking what it was meant to...) p.s. Sorry for the lack of test case. Getting things into a state to actually hit this is difficult and fragile. The original repro involves loop-deletion leaving SCEV in a slightly inprecise state which lets us bypass other transforms in IndVarSimplify on the way to this one. All of my attempts to separate it into a standalone test failed.	2021-04-01 17:53:44 -07:00
Philip Reames	b23a314146	[funcattrs] Respect nofree attribute on callsites (not just callee)	2021-04-01 14:45:49 -07:00
Philip Reames	1e69a5af92	[Attributor] Cleanup detection of non-relaxed atomics in nosync inference The code was checking for cases which are disallowed by the verifier. Delete dead code and adjust style.	2021-04-01 12:01:29 -07:00
Philip Reames	8e596f7e27	[Attributor] Cleanup intrinsic handling in nosync inference [mostly NFC] Mostly stylistic adjustment, but the old code didn't handle the memcpy.inline intrinsic. By using the matcher class, we now do.	2021-04-01 11:49:59 -07:00
Philip Reames	6ef4505298	[funcattrs] Infer nosync from readnone and non-convergent This implements the most basic possible nosync inference. The choice of inference rule is taken from the comments in attributor and the discussion on the review of the change which introduced the nosync attribute (`0626367202`). This is deliberately minimal. As noted in code comments, I do plan to add a more robust inference which actually scans the function IR directly, but a) I need to do some refactoring of the attributor code to use common interfaces, and b) I wanted to get something in. I also wanted to minimize the "interesting" analysis discussion since that's time intensive. Context: This combines with existing nofree attribute inference to help prove dereferenceability in the ongoing deref-at-point semantics work. Differential Revision: https://reviews.llvm.org/D99749	2021-04-01 11:37:34 -07:00
Philip Reames	ffa15e9463	Extract isVolatile helper on Instruction [NFCI] We have this logic duplicated in several cases, none of which were exhaustive. Consolidate it in one place. I don't believe this actually impacts behavior of the callers. I think they all filter their inputs such that their partial implementations were correct. If not, this might be fixing a cornercase bug.	2021-04-01 11:24:02 -07:00
Philip Reames	6b05d753e0	Mark unordered memset/memmove/memcpy as nosync Mostly a means to remove a bit of code from attributor in advance of implementing a FuncAttr inference for nosync.	2021-04-01 10:38:54 -07:00
Alexey Bataev	c03696da5e	[SLP]Improve and fix getVectorElementSize. 1. Need to cleanup InstrElementSize map for each new tree, otherwise might use sizes from the previous run of the vectorization attempt. 2. No need to include into analysis the instructions from the different basic blocks to save compile time. Differential Revision: https://reviews.llvm.org/D99677	2021-04-01 06:51:26 -07:00
Alexey Bataev	ce98a0556a	[SLP]Remove `else` after `return`, NFC.`	2021-04-01 05:33:01 -07:00
Yevgeny Rouban	1ed53d44d8	[LoopFlatten] Do not report CFG analyses as up-to-date Removes CFGAnalyses from the preserved analyses set returned by LoopFlattenPass::run(). Reviewed By: Dave Green, Ta-Wei Tu Differential Revision: https://reviews.llvm.org/D99700	2021-04-01 15:52:36 +07:00
Max Kazantsev	a1d83776bf	[NFC] Undo some erroneous renamings Some vars renamed by mistake during auto-replacements. Undoing them.	2021-04-01 13:10:10 +07:00
Max Kazantsev	630818a850	[NFC] Disambiguate LI in GVN Name GVN uses name 'LI' for two different unrelated things: LoadInst and LoopInfo. This patch relates the variables with former meaning into 'Load' to disambiguate the code.	2021-04-01 12:40:35 +07:00
KAWASHIMA Takahiro	5fac7c6046	[GVN] Propagate llvm.access.group metadata of loads Before this change, the `llvm.access.group` metadata was dropped when moving a load instruction in GVN. This prevents vectorizing a C/C++ loop with `#pragma clang loop vectorize(assume_safety)`. This change propagates the metadata as well as other metadata if it is safe (the move-destination basic block and source basic block belong to the same loop). Differential Revision: https://reviews.llvm.org/D93503	2021-04-01 10:00:48 +09:00
qixingxue	62b74f7564	[GVN][NFC] Refactor analyzeLoadFromClobberingWrite This commit adjusts the order of two swappable if statements to make code cleaner. Reviewed By: lattner, nikic Differential Revision: https://reviews.llvm.org/D99648	2021-04-01 08:35:35 +08:00
Roman Lebedev	43ded90094	[NFC][LoopRotation] Count the number of instructions hoisted/cloned into preheader	2021-03-31 23:27:36 +03:00
Huihui Zhang	fe5c4a06a4	[LoopVectorize] Use SetVector to track uniform uses to prevent non-determinism. Use SetVector instead of SmallPtrSet to track values with uniform use. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM test consecutive-ptr-uniforms.ll . Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99549	2021-03-31 11:21:07 -07:00
Sanjay Patel	1462bdf1b9	[InstCombine] fold abs(srem X, 2) This is a missing optimization based on an example in: https://llvm.org/PR49763 As noted there and the test here, we could add a more general fold if that is shown useful. https://alive2.llvm.org/ce/z/xEHdTv https://alive2.llvm.org/ce/z/97dcY5	2021-03-31 11:29:20 -04:00
Sander de Smalen	7108b2dec1	[SVE] Fix LoopVectorizer test scalalable-call.ll This marks FSIN and other operations to EXPAND for scalable vectors, so that they are not assumed to be legal by the cost-model. Depends on D97470 Reviewed By: dmgreen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97471	2021-03-31 14:52:49 +01:00
Chuanqi Xu	eb51dd719f	[Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2 Summary: Try to insert dbg.declare to entry.resume basic block in resume function. In this way, we could print alloca such as __promise in gdb/lldb under O2, which would be beneficial to debug coroutine program. Test Plan: check-llvm Reviewed by: aprantl Differential Revision: https://reviews.llvm.org/D96938	2021-03-31 10:37:06 +08:00
Fangrui Song	3e5ee194c0	[SimpleLoopUnswitch] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after `431a40e1e2`	2021-03-30 19:27:10 -07:00
Juneyoung Lee	431a40e1e2	[LoopUnswitch] Assert that branch condition is either and/or but not both as suggested at https://reviews.llvm.org/rG5bb38e84d3d0#986321	2021-03-31 10:35:22 +09:00
Sanjay Patel	c2ebad8d55	[InstCombine] add fold for demand of low bit of abs() This is one problem shown in https://llvm.org/PR49763 https://alive2.llvm.org/ce/z/cV6-4K https://alive2.llvm.org/ce/z/9_3g-L	2021-03-30 15:14:37 -04:00
Huihui Zhang	d857a81437	[VPlan] Use SetVector for VPExternalDefs to prevent non-determinism. Use SetVector instead of SmallPtrSet for external definitions created for VPlan. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM-Unit test VPRecipeTest.dump. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99544	2021-03-30 12:10:56 -07:00
spupyrev	22998738e8	[SamplePGO] Keeping prof metadata for IndirectBrInst Currently prof metadata with branch counts is added only for BranchInst and SwitchInst, but not for IndirectBrInst. As a result, BPI/BFI make incorrect inferences for indirect branches, which can be very hot. This diff adds metadata for IndirectBrInst, in addition to BranchInst and SwitchInst. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D99550	2021-03-30 10:44:48 -07:00
Hongtao Yu	3e3fc431df	[CSSPGO] Top-down processing order based on full profile. Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn't reflect real execution order. For example: 1. Incomplete static call graph due to unknown indirect call targets. Adjusting the order by considering indirect call edges from the profile can enable the inlining of indirect call targets by allowing the caller processed before them. 2. Mutual call edges in an SCC. The static processing order computed for an SCC may not reflect the call contexts in the context-sensitive profile, thus may cause potential inlining to be overlooked. The function order in one SCC is being adjusted to a top-down order based on the profile to favor more inlining. 3. Transitive indirect call edges due to inlining. When a callee function is inlined into into a caller function in LTO prelink, every call edge originated from the callee will be transferred to the caller. If any of the transferred edges is indirect, the original profiled indirect edge, even if considered, would not enforce a top-down order from the caller to the potential indirect call target in LTO postlink since the inlined callee is gone from the static call graph. 4. #3 can happen even for direct call targets, due to functions defined in header files. Header functions, when included into source files, are defined multiple times but only one definition survives due to ODR. Therefore, the LTO prelink inlining done on those dropped definitions can be useless based on a local file scope. More importantly, the inlinee, once fully inlined to a to-be-dropped inliner, will have no profile to consume when its outlined version is compiled. This can lead to a profile-less prelink compilation for the outlined version of the inlinee function which may be called from external modules. while this isn't easy to fix, we rely on the postlink AutoFDO pipeline to optimize the inlinee. Since the survived copy of the inliner (defined in headers) can be inlined in its local scope in prelink, it may not exist in the merged IR in postlink, and we'll need the profiled call edges to enforce a top-down order for the rest of the functions. Considering those cases, a profiled call graph completely independent of the static call graph is constructed based on profile data, where function objects are not even needed to handle case #3 and case 4. I'm seeing an average 0.4% perf win out of SPEC2017. For certain benchmark such as Xalanbmk and GCC, the win is bigger, above 2%. The change is an enhancement to https://reviews.llvm.org/D95988. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D99351	2021-03-30 10:42:22 -07:00
Krasimir Georgiev	c51e91e046	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `5178ffc7cf`. Compiling `llvm-profdata` with a compiler build from this produces a crashing binary.	2021-03-30 14:13:37 +02:00
Juneyoung Lee	6b4b1dc6ec	[LoopUnswitch] Simplify branch condition if it is select with constant operands This fixes the miscompilation reported in https://reviews.llvm.org/rG5bb38e84d3d0#986154 . `select _, true, false` matches both m_LogicalAnd and m_LogicalOr, making later transformations confused. Simplify the branch condition to not have the form.	2021-03-30 20:09:42 +09:00
Sander de Smalen	f71ed5dfe2	NFC: Migrate PartialInlining to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D97382	2021-03-30 11:59:45 +01:00
David Sherwood	a08c7736a7	[LoopVectorize] Add support for scalable vectorization of induction variables This patch adds support for the vectorization of induction variables when using scalable vectors, which required the following changes: 1. Removed assert from InnerLoopVectorizer::getStepVector. 2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use a runtime determined value for VF and removed an assert. 3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable vectors. I did this by calculating the full vector value for each Part of the unroll factor (UF) and caching this in the VP state. This means that we are always able to extract an arbitrary element from the vector if necessary. In addition to this, I also permitted the caching of the individual lane values themselves for the known minimum number of elements in the same way we do for fixed width vectors. This is a further optimisation that improves the code quality since it avoids unnecessary extractelement operations when extracting the first lane. 4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while testing some code paths I noticed this is currently broken for scalable vectors. Various tests to support different cases have been added here: Transforms/LoopVectorize/AArch64/sve-inductions.ll Differential Revision: https://reviews.llvm.org/D98715	2021-03-30 11:13:31 +01:00
Krasimir Georgiev	8e7df996e3	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `92ddd3c1b6`. Causes multistage clang crashes, e.g.: https://lab.llvm.org/buildbot/#/builders/36/builds/6678	2021-03-30 11:47:12 +02:00
Han Zhu	92ddd3c1b6	[loop-idiom] Hoist loop memcpys to loop preheader For a simple loop like: ``` struct S { int x; int y; char b; }; unsigned foo(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; return sizeof(a[0]); } ``` We could eliminate the loop and convert it to a large memcpy of 12n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll` ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %0 = bitcast %struct.S* %arrayidx2 to i8* %1 = bitcast %struct.S* %arrayidx to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false) %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic. With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass. ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %a1 = bitcast %struct.S* %a to i8* %b2 = bitcast %struct.S* %b to i8* %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = zext i32 %n to i64 %1 = mul nuw nsw i64 %0, 12 call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false) br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %2 = bitcast %struct.S* %arrayidx2 to i8* %3 = bitcast %struct.S* %arrayidx to i8* %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` Reviewed By: zino Differential Revision: https://reviews.llvm.org/D97667	2021-03-29 23:36:26 -07:00
Han Zhu	2bd4049ceb	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `deb5095833`. Bad commit message.	2021-03-29 23:35:35 -07:00
Han Zhu	deb5095833	[loop-idiom] Hoist loop memcpys to loop preheader Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Blame Revision: Differential Revision: https://phabricator.intern.facebook.com/D26380397	2021-03-29 23:14:42 -07:00
Huihui Zhang	ca721042f1	[IPO][SampleContextTracker] Use SmallVector to track context profiles to prevent non-determinism. Use SmallVector instead of SmallSet to track the context profiles mapped. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM test profile-context-tracker-debug.ll . Reviewed By: MaskRay, wenlei Differential Revision: https://reviews.llvm.org/D99547	2021-03-29 16:37:10 -07:00
Gulfem Savrun Yeniceri	5178ffc7cf	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-29 21:53:32 +00:00
Wenlei He	30b0232336	[CSSPGO][llvm-profgen] Context-sensitive global pre-inliner This change sets up a framework in llvm-profgen to estimate inline decision and adjust context-sensitive profile based on that. We call it a global pre-inliner in llvm-profgen. It will serve two purposes: 1) Since context profile for not inlined context will be merged into base profile, if we estimate a context will not be inlined, we can merge the context profile in the output to save profile size. 2) For thinLTO, when a context involving functions from different modules is not inined, we can't merge functions profiles across modules, leading to suboptimal post-inline count quality. By estimating some inline decisions, we would be able to adjust/merge context profiles beforehand as a mitigation. Compiler inline heuristic uses inline cost which is not available in llvm-profgen. But since inline cost is closely related to size, we could get an estimate through function size from debug info. Because the size we have in llvm-profgen is the final size, it could also be more accurate than the inline cost estimation in the compiler. This change only has the framework, with a few TODOs left for follow up patches for a complete implementation: 1) We need to retrieve size for funciton//inlinee from debug info for inlining estimation. Currently we use number of samples in a profile as place holder for size estimation. 2) Currently the thresholds are using the values used by sample loader inliner. But they need to be tuned since the size here is fully optimized machine code size, instead of inline cost based on not yet fully optimized IR. Differential Revision: https://reviews.llvm.org/D99146	2021-03-29 09:46:14 -07:00
Florian Hahn	c773d0f973	Recommit "[LV] Move runtime pointer size check to LVP::plan()." Re-apply `25fbe803d4`, with a small update to emit the right remark class. Original message: [LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri	2021-03-29 16:14:27 +01:00
Florian Hahn	485c8ce733	Revert "[LV] Move runtime pointer size check to LVP::plan()." This reverts commit `25fbe803d4`. This breaks a clang test which filters for the wrong remark type.	2021-03-29 14:41:53 +01:00
Sanjay Patel	da381cf7ce	[SLP] allow matching integer min/max intrinsics as reduction ops This is a 2nd try of: `3c8473ba53` which was reverted at: `a26312f9d4` because of crashing. This version includes extra code and tests to avoid the known crashing examples as discussed in PR49730. Original commit message: As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics. Most of the real work to make this possible was in: `7202f47508` Differential Revision: https://reviews.llvm.org/D98981	2021-03-29 09:38:18 -04:00
Florian Hahn	25fbe803d4	[LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98634	2021-03-29 14:12:29 +01:00
Matt Arsenault	9a0c9402fa	Reapply "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `07e46367ba`.	2021-03-29 08:55:30 -04:00
Jingu Kang	e4abb64100	[LoopUnswitch] Use reference variables instead of pointer one Differential Revision: https://reviews.llvm.org/D99496	2021-03-29 13:08:46 +01:00
Hans Wennborg	c6e5c4654b	Don't use $ as suffix for symbol names in ThinLTOBitcodeWriter and other places Using $ breaks demangling of the symbols. For example, $ c++filt _Z3foov\$123 _Z3foov$123 This causes problems for developers who would like to see nice stack traces etc., but also for automatic crash tracking systems which try to organize crashes based on the stack traces. Instead, use the period as suffix separator, since Itanium demanglers normally ignore such suffixes: $ c++filt _Z3foov.123 foo() [clone .123] This is already done in some places; try to do it everywhere. Differential revision: https://reviews.llvm.org/D97484	2021-03-29 13:03:52 +02:00
Oliver Stannard	07e46367ba	Revert "Reapply "OpaquePtr: Turn inalloca into a type attribute"" Reverting because test 'Bindings/Go/go.test' is failing on most buildbots. This reverts commit `fc9df30991`.	2021-03-29 11:32:22 +01:00
Jingu Kang	cfe87d4edd	[NFC][LoopUnswitch] Move hasPartialIVCondition to LoopUtils Differential revision: https://reviews.llvm.org/D99490	2021-03-29 10:29:45 +01:00
Matt Arsenault	fc9df30991	Reapply "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `20d5c42e0e`.	2021-03-28 13:35:21 -04:00
Sanjay Patel	01ae6e5ead	[InstCombine] sink min/max intrinsics with common op after select This is another step towards parity with cmp+select min/max idioms. See D98152.	2021-03-28 13:13:04 -04:00
Nico Weber	20d5c42e0e	Revert "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `4fefed6563`. Broke check-clang everywhere.	2021-03-28 13:02:52 -04:00
Matt Arsenault	4fefed6563	OpaquePtr: Turn inalloca into a type attribute I think byval/sret and the others are close to being able to rip out the code to support the missing type case. A lot of this code is shared with inalloca, so catch this up to the others so that can happen.	2021-03-28 11:12:23 -04:00
Florian Hahn	8c6c357897	[LV] Mark a few more cost-model members as const (NFC).	2021-03-28 14:59:48 +01:00
Florian Hahn	d2855eba81	[LV] Fix formatting from `2f9d68c3f1`.	2021-03-27 21:29:56 +00:00
Florian Hahn	2f9d68c3f1	[LV] Mark some methods as const (NFC). Mark a few methods as const, as they do not modify any state.	2021-03-27 21:27:53 +00:00
Juneyoung Lee	05884d3b52	Make FoldBranchToCommonDest poison-safe by default This is a small patch to make FoldBranchToCommonDest poison-safe by default. After `fc3f0c9c`, only two syntactic changes are needed to fix unit tests. This does not cause any assembly difference in testsuite as well (-O3, X86-64 Manjaro). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99452	2021-03-27 19:05:12 +09:00
Juneyoung Lee	fc3f0c9cc0	[IRCE] Use m_LogicalAnd This is a minor fix to use m_LogicalAnd. This allows IRCE to recognize select form of and conditions as well.	2021-03-27 15:23:18 +09:00
Hongtao Yu	12ac0403b1	[CSSPGO][NFC] Fix a debug dump issue. During context promotion, intermediate nodes that are on a call path but do not come with a profile can be promoted together with their parent nodes. Do not print sample context string for such nodes since they do not have profile. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D99441	2021-03-26 16:06:56 -07:00
Chris Lattner	62c41cfba1	Add a missing file header comment, NFC.	2021-03-26 15:34:04 -07:00
Nikita Popov	4622648a06	Revert "[ArgPromotion] Copy additional metadata for loads." This reverts commit `166620a4f0`. A miscompile has been reported in https://reviews.llvm.org/D93927#2653480 and following.	2021-03-26 21:34:54 +01:00
Florian Hahn	4858e081d7	[ConstraintElimination] Only strip casts preserving the representation. Things like addrspacecast may not be no-ops, so we should not look through them.	2021-03-26 20:07:41 +00:00
Sanjay Patel	b0797e0c12	[SLP] use dyn_cast instead of isa + cast; NFC	2021-03-26 13:52:31 -04:00
Sanjay Patel	a26312f9d4	Revert "[SLP] allow matching integer min/max intrinsics as reduction ops" This reverts commit `3c8473ba53` and includes test diffs to maintain testing status. There's at least 1 place that was not updated with `7202f47508` , so we can crash mismatching select and intrinsics as shown in PR49730.	2021-03-26 09:59:14 -04:00
David Sherwood	c39460cc4f	Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost" This reverts commit `240aa96cf2`.	2021-03-26 11:36:53 +00:00
David Sherwood	240aa96cf2	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. Differential Revision: https://reviews.llvm.org/D98512	2021-03-26 11:27:12 +00:00
Wenlei He	5f59f407f5	[CSSPGO] Minor tweak for inline candidate priority tie breaker When prioritize call site to consider for inlining in sample loader, use number of samples as a first tier breaker before using name/guid comparison. This would favor smaller functions when hotness is the same (from the same block). We could try to retrieve accurate function size if this turns out to be more important. Differential Revision: https://reviews.llvm.org/D99370	2021-03-25 21:15:36 -07:00
Leonard Chan	36eaeaf728	[llvm][hwasan] Add Fuchsia shadow mapping configuration Ensure that Fuchsia shadow memory starts at zero. Differential Revision: https://reviews.llvm.org/D99380	2021-03-25 15:28:59 -07:00
Guozhi Wei	3240910f00	[DAE] Adjust param/arg attributes when changing parameter to undef In DeadArgumentElimination pass, if a function's argument is never used, corresponding caller's parameter can be changed to undef. If the param/arg has attribute noundef or other related attributes, LLVM LangRef(https://llvm.org/docs/LangRef.html#parameter-attributes) says its behavior is undefined. SimplifyCFG(D97244) takes advantage of this behavior and does bad transformation on valid code. To avoid this undefined behavior when change caller's parameter to undef, this patch removes noundef attribute and other attributes imply noundef on param/arg. Differential Revision: https://reviews.llvm.org/D98899	2021-03-25 14:53:22 -07:00
Roman Lebedev	1c55dcbca7	[NFCI][SimplifyCFG] Don't pay for a Small{Map,Set}Vector when plain SmallSet will suffice This only changes the cases where we really don't care about the iteration order of the underlying contained, namely when we will use the values from it to form DTU updates.	2021-03-25 23:25:40 +03:00
Yevgeny Rouban	f7ef26ef0b	[SLP] Fix crash in reduction for integer min/max The SCEV commit `b46c085d2b` [NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions seems to reveal a new crash in SLPVectorizer. SLP crashes expecting a SelectInst as an externally used value but umin() call is found. The patch relaxes the assumption to make the IR flag propagation safe. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D99328	2021-03-25 21:44:21 +07:00
Matt Morehouse	96a4167b4c	[HWASan] Use page aliasing on x86_64. Userspace page aliasing allows us to use middle pointer bits for tags without untagging them before syscalls or accesses. This should enable easier experimentation with HWASan on x86_64 platforms. Currently stack, global, and secondary heap tagging are unsupported. Only primary heap allocations get tagged. Note that aliasing mode will not work properly in the presence of fork(), since heap memory will be shared between the parent and child processes. This mode is non-ideal; we expect Intel LAM to enable full HWASan support on x86_64 in the future. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D98875	2021-03-25 07:04:14 -07:00
Alexey Bataev	568c874117	[SLP]Improve and simplify extendSchedulingRegion. We do not need to scan further if the upper end or lower end of the basic block is reached already and the instruction is not found. It means that the instruction is definitely in the lower part of basic block or in the upper block relatively. This should improve compile time for the very big basic blocks. Differential Revision: https://reviews.llvm.org/D99266	2021-03-25 05:31:58 -07:00
Sameer Sahasrabuddhe	b92c8c22b9	[NewPM] Disable non-trivial loop-unswitch on targets with divergence Unswitching a loop on a non-trivial divergent branch is expensive since it serializes the execution of both version of the loop. But identifying a divergent branch needs divergence analysis, which is a function level analysis. The legacy pass manager handles this dependency by isolating such a loop transform and rerunning the required function analyses. This functionality is currently missing in the new pass manager, and there is no safe way for the SimpleLoopUnswitch pass to depend on DivergenceAnalysis. So we conservatively assume that all non-trivial branches are divergent if the target has divergence. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D98958	2021-03-25 11:27:10 +00:00
Philip Reames	9a82f42d12	Plumb TLI through isSafeToExecuteUnconditionally [NFC] Split from D95815 to reduce patch size. Isn't (yet) used for anything, only the client side is wired up.	2021-03-24 17:52:04 -07:00
Matt Morehouse	c8ef98e5de	Revert "[HWASan] Use page aliasing on x86_64." This reverts commit `63f73c3eb9` due to breakage on aarch64 without TBI.	2021-03-24 16:18:29 -07:00
Roman Lebedev	2070fe7144	[NFCI][SimplifyCFG] Don't form DTU updates if we aren't going to apply them I think we may want to have a thin wrapper over a vector to deduplicate those `if(DTU)` predicates, and instead do them in the `insert()` itself.	2021-03-25 00:02:37 +03:00
Congzhe Cao	829c1b6443	[LoopInterchange] fix tightlyNested() in LoopInterchange legality This is yet another attempt to fix tightlyNested(). Add checks in tightlyNested() for the inner loop exit block, such that 1) if there is control-flow divergence in between the inner loop exit block and the outer loop latch, or 2) if the inner loop exit block contains unsafe instructions, tightlyNested() returns false. The reasoning behind is that after interchange, the original inner loop exit block, which was part of the outer loop, would be put into the new inner loop, and will be executed different number of times before and after interchange. Thus it should be dealt with appropriately. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D98263	2021-03-24 15:49:25 -04:00
Florian Hahn	9d45579279	[LV] Factor out phi type access to variable (NFC). A slight simplification of the code to reduce future diffs.	2021-03-24 19:25:22 +00:00
Florian Hahn	8d1342f79d	[LV] Remove redundant access to Legal::getReductionVars() (NFC). The reduction descriptor is retrieved earlier and stored in a variable RdxDesc already.	2021-03-24 19:15:14 +00:00

1 2 3 4 5 ...

27111 Commits