llvm-project

Commit Graph

Author	SHA1	Message	Date
Rong Xu	19ac75364f	[PGO] Improve hash-mismatch warning message This patch improves FDO hash-mismatch handling: (1) filter out warnings to weak functions. Weak functions definition will be overridden by a strong definition by linker. The hash mismatch in profile use compilation is expected. Make the profile hash mismatch warning under the existing option (default true). (2) add an option to trace the hash of functions with the specific string. Note that an empty string parameter will trace all functions. Differential Revision: https://reviews.llvm.org/D129002	2022-07-15 13:44:55 -07:00
Philip Reames	6ab686eb86	[LSR] Allow already invariant operand for ICmpZero matching [try 2] Changes since initial commit: * Wrapping a pointer in an SCEV unknown hides the base, and SCEV is only able to compute a subtraction when the bases are known to be equal. This results in a SCEVCouldNotCompute flowing forward and triggering asserts. Test case added in `d767b392`. * isLoopInvariant returns true for instructions outside the loop, but not necessarily above the loop. Since this code is allowed to visit uses of an IV outside of a loop, we have to make sure the operands of the compare are both invariant and dominating the header. Test case added in `2aed3cdb`. Original commit message follows... The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already are loop invariant. As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former? Differential Revision: https://reviews.llvm.org/D129793	2022-07-15 13:29:43 -07:00
Warren Ristow	c650793049	[Reassociate] Enable FP reassociation via 'reassoc' and 'nsz' Compiling with '-ffast-math' tuns on all the FastMathFlags (FMF), as expected, and that enables FP reassociation. Only the two FMF flags 'reassoc' and 'nsz' are technically required to perform reassociation, but disabling other unrelated FMF bits is needlessly suppressing the optimization. This patch fixes that needless suppression, and makes appropriate adjustments to test-cases, fixing some outstanding TODOs in the process. Fixes: #56483 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D129523	2022-07-15 11:44:35 -07:00
Philip Reames	6fe766beba	Revert "[LSR] Allow already invariant operand for ICmpZero matching" This reverts commit `9153515a7b`. Builtbot crash was reported in the commit thread, reverting while investigating.	2022-07-15 10:47:57 -07:00
Florian Hahn	aa00fb02c9	[LV] Use umax(VF * UF, MinProfTC) for scalable vectors. For scalable vectors, it is not sufficient to only check MinProfitableTripCount if it is >= VF.getKnownMinValue() * UF, because this property may not holder for larger values of vscale. In those cases, compute umax(VF * UF, MinProfTC) instead. This should fix https://lab.llvm.org/buildbot/#/builders/197/builds/2262	2022-07-15 10:23:14 -07:00
Philip Reames	9153515a7b	[LSR] Allow already invariant operand for ICmpZero matching The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already are loop invariant. As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former? Differential Revision: https://reviews.llvm.org/D129793	2022-07-15 09:51:00 -07:00
Nikita Popov	8a519b3c21	[InstCombine] Ensure constant folding in binop of select fold When folding a binop into a select, we need to ensure that one of the select arms actually does constant fold, otherwise we'll create two binop instructions and perform the reverse transform. Ensure this by performing an explicit constant folding attempt, and failing the transform if neither side simplifies. A simple alternative here would have been to limit the fold to ImmConstants, but given the current representation of scalable vector splats, this wouldn't be ideal.	2022-07-15 11:03:10 +02:00
Mel Chen	bd404fbcc8	[LV][NFC] Fix the condition for printing debug messages Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D128523	2022-07-15 01:47:33 -07:00
Nikita Popov	f75ccadcdd	[LSR] Create SCEVExpander earlier, use member isSafeToExpand() (NFC) This is a followup to D129630, which switches LSR to the member isSafeToExpand() variant, and removes the freestanding function. This is done by creating the SCEVExpander early (already during the analysis phase). Because the SCEVExpander is now available for the whole lifetime of LSRInstance, I've also made it into a member variable, rather than passing it around in even more places. Differential Revision: https://reviews.llvm.org/D129769	2022-07-15 09:41:23 +02:00
Craig Topper	0e718443c7	[SimplifyIndVar] Use enum class for ExtendKind. NFC I happened to notice a two places where the enum was being pass directly to the bool IsSigned argument of createExtendInst. This was functionally ok since SignExtended in the enum has value of 1, but the code shouldn't rely on that. Using an enum class prevents the enum from being convertible to bool, but does make writing the enum values more verbose. Since we now have to write ExtendKind:: in front of them, I've shortened the names of ZeroExtended and SignExtended. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129733	2022-07-14 10:03:58 -07:00
Philip Reames	3bc09c7da5	[SCEVExpander] Allow udiv with isKnownNonZero(RHS) + add vscale case Motivation here is to unblock LSRs ability to use ICmpZero uses - the major effect of which is to enable count down IVs. The test changes reflect this goal, but the potential impact is much broader since this isn't a change in LSR at all. SCEVExpander needs() to prove that expanding the expression is safe anywhere the SCEV expression is valid. In general, we can't expand any node which might fault (or exhibit UB) unless we can either a) prove it won't fault, or b) guard the faulting case. We'd been allowing non-zero constants here; this change extends it to non-zero values. vscale is never zero. This is already implemented in ValueTracking, and this change just adds the same logic in SCEV's range computation (which in turn drives isKnownNonZero). We should common up some logic here, but let's do that in separate changes. () As an aside, "needs" is such an interesting word here. First, we don't actually need to guard this at all; we could choose to emit a select for the RHS of ever udiv and remove this code entirely. Secondly, the property being checked here is way too strong. What the client actually needs is to expand the SCEV at some particular point in some particular loop. In the examples, the original urem dominates that loop and yet we completely ignore that information when analyzing legality. I don't plan to actively pursue either direction, just noting it for future reference. Differential Revision: https://reviews.llvm.org/D129710	2022-07-14 08:56:58 -07:00
Brendon Cahoon	58fec78231	Revert "[UnifyLoopExits] Reduce number of guard blocks" This reverts commit `e13248ab0e`. Need to revert because the transformation cannot occur for basic blocks that contain convergent instructions.	2022-07-14 10:33:52 -05:00
Warren Ristow	230c8c56f2	[Reassociate] Cleanup minor missed optimizations In analyzing issue #56483, it was noticed that running `opt` with `-reassociate` was missing some minor optimizations. For example, there were cases where the running `opt` on IR with floating-point instructions that have the `fast` flags applied, sometimes resulted in less efficient code than the input IR (things like dead instructions left behind, and missed reassociations). These were sometimes noted in the test-files with TODOs, to investigate further. This commit fixes some of these problems, removing some TODOs in the process. FTR, I refer to these as "minor" missed optimizations, because when running a full clang/llvm compilation, these inefficiencies are not happening, as other passes clean that residue up. Regardless, having cleaner IR produced by `opt`, makes assessing the quality of fixes done in `opt` easier.	2022-07-14 08:21:04 -07:00
Brendon Cahoon	c945d88d2b	Revert "[StructurizeCFG] Improve basic block ordering" This reverts commit `f1b05a0a2b`. Need to revert to due to issues identified with testing. The transformation is incorrect for blocks that contain convergent instructions.	2022-07-14 09:40:51 -05:00
Nikita Popov	9e6e631b38	[LoopPredication] Use isSafeToExpandAt() member function (NFC) As a followup to D129630, this switches a usage of the freestanding function in LoopPredication to use the member variant instead. This was the last use of the freestanding function, so drop it entirely.	2022-07-14 14:49:07 +02:00
Nikita Popov	dcf4b733ef	[SCEVExpander] Make CanonicalMode handing in isSafeToExpand() more robust (PR50506) isSafeToExpand() for addrecs depends on whether the SCEVExpander will be used in CanonicalMode. At least one caller currently gets this wrong, resulting in PR50506. Fix this by a) making the CanonicalMode argument on the freestanding functions required and b) adding member functions on SCEVExpander that automatically take the SCEVExpander mode into account. We can use the latter variant nearly everywhere, and thus make sure that there is no chance of CanonicalMode mismatch. Fixes https://github.com/llvm/llvm-project/issues/50506. Differential Revision: https://reviews.llvm.org/D129630	2022-07-14 14:41:51 +02:00
zhongyunde	fc6092fd4d	[IndVars] Eliminate redundant type cast between unsigned integer and float Extend for unsigned integer according the comment of D129191. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129358	2022-07-14 19:41:07 +08:00
Nikita Popov	7a43b382ce	[IndVars] Make sure header phi simplification preserves LCSSA form When simplifying instructions, make sure that the replacement preserves LCSSA form. This fixes the issue reported at: https://reviews.llvm.org/D129293#3650851	2022-07-14 11:46:48 +02:00
Nikita Popov	ebc54e0cd4	[SCCP] Make check for unknown/undef in unary op handling more explicit (NFCI) Make the implementation more similar to other functions, by explicitly skipping an unknown/undef first, and always falling back to overdefined at the end. I don't think it makes a difference now, but could make one once the constant evaluation can fail. In that case we would directly mark the result as overdefined now, rather than keeping it unknown (and later making it overdefined because we think it's undef-based).	2022-07-14 10:56:11 +02:00
Nikita Popov	6db3edc858	[SCCP] Don't check for UndefValue before calling markConstant() The value lattice explicitly represents undef, and markConstant() internally checks for UndefValue and will create an undef rather than constant lattice element in that case. This is mostly a code simplification, it has little practical impact because we usually get undef results from undef operands, and those don't get processed. Only leave the check behind for the CmpInst case, because it currently goes through this incorrect code in the getCompare() implementation: `f98697642c/llvm/include/llvm/Analysis/ValueLattice.h (L456-L457)` Differential Revision: https://reviews.llvm.org/D128330	2022-07-14 10:05:56 +02:00
Kazu Hirata	611ffcf4e4	[llvm] Use value instead of getValue (NFC)	2022-07-13 23:11:56 -07:00
Florian Hahn	ee37ae91b6	[VPlan] Move VPBB verification to separate function (NFC).	2022-07-13 18:53:40 -07:00
Florian Hahn	6f7347b888	[LV] Use PredRecipe directly instead of getOrAddVPValue (NFC). There is no need to look up the VPValue for Instr, PredRecipe can be used directly.	2022-07-13 17:01:42 -07:00
Alexander Shaposhnikov	c916840539	[SimplifyCFG] Improve SwitchToLookupTable optimization Try to use the original value as an index (in the lookup table) in more cases (to avoid one subtraction and shorten the dependency chain) (https://github.com/llvm/llvm-project/issues/56189). Test plan: 1/ ninja check-all 2/ bootstrapped LLVM + Clang pass tests Differential revision: https://reviews.llvm.org/D128897	2022-07-13 23:21:45 +00:00
Leonard Chan	21f72c05c4	[hwasan] Add __hwasan_add_frame_record to the hwasan interface Hwasan includes instructions in the prologue that mix the PC and SP and store it into the stack ring buffer stored at __hwasan_tls. This is a thread_local global exposed from the hwasan runtime. However, if TLS-mechanisms or the hwasan runtime haven't been setup yet, it will be invalid to access __hwasan_tls. This is the case for Fuchsia where we instrument libc, so some functions that are instrumented but can run before hwasan initialization will incorrectly access this global. Additionally, libc cannot have any TLS variables, so we cannot weakly define __hwasan_tls until the runtime is loaded. A way we can work around this is by moving the instructions into a hwasan function that does the store into the ring buffer and creating a weak definition of that function locally in libc. This way __hwasan_tls will not actually be referenced. This is not our long-term solution, but this will allow us to roll out hwasan in the meantime. This patch includes: - A new llvm flag for choosing to emit a libcall rather than instructions in the prologue (off by default) - The libcall for storing into the ringbuffer (__hwasan_add_frame_record) Differential Revision: https://reviews.llvm.org/D128387	2022-07-13 15:15:15 -07:00
Leonard Chan	d843d5c8e6	Revert "[hwasan] Add __hwasan_record_frame_record to the hwasan interface" This reverts commit `4956620387`. This broke a sanitizer builder: https://lab.llvm.org/buildbot/#/builders/77/builds/19597	2022-07-13 15:06:07 -07:00
Florian Hahn	225e3ec622	[LV] Move VPBranchOnMaskRecipe::execute to VPlanRecipes.cpp (NFC).	2022-07-13 14:39:59 -07:00
leonardchan	4956620387	[hwasan] Add __hwasan_record_frame_record to the hwasan interface Hwasan includes instructions in the prologue that mix the PC and SP and store it into the stack ring buffer stored at __hwasan_tls. This is a thread_local global exposed from the hwasan runtime. However, if TLS-mechanisms or the hwasan runtime haven't been setup yet, it will be invalid to access __hwasan_tls. This is the case for Fuchsia where we instrument libc, so some functions that are instrumented but can run before hwasan initialization will incorrectly access this global. Additionally, libc cannot have any TLS variables, so we cannot weakly define __hwasan_tls until the runtime is loaded. A way we can work around this is by moving the instructions into a hwasan function that does the store into the ring buffer and creating a weak definition of that function locally in libc. This way __hwasan_tls will not actually be referenced. This is not our long-term solution, but this will allow us to roll out hwasan in the meantime. This patch includes: - A new llvm flag for choosing to emit a libcall rather than instructions in the prologue (off by default) - The libcall for storing into the ringbuffer (__hwasan_record_frame_record) Differential Revision: https://reviews.llvm.org/D128387	2022-07-14 05:07:11 +08:00
Martin Sebor	ab7ee3c991	[InstCombine] Enable strtol folding with nonnull endptr Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129593	2022-07-13 09:26:34 -06:00
Nikita Popov	07146a9e64	[SCCP] Fix typo in previous commit Ooops, I tested a build from the wrong checkout.	2022-07-13 16:22:40 +02:00
Nikita Popov	e298dfbc1b	[SCCP] Avoid ConstantExpr::get() call Use ConstantFoldUnaryOpOperand() API instead. This is in preparation for removing fneg constant expressions.	2022-07-13 16:20:34 +02:00
Max Kazantsev	62f4572e45	[IndVars][NFC] Make IVOperand parameter an instruction	2022-07-13 19:07:16 +07:00
Max Kazantsev	30e33b4b81	[SCEV][NFC] Make getStrengthenedNoWrapFlagsFromBinOp return optional	2022-07-13 18:54:25 +07:00
David Sherwood	307ace7f20	[LoopVectorize] Ensure the VPReductionRecipe is placed after all it's inputs When vectorising ordered reductions we call a function LoopVectorizationPlanner::adjustRecipesForReductions to replace the existing VPWidenRecipe for the fadd instruction with a new VPReductionRecipe. We attempt to insert the new recipe in the same place, but this is wrong because createBlockInMask may have generated new recipes that VPReductionRecipe now depends upon. I have changed the insertion code to append the recipe to the VPBasicBlock instead. Added a new RUN with tail-folding enabled to the existing test: Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll Differential Revision: https://reviews.llvm.org/D129550	2022-07-13 09:29:25 +01:00
Nikita Popov	af49bed933	[IndVars] Simplify instructions after replacing header phi with preheader value After replacing a loop phi with the preheader value, it's usually possible to simplify some of the using instructions, so do that as part of replaceLoopPHINodesWithPreheaderValues(). Doing this as part of IndVars is valuable, because it may make GEPs in the loop have constant offsets and allow the following SROA run to succeed (as demonstrated in the PhaseOrdering test). Differential Revision: https://reviews.llvm.org/D129293	2022-07-13 10:27:04 +02:00
Nikita Popov	a5ee62a141	[IndVars] Call replaceLoopPHINodesWithPreheaderValues() for already constant exits Currently we only call replaceLoopPHINodesWithPreheaderValues() if optimizeLoopExits() replaces the exit with an unconditional exit. However, it is very common that this already happens as part of eliminateIVComparison(), in which case we're leaving behind the dead header phi. Tweak the early bailout for already-constant exits to also call replaceLoopPHINodesWithPreheaderValues(). Differential Revision: https://reviews.llvm.org/D129214	2022-07-13 09:43:21 +02:00
Augie Fackler	9029bda041	[Attributor] Don't crash if getAnalysisResultForFunction() returns null LoopInfo I have no idea what's going on here. This code was moved around/introduced in change `cb26b01d57` and starts crashing with a NULL dereference once I apply https://reviews.llvm.org/D123090. I assume that I've unwittingly taught the attributor enough that it's able to do more clever things than in the past, and it's able to trip on this case. I make no claims about the correctness of this patch, but it passes tests and seems to fix all the crashes I've been seeing. Differential Revision: https://reviews.llvm.org/D129589	2022-07-12 16:44:06 -04:00
Yuanfang Chen	fcb7d76d65	[coroutine] add nomerge function attribute to `llvm.coro.save` It is illegal to merge two `llvm.coro.save` calls unless their `llvm.coro.suspend` users are also merged. Marks it "nomerge" for the moment. This reverts D129025. Alternative to D129025, which affects other token type users like WinEH. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D129530	2022-07-12 10:39:38 -07:00
Nick Desaulniers	2240d72f15	[X86] initial -mfunction-return=thunk-extern support Adds support for: * `-mfunction-return=<value>` command line flag, and * `__attribute__((function_return("<value>")))` function attribute Where the supported <value>s are: * keep (disable) * thunk-extern (enable) thunk-extern enables clang to change ret instructions into jmps to an external symbol named __x86_return_thunk, implemented as a new MachineFunctionPass named "x86-return-thunks", keyed off the new IR attribute fn_ret_thunk_extern. The symbol __x86_return_thunk is expected to be provided by the runtime the compiled code is linked against and is not defined by the compiler. Enabling this option alone doesn't provide mitigations without corresponding definitions of __x86_return_thunk! This new MachineFunctionPass is very similar to "x86-lvi-ret". The <value>s "thunk" and "thunk-inline" are currently unsupported. It's not clear yet that they are necessary: whether the thunk pattern they would emit is beneficial or used anywhere. Should the <value>s "thunk" and "thunk-inline" become necessary, x86-return-thunks could probably be merged into x86-retpoline-thunks which has pre-existing machinery for emitting thunks (which could be used to implement the <value> "thunk"). Has been found to build+boot with corresponding Linux kernel patches. This helps the Linux kernel mitigate RETBLEED. * CVE-2022-23816 * CVE-2022-28693 * CVE-2022-29901 See also: * "RETBLEED: Arbitrary Speculative Code Execution with Return Instructions." * AMD SECURITY NOTICE AMD-SN-1037: AMD CPU Branch Type Confusion * TECHNICAL GUIDANCE FOR MITIGATING BRANCH TYPE CONFUSION REVISION 1.0 2022-07-12 * Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 SystemZ may eventually want to support "thunk-extern" and "thunk"; both options are used by the Linux kernel's CONFIG_EXPOLINE. This functionality has been available in GCC since the 8.1 release, and was backported to the 7.3 release. Many thanks for folks that provided discrete review off list due to the embargoed nature of this hardware vulnerability. Many Bothans died to bring us this information. Link: https://www.youtube.com/watch?v=IF6HbCKQHK8 Link: https://github.com/llvm/llvm-project/issues/54404 Link: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01197.html Link: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html Link: https://arstechnica.com/information-technology/2022/07/intel-and-amd-cpus-vulnerable-to-a-new-speculative-execution-attack/?comments=1 Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce114c866860aa9eae3f50974efc68241186ba60 Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00702.html Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00707.html Reviewed By: aaron.ballman, craig.topper Differential Revision: https://reviews.llvm.org/D129572	2022-07-12 09:17:54 -07:00
David Sherwood	6b694d600a	[LoopVectorize] Change PredicatedBBsAfterVectorization to be per VF When calculating the cost of Instruction::Br in getInstructionCost we query PredicatedBBsAfterVectorization to see if there is a scalar predicated block. However, this meant that the decisions being made for a given fixed-width VF were affecting the cost for a scalable VF. As a result we were returning InstructionCost::Invalid pointlessly for a scalable VF that should have a low cost. I encountered this for some loops when enabling tail-folding for scalable VFs. Test added here: Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll Differential Revision: https://reviews.llvm.org/D128272	2022-07-12 14:53:20 +01:00
Nikita Popov	3d475dfeb9	[Mem2Reg] Consistently preserve nonnull assume for uninit load When performing a !nonnull load from uninitialized memory, we should preserve the nonnull assume just like in all other cases. We already do this correctly in the generic mem2reg code, but don't handle this case when using the optimized single-block implementation. Make sure that the optimized implementation exhibits the same behavior as the generic implementation.	2022-07-12 12:53:08 +02:00
Kazu Hirata	ec9a0e36d9	[IPO] Remove addLTOOptimizationPasses and addLateLTOOptimizationPasses (NFC) The last uses were removed on Apr 15, 2022 in commit `2e6ac54cf4`. Differential Revision: https://reviews.llvm.org/D129460	2022-07-11 20:15:24 -07:00
Florian Hahn	5d135041c5	[LV] Move VPBlendRecipe::execute to VPlanRecipes.cpp (NFC).	2022-07-11 16:01:07 -07:00
Justin Cady	3d438ceed1	[InstrProf] Mark __llvm_profile_runtime hidden to match libclang_rt.profile definition Mark the symbol hidden to match INSTR_PROF_PROFILE_RUNTIME_VAR in compiler-rt. Fixes second issue discussed at https://discourse.llvm.org/t/63090 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D128842	2022-07-11 11:29:20 -07:00
David Sherwood	03fee6712a	[LoopVectorize] Add option to use active lane mask for loop control flow Currently, for vectorised loops that use the get.active.lane.mask intrinsic we only use the mask for predicated vector operations, such as masked loads and stores, etc. The loop itself is still controlled by comparing the canonical induction variable with the trip count. However, for some targets this is inefficient when it's cheap to use the mask itself to control the loop. This patch adds support for using the active lane mask for control flow by: 1. Generating the active lane mask for the next iteration of the vector loop, rather than the current one. If there are still any remaining iterations then at least the first bit of the mask will be set. 2. Extract the first bit of this mask and use this bit for the conditional branch. I did this by creating a new VPActiveLaneMaskPHIRecipe that sets up the initial PHI values in the vector loop pre-header. I've also made use of the new BranchOnCond VPInstruction for the final instruction in the loop region. Differential Revision: https://reviews.llvm.org/D125301	2022-07-11 13:46:55 +01:00
David Sherwood	02d6950d84	[LoopVectorize][NFC] Add optional Name parameter to VPInstruction This patch is a simple piece of refactoring that now permits users to create VPInstructions and specify the name of the value being generated. This is useful for creating more readable/meaningful names in IR. Differential Revision: https://reviews.llvm.org/D128982	2022-07-11 09:23:24 +01:00
Florian Hahn	6a4bc452f8	[LV] Move VPWidenGEPRecipe::execute to VPlanRecipes.cpp (NFC).	2022-07-10 17:10:17 -07:00
Florian Hahn	13ae213469	[LV] Move VPWidenRecipe::execute to VPlanRecipes.cpp (NFC).	2022-07-09 18:46:57 -07:00
Paul Osmialowski	b17754bcaa	[SimplifyLibCalls] refactor pow(x, n) expansion where n is a constant integer value Since the backend's codegen is capable to expand powi into fmul's, it is not needed anymore to do so in the ::optimizePow() function of SimplifyLibCalls.cpp. What is sufficient is to always turn pow(x, n) into powi(x, n) for the cases where n is a constant integer value. Dropping the current expansion code allowed relaxation of the folding conditions and now this can also happen at optimization levels below Ofast. The added CodeGen/AArch64/powi.ll test case ensures that powi is actually expanded into fmul's, confirming that this refactor did not cause any performance degradation. Following an idea proposed by David Sherwood <david.sherwood@arm.com>. Differential Revision: https://reviews.llvm.org/D128591	2022-07-09 12:00:22 -04:00
Florian Hahn	0c27b38849	[VPlan] Move VPWidenSelectRecipe::execute to VPlanRecipes.cpp (NFC). Depends on D127968. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D127970	2022-07-08 09:35:23 -07:00
Nikita Popov	d287051404	[InstCombine] Avoid ConstantExpr::get() in vector binop fold (NFCI) Use the ConstantFoldBinaryOpOperands() API instead. This case would bail out on a non-folded result anyway.	2022-07-08 17:20:14 +02:00
Nikita Popov	29c6bf45c3	[InstCombine] Avoid ConstantExpr::get() call Avoid calling ConstantExpr::get() for associative/commutative binops, call ConstantFoldBinaryOpOperands() instead. We only want to perform the reassociation of the constants actually fold.	2022-07-08 17:13:06 +02:00
Nikita Popov	fc18a88231	[InstCombine] Avoid creating float binop ConstantExprs Replace ConstantExpr:getFAdd etc with call to ConstantFoldBinaryOpOperands(). I'm using the constant folding API rather than IRBuilder here to ensure that this does actually constant fold. These transforms don't use m_ImmConstant(), so this would not otherwise be guaranteed (and apparently, they can't use m_ImmConstant because they want to handle scalable vector splats). There is an opportunity here to further migrate these to the ConstantFoldFPInstOperands() API, which would respect the denormal mode. I've held off on doing so here, because some of this code explicitly checks for denormal results, and I don't want to touch it in a mostly NFC change.	2022-07-08 16:36:04 +02:00
Sanjay Patel	79bb915fb6	[InstCombine] enhance fold for subtract-from-constant -> xor A low-bit mask is not required: https://alive2.llvm.org/ce/z/yPShss This matches the SDAG implementation that was updated at: `8b75671314`	2022-07-08 10:02:19 -04:00
zhongyunde	716e1b856a	[IndVars] Eliminate redundant type cast between integer and float Recompute the range: match for fptosi of sitofp, and then query the range of the input to the sitofp according the comment on D129140. Fixes https://github.com/llvm/llvm-project/issues/55505. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129191	2022-07-08 17:07:20 +08:00
ChenYang Li	6d036b83d1	[JumpThreading] Avoid threadThroughTwoBasicBlocks when PredPred BB ends with indirectbranch Since we can't change the destination of indirectbr, so when encounter indirectbr as PredPredBB terminator, we should pass it. Differential Revision: https://reviews.llvm.org/D129193	2022-07-08 09:29:17 +02:00
Nikita Popov	34a5c2bcf2	[BasicBlockUtils] Allow critical edge splitting with callbr terminators After D129205, we support SplitBlockPredecessors() for predecessors with callbr terminators. This means that it is now also safe to invoke critical edge splitting for an edge coming from a callbr terminator. Remove checks in various passes that were protecting against that. Differential Revision: https://reviews.llvm.org/D129256	2022-07-08 09:20:44 +02:00
Craig Topper	0266773464	[SLP] Add missing space to optimization remark. Reviewed By: vporpo Differential Revision: https://reviews.llvm.org/D129330	2022-07-07 23:29:11 -07:00
Johannes Doerfert	f6e0c05e3d	Revert "[Attributor] Replace AAValueSimplify with AAPotentialValues" This reverts commit `f17639ea0c` as three AMDGPU tests haven't been updated. Will need to verify the changes are not regressions we should avoid.	2022-07-08 00:53:38 -05:00
Johannes Doerfert	f17639ea0c	[Attributor] Replace AAValueSimplify with AAPotentialValues For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many reasons: - We recomputed the result a lot as there was no caching for the 9 locations calling `genericValueTraversal`. - We added the idea of "intra" vs. "inter" procedural simplification only as an afterthought. `genericValueTraversal` did offer an option but `AAValueSimplify` did not. Thus, we might end up with "too much" simplification in certain situations and then gave up on it. - Because `genericValueTraversal` was not a real `AA` we ended up with problems like the infinite recursion bug (#54981) as well as code duplication. This patch introduces `AAPotentialValues` and replaces the `AAValueSimplify` uses with it. `genericValueTraversal` is folded into `AAPotentialValues` as are the instruction simplifications performed in `AAValueSimplify` before. We further distinguish "intra" and "inter" procedural simplification now. `AAValueSimplify` was not deleted as we haven't ported the re-materialization of instructions yet. There are other differences over the former handling, e.g., we may not fold trivially foldable instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2` but if an operand would be simplified to `i32 1` we would fold it still. We are also even more aware of function/SCC boundaries in CGSCC passes, which is good even if some tests look like they regress. Fixes: https://github.com/llvm/llvm-project/issues/54981 Note: A previous version was flawed and consequently reverted in `6555558a80`.	2022-07-08 00:38:27 -05:00
Johannes Doerfert	cb26b01d57	[Attributor] Make heap2stack record alloca placement We recently learned to place the alloca during the heap2stack transformation in the entry block but we did not account for other concurrent modifications. We need to record our decision rather than checking (then outdated) passes during the manifest stage. This will also allow us to use a custom (=optimistic) "loop info" in the future.	2022-07-07 16:49:22 -05:00
Johannes Doerfert	efe8c581ff	[Attributor][NFC] Improve heap2stack result readability and code style	2022-07-07 16:49:22 -05:00
Johannes Doerfert	c771eaf07e	[OpenMP] Ensure to not use SPMD mode in the absence of parallel regions	2022-07-07 16:49:22 -05:00
Leonard Chan	0f589826a3	[hwasan] Refactor frame record info into function This way it can be reused easily in D128387. Note this changes the IR slightly. Before The steps for calculating and storing the frame record info were: 1. getPC 2. getSP 3. inttoptr 4. or SP, PC 5. store Now the steps are: 1. getPC 2. getSP 3. or SP, PC 4. inttoptr 5. store Differential Revision: https://reviews.llvm.org/D129315	2022-07-07 14:44:39 -07:00
Martin Sebor	516915beb5	[InstCombine] Fold memchr and strchr equality with first argument Enhance memchr and strchr handling to simplify calls to the functions used in equality expressions with the first argument to at most two integer comparisons: - memchr(A, C, N) == A to N && A == C for either a dereferenceable A or a nonzero N, - strchr(S, C) == S to S == C for any S and C, and - strchr(S, '\0') == 0 to true for any S Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D128939	2022-07-07 15:14:23 -06:00
Zaara Syeda	58b9666dc1	[LSR] Fix bug - check if loop has preheader before calling isInductionPHI Fix bug exposed by https://reviews.llvm.org/D125990 rewriteLoopExitValues calls InductionDescriptor::isInductionPHI which requires the PHI node to have an incoming edge from the loop preheader. This adds checks before calling InductionDescriptor::isInductionPHI to see that the loop has a preheader. Also did some refactoring. Differential Revision: https://reviews.llvm.org/D129297	2022-07-07 15:11:33 -04:00
Daniel Bertalan	ef7aed3e11	[InstCombine] Do not fold 'and (sext (ashr X, Shift)), C' if Shift < 0 The 'and (sext (ashr X, ShiftC)), C' --> 'lshr (sext X), ShiftC' transformation would access out of bounds bits in APInt::getLowBitsSet if the shift count was larger than X's bit width or if it was negative. Fixes #56424	2022-07-07 19:13:55 +02:00
Joseph Huber	41fba3c107	[Metadata] Add 'exclude' metadata to add the exclude flags on globals This patchs adds a new metadata kind `exclude` which implies that the global variable should be given the necessary flags during code generation to not be included in the final executable. This is done using the ``SHF_EXCLUDE`` flag on ELF for example. This should make it easier to specify this flag on a variable without needing to explicitly check the section name in the target backend. Depends on D129053 D129052 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D129151	2022-07-07 12:20:40 -04:00
Joseph Huber	ed801ad5e5	[Clang] Use metadata to make identifying embedded objects easier Currently we use the `embedBufferInModule` function to store binary strings containing device offloading data inside the host object to create a fatbinary. In the case of LTO, we need to extract this object from the LLVM-IR. This patch adds a metadata node for the embedded objects containing the embedded pointers and the sections they were stored at. This should create a cleaner interface for identifying these values. In the future it may be worthwhile to also encode an `ID` in the metadata corresponding to the object's special section type if relevant. This would allow us to extract the data from an object file and LLVM-IR using the same ID. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D129033	2022-07-07 12:20:25 -04:00
Florian Hahn	bc19b7c3cc	[LV] Remove collectTriviallyDeadInstructions, already handled by VP DCE. Now that removeDeadRecipes can remove most dead recipes across a whole VPlan, there is no need to first collect some dead instructions. Instead removeDeadRecipes can simply clean them up. Depends D127580. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D128408	2022-07-07 08:40:27 -07:00
Sander de Smalen	519d7876cb	[VectorCombine] Avoid creating shuffle for extract-extract pattern on scalable vector. This addresses https://github.com/llvm/llvm-project/issues/56377 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D129136	2022-07-07 08:37:04 +00:00
Nikita Popov	40a4078e14	[BasicBlockUtils] Allow splitting predecessors with callbr terminators SplitBlockPredecessors currently asserts if one of the predecessor terminators is a callbr. This limitation was originally necessary, because just like with indirectbr, it was not possible to replace successors of a callbr. However, this is no longer the case since D67252. As the requirement nowadays is that callbr must reference all blockaddrs directly in the call arguments, and these get automatically updated when setSuccessor() is called, we no longer need this limitation. The only thing we need to do here is use replaceSuccessorWith() instead of replaceUsesOfWith(), because only the former does the necessary blockaddr updating magic. I believe there's other similar limitations that can be removed, e.g. related to critical edge splitting. Differential Revision: https://reviews.llvm.org/D129205	2022-07-07 09:13:25 +02:00
Chuanqi Xu	66e15d4c01	[NFC] [Coroutines] Update the comments for lowering coro.save The original comment is not right. We don't store 0 all the time.	2022-07-07 14:57:41 +08:00
Florian Hahn	17d48c3169	[VPlan] Move remove dead recipes before merging regions. This can enable additional region merging, while not losing opportunities as region merging does not produce dead recipes. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D128831	2022-07-06 20:38:38 -07:00
Chuanqi Xu	e3b4452e07	[Debug] [Coroutines] Get rid of DW_ATE_address Closing https://github.com/llvm/llvm-project/issues/55916 This patch tries to get rid of DW_ATE_address and enhance the test coverage. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D127625	2022-07-07 10:47:09 +08:00
Chuanqi Xu	7137ebc4ce	[Debug] [Coroutine] Adjust the scope and name for coroutine frame Previously the scope of debug type of __coro_frame is limited in the current function. It looked good at the first sight. But it prevent us to print the type in splitted functions and other functions. Also the debug type is different for different coroutine functions. So it makes sense to rename the debug type to make it related to the function name. After this patch, we could access the coroutine frame type in a function by `function_name.coro_frame_ty`. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D127623	2022-07-07 10:35:32 +08:00
Vir Narula	89a99ec900	[GVN] Bug fix to reportMayClobberedLoad remark Bug fix to avoid assert crashing when generating remarks for GVN crashing. Intention of assert is correct but ignores edge case of instructions being equivalent. Reduced input that causes crash when remarks are turned on: ``` target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128" target triple = "arm64-apple-macosx12.0.0" define ptr @ReplaceWithTidy(ptr %zz_hold) { cond.end480.us: %0 = load ptr, ptr null, align 8 store ptr %0, ptr %0, align 8 store ptr null, ptr %zz_hold, align 8 %1 = load ptr, ptr %0, align 8 store ptr %1, ptr null, align 8 ret ptr null } ``` Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D129235	2022-07-06 17:42:05 -07:00
Wolfgang Pieb	ff87ee4dee	[Metadata] Utilize the resizing capability of MDNodes in Moduleflag processing. This mostly affects PGO/LTO builds which use module flags describing the call graph. Fixes Issue #51893. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D125999	2022-07-06 10:18:33 -07:00
Nikola Tesic	b5b6d3a41b	[Debugify] Port verify-debuginfo-preserve to NewPM Debugify in OriginalDebugInfo mode, introduced with D82545, runs only with legacy PassManager. This patch enables this utility for the NewPM. Differential Revision: https://reviews.llvm.org/D115351	2022-07-06 17:07:20 +02:00
Shilei Tian	1023ddaf77	[LLVM] Add the support for fmax and fmin in atomicrmw instruction This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instruction will be expanded to CAS loop. There are already a couple of targets supporting the feature. I'll create another patch(es) to enable them accordingly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127041	2022-07-06 10:57:53 -04:00
Nikita Popov	20962c1240	[SimplifyCFG] Don't split predecessors of callbr terminator This addresses the assertion failure reported in https://reviews.llvm.org/D124159#3631240. I believe that this limitation in SplitBlockPredecessors is not actually necessary (because unlike with indirectbr, callbr is restricted in a way that does allow updating successors), but for now fix the assertion failure the same way we do everywhere else, by also skipping callbr.	2022-07-06 15:38:53 +02:00
Dimitrije Milosevic	9f492a9ae5	[MIPS] Fix the ASAN shadow offset hook for the N32 ABI Currently, LLVM doesn't have the correct shadow offset mapping for the n32 ABI. This patch introduces the correct shadow offset value for the n32 ABI - 1ULL << 29. Differential Revision: https://reviews.llvm.org/D127096	2022-07-06 12:44:28 +02:00
Nikita Popov	f96cb66d19	[ValueTracking] Accept Instruction in isSafeToSpeculativelyExecute() (NFC) As constant expressions can no longer trap, it only makes sense to call isSafeToSpeculativelyExecute on Instructions, so limit the API to accept only them, rather than general Operators or Values.	2022-07-06 11:12:49 +02:00
Chenbing Zheng	851447cb32	[InstCombine] remove useless insertelement extractelement (bitcast (insertelement (Vec, b)), a) -> extractelement (bitcast (Vec), a) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D128890	2022-07-06 17:05:27 +08:00
Nikita Popov	1ed8b29302	[LoopVectorizationLegality] Drop unused variable (NFC)	2022-07-06 10:43:39 +02:00
Nikita Popov	8ee913d83b	[IR] Remove Constant::canTrap() (NFC) As integer div/rem constant expressions are no longer supported, constants can no longer trap and are always safe to speculate. Remove the Constant::canTrap() method and its usages.	2022-07-06 10:36:47 +02:00
Yuanfang Chen	b170d856a3	[SimplifyCFG] Skip hoisting common instructions that return token type By LangRef, hoisting token-returning instructions obsures the origin so it should be skipped. Found this issue while investigating a CoroSplit pass crash. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129025	2022-07-05 11:21:57 -07:00
Zaara Syeda	dbf6ab5ef9	[LSR] Fix bug for optimizing unused IVs to final values This is a fix for a crash reported for https://reviews.llvm.org/D118808 The fix is to only consider PHINodes which are induction phis. Fixes #55529 Differential Revision: https://reviews.llvm.org/D125990	2022-07-05 12:30:58 -04:00
David Green	5493f8fc59	[VectorCombine] Improve shuffle select shuffle-of-shuffles This in an extension to the code added in D123911 which added vector combine folding of shuffle-select patterns, attempting to reduce the total amount of shuffling required in patterns like: %x = shuffle %i1, %i2 %y = shuffle %i1, %i2 %a = binop %x, %y %b = binop %x, %y shuffle %a, %b, selectmask This patch extends the handing of shuffles that are dependent on one another, which can arise from the SLP vectorizer, as-in: %x = shuffle %i1, %i2 %y = shuffle %x The input shuffles can also be emitted, in which case they are treated like identity shuffles. This patch also attempts to calculate a better ordering of input shuffles, which can help getting lower cost input shuffles, pushing complex shuffles further down the tree. This is a recommit with some additional checks for supported forms and out-of-bounds mask elements, with some extra tests. Differential Revision: https://reviews.llvm.org/D128732	2022-07-05 17:16:18 +01:00
Nikita Popov	a4772cbaf0	Revert "[SimplifyCFG] Thread branches on same condition in more cases (PR54980)" This reverts commit `4e545bdb35`. The newly added test is the third infinite combine loop caused by this change. In this case, it's a combination of the branch to common dest and jump threading folds that keeps peeling off loop iterations. The core problem here is that we ideally would not thread over loop backedges, both because it is potentially non-profitable (it may break canonical loop structure) and because it may result in these kinds of loops. Unfortunately, due to the lack of a dominator tree in SimplifyCFG, there is no good way to prevent this. While we have LoopHeaders, this is an optional structure and we don't do a good job of keeping it up to date. It would be fine for a profitability check, but is not suitable for a correctness check. So for now I'm just giving up here, as I don't see a good way to robustly prevent infinite combine loops. Fixes https://github.com/llvm/llvm-project/issues/56203.	2022-07-05 16:57:46 +02:00
Nikita Popov	935570b2ad	[ConstExpr] Don't create div/rem expressions This removes creation of udiv/sdiv/urem/srem constant expressions, in preparation for their removal. I've added a ConstantExpr::isDesirableBinOp() predicate to determine whether an expression should be created for a certain operator. With this patch, div/rem expressions can still be created through explicit IR/bitcode, forbidding them entirely will be the next step. Differential Revision: https://reviews.llvm.org/D128820	2022-07-05 15:54:53 +02:00
Nikita Popov	dc969061c6	[SimplifyCFG] Thread all predecessors with same value at once If there are multiple predecessors that have the same condition value (and thus same "real destination"), these were previously handled by copying the threaded block for each predecessor. Instead, we can reuse one block for all of them. This makes the behavior of SimplifyCFG's jump threading match that of the actual JumpThreading pass. This also avoids the infinite combine loop reported in: https://reviews.llvm.org/D124159#3624387	2022-07-05 14:33:53 +02:00
Florian Hahn	ebb78a95ce	[LV] Remove stray dbgs() call after `774fc63490`.	2022-07-05 12:58:18 +01:00
Chenbing Zheng	b43dd2f6c4	[InstCombine] improve fold for icmp_eq_and to icmp_ult In D95959, the improve analysis for "C >> X" broken the fold ((%x & C) == 0) --> %x u< (-C) iff (-C) is power of two. It simplifies C, but fails to satisfy the fold condition. This patch try to restore C before the fold. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D128790	2022-07-05 17:18:23 +08:00
Chenbing Zheng	b66220f25a	[InstCombine] [NFC] use C.isNegatedPowerOf2() instead of (~C + 1).isPowerOf2() Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D129103	2022-07-05 17:04:59 +08:00
Florian Hahn	774fc63490	[LV] Consider minimum vscale assmuption for RT check cost. For scalable VFs, the minimum assumed vscale needs to be included in the cost-computation, otherwise a smaller VF may be used for RT check cost computation than was used for earlier cost computations. Fixes a RISCV test failing with UBSan due to both scalar and vector loops having the same cost.	2022-07-05 09:41:58 +01:00
Nikita Popov	b69c75d53f	Revert "[VectorCombine] Improve shuffle select shuffle-of-shuffles" This reverts commit `19a1e20b8a`. Clang crashes while linking bullet from llvm-test-suite in ReleaseLTO-g cmake configuration.	2022-07-05 09:31:20 +02:00
zhongyunde	b2b4c8721d	[InstCombine] Make use of low zero bits to determine exact int->fp cast According the comment https://reviews.llvm.org/D127854#inline-1226805, We could also make use of these low zero bits, https://alive2.llvm.org/ce/z/GYxTRu Reviewed By: spatel, nikic, xbolva00 Differential Revision: https://reviews.llvm.org/D128895	2022-07-05 09:15:12 +08:00
Sanjay Patel	142aca7741	[InstCombine] fold sub of min/max of sub with common operand x - max(x - y, 0) --> min(x, y) x - min(x - y, 0) --> max(x, y) https://alive2.llvm.org/ce/z/2YkqFe issue #55470	2022-07-04 18:55:24 -04:00
Sanjay Patel	4276d00b12	[InstCombine] add helper function for sub-of-min/max folds; NFC The test diffs are cosmetic -- but improvements -- because we let instcombine handle replacement. Instead of dropping the old value name, it propagates to the new instruction.	2022-07-04 17:43:18 -04:00
Florian Hahn	2a82c15f63	[LV] Consider runtime checks profitable if scalar cost is zero. This fixes an UBSan failure after `644a965c1e`. When using user-provided VFs/ICs (via the force-vector-width / force-vector-interleave options) the scalar cost is zero, which would cause divide-by-zero. When forcing vectorization using the options, the cost of the runtime checks should not block vectorization.	2022-07-04 21:37:16 +01:00
Florian Hahn	9eb6572786	[LV] Add back CantReorderMemOps remark. Add back remark unintentionally dropped by `644a965c1e`. I will add a LV test separately, so we do not have to rely on a Clang test to catch this.	2022-07-04 17:23:47 +01:00
Nikita Popov	abbd684c02	[InstCombine] Avoid ConstantExpr::get() in phi binop fold Use ConstantFoldBinaryOpOperands() instead, in preparation for not all binops having a supported constant expression.	2022-07-04 16:46:27 +02:00
Peter Waller	c146af3f46	[LoopVectorize][NFC] Reinstate TTICapture workaround for gcc-6 Fixes #56374.	2022-07-04 14:14:15 +00:00
Florian Hahn	644a965c1e	[LV] Vectorize cases with larger number of RT checks, execute only if profitable. This patch replaces the tight hard cut-off for the number of runtime checks with a more accurate cost-driven approach. The new approach allows vectorization with a larger number of runtime checks in general, but only executes the vector loop (and runtime checks) if considered profitable at runtime. Profitable here means that the cost-model indicates that the runtime check cost + vector loop cost < scalar loop cost. To do that, LV computes the minimum trip count for which runtime check cost + vector-loop-cost < scalar loop cost. Note that there is still a hard cut-off to avoid excessive compile-time/code-size increases, but it is much larger than the original limit. The performance impact on standard test-suites like SPEC2006/SPEC2006/MultiSource is mostly neutral, but the new approach can give substantial gains in cases where we failed to vectorize before due to the over-aggressive cut-offs. On AArch64 with -O3, I didn't observe any regressions outside the noise level (<0.4%) and there are the following execution time improvements. Both `IRSmk` and `srad` are relatively short running, but the changes are far above the noise level for them on my benchmark system. ``` CFP2006/447.dealII/447.dealII -1.9% CINT2017rate/525.x264_r/525.x264_r -2.2% ASC_Sequoia/IRSmk/IRSmk -9.2% Rodinia/srad/srad -36.1% ``` `size` regressions on AArch64 with -O3 are ``` MultiSource/Applications/hbd/hbd 90256.00 106768.00 18.3% MultiSourc...ks/ASCI_Purple/SMG2000/smg2000 240676.00 257268.00 6.9% MultiSourc...enchmarks/mafft/pairlocalalign 472603.00 489131.00 3.5% External/S...2017rate/525.x264_r/525.x264_r 613831.00 630343.00 2.7% External/S...NT2006/464.h264ref/464.h264ref 818920.00 835448.00 2.0% External/S...te/538.imagick_r/538.imagick_r 1994730.00 2027754.00 1.7% MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 1236471.00 1253015.00 1.3% MultiSource/Applications/oggenc/oggenc 2108147.00 2124675.00 0.8% External/S.../CFP2006/447.dealII/447.dealII 4742999.00 4759559.00 0.3% External/S...rate/510.parest_r/510.parest_r 14206377.00 14239433.00 0.2% ``` Reviewed By: lebedev.ri, ebrevnov, dmgreen Differential Revision: https://reviews.llvm.org/D109368	2022-07-04 15:11:39 +01:00
David Green	2de05afc19	[SLP] Peek into loads when hitting the RecursionMaxDepth This patch slightly extends the limit on the RecursionMaxDepth inside the SLP vectorizer. It does it only when it hits a load (or zext/sext of a load), which allows it to peek through in the places where it will be the most valuable, without ballooning out the O(..) by any 2^n factors. Differential Revision: https://reviews.llvm.org/D122148	2022-07-04 14:22:50 +01:00
Nikita Popov	93cbdaef04	[Reassociate] Avoid ConstantExpr::get() Use ConstantFoldBinaryOpOperands() instead, to handle the case where not all binary ops have a constant expression variant. This is a bit awkward because we only want to pop the element from Ops once we're sure that it has folded.	2022-07-04 15:17:22 +02:00
Nikita Popov	32a76fc292	[SCEVExpander] Avoid ConstantExpr::get() (NFCI) Use ConstantFoldBinaryOpOperands() instead. This will be important when not all binops have constant expression variants.	2022-07-04 14:59:00 +02:00
David Green	19a1e20b8a	[VectorCombine] Improve shuffle select shuffle-of-shuffles This in an extension to the code added in D123911 which added vector combine folding of shuffle-select patterns, attempting to reduce the total amount of shuffling required in patterns like: %x = shuffle %i1, %i2 %y = shuffle %i1, %i2 %a = binop %x, %y %b = binop %x, %y shuffle %a, %b, selectmask This patch extends the handing of shuffles that are dependent on one another, which can arise from the SLP vectorizer, as-in: %x = shuffle %i1, %i2 %y = shuffle %x The input shuffles can also be emitted, in which case they are treated like identity shuffles. This patch also attempts to calculate a better ordering of input shuffles, which can help getting lower cost input shuffles, pushing complex shuffles further down the tree. Differential Revision: https://reviews.llvm.org/D128732	2022-07-04 13:38:43 +01:00
Nikita Popov	9604601c93	[SimplifyCFG] Remove redundant checks for hoisting (NFCI) These conditions are later checked in the HoistTerminator code path. Checking them here is somewhat confusing, because this code only checks the first instruction in the block, which is not necessarily the terminator.	2022-07-04 10:53:54 +02:00
Florian Hahn	b4694229aa	[LV] Simplify setDebugLocFromInst by using early exit (NFC). Suggested as separate improvement in D128657.	2022-07-04 09:25:26 +01:00
Sanjay Patel	f9f40aa10d	[InstCombine] fold negated low-bit-mask to cmp+select (-(X & 1)) & Y --> (X & 1) == 0 ? 0 : Y https://alive2.llvm.org/ce/z/rhpH3i This is noted as a missing IR canonicalization in issue #55618. We already managed to fix codegen to the expected form.	2022-07-03 12:25:26 -04:00
Nuno Lopes	53dc0f1078	[NFC] Switch a few uses of undef to poison as placeholders for unreachble code	2022-07-03 14:34:03 +01:00
Nuno Lopes	022bd92c78	[LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC]	2022-07-03 12:32:19 +01:00
Florian Hahn	b0da3c6fa4	[VPlan] Move setDebugLocFromInst to VPTransformState (NFC). The moved helpers are only used for codegen. It will allow moving the remaining ::execute implementations out of LoopVectorize.cpp. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D128657	2022-07-02 15:18:17 +01:00
Johannes Doerfert	07766f4070	[Attributor] Move heap2stack allocas to the entry block if possible If we are certainly not in a loop we can directly emit the heap2stack allocas in the function entry block. This will help to get rid of them (SROA) and avoid stacksave/restore intrinsics when the function is inlined.	2022-07-01 21:34:12 -05:00
Nuno Lopes	7c4f45f87a	Revert [LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC] This reverts commits `47e6f98f84` and `3e701bcd2a`	2022-07-01 23:53:41 +01:00
Nuno Lopes	47e6f98f84	[LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC]	2022-07-01 23:31:31 +01:00
Sanjay Patel	9c8a39c67b	[InstCombine] restrict select of bit-tests to constant shift amounts This transform is responsible for a long-standing miscompile as discussed in issue #47012 (was bugzilla #47668). There was a proposal to correct it in D88432, but that was abandoned and there hasn't been any recent activity to fix it AFAICT. The original patch D45108 started with a constant-shift-only restriction and only expanded during review, so I don't think there's much risk of perf regression on the motivating code.	2022-07-01 16:24:34 -04:00
Martin Sebor	0d68ff87d2	[InstCombine] Transform strrchr to memrchr for constant strings Add an emitter for the memrchr common extension and simplify the strrchr call handler to use it. This enables transforming calls with the empty string to the test C ? S : 0. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D128954	2022-07-01 11:10:00 -06:00
Nikita Popov	65d59b4265	[LoopDeletion] Fix deletion with unusual predecessor terminator (PR56266) LoopSimplify only requires that the loop predecessor has a single successor and is safe to hoist into -- it doesn't necessarily have to be an unconditional BranchInst. Adjust LoopDeletion to assert conditions closer to what it actually needs for correctness, namely a single successor and a side-effect-free terminator (as the terminator is getting dropped). Fixes https://github.com/llvm/llvm-project/issues/56266.	2022-07-01 16:13:35 +02:00
Florian Hahn	0dddf04cab	[LV] Don't optimize exit cond during epilogue vectorization. At the moment, the same VPlan can be used code generation of both the main vector and epilogue vector loop. This can lead to wrong results, if the plan is optimized based on the VF of the main vector loop and then re-used for the epilogue loop. One example where this is problematic is if the scalar loops need to execute at least one iteration, e.g. due to interleave groups. To prevent mis-compiles in the short-term, disable optimizing exit conditions for VPlans when using epilogue vectorization. The proper fix is to avoid re-using the same plan for both loops, which will require support for cloning plans first. Fixes #56319.	2022-07-01 13:48:38 +01:00
Nikita Popov	fabe915705	[SimplifyLibCalls] Use inbounds GEP When converting strchr(p, '\0') to p + strlen(p) we know that strlen() must return an offset that is inbounds of the allocated object (otherwise it would be UB), so we can use an inbounds GEP. An equivalent argument can be made for the other cases.	2022-07-01 14:31:44 +02:00
Sanjay Patel	ab372cdd6f	[InstCombine] add code comment for icmp transform; NFC This was accidentally left out of `cc88445a91`	2022-07-01 08:21:55 -04:00
Florian Hahn	583abd0e36	[VPlan] Move addMetadata to VPTransformState (NFC). The moved helpers are only used for codegen. It will allow moving the remaining ::execute implementations out of LoopVectorize.cpp. Depends on D127966. Depends on D127965. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D127968	2022-07-01 12:03:25 +01:00
Nikita Popov	9b994593cc	[SCCP] Only handle unknown lattice values in resolvedUndefsIn() This is a minor refinement of resolvedUndefsIn(), mostly for clarity. If the value of an instruction is undef, then that's already a legal final result -- we can safely rauw such an instruction with undef. We only need to mark unknown values as overdefined, as that's the result we get for an instruction that has not been processed because it has an undef operand. Differential Revision: https://reviews.llvm.org/D128251	2022-07-01 09:14:37 +02:00
Chen Zheng	39fe49aa57	[Inline] don't add noalias metadata for unknown objects. The unidentified objects recognized in `getUnderlyingObjects` may still alias to the noalias parameter because `getUnderlyingObjects` may not check deep enough to get the underlying object because of `MaxLookup`. The real underlying object for the unidentified object may still be the noalias parameter. Originally Patched By: tingwang Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D127202	2022-07-01 02:16:55 -04:00
Alexey Bataev	4be3fc35aa	[SLP][NFC]Cleanup up operands of the removed insertelements, NFC. Replace all operands of the insertelement instruction, replaced by shuffles, by poisons to avoid false-positive reports about incorrect function.	2022-06-30 17:51:43 -07:00
Nuno Lopes	373571dbb4	[NFC] Switch a few uses of undef to poison as placeholders for unreachble code	2022-06-30 23:01:43 +01:00
William Huang	a9119143a2	[InstCombine] Changing constant-indexed GEP of GEP to i8* for merging When merging GEP of GEP with constant indices, if the second GEP's offset is not divisible by the first GEP's element size, convert both type to i8* and merge. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D125934	2022-06-30 21:26:11 +00:00
Nuno Lopes	0586d1cac2	[NFC] Switch a few uses of undef to poison as placeholders for unreachble code	2022-06-30 21:47:31 +01:00
Craig Topper	e633f8cd14	[InstCombine] Fix a Wparentheses warning in an assert. NFC	2022-06-30 13:03:32 -07:00
Sanjay Patel	cc88445a91	[InstCombine] canonicalize 'icmp (trunc X), C' to 'icmp (X & Mask), C' I looked at canonicalizing in the other direction, but that causes many potential regressions and infinite loops because we already (possibly wrongly) canonicalize "trunc X to i1" into an and+icmp. This has a data layout restriction to avoid creating illegal mask instructions, but we could remove that if we can show that the backend can undo this when needed. The motivating example from issue #56119 is modeled by the PhaseOrdering test.	2022-06-30 15:51:39 -04:00
Martin Sebor	3a743a5892	[InstCombine] Fix memrchr logic error that prevents folding Correct a logic bug in the memrchr enhancement added in D123629 that makes it ineffective in a subset of cases. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D128856	2022-06-30 11:35:26 -06:00
Nikita Popov	f34dcf2763	[IRBuilder] Migrate all binops to folding API Migrate all binops to use FoldXYZ rather than CreateXYZ APIs, which are compatible with InstSimplifyFolder and fallible constant folding. Rather than continuing to add one method for every single operator, add a generic FoldBinOp (plus variants for nowrap, exact and fmf operators), which we would need anyway for CreateBinaryOp. This change is not NFC because IRBuilder with InstSimplifyFolder may perform more folding. However, this patch changes SCEVExpander to not use the folder in InsertBinOp to minimize practical impact and keep this change as close to NFC as possible.	2022-06-30 16:41:17 +02:00
Nikita Popov	588e229bf9	[VNCoercion] Separate constant/non-constant mem intrinsic implementations (NFCI) This means we no longer need to have the same API between IRBuilder and IRBuilderFolder. The constant case is substantially simpler, so implementing it separately isn't an undue burden.	2022-06-30 15:26:06 +02:00
Nikita Popov	014c4bdb9d	[VNCoercion] Use ConstantFoldLoadFromConst API (NFCI) Nowdays we have a generic constant folding API to load a type from an offset. It should be able to do anything that VNCoercion can do. This avoids the weird templating between IRBuilder and ConstantFolder in one function, which is will stop working as the IRBuilderFolder moves from CreateXYZ to FoldXYZ APIs. Unfortunately, this doesn't eliminate this pattern from VNCoercion entirely yet.	2022-06-30 14:52:27 +02:00
Florian Hahn	68884dde70	[LV] Move LoopVersioning creation to LVP::execute. At the moment LoopVersioning is only created for inner-loop vectorization. This patch moves it to LVP::execute, which means it will also be added for epilogue vectorization. As a consequence, the proper noalias metadata is now also added to epilogue vector loops. LVer will be moved to VPTransformState as follow-up. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D127966	2022-06-30 12:14:32 +01:00
Sanjay Patel	7c4b90a98d	[InstCombine] fix overzealous assert in icmp-shr fold The assert was added with `0399473de8` and is correct for that pattern, but it is off-by-1 with the enhancement in `d4f39d8333`. The transforms are still correct with the new pre-condition: https://alive2.llvm.org/ce/z/6_6ghm https://alive2.llvm.org/ce/z/_GTBUt And as shown in the new test, the transform is expected with 'ult' - in that case, the icmp reduces to test if the shift amount is 0.	2022-06-30 06:28:48 -04:00
Nikita Popov	1579fc62fe	[Evaluator] Add missing LLVM_DEBUG() Missed these in `41f0b6a781`, resulting in unconditional debug output.	2022-06-30 11:54:47 +02:00
Chen Zheng	b05801de35	[InlineFunction] Only check pointer arguments for a call Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D128529	2022-06-30 05:39:47 -04:00
Nikita Popov	41f0b6a781	[Evaluator] Use ConstantFoldInstOperands() For instructions that don't need any special handling, use ConstantFoldInstOperands(), rather than re-implementing individual cases. This is probably not NFC because it can handle cases the previous code missed (e.g. vector operations).	2022-06-30 11:10:17 +02:00
Nikita Popov	a6d4b4138f	[ConstantFold] Supports compares in ConstantFoldInstOperands() Support compares in ConstantFoldInstOperands(), instead of forcing the use of ConstantFoldCompareInstOperands(). Also handle insertvalue (extractvalue was already handled). This removes a footgun, where many uses of ConstantFoldInstOperands() need a separate check for compares beforehand. It's particularly insidious if called on a constant expression, because it doesn't fail in that case, but will just not do DL-dependent folding.	2022-06-30 11:05:24 +02:00
Florian Hahn	24b5f8e0d0	[VPlan] Make sure optimizeInductions removes wide ind from scalar plan. In some cases, there may be widened users of inductions even though the plan includes the scalar VF. In those cases, make sure we still replace the VPWidenIntOrFpInductionRecipe with scalar steps, as otherwise we may try to execute a VPWidenIntOrFpInductionRecipe with a scalar VF. Alternatively the patch could also split the range if needed. This fixes a crash exposed by D123720. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D128755	2022-06-30 09:11:48 +01:00
Nikita Popov	10c531cd5b	[SCCP] Simplify CFG in SCCP as well Currently, we only remove dead blocks and non-feasible edges in IPSCCP, but not in SCCP. I'm not aware of any strong reason for that difference, so this patch updates SCCP to perform the CFG cleanup as well. Compile-time impact seems to be pretty minimal, in the 0.05% geomean range on CTMark. For the test case from https://reviews.llvm.org/D126962#3611579 the result after -sccp now looks like this: define void @test(i1 %c) { entry: br i1 %c, label %unreachable, label %next next: unreachable unreachable: call void @bar() unreachable } -jump-threading does nothing on this, but -simplifycfg will produce the optimal result. Differential Revision: https://reviews.llvm.org/D128796	2022-06-30 09:25:03 +02:00
Chuanqi Xu	0b5ead6590	[WebAssembly] Don't set musttail for coroutines when tail-call is not enabled The C++20 Coroutines couldn't be compiled to WebAssembly due to an optimization named symmetric transfer requires the support for musttail calls but WebAssembly doesn't support it yet. This patch tries to fix the problem by adding a supportsTailCalls method to TargetTransformImpl to skip the symmetric transfer when tail-call feature is not supported. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D128794	2022-06-30 11:15:40 +08:00
zhongyunde	404479b4b0	[InstCombine] Use known bits to determine exact int->fp cast Reviewed By: spatel, nikic Differential Revision: https://reviews.llvm.org/D127854	2022-06-30 09:45:11 +08:00
Florian Hahn	6d5f814357	[LoopUnrollRuntime] Invalidate SCEV for exit phi in ConnectProlog. ConnectProlog adds new incoming values to exit phi nodes which can change the SCEV for the phi after `20d798bd47`. Fix is analog to `cfc741bc0e`. Fixes #56286.	2022-06-29 20:28:43 +01:00
Florian Hahn	9a35f19e3e	[UnrollRuntime] Invalidate SCEVs for modified phis in ConnectEpilog. ConnectEpilog adds new incoming values to exit phi nodes which can change the SCEV for the phi after `20d798bd47`. Fix is analog to `cfc741bc0e`. Fixes #56282.	2022-06-29 18:26:00 +01:00
Sanjay Patel	d4f39d8333	[InstCombine] add fold for (ShiftC >> X) >u C This is the 'ugt' sibling to: `0399473de8` Decrement the input compare constant (and implicitly decrement the new compare constant): https://alive2.llvm.org/ce/z/iELmct	2022-06-29 12:30:01 -04:00

1 2 3 4 5 ...

31127 Commits