llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	3abaa3760d	[LSR] Preserve LCSSA in expander when rewriting loop exit values. The expanded values when rewriting exit values need to preserve LCSSA. Ask SCEVExpander to preserve LCSSA to ensure that. Fixes #58007.	2022-09-27 09:58:48 +01:00
Nikita Popov	97dfa53626	[FunctionAttrs] Infer precise FMRB This updates checkFunctionMemoryAccess() to infer a precise FunctionModRefBehavior, rather than an approximation split into read/write and argmemonly. Afterwards, we still map this back to imprecise function attributes. This still allows us to infer some cases that we previously did not handle, namely inaccessiblememonly and inaccessiblemem_or_argmemonly. In practice, this means we get better memory attributes in the presence of intrinsics like @llvm.assume. Differential Revision: https://reviews.llvm.org/D134527	2022-09-27 10:14:35 +02:00
Florian Hahn	275bee32ad	[LoopUnroll] Forget block and loop dispositions during unrolling. After unrolling a loop, the block and loop dispositions need to be cleared. As we don't know which SCEVs in the loop/blocks may be impacted, completely clear the cache. This should also fix some cases where deleted loops remained in the LoopDispositions cache. This fixes a verification failure surfaced by D134531. I am planning on reviewing/updating the existing uses of forgetLoopDispositions to check if they should be replaced by forgetBlockAndLoopDispositions. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D134612	2022-09-27 08:49:04 +01:00
Sebastian Peryt	46fc75ab28	[NFC][2/n] Remove PrunePH pass Second patch in the series to remove legacy PM and associated -enable-new-pm=0 flag targets pass that has not been ported to new PM - PruneEH. Discussion about this can be found in D44415. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D134686	2022-09-26 18:38:04 -07:00
Sanjay Patel	def6cbd2bd	[InstCombine] add assert/test for zext to i1 This is a test to verify that we do not crash with the problem noted in issue #57986. The root problem should be fixed with a prior change to InstSimplify.	2022-09-26 16:01:25 -04:00
Matt Arsenault	473e83b95a	GuardWidening: Pass through AssumptionCache (NFC)	2022-09-26 14:53:00 -04:00
Matt Arsenault	9bf1aea224	LoopPeel: Pass through AssumptionCache (NFC)	2022-09-26 14:52:59 -04:00
Matt Arsenault	53fa00b3ae	LoopUnroll: Pass through AssumptionCache (NFC) Using these queries with a context instruction and without a cache seems to be about 2x slower than with it so this theoretically improves compile time.	2022-09-26 14:52:59 -04:00
Ruiling Song	a5676a3a7e	StructurizeCFG: Set Undef for non-predecessors in setPhiValues() During structurization process, we may place non-predecessor blocks between the predecessors of a block in the structurized CFG. Take the typical while-break case as an example: ``` /---A(v=...) \| / \ ^ B C \| \ /\| \---L \| \ / E (r = phi (v:C)...) ``` After structurization, the CFG would be look like: ``` /---A \| \|\ \| \| C \| \|/ \| F1 ^ \|\ \| \| B \| \|/ \| F2 \| \|\ \| \| L \ \|/ \--F3 \| E ``` We can see that block B is placed between the predecessors(C/L) of E. During phi reconstruction, to achieve the same sematics as before, we are reconstructing the PHIs as: F1: v1 = phi (v:C), (undef:A) F3: r = phi (v1:F2), ... But this is also saying that `v1` would be live through B, which is not quite necessary. The idea in the change is to say the incoming value from B is Undef for the PHI in E. With this change, the reconstructed PHI would be: F1: v1 = phi (v:C), (undef:A) F2: v2 = phi (v1:F1), (undef:B) F3: r = phi (v2:F2), ... Reviewed by: sameerds Differential Revision: https://reviews.llvm.org/D132450	2022-09-26 09:54:47 +08:00
Ruiling Song	40e9284f3c	StructurizeCFG: prefer reduced number of live values The instruction simplification will try to simplify the affected phis. In some cases, this might extend the liveness of values. For example: BB0: \| \ \| BB1 \| / BB2:phi (BB0, v), (BB1, undef) The phi in BB2 will be simplified to v as v dominates BB2, but this is increasing the number of active values in BB1. By setting CanUseUndef to false, we will not simplify the phi in this way, this would help register pressure. This is mandatory for the later change to help reducing VGPR pressure for AMDGPU. Reviewed by: foad, sameerds Differential Revision: https://reviews.llvm.org/D132449	2022-09-26 09:54:47 +08:00
Douglas Yung	91e0423595	Revert "[SROA] Create additional vector type candidates based on store and load slices" This reverts commit `de3445e0ef`. This is causing GHI #57796 and #57821.	2022-09-23 12:24:07 -07:00
Douglas Yung	0a7f4e03a9	Revert "[SROA] Check typeSizeEqualsStoreSize in isVectorPromotionViable" This reverts commit `3f08d248c4`. The commit this change is fixing is being reverted due to GHI #57796 and #37821, so revert this commit as well.	2022-09-23 12:24:07 -07:00
Teresa Johnson	b1926f308f	Restore "[MemProf] Memprof profile matching and annotation" This reverts commit `794b7ea960`, and thus restores commit `a212d8da94`, and follow on fixes `0cd6763fa9`, `e9ff53d42f`, and `37c6a25e9a`. Use a hash function (BLAKE3) instead of hash_combine/hash_code which are not guaranteed to be stable across executions. Additionally, it adds a "REQUIRES: x86_64-linux" to the tests that have raw profile inputs to avoid failures on big endian bots. Reviewers: snehasish, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D128142	2022-09-23 11:38:47 -07:00
Florian Hahn	2c692d891e	[LV] Update handling of scalable pointer inductions after b73d2c8. The dependent code has been changed quite a lot since `151c144` which b73d2c8 effectively reverts. Now we run into a case where lowering didn't expect/support the behavior pre `151c144` any longer. Update the code dealing with scalable pointer inductions to also check for uniformity in combination with isScalarAfterVectorization. This should ensure scalable pointer inductions are handled properly during epilogue vectorization. Fixes #57912.	2022-09-23 18:23:02 +01:00
Dmitri Gribenko	954d3cd2c6	Revert "[AggressiveInstCombine] Combine consecutive loads which are being merged to form a wider load." This reverts commit `3c70c8c1df`. After this commit, during the 3-stage bootstrap the second-stage Clang crashes.	2022-09-23 19:21:09 +02:00
Philip Reames	954c1ed009	[SLP] Adjust debug output for store vectorization failure When store vectorization is infeasible, it's helpful to have a debug logging indication of why. A case I've hit a couple times now is accidentally using -march instead of -mtriple and getting the default TTI results. This causes max-vf to become 1, and thus hits the added logging line.	2022-09-23 09:58:15 -07:00
Philip Reames	42ef572049	[SLP] Fix cost model w.r.t. operand properties We allow the target to report different costs depending on properties of the operands; given this, we have to make sure we pass the right set of operands and account for the fact that different scalar instructions can have operands with different properties. As a motivating example, consider a set of multiplies which each multiply by a constant (but not all the same constant). Most of the constants are power of two (but not all). If the target doesn't have support for non-uniform constant immediates, this will likely require constant materialization and a non-uniform multiply. However, depending on the balance of target costs for constant scalar multiplies vs a single vector multiply, this might or might not be a profitable vectorization. This ends up basically being a rewrite of the existing code. Normally, I'd scope the change more narrowly, but I kept noticing things which seemed highly suspicious, and none of the existing code appears to have any test coverage at all. I think this is a case where simply throwing out the existing code and starting from scratch is reasonable. This is a follow on to Alexey's D126885, but also handles the arithmetic instruction case since the existing code appears to have the same problem. Differential Revision: https://reviews.llvm.org/D132566	2022-09-23 08:40:23 -07:00
Florian Hahn	d72eb9c985	[LoopDeletion] Invalidate SCEV after moving instruction. LoopDeletion may hoist instructions out of a loop using makeLoopInvariant without invalidating the SCEV for the moved instruction. Moving the instruction to a different block may change its cached block disposition, so invalidate the cached info. Fixes #57837.	2022-09-23 15:14:11 +01:00
Simon Pilgrim	a6e9141505	[TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436) The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both. This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling. Fixes #50778 Differential Revision: https://reviews.llvm.org/D111968	2022-09-23 14:03:18 +01:00
Nikita Popov	fe196380cc	[FunctionAttrs] Use MemoryLocation::getOrNone() when infering memory attrs MemoryLocation::getOrNone() already has the necessary logic to handle different instruction types. Use it, rather than repeating a subset of the logic. This adds support for previously unhandled instructions like atomicrmw.	2022-09-23 13:56:55 +02:00
Florian Hahn	623c4a7a55	[LoopVersioning] Invalidate SCEV for phi if new values are added. After `20d798bd47`, SCEV looks through PHIs with a single incoming value. This means adding a new incoming value may change the SCEV for a phi. Add missing invalidation when an existing PHI is reused during LoopVersioning. New incoming values will be added later from the versioned loop. Similar issues have been fixed by also adding missing invalidation. Fixes #57825. Note that the test case unfortunately requires running loop-vectorize followed by loop-load-elimination, which does the actual versioning. I don't think it is possible to reproduce the failure without that combination.	2022-09-23 11:53:29 +01:00
bipmis	3c70c8c1df	[AggressiveInstCombine] Combine consecutive loads which are being merged to form a wider load. The patch simplifies some of the patterns as below 1. (ZExt(L1) << shift1) \| (ZExt(L2) << shift2) -> ZExt(L3) << shift1 2. (ZExt(L1) << shift1) \| ZExt(L2) -> ZExt(L3) The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc. Differential Revision: https://reviews.llvm.org/D127392	2022-09-23 10:19:50 +01:00
Teresa Johnson	794b7ea960	Revert "[MemProf] Memprof profile matching and annotation" This reverts commit `a212d8da94`, and follow on fixes `0cd6763fa9`, `e9ff53d42f`, and `37c6a25e9a`. After re-reading the documentation for hash_combine, I don't think this is the appropriate hash function to use for computing the hash to use as a stack id in the metadata, since it is not guaranteed to produce stable values across executions. I have not hit this problem, but plan to switch to using an MD5 hash. I am hitting an issue with one of the bots (https://lab.llvm.org/buildbot/#/builders/171/builds/20732) where the values produced are only the lower 32 bits of the expected hash values, however, which I assume is related to the implementation of hash_combine and hash_code. I believe I fixed all of the other bot failures with the follow on fixes, which I'll merge into the new version before reapplying.	2022-09-22 16:08:03 -07:00
Teresa Johnson	e9ff53d42f	[MemProf] Fix buildbot error due to unused variable from bad merge Fix an unused variable warning introduced by `a212d8da94` due to a bad merge with a recent change. E.g. in https://lab.llvm.org/buildbot/#/builders/77/builds/22095	2022-09-22 13:24:33 -07:00
Teresa Johnson	a212d8da94	[MemProf] Memprof profile matching and annotation Profile matching and IR annotation for memprof profiles. See also related RFCs: RFC: Sanitizer-based Heap Profiler [1] RFC: A binary serialization format for MemProf [2] RFC: IR metadata format for MemProf [3]* * Note that the IR metadata format has changed from the RFC during implementation, as described in the preceeding patch adding the basic metadata and verification support. The matching is performed during the normal PGO annotation phase, to ensure that the inlines applied in the IR at that point are a subset of the inlines in the profiled binary and thus reflected in the profile's call stacks. This is important because the call frames are associated with functions in the profile based on the inlining in the symbolized call stacks, and this simplifies locating the subset of profile data relevant for matching onto each function's IR. The PGOInstrumentationUse pass is enhanced to perform matching for whatever combination of memprof and regular PGO profile data exists in the profile. Using the utilities introduced in D128854: The memprof profile data for each context is converted to "cold" or "notcold" based on parameterized thresholds for size, access count, and lifetime. The memprof allocation contexts are trimmed to the minimal amount of context required to uniquely identify whether the context is cold or not cold. For allocations where all profiled contexts have the same allocation type, no memprof metadata is attached and instead the allocation call is directly annotated with an attribute specifying the alloction type. This is the same attributed that will be applied to allocation calls once cloned for different contexts, and later used during LibCall simplification to emit allocation hints [4]. Depends on D128141 and D128854. [1] https://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html [2] https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html [3] https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165 [4] `ab87cf382d` Differential Revision: https://reviews.llvm.org/D128142	2022-09-22 12:48:31 -07:00
Leonard Chan	21b03bf970	[llvm] Handle dso_local_equivalent in FunctionComparator This addresses https://github.com/llvm/llvm-project/issues/51066. Prior to this, dso_local_equivalent would lead to an llvm_unreachable in a switch in the FunctionComparator. This adds a conservative case in that switch that just compares the underlying functions. Differential Revision: https://reviews.llvm.org/D134300	2022-09-22 18:42:31 +00:00
Philip Reames	32dc1151e2	[VPlan] Only generate single instr for unpredicated stores of varying value to invariant address This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.) This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.) Differential Revision: https://reviews.llvm.org/D133580	2022-09-22 08:53:46 -07:00
Nikita Popov	8df376db72	[InstCombine] Remove buggy zext of icmp eq with pow2 fold (PR57899) For the case where the constant is a power of two rather than zero, the fold is incorrect, because it fails to check that the bit set in the LHS matches the bit in the RHS. Rather than fixing this, remove the power of two handling entirely, as a different fold will already canonicalize such comparisons to use a zero constant. Fixes https://github.com/llvm/llvm-project/issues/57899.	2022-09-22 16:37:10 +02:00
Nikita Popov	c2e76f914c	[InstCombine] Use simplifyWithOpReplaced() for non-bool selects Perform the simplifyWithOpReplaced() fold even for non-bool selects. This subsumes a number of recently added folds for zext/sext of the condition. We still need to manually handle variations with both sext/zext and not, because simplifyWithOpReplaced() only performs one level of replacements.	2022-09-22 15:46:00 +02:00
Nikita Popov	41dde5d858	[InstSimplify] Support vectors in simplifyWithOpReplaced() We can handle vectors inside simplifyWithOpReplaced(), as long as cross-lane operations are excluded. The equality can hold (or not hold) for each vector lane independently, so we shouldn't use the replacement value from other lanes. I believe the only operations relevant here are shufflevector (where all previous bugs were seen) and calls (which might use shuffle-like intrinsics and would require more careful classification). Differential Revision: https://reviews.llvm.org/D134348	2022-09-22 10:45:42 +02:00
Congzhe Cao	22c91df52c	[LoopInterchange][PR57148] Ensure the correct form of IR after transformation This is a bugfix patch that resolves the following two bugs in loop interchange: 1. PR57148 which is an assertion error due to of loss of LCSSA form after interchange, as referred to test1() in pr57148.ll. 2. Use before def for the outermost loop induction variables after interchange, as referred to test2() in pr57148.ll. The fix in this patch is that: 1. In cases where the LCSSA form is not maintained after interchange, we update the IR to the LCSSA form again. 2. We split the phi nodes in the inner loop header into a separate basic block to avoid the situation where use of the outer indvar appears before its def after interchange. Previously we already did this for innermost loops, now we do it for non-innermost loops (e.g., middle loops) as well. Reviewed By: bmahjour, Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D132055	2022-09-22 00:20:53 -04:00
Congzhe Cao	6782d71680	[LoopPassManager] Ensure to construct loop nests with the outermost loop This patch is to resolve the bug reported and discussed in https://reviews.llvm.org/D124926#3718761 and https://reviews.llvm.org/D124926#3719876. The problem is that loop interchange is a loopnest pass under the new pass manager, but the loop nest may not be constructed correctly by the loop pass manager after running loop interchange and before running the next pass, which might cause problems when it continues running the next pass. The reason that the loop nest is constructed incorrectly is that the outermost loop might have changed after interchange, and what was the original outermost loop is not the current outermost loop anymore. Constructing the loop nest based on the original outermost loop would generate an invalid loop nest. The fix in this patch is that, in the loop pass manager before running each loopnest pass, we re-cosntruct the loop nest based on the current outermost loop, if LPMUpdater notifies the loop pass manager that the previous loop nest has been invalidated by passes like loop interchange. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D132199	2022-09-21 23:59:26 -04:00
Vitaly Buka	ba39a6e14a	[msan] Instrument vtest instrinsics Instrumentation just ORs shadow of inputs. I assume some result shadow bits can be reset if we go into specifics of particular checks, but as-is it is still an improvement against existing default strict instruction handler, when every set bit of input shadow is reported as an error. Reviewed By: kda Differential Revision: https://reviews.llvm.org/D134123	2022-09-21 16:57:44 -07:00
Vitaly Buka	6fd959d625	[msan] Handle x86_avx_cmp_pd_256 and x86_avx_cmp_ps_256 Removed FIXME which looks irrelevant. The error message happens only without -mattr=+avx. E.g. GOOD: opt llvm/test/Instrumentation/MemorySanitizer/avx-intrinsics-x86.ll -passes=msan -o - \| llc -O3 -o /dev/null -mattr=+avx BAD: opt llvm/test/Instrumentation/MemorySanitizer/avx-intrinsics-x86.ll -passes=msan -o - \| llc -O3 -o /dev/null So nothing to fix here. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D134119	2022-09-21 15:17:02 -07:00
Sanjay Patel	ee0bf64722	[InstCombine] try to fold mul by neg-power-of-2 to shl `(A * -2**C) + B --> B - (A << C)` https://alive2.llvm.org/ce/z/A6BWkf This inverts what Negator was doing before: D134310 / `0f32a5dea0` Analysis and codegen are generally better without multiply, so we should favor this form even if we trade add for sub (because those are generally equivalent cost operations).	2022-09-21 15:09:39 -04:00
Sanjay Patel	64d309131a	[InstCombine] try multi-use demanded bits fold for 'sub' This is similar to D133788 / `73919a87e9`, but for sub the transform is valid only for low zeros in operand 1. https://alive2.llvm.org/ce/z/EmRsXC	2022-09-21 14:13:05 -04:00
Konstantina	80d3ed6fb1	[NFC][NewGVN] Remove OpIsSafeForPHIOfOpsHelper() Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D130949	2022-09-21 09:25:59 -07:00
Alexey Bataev	e664dea182	[SLP]Fix write-after-bounds. Mask might be larger than the NumElts-OffsetBeg, need to use actual indices to avoid acces out of bounds.	2022-09-21 08:00:15 -07:00
Matt Arsenault	0d1f040749	LICM: Pass through AssumptionCache	2022-09-21 09:16:17 -04:00
Sanjay Patel	0f32a5dea0	[InstCombine] don't canonicalize shl+sub to mul+add This stops Negator from transforming: `C1 - shl X, C2 --> mul X, (1<<C2) + C1` ...in the general case. There does not seem to be any analysis benefit to using mul in IR, and there's definitely downside in codegen (particularly when the multiply has to be expanded). If `C1` is 0, then there's a stronger argument that the single mul is a better canonicalization than negate-of-shl, but we may want to remove that too. This was noted as a potential conflict for D133667. Differential Revision: https://reviews.llvm.org/D134310	2022-09-21 08:39:07 -04:00
Bjorn Pettersson	3f08d248c4	[SROA] Check typeSizeEqualsStoreSize in isVectorPromotionViable Commit `de3445e0ef` (https://reviews.llvm.org/D132096) made changes to isVectorPromotionViable basically doing // Create Vector with size of V, and each element of type Ty ... uint64_t ElementSize = DL.getTypeStoreSizeInBits(Ty).getFixedSize(); uint64_t VectorSize = DL.getTypeSizeInBits(V).getFixedSize(); ... VectorType VTy = VectorType::get(Ty, VectorSize / ElementSize, false); Not quite sure why it uses the TypeStoreSize for the ElementSize, but the new vector would only match in size with the old vector in situations when the TypeStoreSize equals the TypeSize for Ty. Therefore this patch adds a typeSizeEqualsStoreSize check as yet another condition for allowing the the new type as a promotion candidate. Without this fix the new @test15 test would fail with an assert like this: opt: ../lib/Transforms/Scalar/SROA.cpp:1966: auto isVectorPromotionViable(llvm::sroa::Partition &, const llvm::DataLayout &) ::(anonymous class)::operator()(llvm::VectorType , llvm::VectorType *) const: Assertion `DL.getTypeSizeInBits(RHSTy).getFixedSize() == DL.getTypeSizeInBits(LHSTy).getFixedSize() && "Cannot have vector types of different sizes!"' failed. ... #8 isVectorPromotionViable(...)::$_10::operator()... #9 llvm::SROAPass::rewritePartition(...) #10 llvm::SROAPass::splitAlloca(...) #11 llvm::SROAPass::runOnAlloca(...) #12 llvm::SROAPass::runImpl(...) #13 llvm::SROAPass::run(...) Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D134032	2022-09-21 09:45:05 +02:00
Michael Berg	897a79f970	[DSE] Add value type info checks for masked store candidates in Dead Store Elimination. The type information of the store values can diverge when checking for valid mask store candidates to eliminate via DSE. This patch checks for equivalence wrt to size and element count. Reviewed By: fhahn, rui.zhang Differential Revision: https://reviews.llvm.org/D132700	2022-09-20 15:54:25 -07:00
Markus Böck	b751da43b2	[InstCombine] Handle integer extension in `select` patterns using the condition as value These patterns were previously only implemented for i1 type but can be extended for any integer type by also handling zext and sext operands. Differential Revision: https://reviews.llvm.org/D134142	2022-09-20 22:25:13 +02:00
Zain Jaffal	68cc35d52c	[InstCombine] Matrix multiplication negation optimisation If one of the operands in a matrix multiplication is negated we can optimise the equation by moving the negation to the smallest element of the operands or the result. Reviewed By: spatel, fhahn Differential Revision: https://reviews.llvm.org/D133300	2022-09-20 19:50:39 +01:00
Gulfem Savrun Yeniceri	f039a9fa32	[InstrProfiling] Emit runtime hook only once This patch fixes the issue about calling emitRuntimeHook() twice when we need to unconditionally emit runtime hook as discussed in https://reviews.llvm.org/rGd6aed77f0d19. Differential Revision: https://reviews.llvm.org/D134254	2022-09-20 17:00:46 +00:00
Kazu Hirata	00874c48ea	[IPO] Reorder parameters of InlineFunction (NFC) With the recent addition of new parameter MergeAttributes (D134117), callers need to specify several default parameters before getting to specify the new parameter. This patch reorders the parameters so that callers do not have to specify as many default parameters. Differential Revision: https://reviews.llvm.org/D134125	2022-09-20 09:09:38 -07:00
Simon Pilgrim	09cb9fdef9	[InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635) Alive2: https://alive2.llvm.org/ce/z/sZ6wwS As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero. Differential Revision: https://reviews.llvm.org/D134172	2022-09-20 16:44:41 +01:00
Florian Hahn	dcbc8a0daa	[LV] Remove unused widenCallInstruction declaration (NFC). The definition and uses have been removed a while ago. Clean up the unused declaration.	2022-09-20 15:20:28 +01:00
Djordje Todorovic	f0f8b46863	Recommit "[AggressiveInstCombine] Lower Table Based CTTZ The bug reported on the [0] has been fixed. The issue was we have not checked if the global variables that represent cttz tables was constant. There is a new negative test added in negative-lower-table-based-cttz.ll that represents this. [0] https://reviews.llvm.org/rGdf868edee561eb973edd85ec9df41c67aa0bff6b	2022-09-20 13:12:47 +02:00
Dmitri Gribenko	5d7ff0d87c	Fix an unused warning in release build	2022-09-20 11:29:39 +02:00
eopXD	3b2011fd4f	[LSR] Fold terminating condition to other IV when possible When the IV is only used by the terminating condition (say IV-A) and the loop has a predictable back-edge count and we have another IV (say IV-B) that is an affine add recursion, we will be able to calculate the terminating value of IV-B in the loop pre-header. This patch adds attempts to replace IV-B as the new terminating condition and remove IV-A. It is safe to do so since IV-A is only used as the terminating condition. This transformation is suitable to be appended after LSR as it may optimize the loop into the situation mentioned above. The transformation can reduce number of IV-s in the loop by one. A cli option `lsr-term-fold` is added and default disabled. Reviewed By: mcberg2021, craig.topper Differential Revision: https://reviews.llvm.org/D132443	2022-09-20 01:38:47 -07:00
Vitaly Buka	4fa8df20ff	[msan] Handle shadow of masked instruction Origin handling is not implemented yet. Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133682	2022-09-19 17:57:43 -07:00
Matt Arsenault	2adae8e1b7	VectorCombine: Pass through AssumptionCache	2022-09-19 19:25:22 -04:00
Matt Arsenault	c867401407	MemCpyOpt: Pass through AssumptionCache	2022-09-19 19:25:22 -04:00
Matt Arsenault	555af0274c	SLPVectorizer: Pass through AssumptionCache	2022-09-19 19:25:22 -04:00
Matt Arsenault	b609741958	LoopVectorize: Pass through AssumptionCache	2022-09-19 19:25:22 -04:00
Matt Arsenault	84a2e48ce6	GVN: Pass through AssumptionCache to queries	2022-09-19 19:25:22 -04:00
Matt Arsenault	ce44357216	Analysis: Add AssumptionCache to isSafeToSpeculativelyExecute Does not update any of the uses.	2022-09-19 19:25:22 -04:00
Matt Arsenault	fd37ab6cf6	InstCombine: Pass AssumptionCache through isDereferenceablePointer	2022-09-19 19:10:51 -04:00
Matt Arsenault	0d8ffcc532	Analysis: Add AssumptionCache argument to isDereferenceableAndAlignedPointer This does not try to pass it through from the end users.	2022-09-19 18:57:33 -04:00
Alexey Bataev	ce39bdbd65	[SLP][NFC]Reorder gather nodes with reused scalars, NFC. The compiler does not reorder the gather nodes with reused scalars, just does it for opernads of the user nodes. This currently does not affect the compiler but breaks internal logic of the SLP graph. In future, it is supposed to actually use all nodes instead of just list of operands and this will affect the vectorization result. Also, did some early check to avoid complex logic in cost estimation analysis, should improve compiler time a bit.	2022-09-19 14:00:17 -07:00
Vitaly Buka	6f3276d57e	[msan] Check mask and pointers shadow Msan has default handler for unknown instructions which previously applied to these as well. However depending on mask, not all pointers or passthru part will be used. This allows other passes to insert undef into sum arguments. As result, default strict instruction handler can produce false reports. Reviewed By: kda, kstoimenov Differential Revision: https://reviews.llvm.org/D133678	2022-09-19 13:09:56 -07:00
Florian Hahn	582f8ef19f	[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd. Epilogue vectorization uses isScalarAfterVectorization to check if widened versions for inductions need to be generated and bails out in those cases. At the moment, there are scenarios where isScalarAfterVectorization returns true but VPWidenPointerInduction::onlyScalarsGenerated would return false, causing widening. This can lead to widened phis with incorrect start values being created in the epilogue vector body. This patch addresses the issue by storing the cost-model decision in VPWidenPointerInductionRecipe and restoring the behavior before `151c144`. This effectively reverts `151c144`, but the long-term fix is to properly support widened inductions during epilogue vectorization Fixes #57712.	2022-09-19 18:14:35 +01:00
Craig Topper	90a004b4a1	[LV] Remove FIXME about NoImplicitFloat. NFC My understanding is that NoImplicitFloat, despite it's name, is supposed to disable all vectors not just float vectors. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134084	2022-09-19 10:01:02 -07:00
Nikita Popov	dd61726d5b	Revert "[SimplifyCFG] accumulate bonus insts cost" This reverts commit `e5581df60a`. This causes major compile-time regressions, about 2-3% end-to-end on CTMark.	2022-09-19 14:46:43 +02:00
Max Kazantsev	92e9bddc49	[LoopRotate] Drop loop dispositions when rotating loops. PR56260 This is required because if there is a pure loop-invariant instruction, Loop Rotation may decide to not clone it and just hoist it instead. If SCEV has previously cached that it was loop-variant (not being smart enough to prove invariance), we may end up with inconsistent cache state (which may later trigger false-negative assertion failures checking that something was invariant). This is a conservative fix that unconditionally drops the dispositions. We could only drop it if the hoisting has actually happened, but it should take some time understanding whether it's safe with all other things this function does. Differential Revision: https://reviews.llvm.org/D134167 Reviewed By: fhahn	2022-09-19 18:01:02 +07:00
Max Kazantsev	21a9abc1ce	[LoopFuse] Drop loop dispositions before reassigning blocks to other loop This bug was found by recent improvement in SCEV verifier. The code in LoopFuse directly reassigns blocks to be a part of a different loop, which should automatically invalidate all related cached loop dispositions. Differential Revision: https://reviews.llvm.org/D134173 Reviewed By: nikic	2022-09-19 17:43:06 +07:00
Max Kazantsev	818b1ab84e	[SCEV][NFC] Remove unused parameter from forgetLoopDispositions Let's be honest about it, we don't drop loop dispositions for particular loops. Remove the parameter that misleadingly makes it apparent that we do.	2022-09-19 14:06:42 +07:00
Yaxun (Sam) Liu	e5581df60a	[SimplifyCFG] accumulate bonus insts cost SimplifyCFG folds bool foo() { if (cond1) return false; if (cond2) return false; return true; } as bool foo() { if (cond1 \| cond2) return false return true; } 'cond2' is called 'bonus insts' in branch folding since they introduce overhead since the original CFG could do early exit but the folded CFG always executes them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into its predecessor BB which shares the destination. If it is below bonus-inst-threshold, SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed. When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts in the current BB to be considered for folding. This causes issue for unrolled loops which share destinations, e.g. bool foo(int a) { for (int i = 0; i < 32; i++) if (a[i] > 0) return false; return true; } After unrolling, it becomes bool foo(int a) { if(a[0]>0) return false if(a[1]>0) return false; //... if(a[31]>0) return false; return true; } SimplifyCFG will merge each BB with its predecessor BB, and ends up with 32 'bonus insts' which are always executed, which is much slower than the original CFG. The root cause is that SimplifyCFG does not consider the accumulated cost of 'bonus insts' which are folded from different BB's. This patch fixes that by introducing a ValueMap to track costs of 'bonus insts' coming from different BB's into the same BB, and cuts off if the accumulated cost exceeds a threshold. Reviewed by: Artem Belevich, Florian Hahn, Nikita Popov, Matt Arsenault Differential Revision: https://reviews.llvm.org/D132408	2022-09-18 20:21:14 -04:00
Sanjay Patel	d6498abc24	[InstCombine] remove multi-use add demanded constant fold This was originally part of D133788. There are no visible regressions. All of the diffs show a large unsigned constant becoming a small negative constant. This should be better for analysis (and slightly less compile-time) and codegen.	2022-09-18 14:23:43 -04:00
Kazu Hirata	5e5a6c5b07	Use std::conditional_t (NFC)	2022-09-18 10:25:06 -07:00
Marc Auberer	f52dd920d4	[InstCombine] Fix bug when folding x + (x \| -x) to x & (x - 1) Addresses concern: https://reviews.llvm.org/rG09cdddea0c4d284c2c22f5dfade40a60850c5ea7 There was a copy/paste mistake in the code. Updated code and test ref. Differential Revision: https://reviews.llvm.org/D134135	2022-09-18 13:16:12 -04:00
Sanjay Patel	1d1d1e6f22	[InstCombine] fold full-shift of sdiv to icmp+extend This is a disguised sign-bit test with offset: (X / +DivC) >> (Width - 1) --> ext (X <= -DivC) (X / -DivC) >> (Width - 1) --> ext (X >= +DivC) https://alive2.llvm.org/ce/z/cO8JO4 We don't match/test poison in the sdiv constant because that would be immediate undefined behavior.	2022-09-18 13:13:14 -04:00
Kazu Hirata	d3b95ecc98	[ModuleInliner] Remove InlineOrder::front (NFC) InlineOrder::front is a remnant from the era when we had a nested "while" loops in the module inliner, with the inner one grouping the call sites with the same caller. Now that we have a simple "while" loop draining the priority queue, we can just use InlineOrder::pop. Differential Revision: https://reviews.llvm.org/D134121	2022-09-18 08:49:44 -07:00
Benjamin Kramer	b987fe4972	Silence unused variable warning in release builds. NFC	2022-09-18 09:15:32 +02:00
Kazu Hirata	284f0397e2	[Transforms] Merge function attributes within InlineFunction (NFC) In the past, we've had a bug resulting in a compiler crash after forgetting to merge function attributes (D105729). This patch teaches InlineFunction to merge function attributes. This way, we minimize the "time" when the IR is valid, but the function attributes are not. Differential Revision: https://reviews.llvm.org/D134117	2022-09-17 23:10:23 -07:00
Kazu Hirata	6e4fbd2f51	[ModuleInliner] Set Changed earlier (NFC) It makes more sense to set Changed to true immediately after a successful inlining.	2022-09-17 14:16:32 -07:00
Kazu Hirata	31b91356bc	[ModuleInliner] Don't include SetVector.h (NFC) We don't use SetVector in the module inliner.	2022-09-17 12:17:52 -07:00
Kazu Hirata	5faf4bf195	[ModuleInliner] Move UseInlinePriority to InlineOrder.cpp (NFC) UseInlinePriority specifies the priority function. This patch simplifies the code by moving UseInlinePriority closer to the actual consumer -- the switch statement inside getInlineOrder. Differential Revision: https://reviews.llvm.org/D134100	2022-09-17 11:41:28 -07:00
Florian Hahn	7914e53e31	[ConstraintElimination] Fix crash when combining results. `f213128b29` didn't account for the possibility that the result of decompose may be empty. Fix that by explicitly checking. Use a newly introduced helper to also reduce some duplication. Thanks @bjope for finding the issue!	2022-09-17 14:47:38 +01:00
Kazu Hirata	6e30a9cc08	[Inliner] Retire DefaultInlineOrder (NFC) DefaultInlineOrder was largely an exercise in generalizing the traversal order of call sites within the inliner. Now that the module inliner is starting to form its shape, there is no point in sharing DefaultInlineOrder between the module inliner and the CGSCC inliner. DefaultInlineOrder and all the other inline orders are mutually exclusive in the following sense: - The use of DefaultInlineOrder doesn't make sense in the module inliner because there is no priority inherent in the order in which call sites are added to the list of call sites -- SmallVector. - The use of any other inline order doesn't make sense in the CGSCC inliner because little prioritization can be done within one CGSCC. This patch essentially reverts the addition of DefaultInlineOrder so that the loop structure of Inliner.cpp looks like the state just before we started working on the module inliner (circa June 2021). At the same time, ww remove the choice of DefaultInlineOrder from UseInlinePriority. Differential Revision: https://reviews.llvm.org/D134080	2022-09-16 15:36:40 -07:00
Alexey Bataev	5d13b12674	[SLP]Improve isUndefVector function by adding insertelement analysis. Added the mask and the analysis of the buildvector sequence in the isUndefVector function, improves codegen and cost estimation. Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27362.00 27360.00 -0.0% Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 805299.00 806035.00 0.1% 526.blender_r - some extra code is vectorized. 508.namd_r - some extra code is optimized out. Differential Revision: https://reviews.llvm.org/D133891	2022-09-16 14:36:38 -07:00
Teresa Johnson	c2cf93c1a9	[WPD/LTT] Lower type test feeding assumes via phi correctly This fixes https://github.com/llvm/llvm-project/issues/57616. Type test lowering in ThinLTO modules relies on having type id summaries set up for the referenced types, which provide the type test resolution. If there is no summary, the type tests are lowered to false. At the very least, a default type id summary gives the type tests a resolution of Unknown, which is handled correctly (ignored by the first invocation of LTT, and lowered to true by the second). WPD sets up the type id summaries (with a default type test resolution) as it is processing the type tests, but only does this for the patterns handled by WPD, which is a type test directly feeding an assume. In the case of type tests feeding an assume via a phi, the type id summary was not being set up, leading to the type tests being lowered to false incorrectly. Fix this by adding the default type id summary entries for all type ids used on globals during index-only WPD. This is not an issue for hybrid (split-lto-unit) LTO, as in that case the type test resolution is determined and set up during LTT, since the type definitions are in the regular LTO split module, and exported via the summary to the ThinLTO split module. Differential Revision: https://reviews.llvm.org/D134012	2022-09-16 13:50:01 -07:00
Kazu Hirata	9111920af8	[ModuleInliner] clang-format ModuleInliner.cpp (NFC)	2022-09-16 09:41:42 -07:00
Kazu Hirata	4475470529	[ModuleInliner] Remove a stale comment (NFC) These comments refer to the nested loop in the module inliner where the inner loop grouped call sites from the same caller. We don't group call sites anymore, so the comment has become stale.	2022-09-16 09:37:43 -07:00
Kazu Hirata	42a90e6017	[ModuleInliner] Remove a redundaunt variable (NFC) In the CGSCC inliner, DidInline was used as an indicator to update the call graph. In the module inliner, DidInline is always true at the end of the "while" loop, so can just drop it.	2022-09-16 09:32:02 -07:00
Kazu Hirata	513717ddd0	[ModuleInliner] Remove a write-only variable (NFC) InlinedCallees is a remnant from the CGSCC inliner. We don't use it in the module inliner.	2022-09-16 09:15:53 -07:00
Kazu Hirata	77501bfab8	[IPO] Simplify the module inliner loop (NFC) In the bottom-up inliner, we have a two-level nested "while" loop, with the inner one grouping call sites with the same caller. We need to do so to keep CGSCC up to date. Now, with the module inliner, we don't have any per-caller work. We don't update CGSCC. Plus, the caller will likely keep changing as we pop call sites in some priority order. This patch simply removes the inner "while" loop while indenting its body. Further cleanup is possible, but that's left for follow-up patches. Differential Revision: https://reviews.llvm.org/D133969	2022-09-16 08:56:18 -07:00
Sanjay Patel	6174da2299	[InstCombine] reduce code duplication in foldICmpMulConstant(); NFC	2022-09-16 10:39:54 -04:00
Vitaly Buka	f0c2ffa8f8	[msan] Add msan-insert-check DEBUG_COUNTER	2022-09-15 21:52:58 -07:00
Gulfem Savrun Yeniceri	d6aed77f0d	[InstrProfiling] No runtime hook for unused funcs This is a reland of https://reviews.llvm.org/D122336. Original patch caused a problem in collecting coverage in Fuchsia because it was returning early without putting unused function names into __llvm_prf_names section. This patch fixes that issue. The original commit message is as the following: CoverageMappingModuleGen generates a coverage mapping record even for unused functions with internal linkage, e.g. static int foo() { return 100; } Clang frontend eliminates such functions, but InstrProfiling pass still emits runtime hook since there is a coverage record. Fuchsia uses runtime counter relocation, and pulling in profile runtime for unused functions causes a linker error: undefined hidden symbol: __llvm_profile_counter_bias. Since https://reviews.llvm.org/D98061, we do not hook profile runtime for the binaries that none of its translation units have been instrumented in Fuchsia. This patch extends that for the instrumented binaries that consist of only unused functions. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D122336	2022-09-16 02:05:09 +00:00
Navid Emamdoost	3e52c0926c	Add -fsanitizer-coverage=control-flow Reviewed By: kcc, vitalybuka, MaskRay Differential Revision: https://reviews.llvm.org/D133157	2022-09-15 15:56:04 -07:00
Sanjay Patel	aafaa2f4fc	[SCCP] convert ashr to lshr for non-negative shift value This is similar to the existing signed instruction folds. We get the obvious minimal patterns in other passes, but this avoids potential missed folds when the multi-block tests are converted to selects.	2022-09-15 13:54:52 -04:00
Craig Topper	ace05124f5	[IntegerDivision][AMDGPU] Use CreateLogicalOr to block poison propagation. There are two ctlz intrinsics here with the zero_is_poison flag set. There are also two comparisons that check if either of the inputs the ctlzs are zero. We need to use a logical or to block the poison from the ctlz if either of the inputs is zero. Reviewed By: arsenm, aqjune Differential Revision: https://reviews.llvm.org/D130680	2022-09-15 09:38:02 -07:00
Sanjay Patel	02a27b3890	[InstCombine] fold X*X == 0 --> X == 0 This is safe when the mul does not overflow: https://alive2.llvm.org/ce/z/LedVVP This could be extended to handle non-zero compare constants and non-squared multiplies.	2022-09-15 12:02:50 -04:00
Evgeniy Brevnov	03a102e3b2	[JumpThreading][NFC] Reuse existing DT instead of recomputation (newPM) This is the same change as `503d5771b6` with the same intent but for new pass manager.	2022-09-15 12:27:57 +07:00
Dhruva Chakrabarti	839ac62c50	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `7539e9cf81`.	2022-09-15 03:08:46 +00:00
Vitaly Buka	f221720e82	[nfc][msan] getShadowOriginPtr on <N x ptr> Some vector instructions can benefit from of Addr as <N x ptr>. Differential Revision: https://reviews.llvm.org/D133681	2022-09-14 19:18:52 -07:00
Vitaly Buka	f404169f24	[NFC][msan] Rename variables to match definition	2022-09-14 19:16:27 -07:00
Vitaly Buka	2209be15a5	[NFC][msan] Convert some code to early returns Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133673	2022-09-14 19:16:11 -07:00
Vitaly Buka	bcf3d666b4	[NFC][msan] Simplify llvm.masked.load origin code Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133652	2022-09-14 19:14:29 -07:00
Vitaly Buka	d421223e25	[msan] Resolve FIXME from D133880 We don't need to change tests we convertToBool unconditionally only before OR.	2022-09-14 18:55:57 -07:00
Giorgis Georgakoudis	7539e9cf81	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert, jhuber6, ABataev Differential Revision: https://reviews.llvm.org/D102107	2022-09-15 00:54:05 +00:00
Vitaly Buka	bf204881b6	[msan] Change logic of ClInstrumentationWithCallThreshold According to logs, ClInstrumentationWithCallThreshold is workaround for slow backend with large number of basic blocks. However, I can't reproduce that one, but I see significant slowdown after ClCheckConstantShadow. Without ClInstrumentationWithCallThreshold compiler is able to eliminate many of the branches. So maybe we should drop ClInstrumentationWithCallThreshold completly. For now I just change the logic to ignore constant shadow so it will not trigger callback fallback too early. Reviewed By: kstoimenov Differential Revision: https://reviews.llvm.org/D133880	2022-09-14 14:58:12 -07:00
Florian Hahn	7f3ff9d3c0	[ConstraintElimination] Track if variables are positive in constraint. Keep track if variables are known positive during constraint decomposition, aggregate the information when building the constraint object and encode the extra information as constraints to be used during reasoning.	2022-09-14 18:43:54 +01:00
Alexey Bataev	d647312e3f	[SLP][NFC]Extract getLastInstructionInBundle function for better dependence checking, NFC. Part of D110978	2022-09-14 08:43:15 -07:00
Zain Jaffal	8253f7e286	[InstCombine] Optimize multiplication where both operands are negated Handle the case where both operands are negated in matrix multiplication Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133695	2022-09-14 16:29:39 +01:00
Nikita Popov	b1cd393f9e	[AA] Tracking per-location ModRef info in FunctionModRefBehavior (NFCI) Currently, FunctionModRefBehavior tracks whether the function reads or writes memory (ModRefInfo) and which locations it can access (argmem, inaccessiblemem and other). This patch changes it to track ModRef information per-location instead. To give two examples of why this is useful: * D117095 highlights a weakness of ModRef modelling in the presence of operand bundles. For a memcpy call with deopt operand bundle, we want to say that it can read any memory, but only write argument memory. This would allow them to be treated like any other calls. However, we currently can't express this and have to say that it can read or write any memory. * D127383 would ideally be modelled as a separate threadid location, where threadid Refs outside pre-split coroutines can be ignored (like other accesses to constant memory). The current representation does not allow modelling this precisely. The patch as implemented is intended to be NFC, but there are some obvious opportunities for improvements and simplification. To fully capitalize on this we would also want to change the way we represent memory attributes on functions, but that's a larger change, and I think it makes sense to separate out the FunctionModRefBehavior refactoring. Differential Revision: https://reviews.llvm.org/D130896	2022-09-14 16:34:41 +02:00
Florian Hahn	efd3ec47d9	[ConstraintElimination] Clear new indices directly in getConstraint(NFC) Instead of checking if any of the new indices has a non-zero coefficient before using the constraint, do this directly when constructing the constraint.	2022-09-14 15:31:25 +01:00
Sanjay Patel	73919a87e9	[InstCombine] try multi-use demanded bits folds for 'add' This patch enables a multi-use demanded bits fold (motivated by issue #57576): https://alive2.llvm.org/ce/z/DsZakh This mimics transforms that we already do on the single-use path. Originally, this patch did not include the last part to form a constant, but that can be removed independently to reduce risk. It's not clear what the effect of either change will be when viewed end-to-end. This is expected to be neutral or a slight win for compile-time. See the "add-demand2" series for experimental timing results: https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions&remote=rotateright Differential Revision: https://reviews.llvm.org/D133788	2022-09-14 09:30:59 -04:00
Alexey Bataev	796af0c027	[SLP] Move getInsertIndex function, NFC. Part of D110978.	2022-09-14 06:22:52 -07:00
Florian Hahn	f213128b29	[ConstraintElimination] Further de-compose operands of add operations. This simply extends the existing logic to look through adds and combine the components as done in other places already.	2022-09-14 12:00:32 +01:00
Kazu Hirata	d3649c2be4	[Vectorize] Fix a warning This patch fixes: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5879:5: error: expression result unused [-Werror,-Wunused-value]	2022-09-13 09:30:06 -07:00
Arthur Eubanks	5a33d1f0b9	[SimplifyCFG] Don't hoist allocas D129370 started hoisting allocas across stacksave/stackrestore boundaries which is wrong. Reviewed By: chill, rnk Differential Revision: https://reviews.llvm.org/D133730	2022-09-13 09:23:39 -07:00
Valery N Dmitriev	18dde772d6	[SLP] Unify main/alternate selection for CmpInst instructions Make main/alternate operation selection logic for CmpInst consistent across SLP vectorizer. Differential Revision: https://reviews.llvm.org/D133430	2022-09-13 09:20:25 -07:00
Florian Hahn	ac80b0e84f	[LV] Mark Instr as const in scalarizeInstruction. (NFC). This is to reduce the diff in follow-up changes.	2022-09-13 09:10:02 +01:00
Max Kazantsev	86d5586d78	[SCEVExpander] Recompute poison-generating flags on hoisting. PR57187 Instruction being hoisted could have nuw/nsw flags inferred from the old context, and we cannot simply move it to the new location keeping them because we are going to introduce new uses to them that didn't exist before. Example in https://github.com/llvm/llvm-project/issues/57187 shows how this can produce branch by poison from initially well-defined program. This patch forcefully recomputes poison-generating flag in the new context. Differential Revision: https://reviews.llvm.org/D132022 Reviewed By: fhahn, nikic	2022-09-13 12:56:35 +07:00
Kazu Hirata	9606608474	[llvm] Use x.empty() instead of llvm::empty(x) (NFC) I'm planning to deprecate and eventually remove llvm::empty. I thought about replacing llvm::empty(x) with std::empty(x), but it turns out that all uses can be converted to x.empty(). That is, no use requires the ability of std::empty to accept C arrays and std::initializer_list. Differential Revision: https://reviews.llvm.org/D133677	2022-09-12 13:34:35 -07:00
Sanjay Patel	53eede597e	[InstCombine] look through 'not' of ctlz/cttz op with 0-is-undef https://alive2.llvm.org/ce/z/MNsC1S This pattern was flagged at: https://discourse.llvm.org/t/instcombines-select-optimizations-dont-trigger-reliably/64927	2022-09-12 15:06:21 -04:00
Benjamin Kramer	2675c41671	[DFSan] Don't crash with the legacy pass manager TargetLibraryInfo isn't optional, so we have to provide it even with the lageacy stuff. Ideally we wouldn't need it anymore but there are still users out there that are stuck on the legacy PM. Differential Revision: https://reviews.llvm.org/D133685	2022-09-12 19:11:55 +02:00
A-Wadhwani	de3445e0ef	[SROA] Create additional vector type candidates based on store and load slices This patch adds additional vector types to be considered when doing promotion in SROA, based on the types of the store and load slices. This provides more promotion opportunities, by potentially using an optimal "intermediate" vector type. For example, the following code would currently not be promoted to a vector, since `__m128i` is a `<2 x i64>` vector. ``` __m128i packfoo0(int a, int b, int c, int d) { int r[4] = {a, b, c, d}; __m128i rm; std::memcpy(&rm, r, sizeof(rm)); return rm; } ``` ``` packfoo0(int, int, int, int): mov dword ptr [rsp - 24], edi mov dword ptr [rsp - 20], esi mov dword ptr [rsp - 16], edx mov dword ptr [rsp - 12], ecx movaps xmm0, xmmword ptr [rsp - 24] ret ``` By also considering the types of the elements, we could find that the `<4 x i32>` type would be valid for promotion, hence removing the memory accesses for this function. In other words, we can explore other new vector types, with the same size but different element types based on the load and store instructions from the Slices, which can provide us more promotion opportunities. Additionally, the step for removing duplicate elements from the `CandidateTys` vector was not using an equality comparator, which has been fixed. Differential Revision: https://reviews.llvm.org/D132096	2022-09-12 09:55:37 -07:00
Sanjay Patel	4ca25c66d4	[Reassociate] prevent partial undef negation replacement As shown in the examples in issue #57683, we allow matching vectors with poison (undef) in this transform (and possibly more), but we can't then use the partially defined value as a replacement value in other expressions blindly. This seems to be avoided in simpler examples of reassociation, and other passes should be able to clean up the redundant op seen in these tests.	2022-09-12 12:28:34 -04:00
Florian Hahn	3fd1cc2574	[SLP] Add Preheader to CSE blocks after hoisting CSE-able instrs. Adding the pre-header to CSEBlocks ensures instructions are CSE'd even after hoisting. This was original discovered by @atrick a while ago. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D133649	2022-09-12 15:53:31 +01:00
Alexey Bataev	dfe1e9dd79	[SLP]Improve reordering of clustered reused scalars. If the reused scalars are clustered, i.e. each part of the reused mask contains all elements of the original scalars exactly once, we can reorder those clusters to improve the whole ordering of of the clustered vectors. Differential Revision: https://reviews.llvm.org/D133524	2022-09-12 06:52:25 -07:00
Max Kazantsev	0e465c0c2f	[IRCE] Bail in case of pointer types. PR40539 We should not unconditionally expect that SCEVable types are all integers because SCEV can also be computed for pointers. Bail in this case.	2022-09-12 16:01:25 +07:00
Djordje Todorovic	b080d0bae8	Revert ""Recommit "[AggressiveInstCombine] Lower Table Based CTTZ""" This reverts commit `df868edee5`, as it introduces a bug found by Alive2 (more on the rGdf868edee561).	2022-09-12 08:23:07 +02:00
Johannes Doerfert	c922cac868	Revert "[Attributor] AAPointerInfo should allow "harmless" uses" Revert "[Attributor] Teach AAPointerInfo to look into aggregates" This reverts commit `844f6c5d03` and `4ed0a88cd8` as they broke the buildbots that run openmp/libomptarget/test/offloading/bug49021.cpp.	2022-09-11 21:37:54 -07:00
Johannes Doerfert	844f6c5d03	[Attributor] AAPointerInfo should allow "harmless" uses If a call base use will not capture a pointer we can approximate the effects. This is important especially for readnone/only uses.	2022-09-11 20:16:11 -07:00
Johannes Doerfert	4ed0a88cd8	[Attributor] Teach AAPointerInfo to look into aggregates If we have a constant aggregate, e.g., as an initializer, we usually failed to extract the proper value/type from it. This patch provides the size and offset information necessary to extract the right part of the constant.	2022-09-11 20:16:11 -07:00
Johannes Doerfert	b046ebdc01	[Attributor][FIX] Conservatively handle ptr2int, don't crash If a pointer-2-int cast is found we give up on AAPointerInfo for now. This caused a crash before. Reported by John Tramm (@jtramm).	2022-09-11 20:16:11 -07:00
Johannes Doerfert	21711039e3	[OpenMP] Allow the Attributor to look at functions we also internalized This is important as we have accesses to globals in those which we need to categorize.	2022-09-11 20:16:11 -07:00
Junduo Dong	6975ab7126	[Clang] Reimplement time tracing of NewPassManager by PassInstrumentation framework The previous implementation of time tracing in NewPassManager is direct but messive. The key codes are like the demo below: ``` /// Runs the function pass across every function in the module. PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM, LazyCallGraph &CG, CGSCCUpdateResult &UR) { /// ... PreservedAnalyses PassPA; { TimeTraceScope TimeScope(Pass.name()); PassPA = Pass.run(F, FAM); } /// ... } ``` It can be bothered to judge where should we add the tracing codes by hands. With the PassInstrumentation framework, we can easily add `Before/After` callback functions to add time tracing codes. Differential Revision: https://reviews.llvm.org/D131960	2022-09-11 05:42:55 -07:00
Florian Hahn	69d9bb2aad	[VPlan] Check recipe uses instead of type of underlying instr (NFC). Suggested by @Ayal post-commit, to reduce the dependence on the underlying instruction in favor of information available directly for the recipe.	2022-09-11 12:24:44 +01:00
Marc Auberer	09cdddea0c	[InstCombine] Fold x + (x \| -x) to x & (x - 1) Fixes #57531 This transformation may be particularly useful on x86-64, because x & (x - 1) can be performed by a single blsr instruction. Differential Revision: https://reviews.llvm.org/D133362	2022-09-11 06:14:24 -04:00
Alexey Bader	2bb5535b58	[StripDeadDebugInfo] Drop dead CUs In situations when a submodule is extracted from big module (i.e. using CloneModule) a lot of debug info is copied via metadata nodes. Despite of the fact that part of that info is not linked to any instruction in extracted IR file, StripDeadDebugInfo pass doesn't drop them. Strengthen criteria for debug info that should be kept in a module: - Only those compile units are left that referenced by a subprogram debug info node that is attached to a function definition in the module or to an instruction in the module that belongs to an inlined function. Signed-off-by: Mikhail Lychkov <mikhail.lychkov@intel.com> Differential Revision: https://reviews.llvm.org/D122163	2022-09-11 01:31:03 -07:00
Vitaly Buka	b51d1f1fbd	[msan] Don't deppend on argumens evaluation order	2022-09-10 15:28:32 -07:00
Vitaly Buka	71c5e7b26a	[msan] Do not deppend on arguments evaluation order Clang and GCC do this differently making IR inconsistent. https://lab.llvm.org/buildbot#builders/6/builds/13120	2022-09-10 13:50:32 -07:00
Vitaly Buka	1819d5999c	[NFC][msan] Remove unused return type	2022-09-10 12:20:54 -07:00
Vitaly Buka	6fc31712f1	[msan] Relax handling of llvm.masked.expandload and llvm.masked.gather This is work around for new false positives. Real implementation will follow.	2022-09-10 12:19:16 -07:00
Manuel Brito	b51c6130ef	Use PoisonValue instead of UndefValue when RAUWing unreachable code [NFC] Replacing the following instances of UndefValue with PoisonValue, where the UndefValue is used as an arbitrary value: - llvm/lib/CodeGen/WinEHPrepare.cpp `demotePHIsOnFunclets`: RAUW arbitrary value for lingering uses of removed PHI nodes - llvm/lib/Transforms/Utils/BasicBlockUtils.cpp `FoldSingleEntryPHINodes`: Removes a self-referential single entry phi node. - llvm/lib/Transforms/Utils/CallGraphUpdater.cpp `finalize`: Remove all references to removed functions. - llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp `cleanup`: the result is not used then the inserted instructions are removed. - llvm/tools/bugpoint/CrashDebugger.cpp `TestInts`: the program is cloned and instructions are removed to narrow down source of crash. Differential Revision: https://reviews.llvm.org/D133640	2022-09-10 14:28:01 +01:00
Florian Hahn	da734473fa	[LV] Remove now dead variable after `2a78890b7b` (NFC).	2022-09-09 20:25:55 +01:00
Florian Hahn	2a78890b7b	[VPlan] Move SCEV expansion for pointer induction to VPExpandSCEV (NFC). Use VPExpandSCEVRecipe to expand the step of pointer inductions. This cleanup addresses a corresponding FIXME. It should be NFC, as steps for pointer induction must be constants, which makes expansion trivial.	2022-09-09 19:20:13 +01:00
Sanjay Patel	6113e6738d	[InstCombine] move/adjust comments about demanded bits; NFC The code has been moved/copied around, but the comments were not updated to match.	2022-09-09 11:48:20 -04:00
Philip Reames	a33d98e20a	[LV] Pull out common expression [nfc]	2022-09-09 07:31:46 -07:00
Philip Reames	edb26268ce	[VPlan] Only generate single instr for stores uniform across all parts. Extend the approach taken by D133019 to store instructions. Differential Revision: https://reviews.llvm.org/D133497	2022-09-09 07:15:12 -07:00
Nikita Popov	a9f312c7f4	[AST] Use BatchAA in aliasesUnknownInst() (NFCI)	2022-09-09 15:54:48 +02:00
Sebastian Neubauer	c7750c522e	Add helper func to get first non-alloca position The LLVM performance tips suggest that allocas should be placed at the beginning of the entry block. So far, llvm doesn’t provide any helper to find that position. Add BasicBlock::getFirstNonPHIOrDbgOrAlloca and IRBuilder::SetInsertPointPastAllocas(Function*) that get an insert position after the (static) allocas at the start of a function and use it in ShadowStackGCLowering. Differential Revision: https://reviews.llvm.org/D132554	2022-09-09 15:39:53 +02:00
Nikita Popov	4ab77d1677	[LICM] Allow promotion with non-load/store users If there are non-load/store users of the promoted pointer, we currently abort promotion. However, having such users isn't really relevant to the transform. We already separately check that a) there are no instructions that modref the promoted pointer and b) that a pointer capture disables store promotion. In the affected @test_captured_in_loop test case we have a readnone capture of the promoted pointer, which means that load promotion can be performed (while store promotion cannot). Differential Revision: https://reviews.llvm.org/D133485	2022-09-09 13:09:59 +02:00
Djordje Todorovic	df868edee5	"Recommit "[AggressiveInstCombine] Lower Table Based CTTZ"" This reverts commit `053841c562`. We faced a use-after-free after pushing the D113291, since the foldSqrt() has a call to eraseFromParent(). The function should be at the end of the main loop that folds the patterns. This patch fixes that.	2022-09-09 10:29:39 +02:00
Vitaly Buka	1cf5c7fe8c	[msan] Disambiguate warnings debug location If multiple warnings created on the same instruction (debug location) it can be difficult to figure out which input value is the cause. This patches chains origins just before the warning using last origins update debug information. To avoid inflating the binary unnecessarily, do this only when uncertainty is high enough, 3 warnings by default. On average it adds 0.4% to the .text size. Reviewed By: kda, fmayer Differential Revision: https://reviews.llvm.org/D133232	2022-09-08 14:17:07 -07:00
Vitaly Buka	0f2f1c2be1	[sanitizers] Invalidate GlobalsAA GlobalsAA is considered stateless as usually transformations do not introduce new global accesses, and removed global access is not a problem for GlobalsAA users. Sanitizers introduce new global accesses: - Msan and Dfsan tracks origins and parameters with TLS, and to store stack origins. - Sancov uses global counters. HWAsan store tag state in TLS. - Asan modifies globals, but I am not sure if invalidation is required. I see no evidence that TSan needs invalidation. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D133394	2022-09-08 14:00:43 -07:00
Sanjay Patel	444f08c832	[InstCombine] fold icmp of truncated left shift, part 2 (trunc (1 << Y) to iN) == 2C --> Y == C (trunc (1 << Y) to iN) != 2C --> Y != C https://alive2.llvm.org/ce/z/xnFPo5 Follow-up to `d9e1f9d759`. This was a suggested enhancement mentioned in issue #51889.	2022-09-08 12:44:02 -04:00
Philip Reames	4c4c0d2c06	[LV] Use safe-divisor lowering for fixed vectors if profitable This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well. Differential Revision: https://reviews.llvm.org/D132591	2022-09-08 09:15:54 -07:00
Joe Loser	5e96cea1db	[llvm] Use std::size instead of llvm::array_lengthof LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpful for C++17 or later: `std::size` already has support for C-style arrays. Change call sites to use `std::size` instead. Differential Revision: https://reviews.llvm.org/D133429	2022-09-08 09:01:53 -06:00
Djordje Todorovic	7aec9ddcfd	Revert "Recommit "[AggressiveInstCombine] Lower Table Based CTTZ"" This reverts commit `f879939157`.	2022-09-08 17:01:16 +02:00
Sanjay Patel	d9e1f9d759	[InstCombine] Fold icmp of truncated left shift (trunc (1 << Y) to iN) == 0 --> Y u>= N (trunc (1 << Y) to iN) != 0 --> Y u< N These can be generalized in several ways as noted by the TODO items, but this handles the pattern in the motivating bug report. Fixes #51889 Differential Revision: https://reviews.llvm.org/D115480	2022-09-08 10:48:14 -04:00
Djordje Todorovic	f879939157	Recommit "[AggressiveInstCombine] Lower Table Based CTTZ"	2022-09-08 16:36:46 +02:00
Florian Hahn	422cf99161	[VPlan] Only generate single instr for loads uniform across all parts. VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a scalar instruction is generated per-part. This is a potential alternative D132892. For now the current patch only catches cases where the address is trivially invariant (defined outside VPlan), while D132892 catches any address that is considered invariant by SCEV AFAICT. It should be possible to hoist fully invariant recipes feeding loads out of the vector loop region as well, but in practice LICM should do that already. This version of the patch artificially limits this to loads to make it easier to compare, but this restriction should be easily liftable. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133019	2022-09-08 14:27:58 +01:00
Chenbing Zheng	01cea7ac10	[InstCombine] extractvalue (any_mul_with_overflow X, 2^n), 0 -> X << n Alive2: https://alive2.llvm.org/ce/z/JLmabt (umul) https://alive2.llvm.org/ce/z/J_ruXR (smul) https://alive2.llvm.org/ce/z/o9SVSz (vector) Reviewed By: spatel, RKSimon Differential Revision: https://reviews.llvm.org/D133188	2022-09-08 11:12:55 +08:00
Sami Tolvanen	52967a5306	[InstCombine] Fix a crash in -kcfi debug block Don't attempt to print out DebugLoc as we may not have one.	2022-09-07 22:59:12 +00:00
Marco Elver	97c2220565	[SanitizerBinaryMetadata] Introduce SanitizerBinaryMetadata instrumentation pass Introduces the SanitizerBinaryMetadata instrumentation pass which uses the new MD_pcsections metadata kinds to instrument certain types of instructions and functions required for breakpoint-based sanitizers. The first intended user of the binary metadata emitted will be a variant of GWP-TSan [1]. GWP-TSan will require information about atomic accesses; to unambiguously determine if an access is atomic or not, we also require "covered" information which code has been compiled with SanitizerBinaryMetadata instrumentation enabled. [1] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D130887	2022-09-07 21:25:40 +02:00
Sanjay Patel	85b289377b	[SCCP] convert signed div/rem to unsigned for non-negative operands, 2nd try The original commit ( `fe1f3cfc26` ) was reverted because it could crash / assert when trying to fold a value that was replaced by a constant. In that case, there might not be an entry for the constant in the solver yet. This version adds a check for that possibility along with tests to exercise that pattern (they used to crash). Original commit message: This extends the transform added with D81756 to handle div/rem opcodes. For example: https://alive2.llvm.org/ce/z/cX6za6 This replicates part of what CVP already does, but the motivating example from issue #57472 demonstrates a phase ordering problem - we convert branches to select before CVP runs and miss the transform. Differential Revision: https://reviews.llvm.org/D133198	2022-09-07 11:56:29 -04:00
Sanjay Patel	7c57180900	[InstCombine] fold add+negate through select into sub This transform came up as a potential DAGCombine in D133282, so I wanted to see how it escaped in IR too. We do general folds in InstCombiner::SimplifySelectsFeedingBinaryOp() by checking if either arm of a select simplifies when the trailing binop is threaded into the select. So as long as one side simplifies, it's a good fold to combine a negate and add into 1 subtract. This is an example with a zero arm in the select: https://alive2.llvm.org/ce/z/Hgu_Tj And this models the tests with a cancelling 'not' op: https://alive2.llvm.org/ce/z/BuzVV_ Differential Revision: https://reviews.llvm.org/D133369	2022-09-07 08:23:35 -04:00
Aaron Kogon	ae05b9dc30	Sink/hoist memory instructions between loop fusion candidates Currently, instructions in the preheader of the second of two fusion candidates are sunk and hoisted whenever possible, to try to allow the loops to fuse. Memory instructions are skipped, and are never sunk or hoisted. This change adds memory instructions for sinking/hoisting consideration. This change uses DependenceAnalysis to check if a mem inst in the preheader of FC1 depends on an instruction in FC0's header, across which it will be hoisted, or FC1's header, across which it will be sunk. We reject cases where the dependency is a data hazard. Differential Revision: https://reviews.llvm.org/D131606	2022-09-07 07:42:00 -04:00
Nikita Popov	f42d92611d	[Reassociate] Avoid ConstantExpr::getFNeg() (NFCI) Use ConstantFoldUnaryOpOperand() instead. Also make the code below robust against non-instruction users, just in case it doesn't fold.	2022-09-07 10:48:08 +02:00
Vitaly Buka	4c18670776	[NFC][sancov] Rename ModuleSanitizerCoveragePass	2022-09-06 20:55:39 -07:00
Vitaly Buka	5e38b2a456	[NFC][msan] Rename ModuleMemorySanitizerPass	2022-09-06 20:30:35 -07:00
Ruobing Han	fb45f3c948	[SimpleLoopUnswitch] Skip non-trivial unswitching of cold functions In the current main branch, all cold loops will not be applied non-trivial unswitch. As reported in D129599, skipping these cold loops will incur regression in SPEC benchmark. Thus, instead of skipping cold loops, now only skipping loops in cold functions. Reviewed By: alexgatea, aeubanks Differential Revision: https://reviews.llvm.org/D133275	2022-09-06 19:13:31 -04:00
Vitaly Buka	93600eb50c	[NFC][asan] Rename ModuleAddressSanitizerPass	2022-09-06 15:02:11 -07:00
Vitaly Buka	e7bac3b9fa	[msan] Convert Msan to ModulePass MemorySanitizerPass function pass violatied requirement 4 of function pass to do not insert globals. Msan nees to insert globals for origin tracking, and paramereters tracking. https://llvm.org/docs/WritingAnLLVMPass.html#the-functionpass-class Reviewed By: kstoimenov, fmayer Differential Revision: https://reviews.llvm.org/D133336	2022-09-06 15:01:04 -07:00
Vitaly Buka	b4257d3bf5	[tsan] Replace mem intrinsics with calls to interceptors After https://reviews.llvm.org/rG463aa814182a23 tsan replaces llvm intrinsics with calls to glibc functions. However this approach is fragile, as slight changes in pipeline can return llvm intrinsics back. In particular InstCombine can do that. Msan/Asan already declare own version of these memory functions for the similar purpose. KCSAN, or anything that uses something else than compiler-rt, needs to implement this callbacks. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D133268	2022-09-06 13:09:31 -07:00
Florian Hahn	27e7db54eb	Revert "[SCCP] convert signed div/rem to unsigned for non-negative operands" This reverts commit `fe1f3cfc26`. It looks like this commit breaks building llvm-test-suite. To reproduce, run `opt -passes=ipsccp` on the IR below. @g = internal global i32 256, align 4 define void @test() { entry: %0 = load i32, ptr @g, align 4 %div = sdiv i32 %0, undef ret void }	2022-09-06 18:21:51 +01:00
Florian Hahn	2fb68c0628	[ConstraintElimination] Replace pair with named struct (NFC). This slightly improves the readability and allows further extensions in follow-ups.	2022-09-06 18:04:04 +01:00
Vitaly Buka	c51a12d598	Revert "[tsan] Replace mem intrinsics with calls to interceptors" Breaks http://45.33.8.238/macm1/43944/step_4.txt https://lab.llvm.org/buildbot/#/builders/70/builds/26926 This reverts commit `77654a65a3`.	2022-09-06 09:47:33 -07:00
Sanjay Patel	ae117e1c1b	[InstCombine] remove dead code for add (select cond, (sub), 0); NFC This pattern is handled more generally in SimplifySelectsFeedingBinaryOp(). Tests to confirm that added to the add.ll test file in the previous commit.	2022-09-06 12:19:50 -04:00
Doru Bercea	0b1160fdeb	Fix OpenMP Opt for target without a parallel region. Remove ctx redeclaration. Format code. Remove parallel check. Modify tests. Clean-up code. Fix another test. Move code to helper functions. Format file. Minor fixes.	2022-09-06 16:04:53 +00:00
Vitaly Buka	77654a65a3	[tsan] Replace mem intrinsics with calls to interceptors After https://reviews.llvm.org/rG463aa814182a23 tsan replaces llvm intrinsics with calls to glibc functions. However this approach is fragile, as slight changes in pipeline can return llvm intrinsics back. In particular InstCombine can do that. Msan/Asan already declare own version of these memory functions for the similar purpose. KCSAN, or anything that uses something else than compiler-rt, needs to implement this callbacks. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D133268	2022-09-06 08:25:32 -07:00
Sanjay Patel	fe1f3cfc26	[SCCP] convert signed div/rem to unsigned for non-negative operands This extends the transform added with D81756 to handle div/rem opcodes. For example: https://alive2.llvm.org/ce/z/cX6za6 This replicates part of what CVP already does, but the motivating example from issue #57472 demonstrates a phase ordering problem - we convert branches to select before CVP runs and miss the transform. Differential Revision: https://reviews.llvm.org/D133198	2022-09-06 08:58:15 -04:00
Sanjay Patel	dd6eb4d67f	[InstCombine] reduce code duplication; NFC	2022-09-06 08:19:30 -04:00
Arthur Eubanks	7e3aa8f01a	Revert "[LoopPassManager] Implement and use LoopNestAnalysis::run() instead of manually creating LoopNests" This reverts commit `57fd866551`. Causes crashes, see comments in D132581.	2022-09-05 15:42:48 -07:00
Momchil Velikov	078899cd64	[SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions SimplifyCFG does some common code hoisting, which is limited to hoisting a sequence of identical instruction in identical order and stops at the first non-identical instruction. This patch allows hoisting instruction pairs over same-length sequences of non-matching instructions. The linear asymptotic complexity of the algorithm stays the same, there's an extra parameter `simplifycfg-hoist-common-skip-limit` serving to limit compilation time and/or the size of the hoisted live ranges. The patch improves SPECv6/525.x264_r by about 10%. Reviewed By: nikic, dmgreen Differential Revision: https://reviews.llvm.org/D129370	2022-09-05 15:13:46 +01:00
Tian Zhou	8fa432be4f	[InstCombine] reduce test-for-overflow of shifted value Fixes #57338. The added code makes the following transformations: For unsigned predicates / eq / ne: icmp pred (x << 1), x --> icmp getSignedPredicate(pred) x, 0 icmp pred x, (x << 1) --> icmp getSignedPredicate(pred) 0, x Some examples: https://alive2.llvm.org/ce/z/ckn4cj https://alive2.llvm.org/ce/z/h-4bAQ Differential Revision: https://reviews.llvm.org/D132888	2022-09-05 09:51:51 -04:00
Florian Hahn	408ebe5e3a	[VPlan] Move VPWidenCallRecipe to VPlanRecipes.cpp (NFC). Depends on D132585. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132586	2022-09-05 10:48:29 +01:00
Nikita Popov	388b684354	[LICM] Separate check for writability and thread-safety (NFCI) This used a single check to make sure that the object is both writable and thread-local. Separate them out to make the deficiencies in the current code more obvious.	2022-09-05 09:43:17 +02:00
Florian Hahn	ba3d29f871	[LCSSA] Update unreachable uses with poison. Users of LCSSA may not expect non-phi uses when checking the uses outside a loop, which may cause crashes. This is due to the fact that we do not update uses in unreachable blocks. To ensure all reachable uses outside the loop are phis, update uses in unreachable blocks to use poison in dead code. Fixes #57508.	2022-09-04 22:26:18 +01:00
Kazu Hirata	7d8c2d17eb	[llvm] Use range-based for loops (NFC) Identified with modernize-loop-convert.	2022-09-03 23:27:25 -07:00
Fangrui Song	9fc679b87c	[SanitizerCoverage] Simplify pc-table and improve test. NFC	2022-09-03 14:29:21 -07:00
Kazu Hirata	9eca5ed790	[llvm] Use std::enable_if_t (NFC)	2022-09-03 11:17:44 -07:00
Kazu Hirata	fedc59734a	[llvm] Use range-based for loops (NFC)	2022-09-03 11:17:40 -07:00
Sanjay Patel	22e1f66f26	[SCCP] add helper function for replacing signed operations; NFC Preliminary refactoring for planned enhancement in D133198.	2022-09-03 10:30:10 -04:00
Sanjay Patel	5c759edc57	[InstCombine] reduce another or-xor bitwise logic pattern ~(A & ?) \| (A ^ B) --> ~((A & ?) & B) https://alive2.llvm.org/ce/z/mxex6V This is similar to `9d218b61cc` where we peeked through another logic op to find a common operand.	2022-09-03 09:32:08 -04:00
Richard Smith	053841c562	Revert "[AggressiveInstCombine] Lower Table Based CTTZ" This reverts commit `fec01ee3f5`. According to asan, this patch introduces a heap use after free.	2022-09-02 16:19:09 -07:00
Francis Visoiu Mistrih	c5b10f348e	[Matrix] Use print instead of dump for matrix-print-after-transpose-opt We should be able to use this option even if LLVM_ENABLE_DUMP is not on. (should fix the bots too)	2022-09-02 16:12:21 -07:00
Francis Visoiu Mistrih	81bdb4068d	[Matrix] Simplify matmuls with scalars If one of the operands is a transposed splat, the transpose can be removed. This is useful to simplify when transposes are distributed to operands of a matmul: * k^T -> k * (A * k)^t -> A^t * k Differential Revision: https://reviews.llvm.org/D130177	2022-09-02 15:50:25 -07:00
Sameer Sahasrabuddhe	46b293cb3f	[Attributor] Simplify offset calculation for a constant GEP Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D132931	2022-09-02 23:53:51 +05:30
Arthur Eubanks	57fd866551	[LoopPassManager] Implement and use LoopNestAnalysis::run() instead of manually creating LoopNests The current code is basically just emulating what the analysis manager does. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D132581	2022-09-02 10:55:53 -07:00
Djordje Todorovic	fec01ee3f5	[AggressiveInstCombine] Lower Table Based CTTZ This patch introduces recognition of table-based ctz implementation during the AggressiveInstCombine. This fixes the [0]. [0] https://bugs.llvm.org/show_bug.cgi?id=46434 Differential Revision: https://reviews.llvm.org/D113291	2022-09-02 17:26:55 +02:00
Jolanta Jensen	958abe864a	[LoopLoadElim] Add stores with matching sizes as load-store candidates We are not building up a proper list of load-store candidates because we are throwing away stores where the type don't match the load. This patch adds stores with matching store sizes as candidates. Author of the original patch: David Sherwood. Differential Revision: https://reviews.llvm.org/D130233	2022-09-02 13:11:25 +01:00
Muhammad Omair Javaid	18de7c6a3b	Revert "[InstCombine] Treat passing undef to noundef params as UB" This reverts commit `c911befaec`. It has broken LLDB Arm/AArch64 Linux buildbots. I dont really understand the underlying reason. Reverting for now make buildbot green. https://reviews.llvm.org/D133036	2022-09-02 16:09:50 +05:00
Mikael Holmen	51d4c7ceea	[GlobalOpt] Fix debug variance problem in hasOnlyColdCalls hasOnlyColdCalls skipped over calls to intrinsics, but it did so after checking the linkage of the called function. This meant that the presence of a call to a debug intrinsic could affect the outcome of the optimization. In my original reproducer (for an out of tree target) it was particularly interesting, because the actual IR after GlobalOpt was not different with debug instrinsics present, so -print-after-all printouts didn't show anything there. However, without debuginfo, GlobalOpt went further and ran BlockFrequencyAnalysis and (more importanly) LoopAnalysis, and later on in the pipeline, instcombine behaved in different ways when LoopInfo was present. So a call to a dbg.declare prevented running LoopAnalysis in GlobalOpt, which later prevented InstCombine from doing an optimization. The dbg-intrinsic-loopanalysis.ll testcase tries to expose this. Then I also noted that adding a dbg.declare actually made the existing testcase colccc_coldsites.ll generate different code, so I modified that to now test it behaves the same way with and without the dbg.declare. Reviewed By: nikic, fhahn Differential Revision: https://reviews.llvm.org/D133193	2022-09-02 12:29:44 +02:00
Sergey Kachkov	be37caca00	[JumpThreading] Process range comparisions with non-local cmp instructions Use getPredicateOnEdge method if value is a non-local compare-with-a-constant instruction, that can give more precise results than getConstantOnEdge. Differential Revision: https://reviews.llvm.org/D131956	2022-09-02 12:22:45 +02:00
Nikita Popov	c453e5b901	Revert "[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI" This reverts commit `cd8f3e7581`. As pointed out by Eli on the review, this is missing an alignment check. The value might be written at an offset.	2022-09-02 09:28:48 +02:00
Nikita Popov	639d912282	[LICM] Allow load-only scalar promotion in the presence of unwinding Currently, we bail out of scalar promotion if the loop may unwind and the memory may be visible on unwind. This is because we can't insert stores of the promoted value on unwind edges. However, nowadays scalar promotion also has support for only promoting loads, while leaving stores in place. This kind of promotion is safe even in the presence of unwinding. Differential Revision: https://reviews.llvm.org/D133111	2022-09-02 09:27:13 +02:00
luxufan	cd8f3e7581	[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI For noop store of the form of LoadI and StoreI, An invariant should be kept is that the memory state of the related MemoryLoc before LoadI is the same as before StoreI. For this example: ``` define void @pr49927(i32* %q, i32* %p) { %v = load i32, i32* %p, align 4 store i32 %v, i32* %q, align 4 store i32 %v, i32* %p, align 4 ret void } ``` Here the definition of the store's destination is different with the definition of the load's destination, which it seems that the invariant mentioned above is broken. But the definition of the store's destination would write a value that is LoadI, actually, the invariant is still kept. So we can safely ignore it. Differential Revision: https://reviews.llvm.org/D132657	2022-09-02 06:37:41 +00:00
Vitaly Buka	ad3a77df2d	[msan] Fix debug info with getNextNode When we want to add instrumentation after an instruction, instrumentation still should keep debug info of the instruction. Reviewed By: kda, kstoimenov Differential Revision: https://reviews.llvm.org/D133091	2022-09-01 20:13:56 -07:00
Chenbing Zheng	d30cf77cb1	[InstCombine] complete fold extractvalue (any_mul_with_overflow X, -1) When we do extractvalue (any_mul_with_overflow X, -1) --> (-X and icmp), which left partly failed to match vector constant with poison element. This patch try to fix it. Alive2: https://alive2.llvm.org/ce/z/2rGp_3 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D132996	2022-09-02 10:58:42 +08:00
Vitaly Buka	ad2b356f85	[msan] Use no-origin functions when possible Saves 1.8% of .text size on CTMark Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133077	2022-09-01 19:18:38 -07:00
Arthur Eubanks	c911befaec	[InstCombine] Treat passing undef to noundef params as UB Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D133036	2022-09-01 15:16:45 -07:00
Rong Xu	0caa4a9559	[PGO] Support PGO annotation of CallBrInst We currently instrument CallBrInst but do not annotate it with the branch weight. This patch enables PGO annotation of CallBrInst. Differential Revision: https://reviews.llvm.org/D133040	2022-09-01 14:13:50 -07:00
Vitaly Buka	ef0f866718	[msan] Combine shadow check of the same instruction Reduces .text size by 1% on our large binary. On CTMark (-O2 -fsanitize=memory -fsanitize-memory-use-after-dtor -fsanitize-memory-param-retval) Size -0.4% Time -0.8% Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133071	2022-09-01 13:55:59 -07:00
Vitaly Buka	9110673062	[nfc][msan] Group checks per instruction It's a preparation of to combine shadow checks of the same instruction Reviewed By: kda, kstoimenov Differential Revision: https://reviews.llvm.org/D133065	2022-09-01 13:10:16 -07:00
Jordan Rupprecht	3031a250de	[MSan] Fix determinism issue when using msan-track-origins. When instrumenting `alloca`s, we use a `SmallSet` (i.e. `SmallPtrSet`). When there are fewer elements than the `SmallSet` size, it behaves like a vector, offering stable iteration order. Once we have too many `alloca`s to instrument, the iteration order becomes unstable. This manifests as non-deterministic builds because of the global constant we create while instrumenting the alloca. The test added is a simple IR file, but was discovered while building `libcxx/src/filesystem/operations.cpp` from libc++. A reduced C++ example from that: ``` // clang++ -fsanitize=memory -fsanitize-memory-track-origins \ // -fno-discard-value-names -S -emit-llvm \ // -c op.cpp -o op.ll struct Foo { ~Foo(); }; bool func1(Foo); void func2(Foo); void func3(int) { int f_st, t_st; Foo f, t; func1(f) \|\| func1(f) \|\| func1(t) \|\| func1(f) && func1(t); func2(f); } ``` Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133034	2022-09-01 09:15:57 -07:00
Nuno Lopes	858fe8664e	Expand Div/Rem: consider the case where the dividend is zero So we can't use ctlz in poison-producing mode	2022-09-01 17:04:26 +01:00
Nikita Popov	f5c178b6a4	[LICM] Remove unnecessary condition (NFC)	2022-09-01 15:42:35 +02:00
Nikita Popov	315aef667e	[LICM] Fix thread safety checks for promotion of byval args This code was relying on a very subtle contract: The expectation was that for non-allocas, the unwind safety check would already perform a capture check, so we don't need to perform it later. This held true when this unwind safety was only handled for allocas and noalias calls, but became incorrect when byval support was added. To avoid this kind of issue, just remove the dependency between the unwind and thread-safety checks entirely. At worst, this means we perform a redundant capture check. If this should turn out to be problematic for compile-time, we can cache that query in a more explicit way.	2022-09-01 15:33:46 +02:00
Sanjay Patel	c3d1504d63	[InstCombine] fix crash on type mismatch with fcmp fold The existing predicate doesn't work for a single-element vector, so make sure we are not crossing scalar/vector types. Test (was crashing) based on the post-commit example for: `4827771234`	2022-09-01 08:57:55 -04:00
Sanjay Patel	addbdac5d5	[InstCombine] fold power-of-2 ctlz/cttz with inverted result When X is a power-of-two or zero and zero input is poison: ctlz(i32 X) ^ 31 --> cttz(X) cttz(i32 X) ^ 31 --> ctlz(X) https://alive2.llvm.org/ce/z/Cs7sFE	2022-09-01 08:57:55 -04:00
Nikita Popov	3f8b1d0f15	[LICM] Add some debug output to scalar promotion (NFC)	2022-09-01 14:46:30 +02:00
Alexey Bataev	982d9ef1c1	[SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison. Need either follow the original order of the operands for bool logical ops, or emit freeze instruction to avoid poison propagation. Differential Revision: https://reviews.llvm.org/D126877	2022-09-01 05:34:45 -07:00
Yuanbo Li	ebd0249fcf	[DebugInfo] Missing debug location after replacement in processSRem function This patch fixes an issue in which CorrelatedValuePropagation::processSRem would create new instructions to represent the SRem instruction, but would not correctly copy any existing debug location metadata to the new instruction. Differential Revision: https://reviews.llvm.org/D132218	2022-09-01 13:18:17 +01:00
Florian Hahn	fc444ddc77	[VPlan] Add field to track if intrinsic should be used for call. (NFC) This patch moves the cost-based decision whether to use an intrinsic or library call to the point where the recipe is created. This untangles code-gen from the cost model and also avoids doing some extra work as the information is already computed at construction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132585	2022-09-01 13:14:40 +01:00
Nuno Lopes	fa154a9170	Revert "Expand Div/Rem: consider the case where the dividend is zero" This reverts commit `4aed09868b`.	2022-09-01 12:11:22 +01:00
Nuno Lopes	4aed09868b	Expand Div/Rem: consider the case where the dividend is zero So we can't use ctlz in poison-producing mode	2022-09-01 12:00:03 +01:00
Pavel Samolysov	527b9a9d90	[DeadArgElim] Use structure bindings in foreach loops. NFC Differential Revision: https://reviews.llvm.org/D133026	2022-09-01 13:48:46 +03:00
Nikita Popov	43e7d9af1d	[InstCombine] Fold extractvalue of phi Just as we do for most other operations, we should push extractvalue instructions through phis, if this does not increase unfolded instruction count.	2022-09-01 10:51:54 +02:00
Arthur Eubanks	04f3c20989	[NFC][LICM] Stop passing around unused BFI Uses of this were removed in `1a25d0bfbb`.	2022-08-31 19:15:34 -07:00
Vitaly Buka	53d1ae88f8	[nfc][msan] Prepare the code for check sorting	2022-08-31 15:36:49 -07:00
Nikita Popov	ab6876a40d	reland: [Local] Allow creating callbr with duplicate successors Since D129288, callbr is allowed to have duplicate successors. This patch removes a limitation which prevents optimizations from actually producing such callbrs. This is probably the riskiest of all the recent callbr changes, because code with incorrect assumptions might be lurking somewhere. I fixed the one case I encountered ahead of time in `8201e3ef5c`. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D129997 Originally landed as commit `08860f525a` ("[Local] Allow creating callbr with duplicate successors") Reverted in commit `1cf6b93df1` ("Revert "[Local] Allow creating callbr with duplicate successors"")	2022-08-31 13:23:00 -07:00
Alexey Bataev	588115c117	[SLP][NFC]Add a check for SelectInst to match description, NFC.	2022-08-31 13:04:21 -07:00
Alexey Bataev	d8d9ee10bb	[SLP][NFC]Fix comment and make function following naming standard, NFC.	2022-08-31 12:37:55 -07:00
Philip Reames	8524622bdc	[SLP] Simplify getOperandInfo implementation and be consistent This is NOT nfc. Specifically, the following behavior changes: * Pointers are now allowed. Both uniform, and constants. * FP uniform non-constants can now be recognized. * FP undefs are no longer considered constant. This matches int behavior which we had tests for. FP behavior was untested. Its not clear to me int behavior is reasonable, but it's what tests seem to expect, so go with minimum impact for now.	2022-08-31 12:24:05 -07:00
Nikita Popov	ad66bc42b0	[InstCombine] Use getInsertionPointAfterDef() in freeze fold This simplifies the code and fixes handling of catchswitch, in which case we have no insertion point for the freeze. Originally part of D129660.	2022-08-31 11:32:57 +02:00
Nikita Popov	8f3fd26b74	[Reassociate] Use getInsertionPointerAfterDef() This simplifies the code and fixes handling for the callbr case, where the instruction needs to be inserted in the normal destination, rather than after the terminator. Originally part of D129660.	2022-08-31 11:10:24 +02:00
Nikita Popov	972840aa3b	[IR] Add Instruction::getInsertionPointAfterDef() Transforms occasionally want to insert an instruction directly after the definition point of a value. This involves quite a few different edge cases, e.g. for phi nodes the next insertion point is not the next instruction, and for invokes and callbrs its not even in the same block. Additionally, the insertion point may not exist at all if catchswitch is involved. This adds a general Instruction::getInsertionPointAfterDef() API to implement the necessary logic. For now it is used in two places where this should be mostly NFC. I will follow up with additional uses where this fixes specific bugs in the existing implementations. Differential Revision: https://reviews.llvm.org/D129660	2022-08-31 10:50:10 +02:00
Fangrui Song	13f0795425	[SLPVectorizer] Fix -Wunused-lambda-capture in -DLLVM_ENABLE_ASSERTIONS=off build	2022-08-30 23:01:22 -07:00
Chenbing Zheng	35a3048c25	[InstCombine] add support for multi-use Y of (X op Y) op Z --> (Y op Z) op X For (X op Y) op Z --> (Y op Z) op X we can still do transform when Y is multi-use. In D131356 limit it to one-use, this patch remove this limit. This is still not a complete solution, I add a todo test to show it. In this case, X and Y are both multi use, we can't differentiate how to convert based on this. But at least we don't make the code worse，and it can solve half the scenarios.	2022-08-31 10:55:05 +08:00
Alexey Bataev	ec06df9459	[SLP]Fix PR57447: Assertion `!getTreeEntry(V) && "Scalar already in tree!"' failed. The pointer operands for the ScatterVectorize node may contain non-instruction values and they are not checked for "already being vectorized". Need to check that such pointers are already vectorized and gather them instead of trying to build vectorize node to avoid compiler crash. Differential Revision: https://reviews.llvm.org/D132949	2022-08-30 12:30:14 -07:00
Sanjay Patel	8a19842c0e	[InstCombine] delete redundant folds; NFC InstSimplify does this via isKnownNonEqual(), so it's already using knownbits on these patterns and trying other folds.	2022-08-30 14:21:29 -04:00
Alexey Bataev	afbf5466ba	[SLP]Improve operands kind analaysis for constants. Removed EnableFP parameter in getOperandInfo function since it is not needed, the operands kinds also controlled by the operation code, which allows to remove extra check for the type of the operands. Also, added analysis for uniform constant float values. This change currently does not trigger any changes in the code since TTI does not do analysis for constant floats, so it can be considered NFC. Tested with llvm-test-suite + SPEC2017, no changes. Differential Revision: https://reviews.llvm.org/D132886	2022-08-30 06:35:39 -07:00
zhongyunde	23a5de4294	[InstCombine] Distributive or+mul with const operand We aleady support the transform: `(X+C1)CI -> XCI+C1CI` Here the case is a little special as the form of `(X+C1)CI` is transformed into `(X\|C1)CI`, so we should also support the transform: `(X\|C1)CI -> XCI+C1CI` Fixes https://github.com/llvm/llvm-project/issues/57278 Reviewed By: bcl5980, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D132658	2022-08-30 20:36:52 +08:00
Florian Hahn	b5e208fcba	[DSE] Support looking through memory phis at end of function. Update isWriteAtEndOfFunction to look through MemoryPhis. The reason MemoryPhis were skipped so far was the known AliasAnalysis issue with it missing loop-carried dependences. This problem is already addressed in other parts of the code by skipping MemoryDefs that may be in difference loops. I think the same logic can be applied here. This can have a substantial impact on the number of stores removed in some cases. For MultiSource/SPEC2006/SPEC2017 with -O3: ``` Metric: dse.NumFastStores Program dse.NumFastStores base patch diff External/S...CINT2017rate/557.xz_r/557.xz_r 14.00 45.00 221.4% External/S...te/538.imagick_r/538.imagick_r 439.00 1267.00 188.6% MultiSourc...e/Applications/SIBsim4/SIBsim4 6.00 15.00 150.0% MultiSourc...Prolangs-C/simulator/simulator 3.00 7.00 133.3% MultiSource/Applications/siod/siod 3.00 7.00 133.3% MultiSourc...arks/FreeBench/distray/distray 6.00 9.00 50.0% MultiSourc...e/Applications/obsequi/Obsequi 22.00 30.00 36.4% MultiSource/Benchmarks/Ptrdist/bc/bc 23.00 28.00 21.7% External/S...NT2017rate/502.gcc_r/502.gcc_r 1258.00 1512.00 20.2% External/S...te/520.omnetpp_r/520.omnetpp_r 954.00 1143.00 19.8% External/S...rate/510.parest_r/510.parest_r 5961.00 7122.00 19.5% External/S...C/CINT2006/445.gobmk/445.gobmk 47.00 56.00 19.1% External/S...00.perlbench_r/500.perlbench_r 241.00 286.00 18.7% External/S...NT2006/471.omnetpp/471.omnetpp 36.00 42.00 16.7% External/S...06/400.perlbench/400.perlbench 183.00 210.00 14.8% MultiSource/Applications/SPASS/SPASS 72.00 81.00 12.5% External/S...17rate/541.leela_r/541.leela_r 72.00 80.00 11.1% External/SPEC/CINT2006/403.gcc/403.gcc 585.00 642.00 9.7% MultiSourc...e/Applications/sqlite3/sqlite3 120.00 131.00 9.2% MultiSourc...Applications/hexxagon/hexxagon 11.00 12.00 9.1% External/S.../CFP2006/453.povray/453.povray 566.00 615.00 8.7% External/S...rate/511.povray_r/511.povray_r 578.00 627.00 8.5% External/S...FP2006/482.sphinx3/482.sphinx3 12.00 13.00 8.3% MultiSource/Applications/oggenc/oggenc 130.00 140.00 7.7% MultiSourc...e/Applications/ClamAV/clamscan 250.00 268.00 7.2% MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg 19.00 20.00 5.3% MultiSourc...ch/consumer-jpeg/consumer-jpeg 19.00 20.00 5.3% External/S...te/526.blender_r/526.blender_r 3747.00 3928.00 4.8% MultiSourc...OE-ProxyApps-C++/miniFE/miniFE 104.00 108.00 3.8% MultiSourc...ch/consumer-lame/consumer-lame 54.00 56.00 3.7% MultiSource/Benchmarks/Bullet/bullet 1222.00 1264.00 3.4% MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 973.00 1005.00 3.3% External/S.../CFP2006/447.dealII/447.dealII 2699.00 2780.00 3.0% External/S...06/483.xalancbmk/483.xalancbmk 788.00 810.00 2.8% External/S.../CFP2006/450.soplex/450.soplex 180.00 185.00 2.8% MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR 338.00 345.00 2.1% MultiSourc...Benchmarks/7zip/7zip-benchmark 685.00 699.00 2.0% External/S...FP2017rate/544.nab_r/544.nab_r 158.00 160.00 1.3% MultiSourc...sumer-typeset/consumer-typeset 772.00 781.00 1.2% External/S...2017rate/525.x264_r/525.x264_r 410.00 414.00 1.0% External/S...23.xalancbmk_r/523.xalancbmk_r 998.00 1002.00 0.4% ``` Compile-time is almost neutral: https://llvm-compile-time-tracker.com/compare.php?from=b3125ad3d60531a97eea20009cc9629a87755862&to=84007eee59004f43464eda7f5ba8263ed5158df8&stat=instructions NewPM-O3: +0.03% NewPM-ReleaseThinLTO: -0.01% NewPM-ReleaseLTO-g: +0.03% Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D132365	2022-08-30 13:27:51 +01:00
OCHyams	84a71d5259	[DebugInfo] Fix line number attribution in mldst-motion Taking the example from the test included in this patch: $ cat test.cpp -n 1 void fun(int *a, int cond) { 2 if (cond) 3 a[1] = 1; 4 else 5 a[1] = 2; 6 } mldst-motion will merge and sink the stores in if.then and if.else into if.end. The resultant PHI, gep and store should be attributed line zero with the innermost common scope rather than picking a debug location from one of the original stores. Reviewed By: djtodoro Differential Revision: https://reviews.llvm.org/D132741	2022-08-30 10:03:53 +01:00
jacquesguan	df525c7705	[InstCombine] fold fake floating point vector extract to shift+trunc. This patch supports the FP part of D111082. Differential Revision: https://reviews.llvm.org/D125750	2022-08-30 10:12:16 +08:00
Rong Xu	d7ef0c3970	[llvm-profdata] Improve profile supplementation Current implementation promotes a non-cold function in the SampleFDO profile into a hot function in the FDO profile. This is too aggressive. This patch promotes a hot functions in the SampleFDO profile into a hot function, and a warm function in SampleFDO into a warm function in FDO. Differential Revision: https://reviews.llvm.org/D132601	2022-08-29 16:50:42 -07:00
Philip Reames	8936d86469	[LV] Add debug output for force scalar tracing [nfc] I keep finding myself needing to rule this out as a possible source of scalarization, so add debug output like we have for other instructions we decide to scalarize.	2022-08-29 15:17:51 -07:00
Valery N Dmitriev	329b972d41	[SLP] Try to match reductions before trying to vectorize a vector build sequence. This patch changes order of searching for reductions vs other vectorization possibilities. The idea is if we do not match a reduction it won't be harmful for further attempts to find vectorizable operations on a vector build sequences. But doing it in the opposite order we have good chance to ruin opportunity to match a reduction later. We also don't want to try vectorizing binary operations too early as 2-way vectorization may effectively prohibit wider ones leading to producing less effective code. Differential Revision: https://reviews.llvm.org/D132590	2022-08-29 13:32:14 -07:00
Philip Reames	033a97a8f3	[LV] Minor code restructure of isUniformAfterVectorization [nfc] Mostly just to make a future patch easier to review.	2022-08-29 12:48:27 -07:00
Philip Reames	c37b1a5f76	[RLEV] Pick a correct insert point when incoming instruction is itself a phi node This fixes https://github.com/llvm/llvm-project/issues/57336. It was exposed by a recent SCEV change, but appears to have been a long standing issue. Note that the whole insert into the loop instead of a split exit edge is slightly contrived to begin with; it's there solely because IndVarSimplify preserves the CFG. Differential Revision: https://reviews.llvm.org/D132571	2022-08-29 11:44:33 -07:00
Alexey Bataev	beacf9bd9e	[SLP]Fix PR57322: vectorize constant float stores. Stores for constant floats must be vectorized, improve analysis in SLP vectorizer for stores. Differential Revision: https://reviews.llvm.org/D132750	2022-08-29 11:02:53 -07:00
Alexey Bataev	e6345bf644	[SLP]Improve lookup of the buildvector top insertelement instruction. When estimating the cost of the in-tree vectorized scalars in buildvector sequences, need to take into account the vectorized insertelement instruction. The top of the buildvector seuences is the topmost vectorized insertelement instruction, because it will have > than 1 use after the vectorization. For the affected test case improves througput from 21 to 16 (per llvm-mca). Differential Revision: https://reviews.llvm.org/D132740	2022-08-29 08:19:52 -07:00

... 3 4 5 6 7 ...

31780 Commits