llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	53fa00b3ae	LoopUnroll: Pass through AssumptionCache (NFC) Using these queries with a context instruction and without a cache seems to be about 2x slower than with it so this theoretically improves compile time.	2022-09-26 14:52:59 -04:00
Ruiling Song	a5676a3a7e	StructurizeCFG: Set Undef for non-predecessors in setPhiValues() During structurization process, we may place non-predecessor blocks between the predecessors of a block in the structurized CFG. Take the typical while-break case as an example: ``` /---A(v=...) \| / \ ^ B C \| \ /\| \---L \| \ / E (r = phi (v:C)...) ``` After structurization, the CFG would be look like: ``` /---A \| \|\ \| \| C \| \|/ \| F1 ^ \|\ \| \| B \| \|/ \| F2 \| \|\ \| \| L \ \|/ \--F3 \| E ``` We can see that block B is placed between the predecessors(C/L) of E. During phi reconstruction, to achieve the same sematics as before, we are reconstructing the PHIs as: F1: v1 = phi (v:C), (undef:A) F3: r = phi (v1:F2), ... But this is also saying that `v1` would be live through B, which is not quite necessary. The idea in the change is to say the incoming value from B is Undef for the PHI in E. With this change, the reconstructed PHI would be: F1: v1 = phi (v:C), (undef:A) F2: v2 = phi (v1:F1), (undef:B) F3: r = phi (v2:F2), ... Reviewed by: sameerds Differential Revision: https://reviews.llvm.org/D132450	2022-09-26 09:54:47 +08:00
Ruiling Song	40e9284f3c	StructurizeCFG: prefer reduced number of live values The instruction simplification will try to simplify the affected phis. In some cases, this might extend the liveness of values. For example: BB0: \| \ \| BB1 \| / BB2:phi (BB0, v), (BB1, undef) The phi in BB2 will be simplified to v as v dominates BB2, but this is increasing the number of active values in BB1. By setting CanUseUndef to false, we will not simplify the phi in this way, this would help register pressure. This is mandatory for the later change to help reducing VGPR pressure for AMDGPU. Reviewed by: foad, sameerds Differential Revision: https://reviews.llvm.org/D132449	2022-09-26 09:54:47 +08:00
Douglas Yung	91e0423595	Revert "[SROA] Create additional vector type candidates based on store and load slices" This reverts commit `de3445e0ef`. This is causing GHI #57796 and #57821.	2022-09-23 12:24:07 -07:00
Douglas Yung	0a7f4e03a9	Revert "[SROA] Check typeSizeEqualsStoreSize in isVectorPromotionViable" This reverts commit `3f08d248c4`. The commit this change is fixing is being reverted due to GHI #57796 and #37821, so revert this commit as well.	2022-09-23 12:24:07 -07:00
Teresa Johnson	b1926f308f	Restore "[MemProf] Memprof profile matching and annotation" This reverts commit `794b7ea960`, and thus restores commit `a212d8da94`, and follow on fixes `0cd6763fa9`, `e9ff53d42f`, and `37c6a25e9a`. Use a hash function (BLAKE3) instead of hash_combine/hash_code which are not guaranteed to be stable across executions. Additionally, it adds a "REQUIRES: x86_64-linux" to the tests that have raw profile inputs to avoid failures on big endian bots. Reviewers: snehasish, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D128142	2022-09-23 11:38:47 -07:00
Florian Hahn	2c692d891e	[LV] Update handling of scalable pointer inductions after b73d2c8. The dependent code has been changed quite a lot since `151c144` which b73d2c8 effectively reverts. Now we run into a case where lowering didn't expect/support the behavior pre `151c144` any longer. Update the code dealing with scalable pointer inductions to also check for uniformity in combination with isScalarAfterVectorization. This should ensure scalable pointer inductions are handled properly during epilogue vectorization. Fixes #57912.	2022-09-23 18:23:02 +01:00
Dmitri Gribenko	954d3cd2c6	Revert "[AggressiveInstCombine] Combine consecutive loads which are being merged to form a wider load." This reverts commit `3c70c8c1df`. After this commit, during the 3-stage bootstrap the second-stage Clang crashes.	2022-09-23 19:21:09 +02:00
Philip Reames	954c1ed009	[SLP] Adjust debug output for store vectorization failure When store vectorization is infeasible, it's helpful to have a debug logging indication of why. A case I've hit a couple times now is accidentally using -march instead of -mtriple and getting the default TTI results. This causes max-vf to become 1, and thus hits the added logging line.	2022-09-23 09:58:15 -07:00
Philip Reames	42ef572049	[SLP] Fix cost model w.r.t. operand properties We allow the target to report different costs depending on properties of the operands; given this, we have to make sure we pass the right set of operands and account for the fact that different scalar instructions can have operands with different properties. As a motivating example, consider a set of multiplies which each multiply by a constant (but not all the same constant). Most of the constants are power of two (but not all). If the target doesn't have support for non-uniform constant immediates, this will likely require constant materialization and a non-uniform multiply. However, depending on the balance of target costs for constant scalar multiplies vs a single vector multiply, this might or might not be a profitable vectorization. This ends up basically being a rewrite of the existing code. Normally, I'd scope the change more narrowly, but I kept noticing things which seemed highly suspicious, and none of the existing code appears to have any test coverage at all. I think this is a case where simply throwing out the existing code and starting from scratch is reasonable. This is a follow on to Alexey's D126885, but also handles the arithmetic instruction case since the existing code appears to have the same problem. Differential Revision: https://reviews.llvm.org/D132566	2022-09-23 08:40:23 -07:00
Florian Hahn	d72eb9c985	[LoopDeletion] Invalidate SCEV after moving instruction. LoopDeletion may hoist instructions out of a loop using makeLoopInvariant without invalidating the SCEV for the moved instruction. Moving the instruction to a different block may change its cached block disposition, so invalidate the cached info. Fixes #57837.	2022-09-23 15:14:11 +01:00
Simon Pilgrim	a6e9141505	[TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436) The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both. This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling. Fixes #50778 Differential Revision: https://reviews.llvm.org/D111968	2022-09-23 14:03:18 +01:00
Nikita Popov	fe196380cc	[FunctionAttrs] Use MemoryLocation::getOrNone() when infering memory attrs MemoryLocation::getOrNone() already has the necessary logic to handle different instruction types. Use it, rather than repeating a subset of the logic. This adds support for previously unhandled instructions like atomicrmw.	2022-09-23 13:56:55 +02:00
Florian Hahn	623c4a7a55	[LoopVersioning] Invalidate SCEV for phi if new values are added. After `20d798bd47`, SCEV looks through PHIs with a single incoming value. This means adding a new incoming value may change the SCEV for a phi. Add missing invalidation when an existing PHI is reused during LoopVersioning. New incoming values will be added later from the versioned loop. Similar issues have been fixed by also adding missing invalidation. Fixes #57825. Note that the test case unfortunately requires running loop-vectorize followed by loop-load-elimination, which does the actual versioning. I don't think it is possible to reproduce the failure without that combination.	2022-09-23 11:53:29 +01:00
bipmis	3c70c8c1df	[AggressiveInstCombine] Combine consecutive loads which are being merged to form a wider load. The patch simplifies some of the patterns as below 1. (ZExt(L1) << shift1) \| (ZExt(L2) << shift2) -> ZExt(L3) << shift1 2. (ZExt(L1) << shift1) \| ZExt(L2) -> ZExt(L3) The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc. Differential Revision: https://reviews.llvm.org/D127392	2022-09-23 10:19:50 +01:00
Teresa Johnson	794b7ea960	Revert "[MemProf] Memprof profile matching and annotation" This reverts commit `a212d8da94`, and follow on fixes `0cd6763fa9`, `e9ff53d42f`, and `37c6a25e9a`. After re-reading the documentation for hash_combine, I don't think this is the appropriate hash function to use for computing the hash to use as a stack id in the metadata, since it is not guaranteed to produce stable values across executions. I have not hit this problem, but plan to switch to using an MD5 hash. I am hitting an issue with one of the bots (https://lab.llvm.org/buildbot/#/builders/171/builds/20732) where the values produced are only the lower 32 bits of the expected hash values, however, which I assume is related to the implementation of hash_combine and hash_code. I believe I fixed all of the other bot failures with the follow on fixes, which I'll merge into the new version before reapplying.	2022-09-22 16:08:03 -07:00
Teresa Johnson	e9ff53d42f	[MemProf] Fix buildbot error due to unused variable from bad merge Fix an unused variable warning introduced by `a212d8da94` due to a bad merge with a recent change. E.g. in https://lab.llvm.org/buildbot/#/builders/77/builds/22095	2022-09-22 13:24:33 -07:00
Teresa Johnson	a212d8da94	[MemProf] Memprof profile matching and annotation Profile matching and IR annotation for memprof profiles. See also related RFCs: RFC: Sanitizer-based Heap Profiler [1] RFC: A binary serialization format for MemProf [2] RFC: IR metadata format for MemProf [3]* * Note that the IR metadata format has changed from the RFC during implementation, as described in the preceeding patch adding the basic metadata and verification support. The matching is performed during the normal PGO annotation phase, to ensure that the inlines applied in the IR at that point are a subset of the inlines in the profiled binary and thus reflected in the profile's call stacks. This is important because the call frames are associated with functions in the profile based on the inlining in the symbolized call stacks, and this simplifies locating the subset of profile data relevant for matching onto each function's IR. The PGOInstrumentationUse pass is enhanced to perform matching for whatever combination of memprof and regular PGO profile data exists in the profile. Using the utilities introduced in D128854: The memprof profile data for each context is converted to "cold" or "notcold" based on parameterized thresholds for size, access count, and lifetime. The memprof allocation contexts are trimmed to the minimal amount of context required to uniquely identify whether the context is cold or not cold. For allocations where all profiled contexts have the same allocation type, no memprof metadata is attached and instead the allocation call is directly annotated with an attribute specifying the alloction type. This is the same attributed that will be applied to allocation calls once cloned for different contexts, and later used during LibCall simplification to emit allocation hints [4]. Depends on D128141 and D128854. [1] https://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html [2] https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html [3] https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165 [4] `ab87cf382d` Differential Revision: https://reviews.llvm.org/D128142	2022-09-22 12:48:31 -07:00
Leonard Chan	21b03bf970	[llvm] Handle dso_local_equivalent in FunctionComparator This addresses https://github.com/llvm/llvm-project/issues/51066. Prior to this, dso_local_equivalent would lead to an llvm_unreachable in a switch in the FunctionComparator. This adds a conservative case in that switch that just compares the underlying functions. Differential Revision: https://reviews.llvm.org/D134300	2022-09-22 18:42:31 +00:00
Philip Reames	32dc1151e2	[VPlan] Only generate single instr for unpredicated stores of varying value to invariant address This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.) This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.) Differential Revision: https://reviews.llvm.org/D133580	2022-09-22 08:53:46 -07:00
Nikita Popov	8df376db72	[InstCombine] Remove buggy zext of icmp eq with pow2 fold (PR57899) For the case where the constant is a power of two rather than zero, the fold is incorrect, because it fails to check that the bit set in the LHS matches the bit in the RHS. Rather than fixing this, remove the power of two handling entirely, as a different fold will already canonicalize such comparisons to use a zero constant. Fixes https://github.com/llvm/llvm-project/issues/57899.	2022-09-22 16:37:10 +02:00
Nikita Popov	c2e76f914c	[InstCombine] Use simplifyWithOpReplaced() for non-bool selects Perform the simplifyWithOpReplaced() fold even for non-bool selects. This subsumes a number of recently added folds for zext/sext of the condition. We still need to manually handle variations with both sext/zext and not, because simplifyWithOpReplaced() only performs one level of replacements.	2022-09-22 15:46:00 +02:00
Nikita Popov	41dde5d858	[InstSimplify] Support vectors in simplifyWithOpReplaced() We can handle vectors inside simplifyWithOpReplaced(), as long as cross-lane operations are excluded. The equality can hold (or not hold) for each vector lane independently, so we shouldn't use the replacement value from other lanes. I believe the only operations relevant here are shufflevector (where all previous bugs were seen) and calls (which might use shuffle-like intrinsics and would require more careful classification). Differential Revision: https://reviews.llvm.org/D134348	2022-09-22 10:45:42 +02:00
Congzhe Cao	22c91df52c	[LoopInterchange][PR57148] Ensure the correct form of IR after transformation This is a bugfix patch that resolves the following two bugs in loop interchange: 1. PR57148 which is an assertion error due to of loss of LCSSA form after interchange, as referred to test1() in pr57148.ll. 2. Use before def for the outermost loop induction variables after interchange, as referred to test2() in pr57148.ll. The fix in this patch is that: 1. In cases where the LCSSA form is not maintained after interchange, we update the IR to the LCSSA form again. 2. We split the phi nodes in the inner loop header into a separate basic block to avoid the situation where use of the outer indvar appears before its def after interchange. Previously we already did this for innermost loops, now we do it for non-innermost loops (e.g., middle loops) as well. Reviewed By: bmahjour, Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D132055	2022-09-22 00:20:53 -04:00
Congzhe Cao	6782d71680	[LoopPassManager] Ensure to construct loop nests with the outermost loop This patch is to resolve the bug reported and discussed in https://reviews.llvm.org/D124926#3718761 and https://reviews.llvm.org/D124926#3719876. The problem is that loop interchange is a loopnest pass under the new pass manager, but the loop nest may not be constructed correctly by the loop pass manager after running loop interchange and before running the next pass, which might cause problems when it continues running the next pass. The reason that the loop nest is constructed incorrectly is that the outermost loop might have changed after interchange, and what was the original outermost loop is not the current outermost loop anymore. Constructing the loop nest based on the original outermost loop would generate an invalid loop nest. The fix in this patch is that, in the loop pass manager before running each loopnest pass, we re-cosntruct the loop nest based on the current outermost loop, if LPMUpdater notifies the loop pass manager that the previous loop nest has been invalidated by passes like loop interchange. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D132199	2022-09-21 23:59:26 -04:00
Vitaly Buka	ba39a6e14a	[msan] Instrument vtest instrinsics Instrumentation just ORs shadow of inputs. I assume some result shadow bits can be reset if we go into specifics of particular checks, but as-is it is still an improvement against existing default strict instruction handler, when every set bit of input shadow is reported as an error. Reviewed By: kda Differential Revision: https://reviews.llvm.org/D134123	2022-09-21 16:57:44 -07:00
Vitaly Buka	6fd959d625	[msan] Handle x86_avx_cmp_pd_256 and x86_avx_cmp_ps_256 Removed FIXME which looks irrelevant. The error message happens only without -mattr=+avx. E.g. GOOD: opt llvm/test/Instrumentation/MemorySanitizer/avx-intrinsics-x86.ll -passes=msan -o - \| llc -O3 -o /dev/null -mattr=+avx BAD: opt llvm/test/Instrumentation/MemorySanitizer/avx-intrinsics-x86.ll -passes=msan -o - \| llc -O3 -o /dev/null So nothing to fix here. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D134119	2022-09-21 15:17:02 -07:00
Sanjay Patel	ee0bf64722	[InstCombine] try to fold mul by neg-power-of-2 to shl `(A * -2**C) + B --> B - (A << C)` https://alive2.llvm.org/ce/z/A6BWkf This inverts what Negator was doing before: D134310 / `0f32a5dea0` Analysis and codegen are generally better without multiply, so we should favor this form even if we trade add for sub (because those are generally equivalent cost operations).	2022-09-21 15:09:39 -04:00
Sanjay Patel	64d309131a	[InstCombine] try multi-use demanded bits fold for 'sub' This is similar to D133788 / `73919a87e9`, but for sub the transform is valid only for low zeros in operand 1. https://alive2.llvm.org/ce/z/EmRsXC	2022-09-21 14:13:05 -04:00
Konstantina	80d3ed6fb1	[NFC][NewGVN] Remove OpIsSafeForPHIOfOpsHelper() Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D130949	2022-09-21 09:25:59 -07:00
Alexey Bataev	e664dea182	[SLP]Fix write-after-bounds. Mask might be larger than the NumElts-OffsetBeg, need to use actual indices to avoid acces out of bounds.	2022-09-21 08:00:15 -07:00
Matt Arsenault	0d1f040749	LICM: Pass through AssumptionCache	2022-09-21 09:16:17 -04:00
Sanjay Patel	0f32a5dea0	[InstCombine] don't canonicalize shl+sub to mul+add This stops Negator from transforming: `C1 - shl X, C2 --> mul X, (1<<C2) + C1` ...in the general case. There does not seem to be any analysis benefit to using mul in IR, and there's definitely downside in codegen (particularly when the multiply has to be expanded). If `C1` is 0, then there's a stronger argument that the single mul is a better canonicalization than negate-of-shl, but we may want to remove that too. This was noted as a potential conflict for D133667. Differential Revision: https://reviews.llvm.org/D134310	2022-09-21 08:39:07 -04:00
Bjorn Pettersson	3f08d248c4	[SROA] Check typeSizeEqualsStoreSize in isVectorPromotionViable Commit `de3445e0ef` (https://reviews.llvm.org/D132096) made changes to isVectorPromotionViable basically doing // Create Vector with size of V, and each element of type Ty ... uint64_t ElementSize = DL.getTypeStoreSizeInBits(Ty).getFixedSize(); uint64_t VectorSize = DL.getTypeSizeInBits(V).getFixedSize(); ... VectorType VTy = VectorType::get(Ty, VectorSize / ElementSize, false); Not quite sure why it uses the TypeStoreSize for the ElementSize, but the new vector would only match in size with the old vector in situations when the TypeStoreSize equals the TypeSize for Ty. Therefore this patch adds a typeSizeEqualsStoreSize check as yet another condition for allowing the the new type as a promotion candidate. Without this fix the new @test15 test would fail with an assert like this: opt: ../lib/Transforms/Scalar/SROA.cpp:1966: auto isVectorPromotionViable(llvm::sroa::Partition &, const llvm::DataLayout &) ::(anonymous class)::operator()(llvm::VectorType , llvm::VectorType *) const: Assertion `DL.getTypeSizeInBits(RHSTy).getFixedSize() == DL.getTypeSizeInBits(LHSTy).getFixedSize() && "Cannot have vector types of different sizes!"' failed. ... #8 isVectorPromotionViable(...)::$_10::operator()... #9 llvm::SROAPass::rewritePartition(...) #10 llvm::SROAPass::splitAlloca(...) #11 llvm::SROAPass::runOnAlloca(...) #12 llvm::SROAPass::runImpl(...) #13 llvm::SROAPass::run(...) Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D134032	2022-09-21 09:45:05 +02:00
Michael Berg	897a79f970	[DSE] Add value type info checks for masked store candidates in Dead Store Elimination. The type information of the store values can diverge when checking for valid mask store candidates to eliminate via DSE. This patch checks for equivalence wrt to size and element count. Reviewed By: fhahn, rui.zhang Differential Revision: https://reviews.llvm.org/D132700	2022-09-20 15:54:25 -07:00
Markus Böck	b751da43b2	[InstCombine] Handle integer extension in `select` patterns using the condition as value These patterns were previously only implemented for i1 type but can be extended for any integer type by also handling zext and sext operands. Differential Revision: https://reviews.llvm.org/D134142	2022-09-20 22:25:13 +02:00
Zain Jaffal	68cc35d52c	[InstCombine] Matrix multiplication negation optimisation If one of the operands in a matrix multiplication is negated we can optimise the equation by moving the negation to the smallest element of the operands or the result. Reviewed By: spatel, fhahn Differential Revision: https://reviews.llvm.org/D133300	2022-09-20 19:50:39 +01:00
Gulfem Savrun Yeniceri	f039a9fa32	[InstrProfiling] Emit runtime hook only once This patch fixes the issue about calling emitRuntimeHook() twice when we need to unconditionally emit runtime hook as discussed in https://reviews.llvm.org/rGd6aed77f0d19. Differential Revision: https://reviews.llvm.org/D134254	2022-09-20 17:00:46 +00:00
Kazu Hirata	00874c48ea	[IPO] Reorder parameters of InlineFunction (NFC) With the recent addition of new parameter MergeAttributes (D134117), callers need to specify several default parameters before getting to specify the new parameter. This patch reorders the parameters so that callers do not have to specify as many default parameters. Differential Revision: https://reviews.llvm.org/D134125	2022-09-20 09:09:38 -07:00
Simon Pilgrim	09cb9fdef9	[InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635) Alive2: https://alive2.llvm.org/ce/z/sZ6wwS As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero. Differential Revision: https://reviews.llvm.org/D134172	2022-09-20 16:44:41 +01:00
Florian Hahn	dcbc8a0daa	[LV] Remove unused widenCallInstruction declaration (NFC). The definition and uses have been removed a while ago. Clean up the unused declaration.	2022-09-20 15:20:28 +01:00
Djordje Todorovic	f0f8b46863	Recommit "[AggressiveInstCombine] Lower Table Based CTTZ The bug reported on the [0] has been fixed. The issue was we have not checked if the global variables that represent cttz tables was constant. There is a new negative test added in negative-lower-table-based-cttz.ll that represents this. [0] https://reviews.llvm.org/rGdf868edee561eb973edd85ec9df41c67aa0bff6b	2022-09-20 13:12:47 +02:00
Dmitri Gribenko	5d7ff0d87c	Fix an unused warning in release build	2022-09-20 11:29:39 +02:00
eopXD	3b2011fd4f	[LSR] Fold terminating condition to other IV when possible When the IV is only used by the terminating condition (say IV-A) and the loop has a predictable back-edge count and we have another IV (say IV-B) that is an affine add recursion, we will be able to calculate the terminating value of IV-B in the loop pre-header. This patch adds attempts to replace IV-B as the new terminating condition and remove IV-A. It is safe to do so since IV-A is only used as the terminating condition. This transformation is suitable to be appended after LSR as it may optimize the loop into the situation mentioned above. The transformation can reduce number of IV-s in the loop by one. A cli option `lsr-term-fold` is added and default disabled. Reviewed By: mcberg2021, craig.topper Differential Revision: https://reviews.llvm.org/D132443	2022-09-20 01:38:47 -07:00
Vitaly Buka	4fa8df20ff	[msan] Handle shadow of masked instruction Origin handling is not implemented yet. Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133682	2022-09-19 17:57:43 -07:00
Matt Arsenault	2adae8e1b7	VectorCombine: Pass through AssumptionCache	2022-09-19 19:25:22 -04:00
Matt Arsenault	c867401407	MemCpyOpt: Pass through AssumptionCache	2022-09-19 19:25:22 -04:00
Matt Arsenault	555af0274c	SLPVectorizer: Pass through AssumptionCache	2022-09-19 19:25:22 -04:00
Matt Arsenault	b609741958	LoopVectorize: Pass through AssumptionCache	2022-09-19 19:25:22 -04:00
Matt Arsenault	84a2e48ce6	GVN: Pass through AssumptionCache to queries	2022-09-19 19:25:22 -04:00
Matt Arsenault	ce44357216	Analysis: Add AssumptionCache to isSafeToSpeculativelyExecute Does not update any of the uses.	2022-09-19 19:25:22 -04:00
Matt Arsenault	fd37ab6cf6	InstCombine: Pass AssumptionCache through isDereferenceablePointer	2022-09-19 19:10:51 -04:00
Matt Arsenault	0d8ffcc532	Analysis: Add AssumptionCache argument to isDereferenceableAndAlignedPointer This does not try to pass it through from the end users.	2022-09-19 18:57:33 -04:00
Alexey Bataev	ce39bdbd65	[SLP][NFC]Reorder gather nodes with reused scalars, NFC. The compiler does not reorder the gather nodes with reused scalars, just does it for opernads of the user nodes. This currently does not affect the compiler but breaks internal logic of the SLP graph. In future, it is supposed to actually use all nodes instead of just list of operands and this will affect the vectorization result. Also, did some early check to avoid complex logic in cost estimation analysis, should improve compiler time a bit.	2022-09-19 14:00:17 -07:00
Vitaly Buka	6f3276d57e	[msan] Check mask and pointers shadow Msan has default handler for unknown instructions which previously applied to these as well. However depending on mask, not all pointers or passthru part will be used. This allows other passes to insert undef into sum arguments. As result, default strict instruction handler can produce false reports. Reviewed By: kda, kstoimenov Differential Revision: https://reviews.llvm.org/D133678	2022-09-19 13:09:56 -07:00
Florian Hahn	582f8ef19f	[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd. Epilogue vectorization uses isScalarAfterVectorization to check if widened versions for inductions need to be generated and bails out in those cases. At the moment, there are scenarios where isScalarAfterVectorization returns true but VPWidenPointerInduction::onlyScalarsGenerated would return false, causing widening. This can lead to widened phis with incorrect start values being created in the epilogue vector body. This patch addresses the issue by storing the cost-model decision in VPWidenPointerInductionRecipe and restoring the behavior before `151c144`. This effectively reverts `151c144`, but the long-term fix is to properly support widened inductions during epilogue vectorization Fixes #57712.	2022-09-19 18:14:35 +01:00
Craig Topper	90a004b4a1	[LV] Remove FIXME about NoImplicitFloat. NFC My understanding is that NoImplicitFloat, despite it's name, is supposed to disable all vectors not just float vectors. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134084	2022-09-19 10:01:02 -07:00
Nikita Popov	dd61726d5b	Revert "[SimplifyCFG] accumulate bonus insts cost" This reverts commit `e5581df60a`. This causes major compile-time regressions, about 2-3% end-to-end on CTMark.	2022-09-19 14:46:43 +02:00
Max Kazantsev	92e9bddc49	[LoopRotate] Drop loop dispositions when rotating loops. PR56260 This is required because if there is a pure loop-invariant instruction, Loop Rotation may decide to not clone it and just hoist it instead. If SCEV has previously cached that it was loop-variant (not being smart enough to prove invariance), we may end up with inconsistent cache state (which may later trigger false-negative assertion failures checking that something was invariant). This is a conservative fix that unconditionally drops the dispositions. We could only drop it if the hoisting has actually happened, but it should take some time understanding whether it's safe with all other things this function does. Differential Revision: https://reviews.llvm.org/D134167 Reviewed By: fhahn	2022-09-19 18:01:02 +07:00
Max Kazantsev	21a9abc1ce	[LoopFuse] Drop loop dispositions before reassigning blocks to other loop This bug was found by recent improvement in SCEV verifier. The code in LoopFuse directly reassigns blocks to be a part of a different loop, which should automatically invalidate all related cached loop dispositions. Differential Revision: https://reviews.llvm.org/D134173 Reviewed By: nikic	2022-09-19 17:43:06 +07:00
Max Kazantsev	818b1ab84e	[SCEV][NFC] Remove unused parameter from forgetLoopDispositions Let's be honest about it, we don't drop loop dispositions for particular loops. Remove the parameter that misleadingly makes it apparent that we do.	2022-09-19 14:06:42 +07:00
Yaxun (Sam) Liu	e5581df60a	[SimplifyCFG] accumulate bonus insts cost SimplifyCFG folds bool foo() { if (cond1) return false; if (cond2) return false; return true; } as bool foo() { if (cond1 \| cond2) return false return true; } 'cond2' is called 'bonus insts' in branch folding since they introduce overhead since the original CFG could do early exit but the folded CFG always executes them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into its predecessor BB which shares the destination. If it is below bonus-inst-threshold, SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed. When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts in the current BB to be considered for folding. This causes issue for unrolled loops which share destinations, e.g. bool foo(int a) { for (int i = 0; i < 32; i++) if (a[i] > 0) return false; return true; } After unrolling, it becomes bool foo(int a) { if(a[0]>0) return false if(a[1]>0) return false; //... if(a[31]>0) return false; return true; } SimplifyCFG will merge each BB with its predecessor BB, and ends up with 32 'bonus insts' which are always executed, which is much slower than the original CFG. The root cause is that SimplifyCFG does not consider the accumulated cost of 'bonus insts' which are folded from different BB's. This patch fixes that by introducing a ValueMap to track costs of 'bonus insts' coming from different BB's into the same BB, and cuts off if the accumulated cost exceeds a threshold. Reviewed by: Artem Belevich, Florian Hahn, Nikita Popov, Matt Arsenault Differential Revision: https://reviews.llvm.org/D132408	2022-09-18 20:21:14 -04:00
Sanjay Patel	d6498abc24	[InstCombine] remove multi-use add demanded constant fold This was originally part of D133788. There are no visible regressions. All of the diffs show a large unsigned constant becoming a small negative constant. This should be better for analysis (and slightly less compile-time) and codegen.	2022-09-18 14:23:43 -04:00
Kazu Hirata	5e5a6c5b07	Use std::conditional_t (NFC)	2022-09-18 10:25:06 -07:00
Marc Auberer	f52dd920d4	[InstCombine] Fix bug when folding x + (x \| -x) to x & (x - 1) Addresses concern: https://reviews.llvm.org/rG09cdddea0c4d284c2c22f5dfade40a60850c5ea7 There was a copy/paste mistake in the code. Updated code and test ref. Differential Revision: https://reviews.llvm.org/D134135	2022-09-18 13:16:12 -04:00
Sanjay Patel	1d1d1e6f22	[InstCombine] fold full-shift of sdiv to icmp+extend This is a disguised sign-bit test with offset: (X / +DivC) >> (Width - 1) --> ext (X <= -DivC) (X / -DivC) >> (Width - 1) --> ext (X >= +DivC) https://alive2.llvm.org/ce/z/cO8JO4 We don't match/test poison in the sdiv constant because that would be immediate undefined behavior.	2022-09-18 13:13:14 -04:00
Kazu Hirata	d3b95ecc98	[ModuleInliner] Remove InlineOrder::front (NFC) InlineOrder::front is a remnant from the era when we had a nested "while" loops in the module inliner, with the inner one grouping the call sites with the same caller. Now that we have a simple "while" loop draining the priority queue, we can just use InlineOrder::pop. Differential Revision: https://reviews.llvm.org/D134121	2022-09-18 08:49:44 -07:00
Benjamin Kramer	b987fe4972	Silence unused variable warning in release builds. NFC	2022-09-18 09:15:32 +02:00
Kazu Hirata	284f0397e2	[Transforms] Merge function attributes within InlineFunction (NFC) In the past, we've had a bug resulting in a compiler crash after forgetting to merge function attributes (D105729). This patch teaches InlineFunction to merge function attributes. This way, we minimize the "time" when the IR is valid, but the function attributes are not. Differential Revision: https://reviews.llvm.org/D134117	2022-09-17 23:10:23 -07:00
Kazu Hirata	6e4fbd2f51	[ModuleInliner] Set Changed earlier (NFC) It makes more sense to set Changed to true immediately after a successful inlining.	2022-09-17 14:16:32 -07:00
Kazu Hirata	31b91356bc	[ModuleInliner] Don't include SetVector.h (NFC) We don't use SetVector in the module inliner.	2022-09-17 12:17:52 -07:00
Kazu Hirata	5faf4bf195	[ModuleInliner] Move UseInlinePriority to InlineOrder.cpp (NFC) UseInlinePriority specifies the priority function. This patch simplifies the code by moving UseInlinePriority closer to the actual consumer -- the switch statement inside getInlineOrder. Differential Revision: https://reviews.llvm.org/D134100	2022-09-17 11:41:28 -07:00
Florian Hahn	7914e53e31	[ConstraintElimination] Fix crash when combining results. `f213128b29` didn't account for the possibility that the result of decompose may be empty. Fix that by explicitly checking. Use a newly introduced helper to also reduce some duplication. Thanks @bjope for finding the issue!	2022-09-17 14:47:38 +01:00
Kazu Hirata	6e30a9cc08	[Inliner] Retire DefaultInlineOrder (NFC) DefaultInlineOrder was largely an exercise in generalizing the traversal order of call sites within the inliner. Now that the module inliner is starting to form its shape, there is no point in sharing DefaultInlineOrder between the module inliner and the CGSCC inliner. DefaultInlineOrder and all the other inline orders are mutually exclusive in the following sense: - The use of DefaultInlineOrder doesn't make sense in the module inliner because there is no priority inherent in the order in which call sites are added to the list of call sites -- SmallVector. - The use of any other inline order doesn't make sense in the CGSCC inliner because little prioritization can be done within one CGSCC. This patch essentially reverts the addition of DefaultInlineOrder so that the loop structure of Inliner.cpp looks like the state just before we started working on the module inliner (circa June 2021). At the same time, ww remove the choice of DefaultInlineOrder from UseInlinePriority. Differential Revision: https://reviews.llvm.org/D134080	2022-09-16 15:36:40 -07:00
Alexey Bataev	5d13b12674	[SLP]Improve isUndefVector function by adding insertelement analysis. Added the mask and the analysis of the buildvector sequence in the isUndefVector function, improves codegen and cost estimation. Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27362.00 27360.00 -0.0% Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 805299.00 806035.00 0.1% 526.blender_r - some extra code is vectorized. 508.namd_r - some extra code is optimized out. Differential Revision: https://reviews.llvm.org/D133891	2022-09-16 14:36:38 -07:00
Teresa Johnson	c2cf93c1a9	[WPD/LTT] Lower type test feeding assumes via phi correctly This fixes https://github.com/llvm/llvm-project/issues/57616. Type test lowering in ThinLTO modules relies on having type id summaries set up for the referenced types, which provide the type test resolution. If there is no summary, the type tests are lowered to false. At the very least, a default type id summary gives the type tests a resolution of Unknown, which is handled correctly (ignored by the first invocation of LTT, and lowered to true by the second). WPD sets up the type id summaries (with a default type test resolution) as it is processing the type tests, but only does this for the patterns handled by WPD, which is a type test directly feeding an assume. In the case of type tests feeding an assume via a phi, the type id summary was not being set up, leading to the type tests being lowered to false incorrectly. Fix this by adding the default type id summary entries for all type ids used on globals during index-only WPD. This is not an issue for hybrid (split-lto-unit) LTO, as in that case the type test resolution is determined and set up during LTT, since the type definitions are in the regular LTO split module, and exported via the summary to the ThinLTO split module. Differential Revision: https://reviews.llvm.org/D134012	2022-09-16 13:50:01 -07:00
Kazu Hirata	9111920af8	[ModuleInliner] clang-format ModuleInliner.cpp (NFC)	2022-09-16 09:41:42 -07:00
Kazu Hirata	4475470529	[ModuleInliner] Remove a stale comment (NFC) These comments refer to the nested loop in the module inliner where the inner loop grouped call sites from the same caller. We don't group call sites anymore, so the comment has become stale.	2022-09-16 09:37:43 -07:00
Kazu Hirata	42a90e6017	[ModuleInliner] Remove a redundaunt variable (NFC) In the CGSCC inliner, DidInline was used as an indicator to update the call graph. In the module inliner, DidInline is always true at the end of the "while" loop, so can just drop it.	2022-09-16 09:32:02 -07:00
Kazu Hirata	513717ddd0	[ModuleInliner] Remove a write-only variable (NFC) InlinedCallees is a remnant from the CGSCC inliner. We don't use it in the module inliner.	2022-09-16 09:15:53 -07:00
Kazu Hirata	77501bfab8	[IPO] Simplify the module inliner loop (NFC) In the bottom-up inliner, we have a two-level nested "while" loop, with the inner one grouping call sites with the same caller. We need to do so to keep CGSCC up to date. Now, with the module inliner, we don't have any per-caller work. We don't update CGSCC. Plus, the caller will likely keep changing as we pop call sites in some priority order. This patch simply removes the inner "while" loop while indenting its body. Further cleanup is possible, but that's left for follow-up patches. Differential Revision: https://reviews.llvm.org/D133969	2022-09-16 08:56:18 -07:00
Sanjay Patel	6174da2299	[InstCombine] reduce code duplication in foldICmpMulConstant(); NFC	2022-09-16 10:39:54 -04:00
Vitaly Buka	f0c2ffa8f8	[msan] Add msan-insert-check DEBUG_COUNTER	2022-09-15 21:52:58 -07:00
Gulfem Savrun Yeniceri	d6aed77f0d	[InstrProfiling] No runtime hook for unused funcs This is a reland of https://reviews.llvm.org/D122336. Original patch caused a problem in collecting coverage in Fuchsia because it was returning early without putting unused function names into __llvm_prf_names section. This patch fixes that issue. The original commit message is as the following: CoverageMappingModuleGen generates a coverage mapping record even for unused functions with internal linkage, e.g. static int foo() { return 100; } Clang frontend eliminates such functions, but InstrProfiling pass still emits runtime hook since there is a coverage record. Fuchsia uses runtime counter relocation, and pulling in profile runtime for unused functions causes a linker error: undefined hidden symbol: __llvm_profile_counter_bias. Since https://reviews.llvm.org/D98061, we do not hook profile runtime for the binaries that none of its translation units have been instrumented in Fuchsia. This patch extends that for the instrumented binaries that consist of only unused functions. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D122336	2022-09-16 02:05:09 +00:00
Navid Emamdoost	3e52c0926c	Add -fsanitizer-coverage=control-flow Reviewed By: kcc, vitalybuka, MaskRay Differential Revision: https://reviews.llvm.org/D133157	2022-09-15 15:56:04 -07:00
Sanjay Patel	aafaa2f4fc	[SCCP] convert ashr to lshr for non-negative shift value This is similar to the existing signed instruction folds. We get the obvious minimal patterns in other passes, but this avoids potential missed folds when the multi-block tests are converted to selects.	2022-09-15 13:54:52 -04:00
Craig Topper	ace05124f5	[IntegerDivision][AMDGPU] Use CreateLogicalOr to block poison propagation. There are two ctlz intrinsics here with the zero_is_poison flag set. There are also two comparisons that check if either of the inputs the ctlzs are zero. We need to use a logical or to block the poison from the ctlz if either of the inputs is zero. Reviewed By: arsenm, aqjune Differential Revision: https://reviews.llvm.org/D130680	2022-09-15 09:38:02 -07:00
Sanjay Patel	02a27b3890	[InstCombine] fold X*X == 0 --> X == 0 This is safe when the mul does not overflow: https://alive2.llvm.org/ce/z/LedVVP This could be extended to handle non-zero compare constants and non-squared multiplies.	2022-09-15 12:02:50 -04:00
Evgeniy Brevnov	03a102e3b2	[JumpThreading][NFC] Reuse existing DT instead of recomputation (newPM) This is the same change as `503d5771b6` with the same intent but for new pass manager.	2022-09-15 12:27:57 +07:00
Dhruva Chakrabarti	839ac62c50	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `7539e9cf81`.	2022-09-15 03:08:46 +00:00
Vitaly Buka	f221720e82	[nfc][msan] getShadowOriginPtr on <N x ptr> Some vector instructions can benefit from of Addr as <N x ptr>. Differential Revision: https://reviews.llvm.org/D133681	2022-09-14 19:18:52 -07:00
Vitaly Buka	f404169f24	[NFC][msan] Rename variables to match definition	2022-09-14 19:16:27 -07:00
Vitaly Buka	2209be15a5	[NFC][msan] Convert some code to early returns Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133673	2022-09-14 19:16:11 -07:00
Vitaly Buka	bcf3d666b4	[NFC][msan] Simplify llvm.masked.load origin code Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133652	2022-09-14 19:14:29 -07:00
Vitaly Buka	d421223e25	[msan] Resolve FIXME from D133880 We don't need to change tests we convertToBool unconditionally only before OR.	2022-09-14 18:55:57 -07:00
Giorgis Georgakoudis	7539e9cf81	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert, jhuber6, ABataev Differential Revision: https://reviews.llvm.org/D102107	2022-09-15 00:54:05 +00:00
Vitaly Buka	bf204881b6	[msan] Change logic of ClInstrumentationWithCallThreshold According to logs, ClInstrumentationWithCallThreshold is workaround for slow backend with large number of basic blocks. However, I can't reproduce that one, but I see significant slowdown after ClCheckConstantShadow. Without ClInstrumentationWithCallThreshold compiler is able to eliminate many of the branches. So maybe we should drop ClInstrumentationWithCallThreshold completly. For now I just change the logic to ignore constant shadow so it will not trigger callback fallback too early. Reviewed By: kstoimenov Differential Revision: https://reviews.llvm.org/D133880	2022-09-14 14:58:12 -07:00
Florian Hahn	7f3ff9d3c0	[ConstraintElimination] Track if variables are positive in constraint. Keep track if variables are known positive during constraint decomposition, aggregate the information when building the constraint object and encode the extra information as constraints to be used during reasoning.	2022-09-14 18:43:54 +01:00
Alexey Bataev	d647312e3f	[SLP][NFC]Extract getLastInstructionInBundle function for better dependence checking, NFC. Part of D110978	2022-09-14 08:43:15 -07:00
Zain Jaffal	8253f7e286	[InstCombine] Optimize multiplication where both operands are negated Handle the case where both operands are negated in matrix multiplication Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133695	2022-09-14 16:29:39 +01:00

1 2 3 4 5 ...

31623 Commits