llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	98a3a340c3	[ConstantExpr] Don't create fneg expressions Don't create fneg expressions unless explicitly requested by IR or bitcode.	2022-09-07 11:27:25 +02:00
Xiang1 Zhang	16743c9534	[CodeGen] Limit building time in CodeGenPrepare for huge function Details: Currently CodeGenPrepare is very time consuming in handling big functions. Old Algorithm : It iterate each BB in function, and go on handle very instructions in BB. Due to some instruction optimizations may affect the BBs' dominate tree. The old logic will re-iterate and try optimize for each BB. Suppose we have a big function with 20000 BBs, If we handled the last BB with fine tuning the dominate tree. We need totally re-iterate and try optimize the 20000 BBs from the beginning. The Complex is near N! And we really encounter somes big tests (> 20000 BBs) that cost more than 30 mins in this pass. (Debug version compiler will cost 2 hours here) What this patch do for huge function ? It mainly changes the iteration way for optimization. 1 We do optimizeBlock for each BB (that is same with old way). And, in the meaning time, If BB is changed/updated in the optimization, it will be put into FreshBBs (try do optimizeBlock again). The new created BB at previous iteration will also put into FreshBBs. 2 For the BBs which not updated at previous iteration, we directly skip it. Strictly speaking, here may miss some opportunity, but the probability is very small. 3 For Instructions in single BB, we do optimizeInst for each instruction. If optimizeInst change the instruction dominator in this BB, rather than break and go back to optimize the first BB (the old way), we directly iterate instructions (to do optimizeInst) in this updated BB again (the new way). What this patch do for small/normal (not huge) function ? It is same with the Old Algorithm. (NFC) Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D129352	2022-09-07 10:05:40 +08:00
Ruobing Han	fb45f3c948	[SimpleLoopUnswitch] Skip non-trivial unswitching of cold functions In the current main branch, all cold loops will not be applied non-trivial unswitch. As reported in D129599, skipping these cold loops will incur regression in SPEC benchmark. Thus, instead of skipping cold loops, now only skipping loops in cold functions. Reviewed By: alexgatea, aeubanks Differential Revision: https://reviews.llvm.org/D133275	2022-09-06 19:13:31 -04:00
Joseph Huber	58645d3252	[OpenMP] Fix `omp_get_wtime` function being marked incorrectly as readonly OpenMP has a list of of optimistic attributes that can be attached to known runtime functions to aid some analysis. The `omp_get_wtime` function incorrectly used the `readonly` attribute. This is not correct at the `omp_get_wtime` function changes values depending on some external state. This is more correctly modeled with `inaccessiblememonly` meaning that the value does not depend on anything within the module, but can not be removes as it depends on external state. Fixes #57578 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D133360	2022-09-06 12:59:00 -05:00
Florian Hahn	27e7db54eb	Revert "[SCCP] convert signed div/rem to unsigned for non-negative operands" This reverts commit `fe1f3cfc26`. It looks like this commit breaks building llvm-test-suite. To reproduce, run `opt -passes=ipsccp` on the IR below. @g = internal global i32 256, align 4 define void @test() { entry: %0 = load i32, ptr @g, align 4 %div = sdiv i32 %0, undef ret void }	2022-09-06 18:21:51 +01:00
Sanjay Patel	e028121ed0	[InstCombine] add/move tests for add with select operands that simplify; NFC	2022-09-06 12:19:50 -04:00
Sanjay Patel	d4a4004c0f	[InstCombine] add tests for add of select with 0 and negate arms; NFC	2022-09-06 12:19:49 -04:00
Doru Bercea	0b1160fdeb	Fix OpenMP Opt for target without a parallel region. Remove ctx redeclaration. Format code. Remove parallel check. Modify tests. Clean-up code. Fix another test. Move code to helper functions. Format file. Minor fixes.	2022-09-06 16:04:53 +00:00
Matthias Gehre	af3758d678	Fix remaining test failures for "[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64"	2022-09-06 16:38:43 +01:00
Sanjay Patel	fe1f3cfc26	[SCCP] convert signed div/rem to unsigned for non-negative operands This extends the transform added with D81756 to handle div/rem opcodes. For example: https://alive2.llvm.org/ce/z/cX6za6 This replicates part of what CVP already does, but the motivating example from issue #57472 demonstrates a phase ordering problem - we convert branches to select before CVP runs and miss the transform. Differential Revision: https://reviews.llvm.org/D133198	2022-09-06 08:58:15 -04:00
Sanjay Patel	a8fcb51242	[InstSimplify] allow poison/undef in constant match for "C - X ==/!= X -> false/true" This fold was added with `5e9522c311`, but over-specified. We can assume that an undef element is an odd number: https://alive2.llvm.org/ce/z/djQmWU	2022-09-06 08:19:30 -04:00
Sanjay Patel	1184f8cca5	[InstCombine] add tests for icmp-of-trunc; NFC	2022-09-06 08:19:30 -04:00
LiaoChunyu	5e9522c311	[InstSimplify] Odd - X ==/!= X -> false/true Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D132989	2022-09-05 23:51:45 +08:00
LiaoChunyu	14eea55445	[InstSimplify][NFC][test] Add tests for Odd - X ==/!= X -> false/true testcases will be updated by D132989 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D133306	2022-09-05 23:17:26 +08:00
Momchil Velikov	078899cd64	[SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions SimplifyCFG does some common code hoisting, which is limited to hoisting a sequence of identical instruction in identical order and stops at the first non-identical instruction. This patch allows hoisting instruction pairs over same-length sequences of non-matching instructions. The linear asymptotic complexity of the algorithm stays the same, there's an extra parameter `simplifycfg-hoist-common-skip-limit` serving to limit compilation time and/or the size of the hoisted live ranges. The patch improves SPECv6/525.x264_r by about 10%. Reviewed By: nikic, dmgreen Differential Revision: https://reviews.llvm.org/D129370	2022-09-05 15:13:46 +01:00
Tian Zhou	8fa432be4f	[InstCombine] reduce test-for-overflow of shifted value Fixes #57338. The added code makes the following transformations: For unsigned predicates / eq / ne: icmp pred (x << 1), x --> icmp getSignedPredicate(pred) x, 0 icmp pred x, (x << 1) --> icmp getSignedPredicate(pred) 0, x Some examples: https://alive2.llvm.org/ce/z/ckn4cj https://alive2.llvm.org/ce/z/h-4bAQ Differential Revision: https://reviews.llvm.org/D132888	2022-09-05 09:51:51 -04:00
Samuel Parker	e893345589	[NFC][TypePromotion] Add test	2022-09-05 09:01:23 +01:00
Chenbing Zheng	3e84a955a3	[InstCombine] Precommit tests for smul_with_overflow. nfc	2022-09-05 10:43:40 +08:00
Florian Hahn	ba3d29f871	[LCSSA] Update unreachable uses with poison. Users of LCSSA may not expect non-phi uses when checking the uses outside a loop, which may cause crashes. This is due to the fact that we do not update uses in unreachable blocks. To ensure all reachable uses outside the loop are phis, update uses in unreachable blocks to use poison in dead code. Fixes #57508.	2022-09-04 22:26:18 +01:00
Florian Hahn	a10d42dd45	[LV] Update test use opaque pointers, regenerate checks. Modernize the test to make it easier to extend in a follow-up patch.	2022-09-04 22:26:18 +01:00
Florian Hahn	e27e826d80	[LCSSA] Update test use opaque pointers, regenerate checks. Modernize the test to make it easier to extend in a follow-up patch.	2022-09-04 22:26:17 +01:00
Simon Pilgrim	626a84db47	[CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers	2022-09-04 17:59:08 +01:00
Simon Pilgrim	8dc99180a6	[PhaseOrdering] Move X86 unsigned-multiply-overflow-check.ll test under X86	2022-09-04 17:54:32 +01:00
Simon Pilgrim	80d4b3a275	Revert rG06e73626cf0fc33b025a0f98f1eee4a302279982 "[CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry" Some arm buildbots are complaining about a phase ordering test failure in unsigned-multiply-overflow-check.ll - I guess this test needs making x86 specific first	2022-09-04 17:51:11 +01:00
Simon Pilgrim	06e73626cf	[CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers	2022-09-04 17:28:45 +01:00
Ruobing Han	07341e3b37	[test] pre-submission for the following SimpleLoopUnswitch update	2022-09-04 11:49:51 -04:00
Sanjay Patel	5c759edc57	[InstCombine] reduce another or-xor bitwise logic pattern ~(A & ?) \| (A ^ B) --> ~((A & ?) & B) https://alive2.llvm.org/ce/z/mxex6V This is similar to `9d218b61cc` where we peeked through another logic op to find a common operand.	2022-09-03 09:32:08 -04:00
Sanjay Patel	fbfac8e388	[InstCombine] add tests for or-xor-nand; NFC	2022-09-03 09:32:08 -04:00
Richard Smith	053841c562	Revert "[AggressiveInstCombine] Lower Table Based CTTZ" This reverts commit `fec01ee3f5`. According to asan, this patch introduces a heap use after free.	2022-09-02 16:19:09 -07:00
Francis Visoiu Mistrih	81bdb4068d	[Matrix] Simplify matmuls with scalars If one of the operands is a transposed splat, the transpose can be removed. This is useful to simplify when transposes are distributed to operands of a matmul: * k^T -> k * (A * k)^t -> A^t * k Differential Revision: https://reviews.llvm.org/D130177	2022-09-02 15:50:25 -07:00
Djordje Todorovic	fec01ee3f5	[AggressiveInstCombine] Lower Table Based CTTZ This patch introduces recognition of table-based ctz implementation during the AggressiveInstCombine. This fixes the [0]. [0] https://bugs.llvm.org/show_bug.cgi?id=46434 Differential Revision: https://reviews.llvm.org/D113291	2022-09-02 17:26:55 +02:00
Tian Zhou	a5880b5f9c	[InstCombine] Baseline tests for reducing test-for-overflow of shifted value Baseline tests for this patch https://reviews.llvm.org/D132888 Differential Revision: https://reviews.llvm.org/D133182	2022-09-02 09:01:43 -04:00
Jolanta Jensen	958abe864a	[LoopLoadElim] Add stores with matching sizes as load-store candidates We are not building up a proper list of load-store candidates because we are throwing away stores where the type don't match the load. This patch adds stores with matching store sizes as candidates. Author of the original patch: David Sherwood. Differential Revision: https://reviews.llvm.org/D130233	2022-09-02 13:11:25 +01:00
Muhammad Omair Javaid	18de7c6a3b	Revert "[InstCombine] Treat passing undef to noundef params as UB" This reverts commit `c911befaec`. It has broken LLDB Arm/AArch64 Linux buildbots. I dont really understand the underlying reason. Reverting for now make buildbot green. https://reviews.llvm.org/D133036	2022-09-02 16:09:50 +05:00
Florian Hahn	91e67c0749	[GlobalOpt] Add test case for #56762 . Add test case where GlobalOpt fails to remove loads to global fields with struct types.	2022-09-02 11:33:07 +01:00
Mikael Holmen	51d4c7ceea	[GlobalOpt] Fix debug variance problem in hasOnlyColdCalls hasOnlyColdCalls skipped over calls to intrinsics, but it did so after checking the linkage of the called function. This meant that the presence of a call to a debug intrinsic could affect the outcome of the optimization. In my original reproducer (for an out of tree target) it was particularly interesting, because the actual IR after GlobalOpt was not different with debug instrinsics present, so -print-after-all printouts didn't show anything there. However, without debuginfo, GlobalOpt went further and ran BlockFrequencyAnalysis and (more importanly) LoopAnalysis, and later on in the pipeline, instcombine behaved in different ways when LoopInfo was present. So a call to a dbg.declare prevented running LoopAnalysis in GlobalOpt, which later prevented InstCombine from doing an optimization. The dbg-intrinsic-loopanalysis.ll testcase tries to expose this. Then I also noted that adding a dbg.declare actually made the existing testcase colccc_coldsites.ll generate different code, so I modified that to now test it behaves the same way with and without the dbg.declare. Reviewed By: nikic, fhahn Differential Revision: https://reviews.llvm.org/D133193	2022-09-02 12:29:44 +02:00
Sergey Kachkov	be37caca00	[JumpThreading] Process range comparisions with non-local cmp instructions Use getPredicateOnEdge method if value is a non-local compare-with-a-constant instruction, that can give more precise results than getConstantOnEdge. Differential Revision: https://reviews.llvm.org/D131956	2022-09-02 12:22:45 +02:00
Nikita Popov	10dfcf1f87	[LICM] Add test for missed load promotion opportunity (NFC)	2022-09-02 11:36:07 +02:00
Nikita Popov	c453e5b901	Revert "[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI" This reverts commit `cd8f3e7581`. As pointed out by Eli on the review, this is missing an alignment check. The value might be written at an offset.	2022-09-02 09:28:48 +02:00
Nikita Popov	639d912282	[LICM] Allow load-only scalar promotion in the presence of unwinding Currently, we bail out of scalar promotion if the loop may unwind and the memory may be visible on unwind. This is because we can't insert stores of the promoted value on unwind edges. However, nowadays scalar promotion also has support for only promoting loads, while leaving stores in place. This kind of promotion is safe even in the presence of unwinding. Differential Revision: https://reviews.llvm.org/D133111	2022-09-02 09:27:13 +02:00
luxufan	cd8f3e7581	[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI For noop store of the form of LoadI and StoreI, An invariant should be kept is that the memory state of the related MemoryLoc before LoadI is the same as before StoreI. For this example: ``` define void @pr49927(i32* %q, i32* %p) { %v = load i32, i32* %p, align 4 store i32 %v, i32* %q, align 4 store i32 %v, i32* %p, align 4 ret void } ``` Here the definition of the store's destination is different with the definition of the load's destination, which it seems that the invariant mentioned above is broken. But the definition of the store's destination would write a value that is LoadI, actually, the invariant is still kept. So we can safely ignore it. Differential Revision: https://reviews.llvm.org/D132657	2022-09-02 06:37:41 +00:00
Chenbing Zheng	bb0e6b7721	[InstCombine] Precommit tests for umul_with_overflow. nfc	2022-09-02 11:18:17 +08:00
Chenbing Zheng	d30cf77cb1	[InstCombine] complete fold extractvalue (any_mul_with_overflow X, -1) When we do extractvalue (any_mul_with_overflow X, -1) --> (-X and icmp), which left partly failed to match vector constant with poison element. This patch try to fix it. Alive2: https://alive2.llvm.org/ce/z/2rGp_3 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D132996	2022-09-02 10:58:42 +08:00
Chenbing Zheng	4db9edfdb6	[NFC] fix typo	2022-09-02 10:04:52 +08:00
Arthur Eubanks	c911befaec	[InstCombine] Treat passing undef to noundef params as UB Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D133036	2022-09-01 15:16:45 -07:00
Rong Xu	0caa4a9559	[PGO] Support PGO annotation of CallBrInst We currently instrument CallBrInst but do not annotate it with the branch weight. This patch enables PGO annotation of CallBrInst. Differential Revision: https://reviews.llvm.org/D133040	2022-09-01 14:13:50 -07:00
Sanjay Patel	c29c6170fd	[SCCP][PhaseOrdering] add tests for sdiv/srem range transforms; NFC issue #57472	2022-09-01 16:23:00 -04:00
Arthur Eubanks	79a4fa366c	[test][InstCombine] Update precommitted test	2022-09-01 09:08:44 -07:00
Nuno Lopes	858fe8664e	Expand Div/Rem: consider the case where the dividend is zero So we can't use ctlz in poison-producing mode	2022-09-01 17:04:26 +01:00
Arthur Eubanks	9599393eeb	Revert "[Pipelines] Introduce DAE after ArgumentPromotion" This reverts commit `b10a341aa5`. This commit exposes the pre-existing https://github.com/llvm/llvm-project/issues/56503 in some edge cases. Will fix that and then reland this.	2022-09-01 08:52:19 -07:00
Nikita Popov	4f046bc8e0	[PHITranslateAddr] Require dominance when searching for translated address (PR57025) This is a fix for PR57025 and an alternative to D131776. The problem in the phi-translation-to-wrong-context.ll test case is that phi translation of %gep.j into if2 pick %gep.i as the result. While this instruction has the correct pointer address, it occurs in a context where %i != 0. As such, we get a NoAlias result for the store in if2, even though they do alias for %i == 0 (which is legal in the original context of the pointer). PHITranslateValue already has a MustDominate option, which can be used to restrict PHI translation results to values that dominate the translated-into block. However, this is more aggressive than what we need and would significantly regress GVN results. In particular, if we have a pointer value that does not require any translation, then it is fine to continue using that value in the predecessor, because the context is still correct for the original query. We only run into problems if PHITranslateSubExpr() picks a completely random instruction in a context that may have preconditions that do not hold. Fix this by always performing the dominance checks in PHITranslateSubExpr(), without enabling the more general MustDominate requirement. Fixes https://github.com/llvm/llvm-project/issues/57025. This also fixes the test case for https://github.com/llvm/llvm-project/issues/30999, but I'm not sure whether that's just the particular test case, or a general solution to the problem. Differential Revision: https://reviews.llvm.org/D132935	2022-09-01 16:26:42 +02:00
Nikita Popov	26347adf96	[LICM] Regenerate test checks (NFC)	2022-09-01 16:06:38 +02:00
Nikita Popov	315aef667e	[LICM] Fix thread safety checks for promotion of byval args This code was relying on a very subtle contract: The expectation was that for non-allocas, the unwind safety check would already perform a capture check, so we don't need to perform it later. This held true when this unwind safety was only handled for allocas and noalias calls, but became incorrect when byval support was added. To avoid this kind of issue, just remove the dependency between the unwind and thread-safety checks entirely. At worst, this means we perform a redundant capture check. If this should turn out to be problematic for compile-time, we can cache that query in a more explicit way.	2022-09-01 15:33:46 +02:00
Nikita Popov	20524a3c94	[LICM] Add another byval capture test (NFC) Variant with capture after the loop, in which case promotion is safe.	2022-09-01 15:18:10 +02:00
Nikita Popov	e1826326af	[LICM] Add test for byval scalar promotion miscompile (NFC)	2022-09-01 15:03:20 +02:00
Sanjay Patel	c3d1504d63	[InstCombine] fix crash on type mismatch with fcmp fold The existing predicate doesn't work for a single-element vector, so make sure we are not crossing scalar/vector types. Test (was crashing) based on the post-commit example for: `4827771234`	2022-09-01 08:57:55 -04:00
Sanjay Patel	addbdac5d5	[InstCombine] fold power-of-2 ctlz/cttz with inverted result When X is a power-of-two or zero and zero input is poison: ctlz(i32 X) ^ 31 --> cttz(X) cttz(i32 X) ^ 31 --> ctlz(X) https://alive2.llvm.org/ce/z/Cs7sFE	2022-09-01 08:57:55 -04:00
Bjorn Pettersson	3aab9d2bb7	[GVN] Pre-commit test case showing miscompile in github issue #57025 This commit adds a reproducer for https://github.com/llvm/llvm-project/issues/57025 showing a miscompile in GVN. Not sure how likely this kind of faults would be in a normal pipeline, considering that the input IR has some dead code in it. On the other hand, GVN itself sometimes creates dead basic blocks when splitting critical edges. Anyway, the fault was found when doing fuzzy testing using random pass pipelines. Differential Revision: https://reviews.llvm.org/D131775	2022-09-01 14:43:24 +02:00
Alexey Bataev	982d9ef1c1	[SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison. Need either follow the original order of the operands for bool logical ops, or emit freeze instruction to avoid poison propagation. Differential Revision: https://reviews.llvm.org/D126877	2022-09-01 05:34:45 -07:00
Yuanbo Li	ebd0249fcf	[DebugInfo] Missing debug location after replacement in processSRem function This patch fixes an issue in which CorrelatedValuePropagation::processSRem would create new instructions to represent the SRem instruction, but would not correctly copy any existing debug location metadata to the new instruction. Differential Revision: https://reviews.llvm.org/D132218	2022-09-01 13:18:17 +01:00
Florian Hahn	fc444ddc77	[VPlan] Add field to track if intrinsic should be used for call. (NFC) This patch moves the cost-based decision whether to use an intrinsic or library call to the point where the recipe is created. This untangles code-gen from the cost model and also avoids doing some extra work as the information is already computed at construction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132585	2022-09-01 13:14:40 +01:00
Nuno Lopes	fa154a9170	Revert "Expand Div/Rem: consider the case where the dividend is zero" This reverts commit `4aed09868b`.	2022-09-01 12:11:22 +01:00
Nuno Lopes	4aed09868b	Expand Div/Rem: consider the case where the dividend is zero So we can't use ctlz in poison-producing mode	2022-09-01 12:00:03 +01:00
Nikita Popov	43e7d9af1d	[InstCombine] Fold extractvalue of phi Just as we do for most other operations, we should push extractvalue instructions through phis, if this does not increase unfolded instruction count.	2022-09-01 10:51:54 +02:00
Nikita Popov	5b219dd9e9	[InstCombine] Add tests for extractvalue of phi (NFC)	2022-09-01 10:47:10 +02:00
Nikita Popov	ab6876a40d	reland: [Local] Allow creating callbr with duplicate successors Since D129288, callbr is allowed to have duplicate successors. This patch removes a limitation which prevents optimizations from actually producing such callbrs. This is probably the riskiest of all the recent callbr changes, because code with incorrect assumptions might be lurking somewhere. I fixed the one case I encountered ahead of time in `8201e3ef5c`. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D129997 Originally landed as commit `08860f525a` ("[Local] Allow creating callbr with duplicate successors") Reverted in commit `1cf6b93df1` ("Revert "[Local] Allow creating callbr with duplicate successors"")	2022-08-31 13:23:00 -07:00
Arthur Eubanks	8c7b103055	[test][InstCombine] Precommit some undef/noundef tests	2022-08-31 11:19:31 -07:00
Florian Hahn	e68594ca20	[SLP] Add FMA test case with missing or partial fast-math flags. Add extra FMA tests with missing or partial fast-math flags.	2022-08-31 16:25:18 +01:00
Florian Hahn	faad567589	[LV] Add test case where SCEV is needed to remove vector backedge. Test case mentioned in the discussion for D115261.	2022-08-31 14:01:42 +01:00
Florian Hahn	1ed555a62b	[LV] Fix test cases where vector loop never executed. It looks like the vector loops in the modified test cases unintentionally never get executed. Update the exit condition to ensure it does to avoid them getting optimized away in upcoming changes.	2022-08-31 13:24:49 +01:00
David Green	225faddf8a	[ARM] Add a phase ordering test for MVE intrinsic remainder vectorization/unrolling. NFC	2022-08-31 12:08:38 +01:00
Nikita Popov	ad66bc42b0	[InstCombine] Use getInsertionPointAfterDef() in freeze fold This simplifies the code and fixes handling of catchswitch, in which case we have no insertion point for the freeze. Originally part of D129660.	2022-08-31 11:32:57 +02:00
Nikita Popov	8f3fd26b74	[Reassociate] Use getInsertionPointerAfterDef() This simplifies the code and fixes handling for the callbr case, where the instruction needs to be inserted in the normal destination, rather than after the terminator. Originally part of D129660.	2022-08-31 11:10:24 +02:00
Nikita Popov	b10e508c19	[GVN] Add another test for phi translation miscompile (NFC)	2022-08-31 09:14:53 +02:00
Chenbing Zheng	35a3048c25	[InstCombine] add support for multi-use Y of (X op Y) op Z --> (Y op Z) op X For (X op Y) op Z --> (Y op Z) op X we can still do transform when Y is multi-use. In D131356 limit it to one-use, this patch remove this limit. This is still not a complete solution, I add a todo test to show it. In this case, X and Y are both multi use, we can't differentiate how to convert based on this. But at least we don't make the code worse，and it can solve half the scenarios.	2022-08-31 10:55:05 +08:00
Kai Luo	ad2f7fd286	[AtomicExpand] Make floating point conversion happens before fence insertion IIUC, the conversion part is not part of atomic operations and fences should be put around converted atomic operations. This also fixes atomic load of floating point values which requires fence on PowerPC. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D127609	2022-08-31 09:54:58 +08:00
Alexey Bataev	ec06df9459	[SLP]Fix PR57447: Assertion `!getTreeEntry(V) && "Scalar already in tree!"' failed. The pointer operands for the ScatterVectorize node may contain non-instruction values and they are not checked for "already being vectorized". Need to check that such pointers are already vectorized and gather them instead of trying to build vectorize node to avoid compiler crash. Differential Revision: https://reviews.llvm.org/D132949	2022-08-30 12:30:14 -07:00
Sanjay Patel	16e96a7e60	[InstCombine] add tests for xor-of-ctlz/cttz; NFC	2022-08-30 14:21:29 -04:00
Sanjay Patel	67cbd25dcd	[InstCombine] add tests for signbit test using lshr; NFC	2022-08-30 14:21:29 -04:00
Hendrik Greving	3d5ea53906	[BasicBlockUtils] Amend test for loop metadata. Amends test Transforms/LoopSimplify/update_latch_md2.ll with auto-generated checks. Differential Revision: https://reviews.llvm.org/D125574	2022-08-30 09:29:52 -07:00
Jolanta Jensen	92c4172756	[NFC][LoopLoadElim] Extending type-mismatch testing Added IR for int-pointer type mismatch and int-vector type mismatch. Regenerated CHECK lines using the update_test_checks.py script. Differential Revision: https://reviews.llvm.org/D132239	2022-08-30 16:44:19 +01:00
zhongyunde	23a5de4294	[InstCombine] Distributive or+mul with const operand We aleady support the transform: `(X+C1)CI -> XCI+C1CI` Here the case is a little special as the form of `(X+C1)CI` is transformed into `(X\|C1)CI`, so we should also support the transform: `(X\|C1)CI -> XCI+C1CI` Fixes https://github.com/llvm/llvm-project/issues/57278 Reviewed By: bcl5980, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D132658	2022-08-30 20:36:52 +08:00
Florian Hahn	b5e208fcba	[DSE] Support looking through memory phis at end of function. Update isWriteAtEndOfFunction to look through MemoryPhis. The reason MemoryPhis were skipped so far was the known AliasAnalysis issue with it missing loop-carried dependences. This problem is already addressed in other parts of the code by skipping MemoryDefs that may be in difference loops. I think the same logic can be applied here. This can have a substantial impact on the number of stores removed in some cases. For MultiSource/SPEC2006/SPEC2017 with -O3: ``` Metric: dse.NumFastStores Program dse.NumFastStores base patch diff External/S...CINT2017rate/557.xz_r/557.xz_r 14.00 45.00 221.4% External/S...te/538.imagick_r/538.imagick_r 439.00 1267.00 188.6% MultiSourc...e/Applications/SIBsim4/SIBsim4 6.00 15.00 150.0% MultiSourc...Prolangs-C/simulator/simulator 3.00 7.00 133.3% MultiSource/Applications/siod/siod 3.00 7.00 133.3% MultiSourc...arks/FreeBench/distray/distray 6.00 9.00 50.0% MultiSourc...e/Applications/obsequi/Obsequi 22.00 30.00 36.4% MultiSource/Benchmarks/Ptrdist/bc/bc 23.00 28.00 21.7% External/S...NT2017rate/502.gcc_r/502.gcc_r 1258.00 1512.00 20.2% External/S...te/520.omnetpp_r/520.omnetpp_r 954.00 1143.00 19.8% External/S...rate/510.parest_r/510.parest_r 5961.00 7122.00 19.5% External/S...C/CINT2006/445.gobmk/445.gobmk 47.00 56.00 19.1% External/S...00.perlbench_r/500.perlbench_r 241.00 286.00 18.7% External/S...NT2006/471.omnetpp/471.omnetpp 36.00 42.00 16.7% External/S...06/400.perlbench/400.perlbench 183.00 210.00 14.8% MultiSource/Applications/SPASS/SPASS 72.00 81.00 12.5% External/S...17rate/541.leela_r/541.leela_r 72.00 80.00 11.1% External/SPEC/CINT2006/403.gcc/403.gcc 585.00 642.00 9.7% MultiSourc...e/Applications/sqlite3/sqlite3 120.00 131.00 9.2% MultiSourc...Applications/hexxagon/hexxagon 11.00 12.00 9.1% External/S.../CFP2006/453.povray/453.povray 566.00 615.00 8.7% External/S...rate/511.povray_r/511.povray_r 578.00 627.00 8.5% External/S...FP2006/482.sphinx3/482.sphinx3 12.00 13.00 8.3% MultiSource/Applications/oggenc/oggenc 130.00 140.00 7.7% MultiSourc...e/Applications/ClamAV/clamscan 250.00 268.00 7.2% MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg 19.00 20.00 5.3% MultiSourc...ch/consumer-jpeg/consumer-jpeg 19.00 20.00 5.3% External/S...te/526.blender_r/526.blender_r 3747.00 3928.00 4.8% MultiSourc...OE-ProxyApps-C++/miniFE/miniFE 104.00 108.00 3.8% MultiSourc...ch/consumer-lame/consumer-lame 54.00 56.00 3.7% MultiSource/Benchmarks/Bullet/bullet 1222.00 1264.00 3.4% MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 973.00 1005.00 3.3% External/S.../CFP2006/447.dealII/447.dealII 2699.00 2780.00 3.0% External/S...06/483.xalancbmk/483.xalancbmk 788.00 810.00 2.8% External/S.../CFP2006/450.soplex/450.soplex 180.00 185.00 2.8% MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR 338.00 345.00 2.1% MultiSourc...Benchmarks/7zip/7zip-benchmark 685.00 699.00 2.0% External/S...FP2017rate/544.nab_r/544.nab_r 158.00 160.00 1.3% MultiSourc...sumer-typeset/consumer-typeset 772.00 781.00 1.2% External/S...2017rate/525.x264_r/525.x264_r 410.00 414.00 1.0% External/S...23.xalancbmk_r/523.xalancbmk_r 998.00 1002.00 0.4% ``` Compile-time is almost neutral: https://llvm-compile-time-tracker.com/compare.php?from=b3125ad3d60531a97eea20009cc9629a87755862&to=84007eee59004f43464eda7f5ba8263ed5158df8&stat=instructions NewPM-O3: +0.03% NewPM-ReleaseThinLTO: -0.01% NewPM-ReleaseLTO-g: +0.03% Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D132365	2022-08-30 13:27:51 +01:00
Nikita Popov	07bfbce988	[GVN] Regenerate test checks (NFC)	2022-08-30 12:06:37 +02:00
jacquesguan	df525c7705	[InstCombine] fold fake floating point vector extract to shift+trunc. This patch supports the FP part of D111082. Differential Revision: https://reviews.llvm.org/D125750	2022-08-30 10:12:16 +08:00
jacquesguan	f98153eac0	[InstCombine] Precommit test for D125750. Differential Revision: https://reviews.llvm.org/D126054	2022-08-30 09:57:35 +08:00
Rong Xu	d7ef0c3970	[llvm-profdata] Improve profile supplementation Current implementation promotes a non-cold function in the SampleFDO profile into a hot function in the FDO profile. This is too aggressive. This patch promotes a hot functions in the SampleFDO profile into a hot function, and a warm function in SampleFDO into a warm function in FDO. Differential Revision: https://reviews.llvm.org/D132601	2022-08-29 16:50:42 -07:00
Philip Reames	4c10646367	[LV] Refresh autogen tests to reflect naming changes [nfc] Purely so that these can be easily autogened without spurious diffs	2022-08-29 14:16:54 -07:00
Valery N Dmitriev	329b972d41	[SLP] Try to match reductions before trying to vectorize a vector build sequence. This patch changes order of searching for reductions vs other vectorization possibilities. The idea is if we do not match a reduction it won't be harmful for further attempts to find vectorizable operations on a vector build sequences. But doing it in the opposite order we have good chance to ruin opportunity to match a reduction later. We also don't want to try vectorizing binary operations too early as 2-way vectorization may effectively prohibit wider ones leading to producing less effective code. Differential Revision: https://reviews.llvm.org/D132590	2022-08-29 13:32:14 -07:00
Philip Reames	c37b1a5f76	[RLEV] Pick a correct insert point when incoming instruction is itself a phi node This fixes https://github.com/llvm/llvm-project/issues/57336. It was exposed by a recent SCEV change, but appears to have been a long standing issue. Note that the whole insert into the loop instead of a split exit edge is slightly contrived to begin with; it's there solely because IndVarSimplify preserves the CFG. Differential Revision: https://reviews.llvm.org/D132571	2022-08-29 11:44:33 -07:00
Alexey Bataev	beacf9bd9e	[SLP]Fix PR57322: vectorize constant float stores. Stores for constant floats must be vectorized, improve analysis in SLP vectorizer for stores. Differential Revision: https://reviews.llvm.org/D132750	2022-08-29 11:02:53 -07:00
Florian Hahn	3c5e24a51c	[SLP] Add tests showing over-eager SLP when scalar fma can be used. Add test cases for AArch64 that show over-eager SLP vectorization on AArch64, where keeping the things scalar allows efficient lowering using scalar fmas.	2022-08-29 18:58:56 +01:00
Florian Hahn	197332a1f8	[DSE] Add extra test for loop invariant store in loop, update comments. Add extra test coverage and updates some slightly stale comments as pointed out in D132365.	2022-08-29 17:00:00 +01:00
Alexey Bataev	e6345bf644	[SLP]Improve lookup of the buildvector top insertelement instruction. When estimating the cost of the in-tree vectorized scalars in buildvector sequences, need to take into account the vectorized insertelement instruction. The top of the buildvector seuences is the topmost vectorized insertelement instruction, because it will have > than 1 use after the vectorization. For the affected test case improves througput from 21 to 16 (per llvm-mca). Differential Revision: https://reviews.llvm.org/D132740	2022-08-29 08:19:52 -07:00
Sanjay Patel	6c39a3aae1	[InstCombine] fold not-shift of signbit to icmp+zext https://alive2.llvm.org/ce/z/j_8Wz9 The arithmetic shift was converted to logical shift with: `246078604c` That does not seem to uncover any other missing/conflicting folds, so convert directly to signbit test + cast. We still need to fold the pattern with logical shift to test + cast. This allows reducing patterns where the output type is not the same as the input value: https://alive2.llvm.org/ce/z/nydwFV Fixes #57394	2022-08-29 10:06:31 -04:00
Sanjay Patel	246078604c	[InstCombine] fold inc-of-signbit-splat to not+lshr (iN X s>> (N - 1)) + 1 --> (~X) u>> (N - 1) https://alive2.llvm.org/ce/z/wzS474	2022-08-29 08:48:22 -04:00
Sanjay Patel	ac4c46d24b	[InstCombine] add tests for increment-of-ashr; NFC	2022-08-29 08:48:22 -04:00
Florian Hahn	005d1a8ff5	[LV] Add test where either a libfunc or intrinsic is chosen. In the newly added test either a libfunc (VF=2) or a intrinsic (VF=4) can be chosen. Test coverage for D132585.	2022-08-29 10:51:20 +01:00
zhongyunde	eb438c80df	[tests] precommit tests for D132658 Reviewed By: bcl5980 Differential Revision: https://reviews.llvm.org/D132820	2022-08-29 14:17:37 +08:00
Sanjay Patel	ab6892967c	[InstCombine] allow sext in fold of mask using signbit, part 2 https://alive2.llvm.org/ce/z/rcbZmx Sibling tranform to `275aa24c0a` This pattern is seen in the examples in issue #57381.	2022-08-28 11:50:52 -04:00
zhongyunde	84d6966e4d	[InstCombine] Propagate the nuw for combine of add+mul As the commit of D132658, make the 'nuw' change separately. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D132777	2022-08-28 23:01:11 +08:00
Sanjay Patel	275aa24c0a	[InstCombine] allow sext in fold of mask using signbit ~(iN X s>> (N-1)) & Y --> (X s< 0) ? 0 : Y -- with optional sext https://alive2.llvm.org/ce/z/wFFnZT	2022-08-28 09:01:30 -04:00
Sanjay Patel	8730ef9ab3	[InstCombine] add tests for inverted signbit splat mask; NFC	2022-08-28 09:01:29 -04:00
Pavel Samolysov	b10a341aa5	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-08-28 10:47:03 +03:00
Sanjay Patel	7abf233f44	[InstCombine] allow poison (undef) element in vector signbit transforms If the shift constant has undefined lanes, we can assume those are the same as the defined lanes in these transforms: https://alive2.llvm.org/ce/z/t6TTJ2 Replace undef with poison in the test while here to support the transition away from undef.	2022-08-27 11:57:05 -04:00
Sanjay Patel	3cde55d807	[InstCombine] add tests for signbit splat mask; NFC issue #57381	2022-08-27 11:57:05 -04:00
Sanjay Patel	c6e56024c6	[InstCombine] fold signbit splat pattern that uses negate 0 - (zext (i8 X u>> 7) to iN) --> sext (i8 X s>> 7) to iN https://alive2.llvm.org/ce/z/jzv4Ud This is part of solving issue #57381.	2022-08-27 08:04:35 -04:00
Sanjay Patel	3b071e1d5d	[InstCombine] add tests for signbit-smear; NFC issue #57381	2022-08-27 08:04:35 -04:00
Philip Reames	b45a262679	[RISCV] Enable fixed length vectors and loop vectorization with same This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size. For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware. The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization. SLP has been disabled for now, even when fixed vectors are enabled. See `a310637` and associated review. There are a few addiitional code quality issues which need worked through before turning SLP on would be reasonable. Differential Revision: https://reviews.llvm.org/D131508	2022-08-26 14:45:23 -07:00
Eric Gullufsen	eb1e2b3997	[InstCombine] Canonicalize "and, add", "or, add", "xor, add" Canonicalize ``` ((x + C1) & C2) --> ((x & C2) + C1) ((x + C1) ^ C2) --> ((x ^ C2) + C1) ((x + C1) \| C2) --> ((x \| C2) + C1) ``` for suitable constants `C1` and `C2`. Alive2 proofs: [[ https://alive2.llvm.org/ce/z/BqMDVZ \| add, or --> or, add ]] [[ https://alive2.llvm.org/ce/z/BhAeCl \| add, xor --> xor, add ]] [[ https://alive2.llvm.org/ce/z/jYRHEt \| add, and --> and, add ]] Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D131142	2022-08-26 17:23:29 -04:00
Philip Reames	a310637132	[RISCV] Disable SLP vectorization by default due to unresolved profitability issues This change implements a TTI query with the goal of disabling slp vectorization on RISCV. The current default configuration disables SLP already, but its current tied to the ability to lower fixed length vectors. Over in D131508, I want to enable fixed length vectors for purposes of LoopVectorizer, but preliminary analysis has revealed a couple of SLP specific issues we need to resolve before enabling it by default. This change exists to allow us to enable LV without SLP. Differential Revision: https://reviews.llvm.org/D132680	2022-08-26 14:11:22 -07:00
Paul Kirth	3155e3070c	[llvm][misexpect] Re-enable MisExpect for SampleProfiling MisExpect was occasionally crashing under SampleProfiling, due to a division by zero. We worked around that in D124302 by changing the assert to an early return. This patch is intended to add a test case for the crashing scenario and re-enable MisExpect for SampleProfiling. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D124481	2022-08-26 20:24:10 +00:00
Florian Hahn	9405af1c85	[LAA] Require AddRecs to be in the innermost loop for diff-checks. The simpler diff-checks require pointers with add-recs from the same innermost loop, but this property wasn't check completely. Add the missing check to ensure both addrecs are in the innermost loop. Fixes #57315.	2022-08-26 20:39:52 +01:00
Philip Reames	c58791c286	Revert "[InstCombine] Canonicalize "and, add", "or, add", "xor, add"" This reverts commit `d2f110c693`. test/Transforms/InstCombine/freeze.ll fails on ninja check-llvm on x86_64.	2022-08-26 11:18:31 -07:00
Eric Gullufsen	d2f110c693	[InstCombine] Canonicalize "and, add", "or, add", "xor, add" Canonicalize ``` ((x + C1) & C2) --> ((x & C2) + C1) ((x + C1) ^ C2) --> ((x ^ C2) + C1) ((x + C1) \| C2) --> ((x \| C2) + C1) ``` for suitable constants `C1` and `C2`. Alive2 proofs: [[ https://alive2.llvm.org/ce/z/BqMDVZ \| add, or --> or, add ]] [[ https://alive2.llvm.org/ce/z/BhAeCl \| add, xor --> xor, add ]] [[ https://alive2.llvm.org/ce/z/jYRHEt \| add, and --> and, add ]] Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D131142	2022-08-26 14:07:43 -04:00
Eric Gullufsen	2f525cfb76	[NFC][InstCombine] Add baseline tests for canonicalizing "and, add", "or, add", "xor, add" Baseline tests for canonicalizing "logic op, add" ``` ((x + C1) & C2) --> ((x & C2) + C1) ((x + C1) ^ C2) --> ((x ^ C2) + C1) ((x + C1) \| C2) --> ((x \| C2) + C1) ``` for suitable constants `C1` and `C2`. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D131140	2022-08-26 13:57:11 -04:00
Sanjay Patel	4827771234	[InstCombine] fold test of equality to 0.0 with bitcast operand fcmp oeq/une (bitcast X), 0.0 --> (and X, SignMaskC) ==/!= 0 https://alive2.llvm.org/ce/z/ZKATGN	2022-08-26 13:46:11 -04:00
Sanjay Patel	88d756507b	[InstCombine] add tests for fcmp with bitcast operand; NFC	2022-08-26 13:46:11 -04:00
Alexey Bataev	c7b815a510	[SLP][NFC]Add a test for vectorization of stores with float constants, NFC.	2022-08-26 10:22:07 -07:00
Florian Hahn	e117137af0	[LV] Add another test for incorrect runtime check generation. Add a variation of @nested_loop_outer_iv_addrec_invariant_in_inner with the dependence sink and source swapped to extend test coverage. Also simplifies the test by removing an unneeded reduction.	2022-08-26 17:28:55 +01:00
Florian Hahn	6e56779e6b	[LV] Add test for incorrect runtime check generation #57315 . Test for PR57315 based on a test provided by @kpdev42.	2022-08-26 16:29:20 +01:00
Florian Hahn	3b135ef446	[LV] Convert runtime diff check test to use opaque pointers. Modernize the test to make it easier to extend with up-to-date IR.	2022-08-26 16:02:38 +01:00
Alexey Bataev	af8477f2bc	[SLP][NFC]Add a test for incorrectly calculated cost for extracted buildvector sequence, NFC.	2022-08-26 06:29:15 -07:00
Adrian Vogelsgesang	5af06ba7dc	[Coro][Debuginfo] Add debug info to `__NoopCoro_ResumeDestroy` function With this commit, we now attach an `DISubprogram` to the LLVM-generated `_NoopCoro_ResumeDestroy` function. Thereby, lldb can show a `std::coroutine_handle` to a `std::noop_coroutine` as ``` continuation = coro frame = 0x555555560d98 { resume = 0x0000555555555c50 (a.out`__NoopCoro_ResumeDestroy) destroy = 0x0000555555555c50 (a.out`__NoopCoro_ResumeDestroy) } ``` instead of ``` continuation = coro frame = 0x555555560d98 { resume = 0x0000555555555c50 (a.out`___lldb_unnamed_symbol211) destroy = 0x0000555555555c50 (a.out`___lldb_unnamed_symbol211) } ``` I renamed the function from `NoopCoro.ResumeDestroy` to `_NoopCoro_ResumeDestroy` because: * the leading `_` makes sure this is a reserved name and should not clash with any user-provided names * the `.` was replaced by a `_`, so the name is now a valid identifier in C, making it allows me to type its name in the debugger Differential Revision: https://reviews.llvm.org/D132580	2022-08-26 05:49:52 -07:00
Matthias Gehre	3e39b27101	[llvm/CodeGen] Add ExpandLargeDivRem pass Adds a pass ExpandLargeDivRem to expand div/rem instructions with more than 128 bits into a loop computing that value. As discussed on https://reviews.llvm.org/D120327, this approach has the advantage that it is independent of the runtime library. This also helps the clang driver, which otherwise would need to understand enough about the runtime library to know whether to allow _BitInts with more than 128 bits. Targets are still free to disable this pass and instead provide a faster implementation in a runtime library. Fixes https://github.com/llvm/llvm-project/issues/44994 Differential Revision: https://reviews.llvm.org/D126644	2022-08-26 11:55:15 +01:00
Pavel Samolysov	f964417c32	Revert "[Pipelines] Introduce DAE after ArgumentPromotion" The commit breaks the compiler when a function is used as a function parameter (hm... for a function from the standard C library?): ``` static float strtof(char , char ) {} void a() { strtof(a, 0); } ``` This reverts commit `879f5118fc`.	2022-08-26 13:43:09 +03:00
Florian Hahn	555e09c2b0	[LAA] Rename printing pass to print<access-info>. This updates the naming for the LAA printing pass to be in line with most other analysis printing passes. The old name has come up as confusing multiple times already, e.g. in D131924.	2022-08-26 11:00:09 +01:00
Dmitry Makogon	9142f67ef2	[SimplifyCFG] Don't widen cond br if false branch has successors Fixes https://github.com/llvm/llvm-project/issues/57221. This limits the tryWidenCondBranchToCondBranch transform making it work only if the false block of widenable condition branch has no successors. If that block has successors, then SimplifyCondBranchToCondBranch may undo the transform done by tryWidenCondBranchToCondBranch, which would lead to infinite cycle of transformation and eventually an assert failing. Differential Revision: https://reviews.llvm.org/D132356	2022-08-26 15:23:37 +07:00
Chuanqi Xu	17631ac676	[Coroutines] Store the index for final suspend point if there is unwind coro end Closing https://github.com/llvm/llvm-project/issues/57339 The root cause for this issue is an pre-mature optimization to eliminate the index for the final suspend point since we feel like we can judge if a coroutine is suspended at the final suspend by if resume_fn_addr is null. However this is not true if the coroutine exists via an exception in promise.unhandled_exception(). According to [dcl.fct.def.coroutine]p14: > If the evaluation of the expression promise.unhandled_exception() > exits via an exception, the coroutine is considered suspended at the > final suspend point. But from the perspective of the implementation, we can't set the coro index to the final suspend point directly since it breaks the states. To fix the issue, we block the optimization if we find there is any unwind coro end, which indicates that it is possible that the coroutine exists via an exception from promise.unhandled_exception(). Test Plan: folly	2022-08-26 14:05:46 +08:00
Valery N Dmitriev	f9ceb71542	[SLP][NFC] Add a coverage test for horizontal reductions. Reduction feeds single insertelement instruction.	2022-08-25 16:02:22 -07:00
Philip Reames	86b67a310d	[LAA] Prune dependencies with distance large than access implied by trip count When we have a dependency with a dependence distance which can only be hit on an iteration beyond the actual trip count of the loop, we can ignore that dependency when analyzing said loop. We already had this code, but had restricted it solely to unknown dependence distances. This change applies it to all dependence distances. Without this code, we relied on the vectorizer reducing VF such that our infeasible dependence was respected. This usually worked out to about the same result, but not always. For fixed length vectorization, this could mean a smaller VF than optimal being chosen or additional runtime checks. For scalable vectorization - where the bounds on access implied by VF are broader - we could often not find a feasible VF at all. Differential Revision: https://reviews.llvm.org/D131924	2022-08-25 14:24:13 -07:00
Sanjay Patel	4e44c22c97	[ValueTracking][InstCombine] restrict FP min/max matching to avoid miscompile This is a long-standing FIXME with a non-FMF test that exposes the bug as shown in issue #57357. It's possible that there's still a way to miscompile by mis-identifying/mis-folding FP min/max patterns, but this patch only exposes a couple of seemingly minor regressions while preventing the broken transform.	2022-08-25 16:52:40 -04:00
Florian Hahn	637da77e66	[LV] Add additional test coverage for SCEVexp and LCSSA interaction. Also converts the test to use opaque pointers while I am here.	2022-08-25 20:59:47 +01:00
Sanjay Patel	b65f3aa7d9	[InstCombine] add test for fcmp+select miscompile; NFC issue #57357	2022-08-25 15:38:19 -04:00
Simon Pilgrim	3edec9ba60	[CostModel][X86] Support cost kind specific look up tables (REAPPLIED) Most of our cost model tables have been created assuming cost kind == recip-throughput. But we're starting to see passes wanting to get accurate costs for the other kinds as well. Some of these can be determined procedurally (e.g. codesize by default could just be the split count after type legalization), but others are going to need to be handled in cost tables - this is especially true for x86 which has so many ISA combinations. I've created a 'CostKindCosts' struct which can hold cost values for the 4 cost kinds, defaulting to -1U for unknown cost, this can be used with the existing CostTblEntryT/CostTableLookup template code. I've also added a [TargetCostKind] accessor to make it much easier to look up individual <Optional> costs. This just changes the ISD::SELECT costs to check the effect (and also to check that the ISD::SETCC are correctly handled for default/None cost kinds) - the plan would be to slowly extend this and move the CostKindTblEntry type somewhere generic to allow other targets to use it once its matured. I'm also going to resurrect D103695 so that it can help with latency/codesize/sizelatency coverage testing. For sizelatency - IIRC the definition was vague to let it be target specific - I've tried to use typical uop counts so they're comparable to MicroOpBufferSize etc. REAPPLIED: Added early out to prevent getCmpSelInstrCost being used for anything but generic integer/float scalar/vector types - getTypeLegalizationCost can't handle the "exotic" TypeID enums that some passes attempt to get a costs for (aggregates etc.). Differential Revision: https://reviews.llvm.org/D132216	2022-08-25 16:49:17 +01:00
Sanjay Patel	5260146a8a	[InstCombine] restore test for mul+add transform with constant expression; NFC This test was added with `6cf6c05322`, but then made useless with D4238 / `d1bea693e2`. We still need a test to make sure transforms are not conflicting when matching a constant expression.	2022-08-25 11:33:33 -04:00
Benjamin Kramer	ab85996e47	Revert "[CostModel][X86] Support cost kind specific look up tables" This reverts commit `45846854a2`. This triggers an assertion failure during Clang selfhost Unknown type! UNREACHABLE executed at llvm/lib/CodeGen/ValueTypes.cpp:548! * SIGABRT received by PID 6107 (TID 6107) on cpu 218 from PID 6107; stack trace: * @ 0x556c8827c2d1 64 llvm::llvm_unreachable_internal() @ 0x556c82a5542a 32 llvm::MVT::getVT() @ 0x556c82a54a28 80 llvm::EVT::getEVT() @ 0x556c7dda1526 80 llvm::TargetLoweringBase::getValueType() @ 0x556c8174dd38 112 llvm::BasicTTIImplBase<>::getTypeLegalizationCost() @ 0x556c81755e72 144 llvm::X86TTIImpl::getCmpSelInstrCost() @ 0x556c8174cadf 512 llvm::TargetTransformInfoImplCRTPBase<>::getInstructionCost() @ 0x556c84ab4dd2 32 llvm::TargetTransformInfo::getInstructionCost() @ 0x556c82ead283 1968 llvm::sinkRegion()	2022-08-25 15:42:44 +02:00
Simon Pilgrim	2e5f16516a	[CostModel][X86] Add CodeSize handling for fdiv ops Eventually this will be part of the cost table lookup	2022-08-25 14:08:03 +01:00
Simon Pilgrim	45846854a2	[CostModel][X86] Support cost kind specific look up tables Most of our cost model tables have been created assuming cost kind == recip-throughput. But we're starting to see passes wanting to get accurate costs for the other kinds as well. Some of these can be determined procedurally (e.g. codesize by default could just be the split count after type legalization), but others are going to need to be handled in cost tables - this is especially true for x86 which has so many ISA combinations. I've created a 'CostKindCosts' struct which can hold cost values for the 4 cost kinds, defaulting to -1U for unknown cost, this can be used with the existing CostTblEntryT/CostTableLookup template code. I've also added a [TargetCostKind] accessor to make it much easier to look up individual <Optional> costs. This just changes the ISD::SELECT costs to check the effect (and also to check that the ISD::SETCC are correctly handled for default/None cost kinds) - the plan would be to slowly extend this and move the CostKindTblEntry type somewhere generic to allow other targets to use it once its matured. I'm also going to resurrect D103695 so that it can help with latency/codesize/sizelatency coverage testing. For sizelatency - IIRC the definition was vague to let it be target specific - I've tried to use typical uop counts so they're comparable to MicroOpBufferSize etc. Differential Revision: https://reviews.llvm.org/D132216	2022-08-25 12:23:36 +01:00
Max Kazantsev	ccf788a565	[IRCE] Drop SCEV of a Phi after adding a new input. PR57335 Since SCEV learned to look through single value phis with `20d798bd47`, whenever we add a new input to a Phi, we should make sure that the old cached value is dropped. Otherwise, it may lead to various miscompiles, such as breach of dominance as shown in the bug https://github.com/llvm/llvm-project/issues/57335	2022-08-25 18:14:29 +07:00
Pavel Samolysov	879f5118fc	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-08-25 10:55:47 +03:00
Chenbing Zheng	adf4519c0e	[InstCombine] recognize bitreverse disguised as shufflevector This patch complete TODO left in D66965, and achieve related pattern for bitreverse. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D132431	2022-08-25 10:41:47 +08:00
Chenbing Zheng	14fae4d136	[InstCombine] Add undef elements support for shrinkFPConstantVector Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D132343	2022-08-25 10:38:48 +08:00
Mircea Trofin	5ce4c9aa04	[mlgo] Use TFLite for 'development' mode. TLite is a lightweight, statically linkable[1], model evaluator, supporting a subset of what the full tensorflow library does, sufficient for the types of scenarios we envision having. It is also faster. We still use saved models as "source of truth" - 'release' mode's AOT starts from a saved model; and the ML training side operates in terms of saved models. Using TFLite solves the following problems compared to using the full TF C API: - a compiler-friendly implementation for runtime-loadable (as opposed to AOT-embedded) models: it's statically linked; it can be built via cmake; - solves an issue we had when building the compiler with both AOT and full TF C API support, whereby, due to a packaging issue on the TF side, we needed to have the pip package and the TF C API library at the same version. We have no such constraints now. The main liability is it supporting a subset of what the full TF framework does. We do not expect that to cause an issue, but should that be the case, we can always revert back to using the full framework (after also figuring out a way to address the problems that motivated the move to TFLite). Details: This change switches the development mode to TFLite. Models are still expected to be placed in a directory - i.e. the parameters to clang don't change; what changes is the directory content: we still need an `output_spec.json` file; but instead of the saved_model protobuf and the `variables` directory, we now just have one file, `model.tflite`. The change includes a utility showing how to take a saved model and convert it to TFLite, which it uses for testing. The full TF implementation can still be built (not side-by-side). We intend to remove it shortly, after patching downstream dependencies. The build behavior, however, prioritizes TFLite - i.e. trying to enable both full TF C API and TFLite will just pick TFLite. [1] thanks to @petrhosek's changes to TFLite's cmake support and its deps!	2022-08-24 16:07:24 -07:00
Sami Tolvanen	cff5bef948	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Relands `67504c9549` with a fix for 32-bit builds. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 22:41:38 +00:00
Cameron McInally	38d58c1b37	[GlobalOpt] Bail out of GlobalOpt SROA if a Scalable Vector is seen The SROA algorithm won't work for Scalable Vectors, since we don't know how many bytes are loaded/stored. Bail out if a Scalable Vector is seen. Differential Revision: https://reviews.llvm.org/D132417	2022-08-24 13:17:59 -07:00
Sanjay Patel	f7ab70cf8d	[InstCombine] reduce disguised mul+add factorization ~(A * C1) + A --> (A * (1 - C1)) - 1 This is a non-obvious mix of bitwise logic and math: https://alive2.llvm.org/ce/z/U7ACVT The pattern may be produced by Negator from the more typical code seen in issue #57255.	2022-08-24 16:02:12 -04:00
Sanjay Patel	24d521815f	[InstCombine] add tests for add with not-of-mul common operand; NFC Negator can create non-obvious math while trying hard to avoid subtraction. issue #57255	2022-08-24 16:02:12 -04:00
Sami Tolvanen	a79060e275	Revert "KCFI sanitizer" This reverts commit `67504c9549` as using PointerEmbeddedInt to store 32 bits breaks 32-bit arm builds.	2022-08-24 19:30:13 +00:00
Sami Tolvanen	67504c9549	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 18:52:42 +00:00
Philip Reames	190cdf51ff	[RISCV][LV] Add predicated div/rem test for fixed length vectorization	2022-08-24 11:24:22 -07:00
Valery N Dmitriev	e3dd0ddc5b	[SLP][NFC] Add test case exposing deficiency in finding reductions that feed buildvector sequence. Differential Revision: https://reviews.llvm.org/D132506	2022-08-24 11:13:40 -07:00
Philip Reames	b20104f644	[LV] Update a test which appears to have been editted without regen [nfc]	2022-08-24 11:05:49 -07:00
Philip Reames	f79214d1e1	[LV] Support predicated div/rem operations via safe-divisor select idiom This patch adds support for vectorizing conditionally executed div/rem operations via a variant of widening. The existing support for predicated divrem in the vectorizer requires scalarization which we can't do for scalable vectors. The basic idea is that we can always divide (take remainder) by 1 without executing UB. As such, we can use the active lane mask to conditional select either the actual divisor for active lanes, or a constant one for inactive lanes. We already account for the cost of the active lane mask, so the only additional cost is a splat of one and the vector select. This is one of several possible approaches to this problem; see the review thread for discussion on some of the others. This one was chosen mostly because it was straight forward, and none of the others seemed oviously better. I enabled the new code only for scalable vectors. We could also legally enable it for fixed vectors as well, but I haven't thought through the cost tradeoffs between widening and scalarization enough to know if that's profitable. This will be explored in future patches. Differential Revision: https://reviews.llvm.org/D130164	2022-08-24 10:07:59 -07:00
Sanjay Patel	0cfc651032	[InstCombine] ease use constraint in tryFactorization() The stronger one-use checks prevented transforms like this: (x * y) + x --> x * (y + 1) (x * y) - x --> x * (y - 1) https://alive2.llvm.org/ce/z/eMhvQa This is one of the IR transforms suggested in issue #57255. This should be better in IR because it removes a use of a variable operand (we already fold the case with a constant multiply operand). The backend should be able to re-distribute the multiply if that's better for the target. Differential Revision: https://reviews.llvm.org/D132412	2022-08-24 12:10:54 -04:00
Simon Pilgrim	2f217c1214	[InstCombine] Canonicalize ((X & -X) - 1) --> ((X - 1) & ~X) (PR51784) Enables the ctpop((x & -x ) - 1) -> cttz(x, false) fold Alive2: https://alive2.llvm.org/ce/z/EDk4h7 (((X & -X) - 1) --> (~X & (X - 1)) ) Alive2: https://alive2.llvm.org/ce/z/8Yr3XG (CTPOP -> CTTZ) Fixes #51126 Differential Revision: https://reviews.llvm.org/D110488	2022-08-24 16:50:43 +01:00
Sanjay Patel	4376afe727	[InstCombine] add tests for mul+sub common factor; NFC	2022-08-24 11:31:18 -04:00
Simon Pilgrim	80cc8f0f62	Revert rGc360955c4804e9b25017372cb4c6be7adcb216ce "[InstCombine] Canonicalize ((X & -X) - 1) --> (~X & (X - 1)) (PR51784)" The test changes are failing on some buildbots (but not others.....).	2022-08-24 16:26:28 +01:00
Zain Jaffal	23a1c0a779	[instcombine] Test for zero initialisation optimisation of integer product Following the work on `D131672` we do the same optimisations for integer products. We add tests to check if a loop gets removed if we repeatdly multiply an array elements with an accumulator initalised to zero Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D132553	2022-08-24 15:54:20 +01:00
Simon Pilgrim	c360955c48	[InstCombine] Canonicalize ((X & -X) - 1) --> (~X & (X - 1)) (PR51784) Enables the ctpop((x & -x ) - 1) -> cttz(x, false) fold Alive2: https://alive2.llvm.org/ce/z/EDk4h7 (((X & -X) - 1) --> (~X & (X - 1)) ) Alive2: https://alive2.llvm.org/ce/z/8Yr3XG (CTPOP -> CTTZ) Fixes #51126 Differential Revision: https://reviews.llvm.org/D110488	2022-08-24 15:31:15 +01:00
Simon Pilgrim	cc7818939c	[InstCombine] Prevent complexity commutation in dec_mask_commute_neg_i32 Noticed by @spatel in D110488	2022-08-24 14:35:04 +01:00
Simon Pilgrim	bdac3c28fb	[InstCombine] Add tests for ((X & -X) - 1) --> (~X & (X - 1)) canonicalization As originally suggested on D110488	2022-08-24 13:05:01 +01:00
David Green	8d830f8d68	[LV] Replace fixed-order cost model with a SK_Splice shuffle The existing cost model for fixed-order recurrences models the phi as an extract shuffle of a v1 vector. The shuffle produced should be a splice, as they take two vectors inputs are extracting from a subset of the lanes. On certain architectures the existing cost model can drastically under-estimate the correct cost for the shuffle, so this changes it to a SK_Splice and passes a correct Mask through to the getShuffleCost call. I believe this might be the first use of a SK_Splice shuffle cost model outside of scalable vectors, and some targets may require additions to the cost-model to correctly account for them. In tree targets appear to all have been updated where needed. Differential Revision: https://reviews.llvm.org/D132308	2022-08-24 13:00:32 +01:00
Pavel Samolysov	6703ad1e0c	Revert "[Pipelines] Introduce DAE after ArgumentPromotion" This reverts commit `3f20dcbf70`.	2022-08-24 12:44:13 +03:00
Pavel Samolysov	3f20dcbf70	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-08-24 10:36:12 +03:00
David Green	e29f9f7572	[AArch64][X86] Add some fixed-order-recurrence tests to check the costmodel of fixed order recurrences. NFC	2022-08-24 08:18:01 +01:00
Sanjay Patel	8ccca3f3a4	[InstCombine] adjust tests for mul+add common factor; NFC The existing tests were added with `2880d7b9e4`, but discussion in D132412 suggests that we should start with a simpler pattern (the more complicated pattern may not be a real problem).	2022-08-23 17:53:53 -04:00
Florian Hahn	5913d77056	[Globals] Treat nobuiltin fns as maybe-derefined. Callsites could be marked as `builtin` while calling `nobuiltin` functions. This can lead to problems, if local optimizations apply transformations based on the semantics of the builtin, but then IPO treats the function as `nobuiltin` and applies a transform that breaks builtin semantics (assumed earlier). To avoid this, mark such functions as maybey-derefined, to avoid IPO transforms on them that may break assumptions of earlier calls. Fixes #57075 Fixes #48366 Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D97735	2022-08-23 13:45:10 +01:00
Florian Hahn	13cb1a9249	[DeadArgElim] Add test case for #48366 , #57075	2022-08-23 13:08:42 +01:00
Jay Foad	18557c26be	[StructurizeCFG] Autogenerate checks	2022-08-23 11:22:24 +01:00
gonglingqin	e9a4b8e397	[LoongArch] Optimize the atomic store with amswap_db.[w/d] When AtomicOrdering is release or stronger, use amswap_db.[w/d] $zero, $a1, $a0 instead of dbar 0 st.[w/d] $a0, $a1, 0 Thanks to @xry111 for the suggestion: https://reviews.llvm.org/D128901#3626635 Differential Revision: https://reviews.llvm.org/D129838	2022-08-23 17:11:57 +08:00
Graham Hunter	14212c968f	[NFC][LoopVectorize] Precommit masked vector function call tests	2022-08-23 09:47:10 +01:00
Dmitry Makogon	56b213090f	[NFC] Remove undef from xfailed SimplifyCFG test The test fails not because of undef, so replacing with normal condition.	2022-08-23 14:53:05 +07:00
Chenbing Zheng	6b3cbf60f4	[InstCombine] add tests for shuf bitcast. nfc	2022-08-23 11:01:18 +08:00
Sanjay Patel	2880d7b9e4	[InstCombine] add tests for mul+add common factor; NFC	2022-08-22 17:19:45 -04:00
Jay Foad	f82c55fa08	[InstCombine] Change order of canonicalization of ADD and AND Canonicalize ((x + C1) & C2) --> ((x & C2) + C1) for suitable constants C1 and C2, instead of the other way round. This should allow more constant ADDs to be matched as part of addressing modes for loads and stores. Differential Revision: https://reviews.llvm.org/D130080	2022-08-22 20:03:53 +01:00
Jay Foad	2754ff883d	[InstCombine] Try not to demand low order bits for Add Don't demand low order bits from the LHS of an Add if: - they are not demanded in the result, and - they are known to be zero in the RHS, so they can't possibly overflow and affect higher bit positions This is intended to avoid a regression from a future patch to change the order of canonicalization of ADD and AND. Differential Revision: https://reviews.llvm.org/D130075	2022-08-22 20:03:53 +01:00
Jay Foad	8b24e64014	[InstCombine] Add tests for D130075 Differential Revision: https://reviews.llvm.org/D132381	2022-08-22 20:03:52 +01:00
David Green	04a68fce13	[ARM] Add a couple of MVE fixed-order-reduction tests. NFC	2022-08-22 10:58:14 +01:00
Florian Hahn	b3125ad3d6	[DSE] Add additional uses to tests to ensure stores be removed. Make some tests a bit more robust with respect to future changes.	2022-08-22 10:21:50 +01:00
Max Kazantsev	e587199a50	[SCEV] Prove condition invariance via context, try 2 Initial implementation had too weak requirements to positive/negative range crossings. Not crossing zero with nuw is not enough for two reasons: - If ArLHS has negative step, it may turn from positive to negative without crossing 0 boundary from left to right (and crossing right to left doesn't count for unsigned); - If ArLHS crosses SINT_MAX boundary, it still turns from positive to negative; In fact we require that ArLHS always stays non-negative or negative, which an be enforced by the following set of preconditions: - both nuw and nsw; - positive step (looks liftable); Because of positive step, boundary crossing is only possible from left part to the right part. And because of no-wrap flags, it is guaranteed to never happen.	2022-08-22 14:31:19 +07:00
Max Kazantsev	7d6e7d5445	[Test] And one more test for PR57247	2022-08-22 13:02:01 +07:00
Max Kazantsev	bedf43be4e	[Test] One more test for PR57247 Show that the issue also exists with positive steps.	2022-08-22 12:22:49 +07:00
Ting Wang	d2d77e050b	[PowerPC][Coroutines] Add tail-call check with call information for coroutines Fixes #56679. Reviewed By: ChuanqiXu, shchenz Differential Revision: https://reviews.llvm.org/D131953	2022-08-21 22:20:40 -04:00
Sanjay Patel	15e3d86911	[InstCombine] reassociate bitwise logic chains based on uses (X op Y) op Z --> (Y op Z) op X This isn't a complete solution (see TODO tests for possible refinements), but it shows some nice wins and doesn't seem to cause any harm. I think the most potential danger is from conflicting with other folds and causing an infinite loop - that's the reason for avoiding patterns with constant operands. Alternatively, we could try this in the reassociate pass, but we would not immediately see all of the logic folds that instcombine provides. I also looked at improving ValueTracking's isImpliedCondition() (and we should still add some enhancements there), but that would not work in general for bitwise logic reduction. The tests that reduce completely to 0/-1 are motivated by issue #56653. Differential Revision: https://reviews.llvm.org/D131356	2022-08-21 09:42:14 -04:00
Simon Pilgrim	53c0be28a7	[PhaseOrdering][X86] Regenerate vdiv.ll Noticed while cleaning up x86 cost tables for upcoming cost kind support and it affected this test	2022-08-21 13:39:55 +01:00
Sanjay Patel	2981a94902	[EarlyCSE][ConstantFolding] do not constant fold atan2(+/-0.0, +/-0.0), part 2 Follow-up to `7f1262a322`. That patch avoided removing the call, but it still allowed the constant-folded result. This makes the behavior consistent with 1-arg libm folding: if the call potentially raises an exception, then we just bail out. It seems likely that there are other corner-cases like this, but the tests are incomplete, so we have lived with these discrepancies for a long time. This was untested before the the constant folding was expanded in D127964.	2022-08-20 10:16:06 -04:00
jeff	20cf170e68	[InferAddressSpaces] [AMDGPU] Add inference for flat_atomic intrinsics Certain address space dependent optimizations, like SeperateConstOffsetFromGEP, assume agreement between the address space of the recursive uses and the address space of the def. If this assumption is invalid, then optimizations may or may not be correct depending on properties of an address space for a given target, the address spaces of recursive uses, and the optimization being done. This patch infers the previous address space for flat_atomic ptr arguments. As a result, the address spaces of the uses in flat_atomic cases will agree with the address space in recursive defs. If this results in non-flat address space, then isel may infer a different intrinsic. For example, if the result is a flat_atomic using global address space, then it will be lowered to the corresponding global_atomic intrinsic. Change-Id: Ifcd981709dc2ea94d4acbcb84efe7176593ec8c7	2022-08-19 11:37:20 -07:00
Sanjay Patel	7f1262a322	[EarlyCSE][ConstantFolding] do not constant fold atan2(+/-0.0, +/-0.0) These may raise an error (set errno) as discussed in the post-commit comments for D127964, so we can't fold away the call and potentially alter that behavior.	2022-08-19 12:27:29 -04:00
Sanjay Patel	4bff1037bb	[EarlyCSE][ConstantFolding] add tests for atan2 with zero args; NFC	2022-08-19 12:18:53 -04:00
Alexey Bataev	c167028684	[SLP]Delay vectorization of postponable values for instructions with no users. SLP vectorizer tries to find the reductions starting the operands of the instructions with no-users/void returns/etc. But such operands can be postponable instructions, like Cmp, InsertElement or InsertValue. Such operands still must be postponed, vectorizer should not try to vectorize them immediately. Differential Revision: https://reviews.llvm.org/D131965	2022-08-19 08:39:16 -07:00
Alexey Bataev	0e7ed32c71	[SLP]Cost for a constant buildvector. In many cases constant buildvector results in a vector load from a constant/data pool. Need to consider this cost too. Differential Revision: https://reviews.llvm.org/D126885	2022-08-19 08:02:42 -07:00
David Sherwood	666d2a925f	[SVE][LoopVectorize][NFC] Tidy up some tests Whilst writing a patch to add extra tail-folding RUN lines to existing tests I noticed a few areas where they can be cleaned up a little: 1. scalable-reductions.ll: fmin_fast does not mark fcmp as fast. 2. sve-inductions-unusual-types.ll: remove direct references to SSA variable names. 3. sve-strict-fadd-cost.ll: don't force vector width so we see costs for different VFs in one go. This will be important for the follow-on patch. 4. sve-vector-reverse.ll,vector-reverse-mask4.ll: add noalias keyword to simplify IR. 4. sve-widen-gep.ll,sve-widen-phi.ll: regenerate using script. These changes will make the subsequent patch adding RUN lines much easier to review! Differential Revision: https://reviews.llvm.org/D132219	2022-08-19 15:12:58 +01:00
Max Kazantsev	72136d8ba2	[Test] Add test for miscompile described in PR57247	2022-08-19 21:02:07 +07:00
Max Kazantsev	f798c042f4	Revert "[SCEV] Prove condition invariance via context" This reverts commit `a3d1fb3b59`. Reverting until investigation of https://github.com/llvm/llvm-project/issues/57247 has concluded.	2022-08-19 21:02:06 +07:00
Caroline Concatto	09afe4155b	[InstCombine] For vector extract when extract vector and insert value type is the same This patch has implements these optimizations: extract.vector(insert.vector(Vector, Value, Idx), Idx) --> Value extract.vector(insert.vector(Vector, Value, InsertIndex), ExtractIndex) --> extract.vector(Vector, ExtractIndex) Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D132137	2022-08-19 12:13:03 +01:00
Max Kazantsev	0867f23dcf	[Test] Regenerate tests using update_tests.py	2022-08-19 14:51:19 +07:00
Max Kazantsev	e351e8213b	[Test] Remove addrspace1 ptr to not confuse alive2 addrspace here is not import for the test itself.	2022-08-19 13:25:50 +07:00
Fangrui Song	c2a3888793	[IR] Use Min behavior for module flag "PIC Level" Using Max for both "PIC Level" and "PIE Level" is inconsistent. PIC imposes less restriction while PIE imposes more restriction. The result generally picks the more restrictive behavior: Min for PIC. This choice matches `ld -r`: a non-pic object and a pic object merge into a result which should be treated as non-pic. To allow linking "PIC Level" using Error/Max from old bitcode files, upgrade Error/Max to Min. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D130531	2022-08-18 16:28:55 -07:00
Craig Topper	37c47b2cac	[RISCV] Change how mtune aliases are implemented. The previous implementation translated from names like sifive-7-series to sifive-7-rv32 or sifive-7-rv64. This also required sifive-7-rv32 and sifive-7-rv64 to be valid CPU names. As those are not real CPUs it doesn't make sense to accept them in -mcpu. This patch does away with the translation and adds sifive-7-series directly to RISCV.td. Removing sifive-7-rv32 and sifive-7-rv64. sifive-7-series is only allowed in -mtune. I've also added "rocket" to RISCV.td but have not removed rocket-rv32 or rocket-rv64. To prevent -mcpu=sifive-7-series or -mcpu=rocket being used with llc, I've added a Feature32Bit to all rv32 CPUs. And made it an error to have an rv32 triple without Feature32Bit. sifive-7-series and rocket do not have Feature32Bit or Feature64Bit set so the user would need to provide -mattr=+32bit or -mattr=+64bit along with the -mcpu to avoid the error. SiFive no longer names their newer products with 3, 5, or 7 series. Instead we have p200 series, x200 series, p500 series, and p600 series. Following the previous behavior would require a sifive-p500-rv32 and sifive-p500-rv64 in order to support -mtune=sifive-p500-series. There is currently no p500 product, but it could start getting confusing if there was in the future. I'm open to hearing alternatives for how to achieve my main goal of removing sifive-7-rv32/rv64 as a CPU name. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131708	2022-08-18 16:22:25 -07:00
Sanjay Patel	b066195b3f	[InstCombine] fold bitwise logic or+or+xor+not (~A \| C) \| (A ^ B) --> ~(A & B) \| C https://alive2.llvm.org/ce/z/Qw3aiJ This extends the existing fold (just above the new match) to peek through another 'or' instruction. This should let the motivating case from issue #57174 simplify completely.	2022-08-18 17:14:41 -04:00
Sanjay Patel	0e9fada711	[InstCombine] add tests for or-xor-not patterns; NFC	2022-08-18 17:14:41 -04:00
Philip Reames	4d87591028	[RISCV] Use VScaleForTuning in costing of operations whose cost depends on VL On known hardware, reductions, gather, and scatter operations have execution latencies which correlated with the vector length (VL) of the operation. Most other operations (e.g. simply arithmetic) don't correlated in this way, and instead essentially fixed cost as VL varies. When I'd implemented initial scalable cost model support for reductions, gather, and scatter operations, I had used an upper bound on the statically unknown VL. The argument at the time was that this prevented falsely low costs, and biased the vectorizer away from generating bad (on some hardware) code. Unfortunately, practical experience shows we were a bit too effective at that goal, and the high costs defacto prevents vectorization using these constructs at all. This patch reverses course, and ties the returned cost not to the maximum possible VL, but the VL which would correspond to VScaleForTuning. This parameter is the same one the vectorizer uses when normalizing loop costs, so the term effectively cancels out. The result is that the vectorizer now sees these constructs as comparable in cost to their fixed length variants. This does introduce the possibility of the cost for these operations being a significant under estimate on platforms where actual VLEN is far from that implied by VScaleForTuning. On such platforms, we might make poor heuristic choices. Probably not in LV itself (due to the cancellation mentioned above), but possibly during e.g. lowering. I'm not currently aware of any concrete examples of this, but this patch does open a concern which did not previously exist. Previously, we had the problem of overestimating costs causing the same problem on machines much closer to default values for vscale for tuning. With this patch, we still have that problem potentially if vscale for tuning is set high (manually), and then the code is run on a narrow VLEN machine. Differential Revision: https://reviews.llvm.org/D131519	2022-08-18 13:10:03 -07:00
Florian Hahn	b8709a9d03	[LV] Support fixed order recurrences. If the incoming previous value of a fixed-order recurrence is a phi in the header, go through incoming values from the latch until we find a non-phi value. Use this as the new Previous, all uses in the header will be dominated by the original phi, but need to be moved after the non-phi previous value. At the moment, fixed-order recurrences are modeled as a chain of first-order recurrences. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D119661	2022-08-18 19:15:52 +01:00
Simon Pilgrim	b864cad7b4	[CostModel][X86] Adjust SLM select costs to match poor throughput of pblendvb/blendvpd/blendvps	2022-08-18 17:05:38 +01:00
Danila Malyutin	4a9ff289fb	[InstCombine] Fix freeze instruction getting inserted before landingpad The code would use first non-phi instruction as an insertion point, however this could lead to freeze getting inserted between phi and landingpad causing a verifier assert. Differential Revision: https://reviews.llvm.org/D132105	2022-08-18 17:43:42 +03:00
Philip Reames	531dd3634d	[LV] Restructure isPredicatedInst and isScalarWithPredication (w/a fix for uniform mem ops) This change reorganizes the code and comments to make the expected semantics of these routines more clear. However, this is not an NFC change. The functional change is having isScalarWithPredication return false if the instruction does not need predicated. Specifically, for the case of a uniform memory operation we were previously considering it not to be a predicated instruction, but were considering it to be scalable with predication. As can be seen with the test changes, this causes uniform memory ops which should have been lowered as uniform-per-parts values to instead be lowering via naive scalarization or if scalarization is infeasible (i.e. scalable vectors) aborted entirely. I also don't trust the code to bail out correctly 100% of the time, so it's possible we had a crash or miscompile from trying to scalarize something which isn't scalaralizable. I haven't found a concrete example here, but I am suspicious. Differential Revision: https://reviews.llvm.org/D131093	2022-08-18 07:14:04 -07:00
Konstantina	5bc8791187	[NewGVN][PHIOFOPS] Bail out if an operand is in OpSafeForPHIOfOps but it is not safe for the current basic block. NewGVN tables are not cleared out between the initial run of NewGVN and the verification. In case of phi-of-ops optimization, OpSafeForPHIOfOps goes out of sync between the two runs. One operand might not be safe for one basic block, but it might be safe for one of its successors. In this case, the operand will be added in OpSafeForPHIOfOps map. In verification phase, we reuse OpSafeForPHIOfOps without updating it again. As a result, the operand will be considered safe for phi-of-ops optimization even for the case that it is not. This patch fixes this problem. Fix for 53807. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D130910	2022-08-17 18:57:46 -07:00
Philip Reames	aa41fe664a	[tests] Precommit tests for phi recusion limits in known bits	2022-08-17 14:03:10 -07:00
Ellis Hoag	6f61594d8c	[InstrProf] Add option to avoid instrumenting small functions If a function only has a few instructions, instrumentation can significantly increase the size and performance overhead of that function. Add the `-pgo-function-size-threshold` option to select a size threshold so these small functions are not instrumented. A similar option `-fxray-instruction-threshold=<N>` is used for XRay to reduce binary size overhead [1]. [1] https://www.llvm.org/docs/XRay.html Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D131816	2022-08-17 06:47:15 -07:00
David Spickett	8375c3124d	[LLVM][IndvarSimplify] Move test that requires X86 This is failing on our bots that only build Arm/AArch64. https://lab.llvm.org/buildbot/#/builders/171/builds/19033/steps/5/logs/FAIL__LLVM__pr57187_ll	2022-08-17 11:14:08 +00:00
Zain Jaffal	f61f99a105	[instcombine] Optimise for zero initialisation of product given fast flags are enabled Currently, clang ignores the 0 initialisation in finite math For example: ``` double f_prod = 0; double arr[1000]; for (size_t i = 0; i < 1000; i++) { f_prod *= arr[i]; } ``` Clang will ignore that `f_prod` is set to zero and it will generate assembly to iterate over the loop. Reviewed By: fhahn, spatel Differential Revision: https://reviews.llvm.org/D131672	2022-08-17 11:12:15 +01:00
Andre Vieira	49223e0a2d	[TypePromotion] Don't promote PHI + ZExt if wider than RegisterBitWidth Differential Revision: https://reviews.llvm.org/D131966	2022-08-17 09:54:15 +01:00
Max Kazantsev	53544c67db	[Test] Add miscompiled test for PR57187 Details at https://github.com/llvm/llvm-project/issues/57187	2022-08-17 15:46:30 +07:00
Martin Sebor	345514e991	[InstCombine] Add support for strlcpy folding Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D130666	2022-08-16 16:43:40 -06:00
Martin Sebor	e858f5120d	[InstCombine] Remove assumptions about int having 32 bits Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D131731	2022-08-16 15:35:08 -06:00
Sanjay Patel	ce081776b2	[FlattenCFG] avoid crash on malformed code We don't have a dominator tree in this pass, so we can't bail out sooner by checking for unreachable code, but this is a minimal fix for the example in issue #56875.	2022-08-16 15:11:00 -04:00
Simon Pilgrim	08d153d806	[ValueTracking] computeKnownBits - attempt to use a branch condition feeding a phi to improve known bits range (PR38280) If computeKnownBits encounters a phi node, and we fail to determine any known bits through direct analysis, see if the incoming value is part of a branch condition feeding the phi. Handle cases where icmp(IncomingValue PRED Constant) is driving a branch instruction feeding that phi node - at the moment this only handles EQ/ULT/ULE predicate cases as they are the most straightforward to handle and most likely for branch-loop 'max upper bound' cases - we can extend this if/when necessary. I investigated a more general icmp(LHS PRED RHS) KnownBits system, but the hard limits we put on value tracking depth through phi nodes meant that we were mainly catching constants anyhow. Fixes the pointless vectorization in PR38280 / Issue #37628 (excessive unrolling still needs handling though) Differential Revision: https://reviews.llvm.org/D131838	2022-08-16 16:54:44 +01:00
Florian Hahn	1638ad1ebf	[PhaseOrdering] Add test showing excessive unrolling of vector loop. Test cases based on #42332 showing excessive unrolling with both known and runtime trip counts.	2022-08-16 16:29:15 +01:00
Danila Malyutin	451497a030	[RS4GC] Handle vectors of pointers in non-live clobbering Fix crash when trying to unconditionally cast alloca type to PointerType Differential Revision: https://reviews.llvm.org/D131146	2022-08-16 17:47:30 +03:00
Simon Pilgrim	f5f4ed87a9	[InstCombine] known-phi-br.ll - remove multiuse handling from tests Based off discussion with @spatel for D131838 - InstCombine will still canonicalize the predicates enough that the @use() multiuses aren't helping	2022-08-16 15:34:48 +01:00
Alexey Bataev	65c7cecb13	[SLP]Fix PR51320: Try to vectorize single store operands. Currently, we try to vectorize values, feeding into stores, only if slp-vectorize-hor-store option is provided. We can safely enable vectorization of the value operand of a single store in the basic block, if the operand value is used only in store. It should enable extra vectorization and should not increase compile time significantly. Fixes https://github.com/llvm/llvm-project/issues/51320 Differential Revision: https://reviews.llvm.org/D131894	2022-08-16 07:25:21 -07:00
Zain Jaffal	468a9d6d2a	[instcombine] Test for zero initialisation optimisation of a product given fast flags Precommit tests for D131672. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D131757	2022-08-16 14:07:44 +01:00
Kevin P. Neal	05ac82de40	[FPEnv][EarlyCSE] Support for CSE when exception behavior is "ignore" or "maytrap" and the rounding mode is known. Previously we would only CSE constrained FP intrinsics in the default floating point environment. Exception behavior of "strict" is still not allowed since we are not allowed to remove any traps in that case. There are no restrictions on CSE across function calls inside a function. Differential Revision: https://reviews.llvm.org/D112256	2022-08-16 08:31:42 -04:00
Simon Pilgrim	30bd90b8cd	[InstSimplify] Add another and(x,c) case where the mask is redundant (and in fact can constant fold away)	2022-08-16 12:25:50 +01:00
Florian Hahn	a34428f07d	[LV] Use variables instead of hard-coded metadata IDs in tests.	2022-08-16 12:21:49 +01:00
Andre Vieira	c6b5a13b7a	[TypePromotion] Only search for PHI + ZExt promotion of Integers Differential Revision: https://reviews.llvm.org/D131948	2022-08-16 10:15:32 +01:00
Zain Jaffal	94d21a94d9	[AArch64] Add tests to check for loop vectorization of non temporal loads Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D131899	2022-08-16 09:40:51 +01:00
Max Kazantsev	ebabd6bf18	Return "[SCEV] Use context to strengthen flags of BinOps" This reverts commit `354fa0b480`. Returning as is. The patch was reverted due to a miscompile, but this patch is not causing it. This patch made it possible to infer some nuw flags in code guarded by `false` condition, and then someone else to managed to propagate the flag from dead code outside. Returning the patch to be able to reproduce the issue.	2022-08-16 14:12:36 +07:00
Philip Reames	33e7a0a33b	[RISCV][LV] Add test coverage for upcoming dependence distance handling change	2022-08-15 15:20:36 -07:00
Martin Sebor	65967708d2	[InstCombine] Adjust snprintf folding of constant strings (PR #56598 ) Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D130494	2022-08-15 15:59:21 -06:00
Alexey Bataev	aed5e3bea1	[SLP][NFC]Add a test for delaying of insertelements vectorization, NFC.	2022-08-15 13:26:51 -07:00
Jameson Nash	3a8d7fe201	[SimplifyCFG] teach simplifycfg not to introduce ptrtoint for NI pointers SimplifyCFG expects to be able to cast both sides to an int, if either side can be case to an int, but this is not desirable or legal, in general, per D104547. Spotted in https://github.com/JuliaLang/julia/issues/45702 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D128670	2022-08-15 15:11:48 -04:00
Sanjay Patel	e5748c6e73	[InstCombine] reduce sub-with-overflow ==/!= 0 The basic patterns look like this: https://alive2.llvm.org/ce/z/MDj9EC The tests have a use of the overflow value too. Otherwise, existing folds should reduce already. This was noted as a missing IR fold in: `926e7312b2` Hopefully, this makes it easier to implement a backend fix because we should get the same IR regardless of whether the source used builtins or inline code.	2022-08-15 13:03:51 -04:00
Sanjay Patel	b4f61c5ecd	[InstCombine] add tests for compare of sub-with-overflow; NFC	2022-08-15 13:03:51 -04:00
Simon Pilgrim	e471fdad7c	[InstCombine] known-phi-br.ll - add multiuse of compare results to avoid predicate inverse and add negative tests Feedback from D131838	2022-08-15 17:57:34 +01:00
Craig Topper	ef8c34e954	[InstSimplify] sle on i1 also encodes implication We already support SGE, so the same logic should hold for SLE with the LHS and RHS swapped. I didn't see this in the wild. Just happened to walk past this code and thought it was odd that it was asymmetric in what condition codes it handled. Reviewed By: spatel, reames Differential Revision: https://reviews.llvm.org/D131805	2022-08-15 08:27:23 -07:00
Max Kazantsev	354fa0b480	Revert "[SCEV] Use context to strengthen flags of BinOps" This reverts commit `34ae308c73`. Our internal testing found a miscompile. Not sure if it's caused by this patch or it revealed something else. Reverting while investigating.	2022-08-15 18:51:59 +07:00
Nuno Lopes	0299ebc1bd	InstCombine: use poison instead of undef as placeholder in insertvalue [NFC] These vectors are fully initialized so the placeholder value is irrelevant	2022-08-14 21:37:23 +01:00
Florian Hahn	4f04be5649	[LV] Add tests for vectorizing select of minimum idx idiom. Test cases for selecting the index with the minimum value.	2022-08-14 17:44:11 +01:00
Sanjay Patel	8b56fa92de	[InstCombine] fix "X\|(X^Y)" pattern-matching for commuted variants	2022-08-13 11:02:28 -04:00
Sanjay Patel	34ef8c31ca	[InstCombine] add tests for or-xor; NFC The existing pattern matching fails to handle all commutes.	2022-08-13 11:02:28 -04:00
Sanjay Patel	9d218b61cc	[InstCombine] reduce or-xor-or patterns (A \| ?) \| (A ^ B) --> (A \| ?) \| B https://alive2.llvm.org/ce/z/dbNQw4 This extends the existing transform to peek through another 'or' instruction for the common operand. This is the underlying missing fold that should allow issue #56711 and issue #57120 to reduce even more.	2022-08-13 09:52:01 -04:00
Simon Pilgrim	c771d0fd4b	[Instcombine] Add (simplified) pointless loop unroll / vectorization test for Issue #37628	2022-08-13 14:34:08 +01:00
Sanjay Patel	f7e98ef6c2	[InstCombine] add tests for or-xor-or; NFC	2022-08-13 09:16:32 -04:00
Simon Pilgrim	94b62a1d50	[Instcombine] known-phi-br.ll - fix test bounds for positive tests I was off by one in a couple of the inverse predicate tests....	2022-08-13 13:49:00 +01:00
Simon Pilgrim	58ea2454aa	[Instcombine] Add some value tracking tests for cases where the conditional branch feeding a phi gives us known bits of the value based off the branch condition Part of Issue #37628	2022-08-13 13:00:52 +01:00
Wolfgang Pieb	7ddfb4dfeb	[Inlining] Introduce the function attribute "inline-max-stacksize" The value of the attribute is a size in bytes. It has the effect of suppressing inlining of functions whose stacksizes exceed the given value. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D129904	2022-08-12 11:07:18 -07:00
Max Kazantsev	a3d1fb3b59	[SCEV] Prove condition invariance via context Contextual knowledge may be used to prove invariance of some conditions. For example, in this case: ``` ; %len >= 0 guard(%iv = {start,+,1}<nuw> <s %len) guard(%iv = {start,+,1}<nuw> <u %len) ``` the 2nd check always fails if `start` is negative and always passes otherwise. It looks like there are more opportunities of this kind that are still to be implemented in the future. Differential Revision: https://reviews.llvm.org/D129753 Reviewed By: apilipenko	2022-08-12 14:23:35 +07:00
Chuanqi Xu	e190b7cc90	[Coroutines] Maintain the position of final suspend Closing https://github.com/llvm/llvm-project/issues/56329 The problem happens when we try to simplify the suspend points. We might break the assumption that the final suspend lives in the last slot of Shape.CoroSuspends. This patch tries to main the assumption and fixes the problem.	2022-08-12 13:05:08 +08:00
Philip Reames	1062595808	[RISCV][SLP] Add some basic test coverage	2022-08-11 13:05:14 -07:00
Arnold Schwaighofer	6ef223c041	[coro async] Mark async suspend function and its resume function pointer intrinsic as nomerge Coroutine splitting is not possible if the one-to-one mapping between the two is lost. Every suspend point must have a matching continuation function pointer. rdar://98404664 Differential Revision: https://reviews.llvm.org/D131684	2022-08-11 11:43:30 -07:00
Vir Narula	625877b0ef	[Matrix] Add tests dot product with varied strides Add more tests with varied strides. Changes to lowering upcoming in https://reviews.llvm.org/D131125 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D131444	2022-08-11 19:09:21 +01:00
Sanjay Patel	fa68d93d54	[InstCombine] fold reassociative fadd with negated operand We manage to iteratively achieve this result with no extra uses, and the reassociate pass can also do this, but this pattern falls through the cracks in the example from issue #57053.	2022-08-11 11:43:36 -04:00
Sanjay Patel	5683801c56	[InstCombine] add tests for reassociative fadd with negated op; NFC Extra uses inhibit more basic folds, so we miss the larger fold.	2022-08-11 11:43:36 -04:00
Kevin P. Neal	7bdb010d7c	[FPEnv][InstSimplify] 0.0 - -X ==> X Another ticket split out of D107285, this extends the optimization of 0.0 - -X to just X when using constrained intrinsics and the optimization is allowed. If the negation of X is done with fsub then the match fails because of the lack of IR Matcher support for constrained intrinsics. While I'm here, remove some TODO notices since the work is no longer planned. Differential Revision: https://reviews.llvm.org/D131607	2022-08-11 11:35:33 -04:00
Andre Vieira	f9563967ca	[TypePromotion] Update comment in testcase. RFC	2022-08-11 13:35:14 +01:00
Andre Vieira	1640679187	[TypePromotion] Search from ZExt + PHI Expand TypePromotion pass to try to promote PHI-nodes in loops that are the operand of a ZExt, using the ZExt's result type to determine the Promote Width. Differential Revision: https://reviews.llvm.org/D111237	2022-08-11 09:50:10 +01:00
Andre Vieira	57de4e059d	[TypePromotion] Don't insert Truncate for a no-op ZExt Differential Revision: https://reviews.llvm.org/D131487	2022-08-11 09:50:10 +01:00
Johannes Doerfert	b65471d715	[Attributor][FIX] Visit same instructions with different scopes If we collect potential values we need to visit a value even if we have seen it before if the scope is different. The scope is part of the result after all. Test included. Fixes https://github.com/llvm/llvm-project/issues/56753 Differential Revision: https://reviews.llvm.org/D131597	2022-08-10 16:02:12 -05:00
Martin Sebor	0dcfe7aa35	[InstCombine] Tighten up known library function signature tests (PR #56463 ) Replace a switch statement used to validate arguments to known library functions with a more consistent table-driven approach and tighten it up.	2022-08-10 14:15:46 -06:00
Sanjay Patel	43dd567443	[EarlyCSE] allow flexibility in atan(-0.0) test As discussed in the post-commit feedback for `b53d44fe47`, this test was failing on AIX because atan(-0.0) results in 0.0 (positive). Differential Revision: https://reviews.llvm.org/D131601	2022-08-10 15:02:01 -04:00
Mohammed Nurul Hoque	30abc1a6a1	[ConstantFolding] Eliminate atan and atan2 calls From the opengroup specifications, atan2 may fail if the result underflows and atan may fail if the argument is subnormal, but we assume that does not happen and eliminate the calls if we can constant fold the result at compile-time. Differential Revision: https://reviews.llvm.org/D127964	2022-08-10 11:01:50 -04:00
Jake Egan	c1226585b3	[AIX][tests] XFAIL for system-aix instead The Clang folding for floating-point sometimes calls out to the host.	2022-08-10 09:31:42 -04:00
Dinar Temirbulatov	cab6cd6834	[AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding. After D121595 was commited, I noticed regressions assosicated with small trip count numbersvectorisation by tail folding with scalable vectors. As a solution for those issues I propose to introduce the minimal trip count threshold value. Differential Revision: https://reviews.llvm.org/D130755	2022-08-09 22:10:17 +01:00
Sanjay Patel	926e7312b2	[InstCombine] fold usub.with.overflow to icmp when there's no use of the math value https://alive2.llvm.org/ce/z/UE48FH This is part of solving issue #56926.	2022-08-09 13:13:48 -04:00
Sanjay Patel	56a1c61ad1	[PhaseOrdering][AArch64] add test for mul-with-overflow; NFC Reduced from issue #56403	2022-08-09 12:38:10 -04:00
zhongyunde	c2ab65ddaf	[IndVars] Eliminate redundant type cast with different sizes Deal with different sizes between the itofp and fptoi with trunc or sext/zext, depend on D129756. Fixes https://github.com/llvm/llvm-project/issues/55505. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129958	2022-08-09 23:59:42 +08:00
Jake Egan	6da3f90195	[AIX][tests] XFAIL atan.ll test on AIX XFAIL this newly added test for now to get the AIX bot back to green.	2022-08-09 09:58:08 -04:00
Nikita Popov	4ac00789e1	[RelLookupTableConverter] Bail on invalid pointer size (x32) The RelLookupTableConverter pass currently only supports 64-bit pointers. This is currently enforced using an isArch64Bit() check on the target triple. However, we consider x32 to be a 64-bit target, even though the pointers are 32-bit. (And independently of that specific example, there may be address spaces with different pointer sizes.) As such, add an additional guard for the size of the pointers that are actually part of the lookup table. Differential Revision: https://reviews.llvm.org/D131399	2022-08-09 09:36:39 +02:00
Ruobing Han	f756f06cc4	[SimpleLoopUnswitch] Skip non-trivial unswitching of cold loops With profile data, non-trivial LoopUnswitch will only apply on non-cold loops, as unswitching cold loops may not gain much benefit but significantly increase the code size. Reviewed By: aeubanks, asbirlea Differential Revision: https://reviews.llvm.org/D129599	2022-08-08 18:12:04 +00:00
Vang Thao	257251247a	[SROA] Try harder to find a vector promotion viable type when rewriting We are seeing significant performance loss when an alloca fails to get promoted to register. I have observed that this is due to the common type found when attempting to rewrite partition users being unviable for promotion. While if we would have continue looking for a type, we would have found a subtype in the original allocated type that would have enabled promotion. Thus first check if the initial common type found is promotion viable and if not then continue looking instead of stopping with the initial common type found. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128073	2022-08-08 11:04:01 -07:00
Sanjay Patel	59f3b3d796	[EarlyCSE][ConstantFolding] move test files to dir of pass in RUN line; NFC	2022-08-08 10:08:55 -04:00
Mohammed Nurul Hoque	b53d44fe47	[EarlyCSE][ConstantFolding] add tests for atan/atan2; NFC Baseline coverage for D127964.	2022-08-08 09:24:58 -04:00
Denis Antrushin	36cc533471	[EarlyCSE][OpaquePointers]Replace assert with return for mask type check. When EarlyCSE tries to common vector masked loads/stores, it first checks that they have same base operand and then assumes that this is enough for mask types to be equal. This is true for typed pointers but false for opaque ones - two loads of different vector sizes from same base pointer '%b' are the same, `ptr %b`. (For typed pointers, `%b` was cast to vector pointer type so bases were different). Change assert to return from lambda `isSubmask` so this transformation properly works with opaque pointers. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D131251	2022-08-08 16:14:42 +03:00
Sanjay Patel	74b5e797d5	[InstSimplify] fold scalable vectors with over-shift splat constant to poison Fixes #56968	2022-08-07 16:26:05 -04:00
Sanjay Patel	34a785fee7	[InstSimplify] add test for over-shift of scalable vector; NFC	2022-08-07 16:26:05 -04:00
Sanjay Patel	8148c28fad	[ConstFolding] fix overzealous assert when converting FP half Fixes #56981	2022-08-07 13:34:51 -04:00
Sanjay Patel	2843a1d87d	[InstCombine] add tests for bitwise logic; NFC	2022-08-07 13:34:51 -04:00
Ruobing Han	b5244fb71c	[test][SimpleLoopUnswitch] Precommit test for D129599	2022-08-05 20:17:26 +00:00
Sanjay Patel	019d76196f	[InstSimplify] use isImpliedCondition() instead of semi-duplicated code We get a couple of improvements from recognizing swapped operand patterns that were not handled by the replicated code. This should also enable simplifying larger patterns as seen in issue #56653 and issue #56654, but that requires enhancements to isImpliedCondition() itself.	2022-08-05 10:59:09 -04:00
David Green	b2de84633a	[ConstProp] Don't fallthorugh for poison constants on vctp and active_lane_mask. Given a poison constant as input, the dyn_cast to a ConstantInt would fail so we would fall through to the generic code that attempts to fold each element of the input vectors. The inputs to these intrinsics are not vectors though, leading to a compile time crash. Instead bail out properly for poison values by returning nullptr. This doesn't try to define what poison means for these intrinsics. Fixes #56945	2022-08-05 11:19:36 +01:00
Chuanqi Xu	230d6f93aa	[Coroutines] Remove lifetime intrinsics for spliied allocas in coroutine frames Closing https://github.com/llvm/llvm-project/issues/56919 It is meaningless to preserve the lifetime markers for the spilled allocas in the coroutine frames and it would block some optimizations too.	2022-08-05 14:50:43 +08:00
Sanjay Patel	9a5b34be15	[InstSimplify] add tests for or-of-icmps; NFC	2022-08-04 14:54:52 -04:00
Ellis Hoag	12e78ff881	[InstrProf] Add the skipprofile attribute As discussed in [0], this diff adds the `skipprofile` attribute to prevent the function from being profiled while allowing profiled functions to be inlined into it. The `noprofile` attribute remains unchanged. The `noprofile` attribute is used for functions where it is dangerous to add instrumentation to while the `skipprofile` attribute is used to reduce code size or performance overhead. [0] https://discourse.llvm.org/t/why-does-the-noprofile-attribute-restrict-inlining/64108 Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D130807	2022-08-04 08:45:27 -07:00
jacquesguan	45bae1be90	[RISCV][test] Add inloop reduction vectorize test. NFC	2022-08-04 15:06:44 +08:00
Congzhe Cao	8dc4b2edfa	[LoopInterchange][PR56275] Fix legality with negative dependence vectors This is the 2nd patch of the two-patch series (D130188, D130189) that fix PR56275 (https://github.com/llvm/llvm-project/issues/56275) which is a missed opportunity for loop interchange. As follow-up on the dependence analysis (DA) patch D130188, this patch normalizes DA results in loop interchange, such that negative dependence vectors queried by loop interchange are reversed to be non-negative. Now all tests in PR56275 can get interchanged. Those tests are added in lit test as `pr56275.ll`. Reviewed By: kawashima-fj, bmahjour, Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D130189	2022-08-03 19:59:01 -04:00
Chris Bieneman	383e754072	NFC. Require DirectX backend for these tests Should have added this when I added the test directory. This just requires the DirectX target for running these tests.	2022-08-03 15:55:03 -05:00
Max Kazantsev	34ae308c73	[SCEV] Use context to strengthen flags of BinOps Sometimes SCEV cannot infer nuw/nsw from something as simple as ``` len in [0, MAX_INT] ... iv = phi(0, iv.next) guard(iv <s len) guard(iv <u len) iv.next = iv + 1 ``` just because flag strenthening only relies on definition and does not use local facts. This patch adds support for the simplest case: inference of flags of `add(x, constant)` if we can contextually prove that `x <= max_int - constant`. In case if it has negative CT impact, we can add an option to switch it off. I woudln't expect that though. Differential Revision: https://reviews.llvm.org/D129643 Reviewed By: apilipenko	2022-08-03 14:08:57 +07:00
Chris Bieneman	ee4d815008	[DX] Remove IntrNoMem from create handle intrinsic The create handle intrinsic calls can't be removed, so it was incorrect to mark them as IntrNoMem.	2022-08-02 16:57:22 -05:00
Philip Reames	0b47615fcf	[LV] Recognize store of invariant value to invariant address as uniform This extends the handling of uniform memory operations to handle the case where a store is storing a loop invariant value. Unlike the general case of a store to an invariant address where we must use the last active lane, in this case we can use any lane since all lanes must produce the same result. For context, the basic structure of the existing code and how the change fits in: * First, we select a widening strategy. (The result is irrelevant for this patch.) * Then we determine if a computation is uniform within all lanes of VF. (Note this is the uniform-per-part definition, not LAI's uniform across all unrolled iterations definition.) * If it is, we overrule the widening strategy, and unconditionally scalarize. * VPReplicationRecipe - which is what actually does the scalarization - knows how to handle unform-per-part values including for scalable vectors. However, we do need to know that the expression is safe to execute without predication - e.g. the uniform mem op was unconditional in the original loop. (This part was split off and already landed.) An obvious question is why not simply implement the generic case? The answer is that I'm going to, but doing so without a canonicalization towards uniform causes regressions due to bad interaction with scalarization/uniformity of values feeding the uniform mem-op. This patch is needed to avoid those regressions. Differential Revision: https://reviews.llvm.org/D130364	2022-08-02 08:09:49 -07:00
David Sherwood	4ef9cb6c17	[AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses If we have interleave groups in the loop we want to vectorise then we should fall back on normal vectorisation with a scalar epilogue. In such cases when tail-folding is enabled we'll almost certainly go on to create vplans with very high costs for all vector VFs and fall back on VF=1 anyway. This is likely to be worse than if we'd just used an unpredicated vector loop in the first place. Once the vectoriser has proper support for analysing all the costs for each combination of VF and vectorisation style, then we should be able to remove this. Added an extra test here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D128342	2022-08-02 09:52:33 +01:00
Martin Sebor	bcef4d238d	[InstCombine] Correct strtol folding with nonnull endptr Reflect in the pointer's offset the length of the leading part of the consumed string preceding the first converted digit. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D130912	2022-08-01 16:47:05 -06:00
Vasileios Porpodas	f669030373	[TTI][AArch64][SLP] Sets the cost of an ADD reduction 2xi64 to 2. 2xi64 is the legalized type for wide reductions (like 16xi64) and setting the cost to 2 makes `load-reduce` and `load-zext-reduce` patterns profitable. The few performance measurments that I did on an aarch64 machine confirm that these patterns are actually faster when vectorized. Differential Revision: https://reviews.llvm.org/D130740	2022-08-01 13:03:14 -07:00
Vasileios Porpodas	2d9eae4152	[NFC][TTI][AArch64][SLP] Precommit test for a TTI cost fix of i64 add reductions.	2022-08-01 11:20:12 -07:00
Florian Hahn	ff5ae948a7	[LV] Add variation of test cases with order of phis flipped. Additional tests with integer and pointer inductions for D119661.	2022-08-01 11:38:16 +01:00
Florian Hahn	6e1ba62d0d	[LV] Add additional tests with multiple chained recurrences. Adds more extra tests for D119661. Also update the test to use opaque pointers.	2022-08-01 10:01:19 +01:00
Nikita Popov	7314ad7a06	Revert "[SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions" This reverts commit `7b0f6378e2`. As commented on the review, this patch has a correctness issue regarding the modelling of memory effects.	2022-08-01 09:20:56 +02:00
Momchil Velikov	7b0f6378e2	[SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions SimplifyCFG does some common code hoisting, which is limited to hoisting a sequence of identical instruction in identical order and stops at the first non-identical instruction. This patch allows hoisting instruction pairs over same-length sequences of non-matching instructions. The linear asymptotic complexity of the algorithm stays the same, there's an extra parameter `simplifycfg-hoist-common-skip-limit` serving to limit compilation time and/or the size of the hoisted live ranges. The patch improves SPECv6/525.x264_r by about 10%. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D129370	2022-08-01 07:55:14 +01:00
Sanjay Patel	02b3a35892	[InstSimplify] fold FP rounding intrinsic with rounded operand issue #56775 I rearranged the Thumb2 codegen test to avoid simplifying the chain of rounding instructions. I'm assuming the intent of the test is to verify lowering of each of those intrinsics.	2022-07-31 10:00:27 -04:00

... 4 5 6 7 8 ...

23151 Commits