llvm-project

Commit Graph

Author	SHA1	Message	Date
Vitaly Buka	4c18670776	[NFC][sancov] Rename ModuleSanitizerCoveragePass	2022-09-06 20:55:39 -07:00
Vitaly Buka	5e38b2a456	[NFC][msan] Rename ModuleMemorySanitizerPass	2022-09-06 20:30:35 -07:00
Ruobing Han	fb45f3c948	[SimpleLoopUnswitch] Skip non-trivial unswitching of cold functions In the current main branch, all cold loops will not be applied non-trivial unswitch. As reported in D129599, skipping these cold loops will incur regression in SPEC benchmark. Thus, instead of skipping cold loops, now only skipping loops in cold functions. Reviewed By: alexgatea, aeubanks Differential Revision: https://reviews.llvm.org/D133275	2022-09-06 19:13:31 -04:00
Vitaly Buka	93600eb50c	[NFC][asan] Rename ModuleAddressSanitizerPass	2022-09-06 15:02:11 -07:00
Vitaly Buka	e7bac3b9fa	[msan] Convert Msan to ModulePass MemorySanitizerPass function pass violatied requirement 4 of function pass to do not insert globals. Msan nees to insert globals for origin tracking, and paramereters tracking. https://llvm.org/docs/WritingAnLLVMPass.html#the-functionpass-class Reviewed By: kstoimenov, fmayer Differential Revision: https://reviews.llvm.org/D133336	2022-09-06 15:01:04 -07:00
Vitaly Buka	b4257d3bf5	[tsan] Replace mem intrinsics with calls to interceptors After https://reviews.llvm.org/rG463aa814182a23 tsan replaces llvm intrinsics with calls to glibc functions. However this approach is fragile, as slight changes in pipeline can return llvm intrinsics back. In particular InstCombine can do that. Msan/Asan already declare own version of these memory functions for the similar purpose. KCSAN, or anything that uses something else than compiler-rt, needs to implement this callbacks. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D133268	2022-09-06 13:09:31 -07:00
Florian Hahn	27e7db54eb	Revert "[SCCP] convert signed div/rem to unsigned for non-negative operands" This reverts commit `fe1f3cfc26`. It looks like this commit breaks building llvm-test-suite. To reproduce, run `opt -passes=ipsccp` on the IR below. @g = internal global i32 256, align 4 define void @test() { entry: %0 = load i32, ptr @g, align 4 %div = sdiv i32 %0, undef ret void }	2022-09-06 18:21:51 +01:00
Florian Hahn	2fb68c0628	[ConstraintElimination] Replace pair with named struct (NFC). This slightly improves the readability and allows further extensions in follow-ups.	2022-09-06 18:04:04 +01:00
Vitaly Buka	c51a12d598	Revert "[tsan] Replace mem intrinsics with calls to interceptors" Breaks http://45.33.8.238/macm1/43944/step_4.txt https://lab.llvm.org/buildbot/#/builders/70/builds/26926 This reverts commit `77654a65a3`.	2022-09-06 09:47:33 -07:00
Sanjay Patel	ae117e1c1b	[InstCombine] remove dead code for add (select cond, (sub), 0); NFC This pattern is handled more generally in SimplifySelectsFeedingBinaryOp(). Tests to confirm that added to the add.ll test file in the previous commit.	2022-09-06 12:19:50 -04:00
Doru Bercea	0b1160fdeb	Fix OpenMP Opt for target without a parallel region. Remove ctx redeclaration. Format code. Remove parallel check. Modify tests. Clean-up code. Fix another test. Move code to helper functions. Format file. Minor fixes.	2022-09-06 16:04:53 +00:00
Vitaly Buka	77654a65a3	[tsan] Replace mem intrinsics with calls to interceptors After https://reviews.llvm.org/rG463aa814182a23 tsan replaces llvm intrinsics with calls to glibc functions. However this approach is fragile, as slight changes in pipeline can return llvm intrinsics back. In particular InstCombine can do that. Msan/Asan already declare own version of these memory functions for the similar purpose. KCSAN, or anything that uses something else than compiler-rt, needs to implement this callbacks. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D133268	2022-09-06 08:25:32 -07:00
Sanjay Patel	fe1f3cfc26	[SCCP] convert signed div/rem to unsigned for non-negative operands This extends the transform added with D81756 to handle div/rem opcodes. For example: https://alive2.llvm.org/ce/z/cX6za6 This replicates part of what CVP already does, but the motivating example from issue #57472 demonstrates a phase ordering problem - we convert branches to select before CVP runs and miss the transform. Differential Revision: https://reviews.llvm.org/D133198	2022-09-06 08:58:15 -04:00
Sanjay Patel	dd6eb4d67f	[InstCombine] reduce code duplication; NFC	2022-09-06 08:19:30 -04:00
Arthur Eubanks	7e3aa8f01a	Revert "[LoopPassManager] Implement and use LoopNestAnalysis::run() instead of manually creating LoopNests" This reverts commit `57fd866551`. Causes crashes, see comments in D132581.	2022-09-05 15:42:48 -07:00
Momchil Velikov	078899cd64	[SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions SimplifyCFG does some common code hoisting, which is limited to hoisting a sequence of identical instruction in identical order and stops at the first non-identical instruction. This patch allows hoisting instruction pairs over same-length sequences of non-matching instructions. The linear asymptotic complexity of the algorithm stays the same, there's an extra parameter `simplifycfg-hoist-common-skip-limit` serving to limit compilation time and/or the size of the hoisted live ranges. The patch improves SPECv6/525.x264_r by about 10%. Reviewed By: nikic, dmgreen Differential Revision: https://reviews.llvm.org/D129370	2022-09-05 15:13:46 +01:00
Tian Zhou	8fa432be4f	[InstCombine] reduce test-for-overflow of shifted value Fixes #57338. The added code makes the following transformations: For unsigned predicates / eq / ne: icmp pred (x << 1), x --> icmp getSignedPredicate(pred) x, 0 icmp pred x, (x << 1) --> icmp getSignedPredicate(pred) 0, x Some examples: https://alive2.llvm.org/ce/z/ckn4cj https://alive2.llvm.org/ce/z/h-4bAQ Differential Revision: https://reviews.llvm.org/D132888	2022-09-05 09:51:51 -04:00
Florian Hahn	408ebe5e3a	[VPlan] Move VPWidenCallRecipe to VPlanRecipes.cpp (NFC). Depends on D132585. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132586	2022-09-05 10:48:29 +01:00
Nikita Popov	388b684354	[LICM] Separate check for writability and thread-safety (NFCI) This used a single check to make sure that the object is both writable and thread-local. Separate them out to make the deficiencies in the current code more obvious.	2022-09-05 09:43:17 +02:00
Florian Hahn	ba3d29f871	[LCSSA] Update unreachable uses with poison. Users of LCSSA may not expect non-phi uses when checking the uses outside a loop, which may cause crashes. This is due to the fact that we do not update uses in unreachable blocks. To ensure all reachable uses outside the loop are phis, update uses in unreachable blocks to use poison in dead code. Fixes #57508.	2022-09-04 22:26:18 +01:00
Kazu Hirata	7d8c2d17eb	[llvm] Use range-based for loops (NFC) Identified with modernize-loop-convert.	2022-09-03 23:27:25 -07:00
Fangrui Song	9fc679b87c	[SanitizerCoverage] Simplify pc-table and improve test. NFC	2022-09-03 14:29:21 -07:00
Kazu Hirata	9eca5ed790	[llvm] Use std::enable_if_t (NFC)	2022-09-03 11:17:44 -07:00
Kazu Hirata	fedc59734a	[llvm] Use range-based for loops (NFC)	2022-09-03 11:17:40 -07:00
Sanjay Patel	22e1f66f26	[SCCP] add helper function for replacing signed operations; NFC Preliminary refactoring for planned enhancement in D133198.	2022-09-03 10:30:10 -04:00
Sanjay Patel	5c759edc57	[InstCombine] reduce another or-xor bitwise logic pattern ~(A & ?) \| (A ^ B) --> ~((A & ?) & B) https://alive2.llvm.org/ce/z/mxex6V This is similar to `9d218b61cc` where we peeked through another logic op to find a common operand.	2022-09-03 09:32:08 -04:00
Richard Smith	053841c562	Revert "[AggressiveInstCombine] Lower Table Based CTTZ" This reverts commit `fec01ee3f5`. According to asan, this patch introduces a heap use after free.	2022-09-02 16:19:09 -07:00
Francis Visoiu Mistrih	c5b10f348e	[Matrix] Use print instead of dump for matrix-print-after-transpose-opt We should be able to use this option even if LLVM_ENABLE_DUMP is not on. (should fix the bots too)	2022-09-02 16:12:21 -07:00
Francis Visoiu Mistrih	81bdb4068d	[Matrix] Simplify matmuls with scalars If one of the operands is a transposed splat, the transpose can be removed. This is useful to simplify when transposes are distributed to operands of a matmul: * k^T -> k * (A * k)^t -> A^t * k Differential Revision: https://reviews.llvm.org/D130177	2022-09-02 15:50:25 -07:00
Sameer Sahasrabuddhe	46b293cb3f	[Attributor] Simplify offset calculation for a constant GEP Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D132931	2022-09-02 23:53:51 +05:30
Arthur Eubanks	57fd866551	[LoopPassManager] Implement and use LoopNestAnalysis::run() instead of manually creating LoopNests The current code is basically just emulating what the analysis manager does. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D132581	2022-09-02 10:55:53 -07:00
Djordje Todorovic	fec01ee3f5	[AggressiveInstCombine] Lower Table Based CTTZ This patch introduces recognition of table-based ctz implementation during the AggressiveInstCombine. This fixes the [0]. [0] https://bugs.llvm.org/show_bug.cgi?id=46434 Differential Revision: https://reviews.llvm.org/D113291	2022-09-02 17:26:55 +02:00
Jolanta Jensen	958abe864a	[LoopLoadElim] Add stores with matching sizes as load-store candidates We are not building up a proper list of load-store candidates because we are throwing away stores where the type don't match the load. This patch adds stores with matching store sizes as candidates. Author of the original patch: David Sherwood. Differential Revision: https://reviews.llvm.org/D130233	2022-09-02 13:11:25 +01:00
Muhammad Omair Javaid	18de7c6a3b	Revert "[InstCombine] Treat passing undef to noundef params as UB" This reverts commit `c911befaec`. It has broken LLDB Arm/AArch64 Linux buildbots. I dont really understand the underlying reason. Reverting for now make buildbot green. https://reviews.llvm.org/D133036	2022-09-02 16:09:50 +05:00
Mikael Holmen	51d4c7ceea	[GlobalOpt] Fix debug variance problem in hasOnlyColdCalls hasOnlyColdCalls skipped over calls to intrinsics, but it did so after checking the linkage of the called function. This meant that the presence of a call to a debug intrinsic could affect the outcome of the optimization. In my original reproducer (for an out of tree target) it was particularly interesting, because the actual IR after GlobalOpt was not different with debug instrinsics present, so -print-after-all printouts didn't show anything there. However, without debuginfo, GlobalOpt went further and ran BlockFrequencyAnalysis and (more importanly) LoopAnalysis, and later on in the pipeline, instcombine behaved in different ways when LoopInfo was present. So a call to a dbg.declare prevented running LoopAnalysis in GlobalOpt, which later prevented InstCombine from doing an optimization. The dbg-intrinsic-loopanalysis.ll testcase tries to expose this. Then I also noted that adding a dbg.declare actually made the existing testcase colccc_coldsites.ll generate different code, so I modified that to now test it behaves the same way with and without the dbg.declare. Reviewed By: nikic, fhahn Differential Revision: https://reviews.llvm.org/D133193	2022-09-02 12:29:44 +02:00
Sergey Kachkov	be37caca00	[JumpThreading] Process range comparisions with non-local cmp instructions Use getPredicateOnEdge method if value is a non-local compare-with-a-constant instruction, that can give more precise results than getConstantOnEdge. Differential Revision: https://reviews.llvm.org/D131956	2022-09-02 12:22:45 +02:00
Nikita Popov	c453e5b901	Revert "[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI" This reverts commit `cd8f3e7581`. As pointed out by Eli on the review, this is missing an alignment check. The value might be written at an offset.	2022-09-02 09:28:48 +02:00
Nikita Popov	639d912282	[LICM] Allow load-only scalar promotion in the presence of unwinding Currently, we bail out of scalar promotion if the loop may unwind and the memory may be visible on unwind. This is because we can't insert stores of the promoted value on unwind edges. However, nowadays scalar promotion also has support for only promoting loads, while leaving stores in place. This kind of promotion is safe even in the presence of unwinding. Differential Revision: https://reviews.llvm.org/D133111	2022-09-02 09:27:13 +02:00
luxufan	cd8f3e7581	[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI For noop store of the form of LoadI and StoreI, An invariant should be kept is that the memory state of the related MemoryLoc before LoadI is the same as before StoreI. For this example: ``` define void @pr49927(i32* %q, i32* %p) { %v = load i32, i32* %p, align 4 store i32 %v, i32* %q, align 4 store i32 %v, i32* %p, align 4 ret void } ``` Here the definition of the store's destination is different with the definition of the load's destination, which it seems that the invariant mentioned above is broken. But the definition of the store's destination would write a value that is LoadI, actually, the invariant is still kept. So we can safely ignore it. Differential Revision: https://reviews.llvm.org/D132657	2022-09-02 06:37:41 +00:00
Vitaly Buka	ad3a77df2d	[msan] Fix debug info with getNextNode When we want to add instrumentation after an instruction, instrumentation still should keep debug info of the instruction. Reviewed By: kda, kstoimenov Differential Revision: https://reviews.llvm.org/D133091	2022-09-01 20:13:56 -07:00
Chenbing Zheng	d30cf77cb1	[InstCombine] complete fold extractvalue (any_mul_with_overflow X, -1) When we do extractvalue (any_mul_with_overflow X, -1) --> (-X and icmp), which left partly failed to match vector constant with poison element. This patch try to fix it. Alive2: https://alive2.llvm.org/ce/z/2rGp_3 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D132996	2022-09-02 10:58:42 +08:00
Vitaly Buka	ad2b356f85	[msan] Use no-origin functions when possible Saves 1.8% of .text size on CTMark Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133077	2022-09-01 19:18:38 -07:00
Arthur Eubanks	c911befaec	[InstCombine] Treat passing undef to noundef params as UB Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D133036	2022-09-01 15:16:45 -07:00
Rong Xu	0caa4a9559	[PGO] Support PGO annotation of CallBrInst We currently instrument CallBrInst but do not annotate it with the branch weight. This patch enables PGO annotation of CallBrInst. Differential Revision: https://reviews.llvm.org/D133040	2022-09-01 14:13:50 -07:00
Vitaly Buka	ef0f866718	[msan] Combine shadow check of the same instruction Reduces .text size by 1% on our large binary. On CTMark (-O2 -fsanitize=memory -fsanitize-memory-use-after-dtor -fsanitize-memory-param-retval) Size -0.4% Time -0.8% Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133071	2022-09-01 13:55:59 -07:00
Vitaly Buka	9110673062	[nfc][msan] Group checks per instruction It's a preparation of to combine shadow checks of the same instruction Reviewed By: kda, kstoimenov Differential Revision: https://reviews.llvm.org/D133065	2022-09-01 13:10:16 -07:00
Jordan Rupprecht	3031a250de	[MSan] Fix determinism issue when using msan-track-origins. When instrumenting `alloca`s, we use a `SmallSet` (i.e. `SmallPtrSet`). When there are fewer elements than the `SmallSet` size, it behaves like a vector, offering stable iteration order. Once we have too many `alloca`s to instrument, the iteration order becomes unstable. This manifests as non-deterministic builds because of the global constant we create while instrumenting the alloca. The test added is a simple IR file, but was discovered while building `libcxx/src/filesystem/operations.cpp` from libc++. A reduced C++ example from that: ``` // clang++ -fsanitize=memory -fsanitize-memory-track-origins \ // -fno-discard-value-names -S -emit-llvm \ // -c op.cpp -o op.ll struct Foo { ~Foo(); }; bool func1(Foo); void func2(Foo); void func3(int) { int f_st, t_st; Foo f, t; func1(f) \|\| func1(f) \|\| func1(t) \|\| func1(f) && func1(t); func2(f); } ``` Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133034	2022-09-01 09:15:57 -07:00
Nuno Lopes	858fe8664e	Expand Div/Rem: consider the case where the dividend is zero So we can't use ctlz in poison-producing mode	2022-09-01 17:04:26 +01:00
Nikita Popov	f5c178b6a4	[LICM] Remove unnecessary condition (NFC)	2022-09-01 15:42:35 +02:00
Nikita Popov	315aef667e	[LICM] Fix thread safety checks for promotion of byval args This code was relying on a very subtle contract: The expectation was that for non-allocas, the unwind safety check would already perform a capture check, so we don't need to perform it later. This held true when this unwind safety was only handled for allocas and noalias calls, but became incorrect when byval support was added. To avoid this kind of issue, just remove the dependency between the unwind and thread-safety checks entirely. At worst, this means we perform a redundant capture check. If this should turn out to be problematic for compile-time, we can cache that query in a more explicit way.	2022-09-01 15:33:46 +02:00
Sanjay Patel	c3d1504d63	[InstCombine] fix crash on type mismatch with fcmp fold The existing predicate doesn't work for a single-element vector, so make sure we are not crossing scalar/vector types. Test (was crashing) based on the post-commit example for: `4827771234`	2022-09-01 08:57:55 -04:00
Sanjay Patel	addbdac5d5	[InstCombine] fold power-of-2 ctlz/cttz with inverted result When X is a power-of-two or zero and zero input is poison: ctlz(i32 X) ^ 31 --> cttz(X) cttz(i32 X) ^ 31 --> ctlz(X) https://alive2.llvm.org/ce/z/Cs7sFE	2022-09-01 08:57:55 -04:00
Nikita Popov	3f8b1d0f15	[LICM] Add some debug output to scalar promotion (NFC)	2022-09-01 14:46:30 +02:00
Alexey Bataev	982d9ef1c1	[SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison. Need either follow the original order of the operands for bool logical ops, or emit freeze instruction to avoid poison propagation. Differential Revision: https://reviews.llvm.org/D126877	2022-09-01 05:34:45 -07:00
Yuanbo Li	ebd0249fcf	[DebugInfo] Missing debug location after replacement in processSRem function This patch fixes an issue in which CorrelatedValuePropagation::processSRem would create new instructions to represent the SRem instruction, but would not correctly copy any existing debug location metadata to the new instruction. Differential Revision: https://reviews.llvm.org/D132218	2022-09-01 13:18:17 +01:00
Florian Hahn	fc444ddc77	[VPlan] Add field to track if intrinsic should be used for call. (NFC) This patch moves the cost-based decision whether to use an intrinsic or library call to the point where the recipe is created. This untangles code-gen from the cost model and also avoids doing some extra work as the information is already computed at construction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132585	2022-09-01 13:14:40 +01:00
Nuno Lopes	fa154a9170	Revert "Expand Div/Rem: consider the case where the dividend is zero" This reverts commit `4aed09868b`.	2022-09-01 12:11:22 +01:00
Nuno Lopes	4aed09868b	Expand Div/Rem: consider the case where the dividend is zero So we can't use ctlz in poison-producing mode	2022-09-01 12:00:03 +01:00
Pavel Samolysov	527b9a9d90	[DeadArgElim] Use structure bindings in foreach loops. NFC Differential Revision: https://reviews.llvm.org/D133026	2022-09-01 13:48:46 +03:00
Nikita Popov	43e7d9af1d	[InstCombine] Fold extractvalue of phi Just as we do for most other operations, we should push extractvalue instructions through phis, if this does not increase unfolded instruction count.	2022-09-01 10:51:54 +02:00
Arthur Eubanks	04f3c20989	[NFC][LICM] Stop passing around unused BFI Uses of this were removed in `1a25d0bfbb`.	2022-08-31 19:15:34 -07:00
Vitaly Buka	53d1ae88f8	[nfc][msan] Prepare the code for check sorting	2022-08-31 15:36:49 -07:00
Nikita Popov	ab6876a40d	reland: [Local] Allow creating callbr with duplicate successors Since D129288, callbr is allowed to have duplicate successors. This patch removes a limitation which prevents optimizations from actually producing such callbrs. This is probably the riskiest of all the recent callbr changes, because code with incorrect assumptions might be lurking somewhere. I fixed the one case I encountered ahead of time in `8201e3ef5c`. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D129997 Originally landed as commit `08860f525a` ("[Local] Allow creating callbr with duplicate successors") Reverted in commit `1cf6b93df1` ("Revert "[Local] Allow creating callbr with duplicate successors"")	2022-08-31 13:23:00 -07:00
Alexey Bataev	588115c117	[SLP][NFC]Add a check for SelectInst to match description, NFC.	2022-08-31 13:04:21 -07:00
Alexey Bataev	d8d9ee10bb	[SLP][NFC]Fix comment and make function following naming standard, NFC.	2022-08-31 12:37:55 -07:00
Philip Reames	8524622bdc	[SLP] Simplify getOperandInfo implementation and be consistent This is NOT nfc. Specifically, the following behavior changes: * Pointers are now allowed. Both uniform, and constants. * FP uniform non-constants can now be recognized. * FP undefs are no longer considered constant. This matches int behavior which we had tests for. FP behavior was untested. Its not clear to me int behavior is reasonable, but it's what tests seem to expect, so go with minimum impact for now.	2022-08-31 12:24:05 -07:00
Nikita Popov	ad66bc42b0	[InstCombine] Use getInsertionPointAfterDef() in freeze fold This simplifies the code and fixes handling of catchswitch, in which case we have no insertion point for the freeze. Originally part of D129660.	2022-08-31 11:32:57 +02:00
Nikita Popov	8f3fd26b74	[Reassociate] Use getInsertionPointerAfterDef() This simplifies the code and fixes handling for the callbr case, where the instruction needs to be inserted in the normal destination, rather than after the terminator. Originally part of D129660.	2022-08-31 11:10:24 +02:00
Nikita Popov	972840aa3b	[IR] Add Instruction::getInsertionPointAfterDef() Transforms occasionally want to insert an instruction directly after the definition point of a value. This involves quite a few different edge cases, e.g. for phi nodes the next insertion point is not the next instruction, and for invokes and callbrs its not even in the same block. Additionally, the insertion point may not exist at all if catchswitch is involved. This adds a general Instruction::getInsertionPointAfterDef() API to implement the necessary logic. For now it is used in two places where this should be mostly NFC. I will follow up with additional uses where this fixes specific bugs in the existing implementations. Differential Revision: https://reviews.llvm.org/D129660	2022-08-31 10:50:10 +02:00
Fangrui Song	13f0795425	[SLPVectorizer] Fix -Wunused-lambda-capture in -DLLVM_ENABLE_ASSERTIONS=off build	2022-08-30 23:01:22 -07:00
Chenbing Zheng	35a3048c25	[InstCombine] add support for multi-use Y of (X op Y) op Z --> (Y op Z) op X For (X op Y) op Z --> (Y op Z) op X we can still do transform when Y is multi-use. In D131356 limit it to one-use, this patch remove this limit. This is still not a complete solution, I add a todo test to show it. In this case, X and Y are both multi use, we can't differentiate how to convert based on this. But at least we don't make the code worse，and it can solve half the scenarios.	2022-08-31 10:55:05 +08:00
Alexey Bataev	ec06df9459	[SLP]Fix PR57447: Assertion `!getTreeEntry(V) && "Scalar already in tree!"' failed. The pointer operands for the ScatterVectorize node may contain non-instruction values and they are not checked for "already being vectorized". Need to check that such pointers are already vectorized and gather them instead of trying to build vectorize node to avoid compiler crash. Differential Revision: https://reviews.llvm.org/D132949	2022-08-30 12:30:14 -07:00
Sanjay Patel	8a19842c0e	[InstCombine] delete redundant folds; NFC InstSimplify does this via isKnownNonEqual(), so it's already using knownbits on these patterns and trying other folds.	2022-08-30 14:21:29 -04:00
Alexey Bataev	afbf5466ba	[SLP]Improve operands kind analaysis for constants. Removed EnableFP parameter in getOperandInfo function since it is not needed, the operands kinds also controlled by the operation code, which allows to remove extra check for the type of the operands. Also, added analysis for uniform constant float values. This change currently does not trigger any changes in the code since TTI does not do analysis for constant floats, so it can be considered NFC. Tested with llvm-test-suite + SPEC2017, no changes. Differential Revision: https://reviews.llvm.org/D132886	2022-08-30 06:35:39 -07:00
zhongyunde	23a5de4294	[InstCombine] Distributive or+mul with const operand We aleady support the transform: `(X+C1)CI -> XCI+C1CI` Here the case is a little special as the form of `(X+C1)CI` is transformed into `(X\|C1)CI`, so we should also support the transform: `(X\|C1)CI -> XCI+C1CI` Fixes https://github.com/llvm/llvm-project/issues/57278 Reviewed By: bcl5980, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D132658	2022-08-30 20:36:52 +08:00
Florian Hahn	b5e208fcba	[DSE] Support looking through memory phis at end of function. Update isWriteAtEndOfFunction to look through MemoryPhis. The reason MemoryPhis were skipped so far was the known AliasAnalysis issue with it missing loop-carried dependences. This problem is already addressed in other parts of the code by skipping MemoryDefs that may be in difference loops. I think the same logic can be applied here. This can have a substantial impact on the number of stores removed in some cases. For MultiSource/SPEC2006/SPEC2017 with -O3: ``` Metric: dse.NumFastStores Program dse.NumFastStores base patch diff External/S...CINT2017rate/557.xz_r/557.xz_r 14.00 45.00 221.4% External/S...te/538.imagick_r/538.imagick_r 439.00 1267.00 188.6% MultiSourc...e/Applications/SIBsim4/SIBsim4 6.00 15.00 150.0% MultiSourc...Prolangs-C/simulator/simulator 3.00 7.00 133.3% MultiSource/Applications/siod/siod 3.00 7.00 133.3% MultiSourc...arks/FreeBench/distray/distray 6.00 9.00 50.0% MultiSourc...e/Applications/obsequi/Obsequi 22.00 30.00 36.4% MultiSource/Benchmarks/Ptrdist/bc/bc 23.00 28.00 21.7% External/S...NT2017rate/502.gcc_r/502.gcc_r 1258.00 1512.00 20.2% External/S...te/520.omnetpp_r/520.omnetpp_r 954.00 1143.00 19.8% External/S...rate/510.parest_r/510.parest_r 5961.00 7122.00 19.5% External/S...C/CINT2006/445.gobmk/445.gobmk 47.00 56.00 19.1% External/S...00.perlbench_r/500.perlbench_r 241.00 286.00 18.7% External/S...NT2006/471.omnetpp/471.omnetpp 36.00 42.00 16.7% External/S...06/400.perlbench/400.perlbench 183.00 210.00 14.8% MultiSource/Applications/SPASS/SPASS 72.00 81.00 12.5% External/S...17rate/541.leela_r/541.leela_r 72.00 80.00 11.1% External/SPEC/CINT2006/403.gcc/403.gcc 585.00 642.00 9.7% MultiSourc...e/Applications/sqlite3/sqlite3 120.00 131.00 9.2% MultiSourc...Applications/hexxagon/hexxagon 11.00 12.00 9.1% External/S.../CFP2006/453.povray/453.povray 566.00 615.00 8.7% External/S...rate/511.povray_r/511.povray_r 578.00 627.00 8.5% External/S...FP2006/482.sphinx3/482.sphinx3 12.00 13.00 8.3% MultiSource/Applications/oggenc/oggenc 130.00 140.00 7.7% MultiSourc...e/Applications/ClamAV/clamscan 250.00 268.00 7.2% MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg 19.00 20.00 5.3% MultiSourc...ch/consumer-jpeg/consumer-jpeg 19.00 20.00 5.3% External/S...te/526.blender_r/526.blender_r 3747.00 3928.00 4.8% MultiSourc...OE-ProxyApps-C++/miniFE/miniFE 104.00 108.00 3.8% MultiSourc...ch/consumer-lame/consumer-lame 54.00 56.00 3.7% MultiSource/Benchmarks/Bullet/bullet 1222.00 1264.00 3.4% MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 973.00 1005.00 3.3% External/S.../CFP2006/447.dealII/447.dealII 2699.00 2780.00 3.0% External/S...06/483.xalancbmk/483.xalancbmk 788.00 810.00 2.8% External/S.../CFP2006/450.soplex/450.soplex 180.00 185.00 2.8% MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR 338.00 345.00 2.1% MultiSourc...Benchmarks/7zip/7zip-benchmark 685.00 699.00 2.0% External/S...FP2017rate/544.nab_r/544.nab_r 158.00 160.00 1.3% MultiSourc...sumer-typeset/consumer-typeset 772.00 781.00 1.2% External/S...2017rate/525.x264_r/525.x264_r 410.00 414.00 1.0% External/S...23.xalancbmk_r/523.xalancbmk_r 998.00 1002.00 0.4% ``` Compile-time is almost neutral: https://llvm-compile-time-tracker.com/compare.php?from=b3125ad3d60531a97eea20009cc9629a87755862&to=84007eee59004f43464eda7f5ba8263ed5158df8&stat=instructions NewPM-O3: +0.03% NewPM-ReleaseThinLTO: -0.01% NewPM-ReleaseLTO-g: +0.03% Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D132365	2022-08-30 13:27:51 +01:00
OCHyams	84a71d5259	[DebugInfo] Fix line number attribution in mldst-motion Taking the example from the test included in this patch: $ cat test.cpp -n 1 void fun(int *a, int cond) { 2 if (cond) 3 a[1] = 1; 4 else 5 a[1] = 2; 6 } mldst-motion will merge and sink the stores in if.then and if.else into if.end. The resultant PHI, gep and store should be attributed line zero with the innermost common scope rather than picking a debug location from one of the original stores. Reviewed By: djtodoro Differential Revision: https://reviews.llvm.org/D132741	2022-08-30 10:03:53 +01:00
jacquesguan	df525c7705	[InstCombine] fold fake floating point vector extract to shift+trunc. This patch supports the FP part of D111082. Differential Revision: https://reviews.llvm.org/D125750	2022-08-30 10:12:16 +08:00
Rong Xu	d7ef0c3970	[llvm-profdata] Improve profile supplementation Current implementation promotes a non-cold function in the SampleFDO profile into a hot function in the FDO profile. This is too aggressive. This patch promotes a hot functions in the SampleFDO profile into a hot function, and a warm function in SampleFDO into a warm function in FDO. Differential Revision: https://reviews.llvm.org/D132601	2022-08-29 16:50:42 -07:00
Philip Reames	8936d86469	[LV] Add debug output for force scalar tracing [nfc] I keep finding myself needing to rule this out as a possible source of scalarization, so add debug output like we have for other instructions we decide to scalarize.	2022-08-29 15:17:51 -07:00
Valery N Dmitriev	329b972d41	[SLP] Try to match reductions before trying to vectorize a vector build sequence. This patch changes order of searching for reductions vs other vectorization possibilities. The idea is if we do not match a reduction it won't be harmful for further attempts to find vectorizable operations on a vector build sequences. But doing it in the opposite order we have good chance to ruin opportunity to match a reduction later. We also don't want to try vectorizing binary operations too early as 2-way vectorization may effectively prohibit wider ones leading to producing less effective code. Differential Revision: https://reviews.llvm.org/D132590	2022-08-29 13:32:14 -07:00
Philip Reames	033a97a8f3	[LV] Minor code restructure of isUniformAfterVectorization [nfc] Mostly just to make a future patch easier to review.	2022-08-29 12:48:27 -07:00
Philip Reames	c37b1a5f76	[RLEV] Pick a correct insert point when incoming instruction is itself a phi node This fixes https://github.com/llvm/llvm-project/issues/57336. It was exposed by a recent SCEV change, but appears to have been a long standing issue. Note that the whole insert into the loop instead of a split exit edge is slightly contrived to begin with; it's there solely because IndVarSimplify preserves the CFG. Differential Revision: https://reviews.llvm.org/D132571	2022-08-29 11:44:33 -07:00
Alexey Bataev	beacf9bd9e	[SLP]Fix PR57322: vectorize constant float stores. Stores for constant floats must be vectorized, improve analysis in SLP vectorizer for stores. Differential Revision: https://reviews.llvm.org/D132750	2022-08-29 11:02:53 -07:00
Alexey Bataev	e6345bf644	[SLP]Improve lookup of the buildvector top insertelement instruction. When estimating the cost of the in-tree vectorized scalars in buildvector sequences, need to take into account the vectorized insertelement instruction. The top of the buildvector seuences is the topmost vectorized insertelement instruction, because it will have > than 1 use after the vectorization. For the affected test case improves througput from 21 to 16 (per llvm-mca). Differential Revision: https://reviews.llvm.org/D132740	2022-08-29 08:19:52 -07:00
Sanjay Patel	6c39a3aae1	[InstCombine] fold not-shift of signbit to icmp+zext https://alive2.llvm.org/ce/z/j_8Wz9 The arithmetic shift was converted to logical shift with: `246078604c` That does not seem to uncover any other missing/conflicting folds, so convert directly to signbit test + cast. We still need to fold the pattern with logical shift to test + cast. This allows reducing patterns where the output type is not the same as the input value: https://alive2.llvm.org/ce/z/nydwFV Fixes #57394	2022-08-29 10:06:31 -04:00
Sanjay Patel	246078604c	[InstCombine] fold inc-of-signbit-splat to not+lshr (iN X s>> (N - 1)) + 1 --> (~X) u>> (N - 1) https://alive2.llvm.org/ce/z/wzS474	2022-08-29 08:48:22 -04:00
Florian Hahn	c78696813f	[LV] Remove unneeded getVectorIntrinsicIDForCall call (NFC). Suggested as independent fix during the review of D132585.	2022-08-29 10:19:47 +01:00
Kazu Hirata	2ad7fd3ac7	[Instrumentation] Use std::clamp (NFC) The use of std::clamp should be safe here. MinRZ is at most 32, while kMaxRZ is 1 << 18, so we have MinRZ <= kMaxRZ, avoiding the undefind behavior of std::clamp.	2022-08-28 23:28:57 -07:00
Kazu Hirata	c63f823875	[llvm] Use range-based for loops (NFC)	2022-08-28 17:35:04 -07:00
Sanjay Patel	ab6892967c	[InstCombine] allow sext in fold of mask using signbit, part 2 https://alive2.llvm.org/ce/z/rcbZmx Sibling tranform to `275aa24c0a` This pattern is seen in the examples in issue #57381.	2022-08-28 11:50:52 -04:00
zhongyunde	84d6966e4d	[InstCombine] Propagate the nuw for combine of add+mul As the commit of D132658, make the 'nuw' change separately. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D132777	2022-08-28 23:01:11 +08:00
Florian Hahn	af98b875e8	[VPlan] Use range check in VPHeaderPHIRecipe::classof (NFC). This addresses a suggestion to simplify the check from D131989. This also makes it easier to ensure that VPHeaderPHIRecipe::classof checks for all header phi ids.	2022-08-28 15:54:12 +01:00
Sanjay Patel	275aa24c0a	[InstCombine] allow sext in fold of mask using signbit ~(iN X s>> (N-1)) & Y --> (X s< 0) ? 0 : Y -- with optional sext https://alive2.llvm.org/ce/z/wFFnZT	2022-08-28 09:01:30 -04:00
Kazu Hirata	b18ff9c461	[Transform] Use range-based for loops (NFC)	2022-08-27 23:54:32 -07:00
Kazu Hirata	d0166c617d	[Utils] Remove redundaunt declarations (NFC) Identified with readability-redundant-declaration.	2022-08-27 23:54:31 -07:00
Kazu Hirata	d1688e9ddf	[llvm] Use std::gcd (NFC) This patch replaces calls to greatestCommonDivisor with std::gcd where both arguments are known to be of unsigned. This means that std::common_type_t of the two argument types should just be the wider one of the two.	2022-08-27 23:54:29 -07:00
Kazu Hirata	56ea4f9bd3	[Transforms] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-27 21:21:02 -07:00
Kazu Hirata	7a617fdf39	Use std::gcd (NFC) This patch replaces calls to GreatestCommonDivisor64 with std::gcd where both arguments are known to be of unsigned types no larger than 64 bits in size.	2022-08-27 21:20:59 -07:00
Florian Hahn	7743badafa	[VPlan] Verify that header only contains header phi recipes. Add verification that VPHeaderPHIRecipes are only in header VPBBs. Also adds missing checks for VPPointerInductionRecipe to VPHeaderPHIRecipe::classof. Split off from D119661. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D131989	2022-08-27 22:06:12 +01:00

1 2 3 4 5 ...

31465 Commits