llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	b7b99b0799	[AMDGPU] Fix -amdgpu-inline-arg-alloca-cost Before D94153 this threshold was in a pre-scaled units. After D94153 inlining threshold multiplier is not applied to this portion of the threshold anymore. Restore the threshold by applying the multiplier. Differential Revision: https://reviews.llvm.org/D98362	2021-03-12 10:19:50 -08:00
Nico Weber	08a5277a64	Revert "[IndirectCallPromotion] Don't strip ".__uniq." suffix when it strips" This reverts commit `90dfbeef59`. Causes PR49554. Also see comments on https://reviews.llvm.org/D98389	2021-03-12 10:03:58 -05:00
Florian Hahn	8904a82fa7	[LV] Fix name in CHECK pattern after `fb3ca7076`	2021-03-12 13:31:48 +00:00
Florian Hahn	fb3ca70761	[LV] Account IV recipes being uniform in VPTransformState::get(). This patch fixes a crash when trying to get a scalar value using VPTransformState::get() for uniform induction values or truncated induction values. IVs and truncated IVs can be uniform and the updated code accounts for that, fixing the crash. This should fix https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=31981	2021-03-12 13:29:06 +00:00
Sanjay Patel	bd197ed0a5	[SimplifyCFG] avoid sinking insts within an infinite-loop The test is reduced from a C source example in: https://llvm.org/PR49541 It's possible that the test could be reduced further or the predicate generalized further, but it seems to require a few ingredients (including the "late" SimplifyCFG options on the RUN line) to fall into the infinite-loop trap.	2021-03-12 08:04:57 -05:00
Serguei Katkov	cfe8f8e0f0	Revert "Mark gc.relocate and gc.result as readnone" As readnone function they become movable and LICM can hoist them out of a loop. As a result in LCSSA form phi node of type token is created. No one is ready that GCRelocate first operand is phi node but expects to be token. GVN test were also updated, it seems it does not do what is expected. Test for LICM is also added. This reverts commit `f352463ade`.	2021-03-12 16:59:17 +07:00
Bjorn Pettersson	529c8e8dc6	[InstSimplify] Simplify smul.fix and smul.fix.sat Add simplification of smul.fix and smul.fix.sat according to X * 0 -> 0 X * undef -> 0 X * (1 << scale) -> X This includes the commuted patterns and splatted vectors. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D98299	2021-03-12 09:09:58 +01:00
Bjorn Pettersson	3638bdfbda	[ConstantFold] Handle undef/poison when constant folding smul_fix/smul_fix_sat Do constant folding according to posion * C -> poison C * poison -> poison undef * C -> 0 C * undef -> 0 for smul_fix and smul_fix_sat intrinsics (for any scale). Reviewed By: nikic, aqjune, nagisa Differential Revision: https://reviews.llvm.org/D98410	2021-03-12 09:09:58 +01:00
Johannes Doerfert	ff256c1376	[Attributor] Derive `willreturn` based on `mustprogress` Since D86233 we have `mustprogress` which, in combination with `readonly`, implies `willreturn`. The idea is that every side-effect has to be modeled as a "write". Consequently, `readonly` means there is no side-effect, and `mustprogress` guarantees that we cannot "loop" forever without side-effect. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94125	2021-03-11 23:31:44 -06:00
Johannes Doerfert	9c2074dccb	[Attributor][NFC] Update tests after D94741 The update_test_checks script can now check for global symbols and is able to handle them properly when they differ across prefixes, e.g., attribute #0 might be different in different runs. This patch simply updates all the Attributor tests with the new script. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D97906	2021-03-11 23:31:39 -06:00
Valery N Dmitriev	73f94969b2	[SLP] Fix crash when matching associative reduction for integer min/max. Associative reduction matcher in SLP begins with select instruction but when it reached call to llvm.umax (or alike) via def-use chain the latter also matched as UMax kind. The routine's later code assumes matched instruction to be a select and thus it merely died on the first encountered cast that did not fit. Differential Revision: https://reviews.llvm.org/D98432	2021-03-11 11:52:57 -08:00
Wei Mi	90dfbeef59	[IndirectCallPromotion] Don't strip ".__uniq." suffix when it strips ".llvm." suffix. Currently IndirectCallPromotion simply strip everything after the first "." in LTO mode, in order to match the symbol name and the name with ".llvm." suffix in the value profile. However, if -funique-internal-linkage-names and thinlto are both enabled, the name may have both ".__uniq." suffix and ".llvm." suffix, and the current mechanism will strip them both, which is unexpected. The patch fixes the problem. Differential Revision: https://reviews.llvm.org/D98389	2021-03-11 11:08:47 -08:00
David Green	fad70c3068	[ARM] Improve WLS lowering Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over to the WLS while loops, improving the chance of lowering them successfully. To do this the lowering has to change a little as the instructions are terminators that produce a value - something that needs to be treated carefully. Lowering starts at the Hardware Loop pass, inserting a new llvm.test.start.loop.iterations that produces both an i1 to control the loop entry and an i32 similar to the llvm.start.loop.iterations intrinsic added for do loops. This feeds into the loop phi, properly gluing the values together: %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div) %wls0 = extractvalue { i32, i1 } %wls, 0 %wls1 = extractvalue { i32, i1 } %wls, 1 br i1 %wls1, label %loop.ph, label %loop.exit ... loop: %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ] .. %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1) %cmp = icmp ne i32 %iv.next, 0 br i1 %cmp, label %loop, label %loop.exit The llvm.test.start.loop.iterations need to be lowered through ISel lowering as a pair of WLS and WLSSETUP nodes, which each get converted to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent t2WhileLoopStart from being a terminator that produces a value, something difficult to control at that stage in the pipeline. Instead the t2WhileLoopSetup produces the value of LR (essentially acting as a lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc). These are then converted into a single t2WhileLoopStartLR at the same point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop to prevent them from progressing further in the pipeline. The t2WhileLoopStartLR is a single instruction that takes a GPR and produces LR, similar to the WLS instruction. %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3 t2B %bb.1 ... bb.2.loop: %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2 ... %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2 t2B %bb.3 The t2WhileLoopStartLR can then be treated similar to the other low overhead loop pseudos, eventually being lowered to a WLS providing the branches are within range. Differential Revision: https://reviews.llvm.org/D97729	2021-03-11 17:56:19 +00:00
Hiroshi Yamauchi	365b225d46	[PGO] Fix two issues in PGOMemOPSizeOpt. 1. PGOMemOPSizeOpt grabs only the first, up to five (by default) entries from the value profile metadata and preserves the remaining entries for the fallback memop call site. If there are more than five entries, the rest of the entries would get dropped. This is fine for PGOMemOPSizeOpt itself as it only promotes up to 3 (by default) values, but potentially not for other downstream passes that may use the value profile metadata. 2. PGOMemOPSizeOpt originally assumed that only values 0 through 8 are kept track of. When the range buckets were introduced, it was changed to skip the range buckets, but since it does not grab all entries (only five), if some range buckets exist in the first five entries, it could potentially cause fewer promotion opportunities (eg. if 4 out of 5 were range buckets, it may be able to promote up to one non-range bucket, as opposed to 3.) Also, combined with 1, it means that wrong entries may be preserved, as it didn't correctly keep track of which were entries were skipped. To fix this, PGOMemOPSizeOpt now grabs all the entries (up to the maximum number of value profile buckets), keeps track of which entries were skipped, and preserves all the remaining entries. Differential Revision: https://reviews.llvm.org/D97592	2021-03-11 09:53:05 -08:00
Joseph Huber	807466ef28	[OpenMP] Restore backwards compatibility for libomptarget Summary: The changes introduced in D87946 changed the API for libomptarget functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x but was not given a backward-compatible interface. This change will require people using Clang 13.x or 12.x to recompile their offloading programs. Reviewed By: jdoerfert cchen Differential Revision: https://reviews.llvm.org/D98358	2021-03-11 09:52:11 -05:00
Stephen Tozer	f40976bd01	Revert "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" This reverts commit `c0f3dfb9f1`. Reverted due to an error on the clang-x64-windows-msvc buildbot.	2021-03-11 14:48:01 +00:00
gbtozers	c0f3dfb9f1	[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic operations with two or more non-const operands; this includes the GetElementPtr instruction, and most Binary Operator instructions. These salvages produce DIArgList locations and are only valid for dbg.values, as currently variadic DIExpressions must use DW_OP_stack_value. This functionality is also only added for salvageDebugInfoForDbgValues; other functions that directly call salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be updated in a later patch. Differential Revision: https://reviews.llvm.org/D91722	2021-03-11 13:33:49 +00:00
Nikita Popov	403da6a69a	Reapply [LICM] Make promotion faster Relative to the previous implementation, this always uses aliasesUnknownInst() instead of aliasesPointer() to correctly handle atomics. The added test case was previously miscompiled. ----- Even when MemorySSA-based LICM is used, an AST is still populated for scalar promotion. As the AST has quadratic complexity, a lot of time is spent in this step despite the existing access count limit. This patch optimizes the identification of promotable stores. The idea here is pretty simple: We're only interested in must-alias mod sets of loop invariant pointers. As such, only populate the AST with loop-invariant loads and stores (anything else is definitely not promotable) and then discard any sets which alias with any of the remaining, definitely non-promotable accesses. If we promoted something, check whether this has made some other accesses loop invariant and thus possible promotion candidates. This is much faster in practice, because we need to perform AA queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable) instead of O(NumTotal^2), and NumPromotable tends to be small. Additionally, promotable accesses have loop invariant pointers, for which AA is cheaper. This has a signicant positive compile-time impact. We save ~1.8% geomean on CTMark at O3, with 6% on lencod in particular and 25% on individual files. Conceptually, this change is NFC, but may not be so in practice, because the AST is only an approximation, and can produce different results depending on the order in which accesses are added. However, there is at least no impact on the number of promotions (licm.NumPromoted) in test-suite O3 configuration with this change. Differential Revision: https://reviews.llvm.org/D89264	2021-03-11 10:50:28 +01:00
kuterd	d75c9e61a5	[Attributor] Attributor call site specific AAValueConstantRange This patch makes uses of the context bridges introduced in D83299 to make AAValueConstantRange call site specific. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D83744	2021-03-11 01:19:44 +03:00
Mauri Mustonen	0de8aeae72	[VPlan] Support to widen select intructions in VPlan native path Add support to widen select instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer. Previously select instructions get handled by the wrong recipe and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97136	2021-03-10 20:59:53 +00:00
Matteo Favaro	989051d5f8	[DSE] Extending isOverwrite to support offsetted fully overlapping stores The isOverwrite function is making sure to identify if two stores are fully overlapping and ideally we would like to identify all the instances of OW_Complete as they'll yield possibly killable stores. The current implementation is incapable of spotting instances where the earlier store is offsetted compared to the later store, but still fully overlapped. The limitation seems to lie on the computation of the base pointers with the GetPointerBaseWithConstantOffset API that often yields different base pointers even if the stores are guaranteed to partially overlap (e.g. the alias analysis is returning AliasResult::PartialAlias). The patch relies on the offsets computed and cached by BatchAAResults (available after D93529) to determine if the offsetted overlapping is OW_Complete. Differential Revision: https://reviews.llvm.org/D97676	2021-03-10 21:09:33 +01:00
Sriraman Tallam	0ba1ebcbb7	Remove original implementation of UniqueInternalLinkageNames pass. D96109 was recently submitted which contains the refactored implementation of -funique-internal-linakge-names by adding the unique suffixes in clang rather than as an LLVM pass. Deleting the former implementation in this change. Differential Revision: https://reviews.llvm.org/D98234	2021-03-10 11:57:40 -08:00
Nikita Popov	e19160c81e	[InstCombine] Regenerate test checks (NFC)	2021-03-10 20:27:10 +01:00
Daniil Seredkin	7c49f3c75b	[InstCombine][SimplifyLibCalls] An extra sqrtf was produced because of transformations in optimizePow function See: https://bugs.llvm.org/show_bug.cgi?id=47613 There was an extra sqrt call because shrinking emitted a new powf and at the same time optimizePow replaces the previous pow with sqrt and as the result we have two instructions that will be in worklist of InstCombie despite the fact that %powf is not used by anyone (it is alive because of errno). As the result we have two instructions: %powf = call fast float @powf(float %x, float 5.000000e-01) %sqrt = call fast double @sqrt(double %dx) %powf will be converted to %sqrtf on a later iteration. As a quick fix for that I moved shrinking to the end of optimizePow so that pow is replaced with sqrt at first that allows not to emit a new shrunk powf. Differential Revision: https://reviews.llvm.org/D98235	2021-03-10 12:33:05 -05:00
Ta-Wei Tu	7ff2768be1	Revert "[LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest`" This reverts commit `df9158c9a4`.	2021-03-11 01:24:43 +08:00
Dávid Bolvanský	c68b560be3	[DSE] Handle memmove with equal non-const sizes Follow up for fhahn's D98284. Also fixes a case from PR47644. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D98346	2021-03-10 17:52:00 +01:00
Florian Hahn	077dc5c87b	[DSE] Add tests that require phi translation to be removed.	2021-03-10 16:32:55 +00:00
Alex Richardson	b26d6758f0	[SLC] Simplify strcpy and friends with non-zero address spaces The current logic in TargetLibraryInfoImpl::getLibFunc() was only treating strcpy, etc. with i8* arguments in address space zero as a valid library function. However, in the CHERI and Morello targets we expect all libc functions to use address space 200 arguments. This commit updates isValidProtoForLibFunc() to check that the argument is a pointer type. This also drops the check for i8* since we should not be checking the pointee type any more. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D95142	2021-03-10 11:17:34 +00:00
Alex Richardson	81e2550f94	[SLC] Baseline test for missed strcpy optimizations in non-zero AS This will be fixed in D95142 Differential Revision: https://reviews.llvm.org/D95138	2021-03-10 11:17:34 +00:00
Florian Hahn	8d9b9c0edc	[DSE] Handle memcpy/memset with equal non-const sizes. Currently DSE misses cases where the size is a non-const IR value, even if they match. For example, this means that llvm.memcpy/llvm.memset calls are not eliminated, even if they write the same number of bytes. This patch extends isOverwite to try to get IR values for the number of bytes written from the analyzed instructions. If the values match, alias checks are performed and the result is returned. At the moment this only covers llvm.memcpy/llvm.memset. In the future, we may enable MemoryLocation to also track variable sizes, but this simple approach should allow us to cover the important cases in DSE. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98284	2021-03-10 10:13:58 +00:00
Florian Hahn	5293287630	[DSE] Add tests with memset & memcpy combinations and non-const sizes.	2021-03-10 09:46:54 +00:00
Juneyoung Lee	3170978173	[InstSimplify] Add tests for pr49495 (NFC)	2021-03-10 17:54:46 +09:00
Wei Mi	ee35784a90	[SampleFDO] Support enabling -funique-internal-linkage-name. now -funique-internal-linkage-name flag is available, and we want to flip it on by default since it is beneficial to have separate sample profiles for different internal symbols with the same name. As a preparation, we want to avoid regression caused by the flip. When we flip -funique-internal-linkage-name on, the profile is collected from binary built without -funique-internal-linkage-name so it has no uniq suffix, but the IR in the optimized build contains the suffix. This kind of mismatch may introduce transient regression. To avoid such mismatch, we introduce a NameTable section flag indicating whether there is any name in the profile containing uniq suffix. Compiler will decide whether to keep uniq suffix during name canonicalization depending on the NameTable section flag. The flag is only available for extbinary format. For other formats, by default compiler will keep uniq suffix so they will only experience transient regression when -funique-internal-linkage-name is just flipped. Another type of regression is caused by places where we miss to call getCanonicalFnName. Those places are fixed. Differential Revision: https://reviews.llvm.org/D96932	2021-03-09 21:41:40 -08:00
Philip Reames	b7fc372987	[tests] add a few more tests for D98122	2021-03-09 19:18:22 -08:00
Arnold Schwaighofer	590ac0a26a	[coro async] Transfer the original function's attributes to the clone rdar://75052917 Differential Revision: https://reviews.llvm.org/D98051	2021-03-09 17:01:41 -08:00
William S. Moses	875891a10d	[MemoryDependence] Fix invariant group store Fix bug in MemoryDependence [and thus GVN] for invariant group. Previously MemDep didn't verify that the store was storing into a pointer rather than a store simply using a pointer. Differential Revision: https://reviews.llvm.org/D98267	2021-03-09 19:03:39 -05:00
David Green	fa450e98c5	[ARM] Test for predicated scalar memops. NFC This test shows a case where we can potentially scalarize the store in a predicated loop, creating a lot of instructions that would be much slower than scalar.	2021-03-09 21:57:18 +00:00
Philip Reames	82400ae016	[tests] add tests to show effects of D98122	2021-03-09 13:54:15 -08:00
Juneyoung Lee	f49354838e	Revert "[InstCombine] Add simplification of two logical and/ors" This reverts commit `07c3b97e18` due to a reported failure in two-stage build.	2021-03-10 05:48:31 +09:00
Florian Hahn	5a3bb7dde3	[DSE] Add test cases with memory intrinsics and varying size values. This patch adds a few tests for memset/memcyp with non-constant size values. Some of the tests will be optimized in further patches.	2021-03-09 20:31:21 +00:00
Sanjay Patel	2986a9c7e2	[InstCombine] canonicalize 'not' op after min/max intrinsic This is another step towards parity between existing select transforms and min/max intrinsics (D98152).. The existing 'not' folds around select are complicated, so it's likely that we will need to enhance this, but this should be a safe step.	2021-03-09 11:33:28 -05:00
Sanjay Patel	ef19f6cbf3	[InstCombine] add tests for min/max intrinsics with not+constant; NFC	2021-03-09 11:33:28 -05:00
Sanjay Patel	41b9209a12	[InstCombine] fold min/max intrinsics with not ops This is a partial translation of the existing select-based folds. We need to recreate several different transforms to avoid regressions as noted in D98152. https://alive2.llvm.org/ce/z/teuZ_J	2021-03-09 08:55:48 -05:00
Luo, Yuanke	0875c2f7f6	[X86][AMX] Add test case for combining AMX bitcast.	2021-03-09 19:48:01 +08:00
Florian Hahn	92da5b7119	[InstCombine] Simplify phis with incoming pointer-casts. If the incoming values of a phi are pointer casts of the same original value, replace the phi with a single cast. Such redundant phis are somewhat common after loop-rotate and removing them can avoid some unnecessary code bloat, e.g. because an iteration of a loop is peeled off to make the phi invariant. It should also simplify further analysis on its own. InstCombine already uses stripPointerCasts in a couple of places and also simplifies phis based on the incoming values, so the patch should fit in the existing scope. The patch causes binary changes in 47 out of 237 benchmarks in MultiSource/SPEC2000/SPEC2006 with -O3 -flto on X86. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98058	2021-03-09 11:40:18 +00:00
Sanjay Patel	c05d574a98	[InstCombine] add tests for min/max intrinsics with not ops; NFC	2021-03-08 17:38:23 -05:00
Masoud Ataei	820f508b08	[PowerPC] Removing _massv place holder Since P8 is the oldest machine supported by MASSV pass, _massv place holder is removed and the oldest version of MASSV functions is assumed. If the P9 vector specific is detected in the compilation process, the P8 prefix will be updated to P9. Differential Revision: https://reviews.llvm.org/D98064	2021-03-08 21:43:24 +00:00
Sanjay Patel	0a2d69480d	[InstSimplify] cttz(1<<x) --> x https://alive2.llvm.org/ce/z/TDacYu https://alive2.llvm.org/ce/z/KF84S3	2021-03-08 16:30:14 -05:00
Sanjay Patel	afa443831b	[InstSimplify] add tests for cttz of shifted-1; NFC	2021-03-08 16:30:13 -05:00
Philip Reames	f1f9cc6c40	Fix ppc build bot after `239a6181` (Yes, I checked, return undef is the right result for the function.)	2021-03-08 10:00:56 -08:00
Philip Reames	ebc61f9d3c	[instcombine] Collapse trivial or recurrences If we have a recurrence of the form <Start, Or, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence. Differential Revision: https://reviews.llvm.org/D97578 (or part)	2021-03-08 09:21:38 -08:00
Philip Reames	239a618180	[instcombine] Collapse trivial and recurrences If we have a recurrence of the form <Start, And, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence. Differential Revision: https://reviews.llvm.org/D97578 (and part)	2021-03-08 09:21:38 -08:00
Philip Reames	97a7bc5831	[gvn] Precisely propagate equalities to phi operands The code used for propagating equalities (e.g. assume facts) was conservative in two ways - one of which this patch fixes. Specifically, it shifts the code reasoning about whether a use is dominated by the end of the assume block to consider phi uses to exist on the predecessor edge. This matches the dominator tree handling for dominates(Edge, Use), and simply extends it to dominates(BB, Use). Note that the decision to use the end of the block is itself a conservative choice. The more precise option would be to use the later of the assume and the value, and replace all uses after that. GVN handles that case separately (with the replace operand mechanism) because it used to be expensive to ask dominator questions within blocks. With the new instruction ordering support, we should probably rewrite this code at some point to simplify. Differential Revision: https://reviews.llvm.org/D98082	2021-03-08 08:59:00 -08:00
Sanne Wouda	05a6e2eb9a	[InstCombine] Add a combine for a shuffle of similar bitcasts Some intrinsics wrapper code has the habit of ignoring the type of the elements in vectors, thinking of vector registers as a "bag of bits". As a consequence, some operations are shared between vectors of different types are shared. For example, functions that rearrange elements in a vector can be shared between vectors of int32 and float. This can result in bitcasts in awkward places that prevent the backend from recognizing some instructions. For AArch64 in particular, it inhibits the selection of dup from a general purpose register (GPR), and mov from GPR to a vector lane. This patch adds a pattern in InstCombine to move the bitcasts past the shufflevector if this is possible. Sometimes this even allows InstCombine to remove the bitcast entirely, as in the included tests. Alternatively this could be done with a few extra patterns in the AArch64 backend, but InstCombine seems like a better place for this. Differential Revision: https://reviews.llvm.org/D97397	2021-03-08 16:32:30 +00:00
Florian Hahn	2bf1955f8b	[InstCombine] Pre-commit tests for redundant phis with pointer casts. Pre-commit tests for D98058.	2021-03-08 16:25:50 +00:00
Nikita Popov	7faad5c900	[ConstantFold] Handle icmp of global and null consistently Return UGT rather than NE for icmp @g, null, which is slightly stronger. This is consistent with what we do for more complex folds. It is somewhat silly that @g ugt null does not get folded while (gep @g) ugt null does.	2021-03-08 17:18:01 +01:00
Nikita Popov	f08148e874	[ConstProp] Fix folding of pointer icmp with signed predicates While @g ugt null is always true (ignoring weak symbols), @g sgt null is not necessarily the case -- that would imply that it is forbidden to place globals in the high half of the address space.	2021-03-08 17:12:12 +01:00
Nikita Popov	2ef03bc3a8	[ConstProp] Add more tests for pointer icmp folding (NFC)	2021-03-08 17:06:12 +01:00
Juneyoung Lee	ff58b243ac	Apply update_test_checks.py to test/Transforms/Util/assume-builder.ll (NFC)	2021-03-09 01:03:03 +09:00
Sanjay Patel	f75b5305f4	[ConstantFold] allow folding icmp of null and constexpr I noticed that we were not folding expressions like this: icmp ult (constexpr), null in https://llvm.org/PR49355, so we end up with extremely large icmp instructions as the constant expressions pile up on each other. There is no potential to mis-fold an unsigned boundary condition with a zero/null, so this is just falling through a crack in the pattern matching. The more general case of comparisons of non-zero constants and constexpr are more tricky and may require the datalayout to know how to cast to different types, etc. Negative tests verify that we are only changing a subset of potential patterns. Differential Revision: https://reviews.llvm.org/D98150	2021-03-08 08:53:59 -05:00
Sanjay Patel	a093942c28	[ConstProp][JumpThreading] add more test coverage for potential nullptr folds; NFC See D98150.	2021-03-08 08:53:59 -05:00
Sanjay Patel	962c6fda4d	[JumpThreading] auto-generate complete test checks; NFC	2021-03-08 08:26:16 -05:00
Haojian Wu	c9ff39a3f9	Add "assert require" for the test added in `df9158c9a4` The test is using "debug-only", it was failing in opt built mode.	2021-03-08 14:17:26 +01:00
Simon Pilgrim	c2d18d7005	[KnownBits] Add min/max shift amount handling to shl/lshr/ashr KnownBits helpers Pulled out of the original D90479 patch - also includes the "impossible shift amount" filtering from computeKnownBitsFromShiftOperator. Differential Revision: https://reviews.llvm.org/D90479	2021-03-08 11:44:31 +00:00
David Sherwood	de3185647d	[LoopVectorize][SVE] Add tests for vectorising conditional loads of invariant addresses For loops of the form: void foo(int a, int cond, short inv, long long n) { for (long long i=0; i<n; ++i) { if (cond[i]) a[i] = inv; } } we can vectorise for SVE using masked gather loads where the array of pointers is simply a vector splat of 'inv' and the mask comes from the condition 'cond[i] != 0'. This patch simply adds tests upstream to defend this capability. Differential Revision: https://reviews.llvm.org/D98043	2021-03-08 08:38:31 +00:00
Ta-Wei Tu	df9158c9a4	[LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest` The check `tightlyNested()` in `LoopInterchange` is similar to the one in `LoopNest`. In fact, the former misses some cases where loop-interchange is not feasible and results in incorrect behaviour. Replacing it with the much robust version provided by `LoopNest` reduces code duplications and fixes https://bugs.llvm.org/show_bug.cgi?id=48113. `LoopInterchange` has a weaker definition of tightly or perfectly nesting-ness than the one implemented in `LoopNest::arePerfectlyNested()`. Therefore, `tightlyNested()` is instead implemented with `LoopNest::checkLoopsStructure` and additional checks for unsafe instructions. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D97290	2021-03-08 11:36:08 +08:00
Mehdi Amini	8d5a981a13	Revert "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe" This reverts commit `99108c791d`. Clang is miscompiling LLVM with this change, a stage-2 build hits multiple failures. As a repro, I built clang in a stage1 directory and used it this way: cmake -G Ninja ../llvm \ -DCMAKE_CXX_COMPILER=`pwd`/../build-stage1/bin/clang++ \ -DCMAKE_C_COMPILER=`pwd`/../build-stage1/bin/clang \ -DLLVM_TARGETS_TO_BUILD="X86;NVPTX;AMDGPU" \ -DLLVM_ENABLE_PROJECTS=mlir \ -DLLVM_BUILD_EXAMPLES=ON \ -DCMAKE_BUILD_TYPE=Release \ -DLLVM_ENABLE_ASSERTIONS=On ninja check-mlir	2021-03-08 00:15:47 +00:00
Whitney Tsang	0d8f102809	[NFC][LoopUnroll] Add `-unroll-runtime-other-exit-predictable=false` in `runtime-multiexit-heuristic.ll` Added -unroll-runtime-other-exit-predictable=false in runtime-multiexit-heuristic.ll to make it more robust. runtime-multiexit-heuristic.ll intention is to test -unroll-runtime-multi-exit=false, so the default value of -unroll-runtime-other-exit-predictable should not impact the result. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D98098	2021-03-07 23:51:09 +00:00
Whitney Tsang	40391cef61	[LoopUnrollRuntime] Add option to assume the non latch exit block to be predictable. (Add LIT) Reviewed By: Meinersbur, bmahjour Differential Revision: https://reviews.llvm.org/D97747	2021-03-07 23:48:00 +00:00
Sanjay Patel	898b40645d	[ConstProp] add tests for cmp with null and constexpr; NFC	2021-03-07 14:02:44 -05:00
Juneyoung Lee	07c3b97e18	[InstCombine] Add simplification of two logical and/ors This is a patch that adds folding of two logical and/ors that share one variable: a && (a && b) -> a && b a && (a & b) -> a && b ... This is towards removing the poison-unsafe select optimization (D93065 has more context). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96945	2021-03-08 02:38:43 +09:00
Nikita Popov	2b494f85f1	[CVP] Remove -cvp-dont-add-nowrap-flags option This option was originally added to work around a bug in LFTR. The bug has long since been fixed.	2021-03-07 18:19:31 +01:00
Nikita Popov	176bbcae11	[DSE] Remove MemDep-based implementation The MemorySSA-based implementation has been enabled without issue for a while now, so keeping the old implementation around doesn't seem useful anymore. This drops the MemDep-based implementation. Differential Revision: https://reviews.llvm.org/D97877	2021-03-07 18:17:31 +01:00
Juneyoung Lee	33590ed4f2	[InstCombine] fix another poison-unsafe select transformation This fixes another unsafe select folding by disabling it if EnableUnsafeSelectTransform is set to false. EnableUnsafeSelectTransform's default value is true, hence it won't affect generated code (unless the flag is explicitly set to false).	2021-03-08 02:11:04 +09:00
Juneyoung Lee	99108c791d	[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe This patch makes FoldBranchToCommonDest merge branch conditions into `select i1` rather than `and/or i1` when it is called by SimplifyCFG. It is known that merging conditions into and/or is poison-unsafe, and this is towards making things more correct by removing possible miscompilations. Currently, InstCombine simply consumes these selects into and/or of i1 (which is also unsafe), so the visible effect would be very small. The unsafe select -> and/or transformation will be removed in the future. There has been efforts for updating optimizations to support the select form as well, and they are linked to D93065. The safe transformation is fired when it is called by SimplifyCFG only. This is done by setting the new `PoisonSafe` argument as true. Another place that calls FoldBranchToCommonDest is LoopSimplify. `PoisonSafe` flag is set to false in this case because enabling it has a nontrivial impact in performance because SCEV is more conservative with select form and InductiveRangeCheckElimination isn't aware of select form of and/or i1. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95026	2021-03-08 01:38:03 +09:00
Juneyoung Lee	5bb38e84d3	[LoopUnswitch] unswitch if cond is in select form of and/or as well Hello all, I'm trying to fix unsafe propagation of poison values in and/or conditions by using equivalent select forms (`select i1 A, i1 B, i1 false` and `select i1 A, i1 true, i1 false`) instead. D93065 has links to patches for this. This patch allows unswitch to happen if the condition is in this form as well. `collectHomogenousInstGraphLoopInvariants` is updated to keep traversal if Root and the visiting I matches both m_LogicalOr()/m_LogicalAnd(). Other than this, the remaining changes are almost straightforward and simply replaces Instruction::And/Or check with match(m_LogicalOr()/m_LogicalAnd()). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D97756	2021-03-08 01:19:43 +09:00
Juneyoung Lee	d65c947600	[InstCombine] enrich select-safe-bool-transforms.ll test (NFC) for https://reviews.llvm.org/D96945	2021-03-08 00:01:08 +09:00
Juneyoung Lee	2c16c4a43c	[ValueTracking] update directlyImpliesPoison to look into select's condition This is a minor update in directlyImpliesPoison and makes it look into select's condition. Splitted from https://reviews.llvm.org/D96945	2021-03-07 23:16:44 +09:00
Nikita Popov	3fedaf2a52	[GVN] Don't explicitly materialize undefs from dead blocks When materializing an available load value, do not explicitly materialize the undef values from dead blocks. Doing so will will force creation of a phi with an undef operand, even if there is a dominating definition. The phi will be folded away on subsequent GVN iterations, but by then we may have already poisoned MDA cache slots. Simply don't register these values in the first place, and let SSAUpdater do its thing.	2021-03-06 23:46:24 +01:00
Nikita Popov	5f319fc444	[GVN] Add test for load GVN with dead block (NFC) What this test illustrates is that GVN inserts an unnecessary phi node initially, which prevents alias analysis from establishing NoAlias, and MDA caches that result. We would be able to fully fold this after another -gvn run with clean MDA.	2021-03-06 23:19:58 +01:00
Mauri Mustonen	494b5ba364	[VPlan] Support to widen call intructions in VPlan native path Add support to widen call instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer. Previously call instructions got handled by wrong recipes and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139. Patch by Mauri Mustonen <mauri.mustonen@tuni.fi> Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97278	2021-03-06 21:59:52 +00:00
Roman Lebedev	2ad1f5eb1a	[InstCombine] Don't canonicalize (gep i8* X, -(ptrtoint Y)) as (inttoptr (sub (ptrtoint X), (ptrtoint Y))) It's just a wrong thing to do. We introduce inttoptr where there were none, which results in loosing all provenance information because we no longer have a GEP{i,}, and pessimize all future optimizations, because we are basically not allowed to look past `inttoptr`. (gep i8* X, -(ptrtoint Y)) is the canonical form. So just drop this fold. Noticed while reviewing D98120.	2021-03-06 23:00:25 +03:00
Roman Lebedev	75c7e3e314	[NFC][InstCombine] Add plain GEP test for (gep i8* X, -(ptrtoint Y)) --> (inttoptr (sub (ptrtoint X), (ptrtoint Y))) fold	2021-03-06 23:00:25 +03:00
Roman Lebedev	b46c085d2b	[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions These intrinsics, not the icmp+select are the canonical form nowadays, so we might as well directly emit them. This should not cause any regressions, but if it does, then then they would needed to be fixed regardless. Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`, but that is a pessimization, not a correctness issue. Additionally, the non-intrinsic form has issues with undef, see https://reviews.llvm.org/D88287#2587863	2021-03-06 21:52:46 +03:00
William S. Moses	d163e75c81	[Attributor] Enable heap-to-stack of any size Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1 Differential Revision: https://reviews.llvm.org/D97873	2021-03-06 12:57:32 -05:00
Philip Reames	9c139c50c9	[tests] Update an autogen test for format change	2021-03-06 09:49:27 -08:00
Philip Reames	5db2735af9	[gvn] Handle simply phi equivalence cases GVN basically doesn't handle phi nodes at all. This is for a reason - we can't value number their inputs since the predecessor blocks have probably not been visited yet. However, it also creates a significant pass ordering problem. As it stands, instcombine and simplifycfg ends up implementing CSE of phi nodes. This means that for any series of CSE opportunities intermixed with phi nodes, we end up having to alternate instcombine/simplifycfg and gvn to make progress. This patch handles the simplest case by simply preprocessing the phi instructions in a block, and CSEing them if they are syntactically identical. This turns out to be powerful enough to handle many cases in a single invocation of GVN since blocks which use the cse'd phi results are visited after the block containing the phi. If there's a CSE opportunity in one the phi predecessors required to recognize the phi CSE opportunity, that will require a second iteration on the function. (Still within a single run of gvn though.) Compile time wise, this could go either way. On one hand, we're potentially causing GVN to iterate over the function more. On the other, we're cutting down on iterations between two passes and potentially shrinking the IR aggressively. So, a bit unclear what to expect. Note that this does still rely on instcombine to canonicalize block order of the phis, but that's a one time transformation independent of the values incoming to the phi. Differential Revision: https://reviews.llvm.org/D98080	2021-03-06 09:31:12 -08:00
Philip Reames	06a8a867d1	[rs4gc/tests] Remove use of internal debug flags As a pragmatic tradeoff, the ease of updating the tests outweighs the slightly easier to understand test conditions. Where revevant, debug output was converted to comments to help human understanding.	2021-03-06 09:20:02 -08:00
Philip Reames	c6ec563f02	[rs4gc] autogen a bunch of tests for ease of update	2021-03-06 09:04:00 -08:00
Nikita Popov	1c59bf4d4d	[InstCombine] Add tests for non-trivial store to load forward (NFC) Examples of things we mostly don't handle.	2021-03-06 16:58:11 +01:00
Nikita Popov	edf7004851	[ConstantFold] Handle vectors in ConstantFoldLoadThroughBitcast() There seems to be an impedance mismatch between what the type system considers an aggregate (structs and arrays) and what constants consider an aggregate (structs, arrays and vectors). Rather than adjusting the type check, simply drop it entirely, as getAggregateElement() is well-defined for non-aggregates: It simply returns null in that case.	2021-03-06 12:17:56 +01:00
Nikita Popov	be58465591	[GVN] Regenerate test checks (NFC)	2021-03-06 12:11:16 +01:00
Nikita Popov	a917fb89dc	[LVI] Simplify and generalize handling of clamp patterns Instead of handling a number of special cases for selects, handle this generally when inferring ranges from conditions. We already infer ranges from `x + C pred C2` to `x`, so doing the same for `x pred C2` to `x + C` is straightforward.	2021-03-06 10:42:41 +01:00
Nikita Popov	906deaa0d9	[CVP] Add additional tests for clamp patterns (NFC) These are the same as the existing tests, but using different predicates that are not handled by the current code.	2021-03-06 10:42:40 +01:00
Nikita Popov	019ae8220f	[CVP] Fix tests for clamp patterns (NFC) These tests didn't test the pattern they were supposed to, because %a instead of %add was used in the select, which turned this into a normal min/max). Noticed this when commenting out the clamp handling code did not result in any test failures...	2021-03-06 10:24:44 +01:00
Philip Reames	8bdb5ecd82	[tests] precommit tests for D98082	2021-03-05 15:21:34 -08:00
Philip Reames	c0d390d0d2	[tests] precommit tests for phi handling in GVN	2021-03-05 14:24:21 -08:00
Philip Reames	51b13a7ea0	[gvn] CSE gc.relocates based on meaning, not spelling The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.) We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass. Differential Revision: https://reviews.llvm.org/D97974 (Note: Part of the reviewed change was split and landed as `f352463a`)	2021-03-05 10:16:12 -08:00
Philip Reames	f352463ade	Mark gc.relocate and gc.result as readnone For some reason, we had been marking gc.relocates as reading memory. There's no known reason for this, and I suspect it to be a legacy of very early implementation conservatism. gc.relocate and gc.result are simply projections of the return values from the associated statepoint. Note that the LangRef has always declared them readnone. The EarlyCSE change is simply moving the special casing from readonly to readnone handling. As noted by the test diffs, this does allow some additional CSE when relocates are separated by stores, but since we generate gc.relocates in batches, this is unlikely to help anything in practice. This was reviewed as part of https://reviews.llvm.org/D97974, but split at reviewer request before landing. The motivation is to enable the GVN changes in that patch.	2021-03-05 10:07:17 -08:00
Philip Reames	9fe46d6487	[tests] precommit some additional tests for D97974	2021-03-05 10:04:07 -08:00

1 2 3 4 5 ...

17805 Commits