llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	00d3f7cc3c	[LAA] Make getPointersDiff() API compatible with opaque pointers Make getPointersDiff() and sortPtrAccesses() compatible with opaque pointers by explicitly passing in the element type instead of determining it from the pointer element type. The SLPVectorizer result is slightly non-optimal in that unnecessary pointer bitcasts are added. Differential Revision: https://reviews.llvm.org/D104784	2021-06-23 18:44:34 +02:00
Sanjay Patel	656001e7b2	[ValueTracking] look through bitcast of vector in computeKnownBits This borrows as much as possible from the SDAG version of the code (originally added with D27129 and since updated with big endian support). In IR, we can test more easily for correctness than we did in the original patch. I'm using the simplest cases that I could find for InstSimplify: we computeKnownBits on variable shift amounts to see if they are zero or in range. So shuffle constant elements into a vector, cast it, and shift it. The motivating x86 example from https://llvm.org/PR50123 is also here. We computeKnownBits in the caller code, but we only check if the shift amount is in range. That could be enhanced to catch the 2nd x86 test - if the shift amount is known too big, the result is 0. Alive2 understands the datalayout and agrees that the tests here are correct - example: https://alive2.llvm.org/ce/z/KZJFMZ Differential Revision: https://reviews.llvm.org/D104472	2021-06-23 11:46:46 -04:00
David Green	8cfc080132	[ARM] Limit v6m unrolling with multiple live outs v6m cores only have a limited number of registers available. Unrolling can mean we spend more on stack spills and reloads than we save from the unrolling. This patch adds an extra heuristic to put a limit on the unroll count for loops with multiple live out values, as measured from the LCSSA phi nodes. Differential Revision: https://reviews.llvm.org/D104659	2021-06-23 16:36:37 +01:00
Datta Nagraj	ad0085d338	[InstCombine] Eliminate casts to optimize ctlz operation If a ctlz operation is performed on higher datatype and then downcasted, then this can be optimized by doing a ctlz operation on a lower datatype and adding the difference bitsize to the result of ctlz to provide the same output: https://alive2.llvm.org/ce/z/8uup9M The original problem is shown in https://llvm.org/PR50173 Differential Revision: https://reviews.llvm.org/D103788	2021-06-23 11:19:12 -04:00
Sanjay Patel	1e9b6b89a7	[InstCombine] convert FP min/max with negated op to fabs This is part of improving floating-point patterns seen in: https://llvm.org/PR39480 We don't require any FMF because the 2 potential corner cases (-0.0 and NaN) are correctly handled without FMF: 1. -0.0 is treated as strictly less than +0.0 with maximum/minimum, so fabs/fneg work as expected. 2. +/- 0.0 with maxnum/minnum is indeterminate, so transforming to fabs/fneg is more defined. 3. The sign of a NaN may be altered by this transform, but that is allowed in the default FP environment. If there are FMF, they are propagated from the min/max call to one or both new operands which seems to agree with Alive2: https://alive2.llvm.org/ce/z/bem_xC	2021-06-23 10:41:39 -04:00
Roman Lebedev	ff4b1d379f	[NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging This changes the approach taken to tail-merge the blocks to always create a new block instead of trying to reuse some block, and generalizes it to support dealing not with just the `ret` in the future. This effectively lifts the CallBr restriction, although this isn't really intentional. That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it. Other restrictions of the transform remain. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104598	2021-06-23 14:33:18 +03:00
Juneyoung Lee	5af8bacc94	[InstSimplify] Add more poison folding optimizations This adds more poison folding optimizations to InstSimplify. Since all binary operators propagate poison, these are fine. Also, the precondition of `select cond, undef, x` -> `x` is relaxed to allow the case when `x` is undef. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104661	2021-06-23 20:25:24 +09:00
Joe Ellis	3c4dbf6ea9	[Verifier] Fail on overrunning and invalid indices for {insert,extract} vector intrinsics With regards to overrunning, the langref (llvm/docs/LangRef.rst) specifies: (llvm.experimental.vector.insert) Elements ``idx`` through (``idx`` + num_elements(``subvec``) - 1) must be valid ``vec`` indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined. (llvm.experimental.vector.extract) Elements ``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined. For the non-mixed cases (e.g. inserting/extracting a scalable into/from another scalable, or inserting/extracting a fixed into/from another fixed), it is possible to statically check whether or not the above conditions are met. This was previously missing from the verifier, and if the conditions were found to be false, the result of the insertion/extraction would be replaced with an undef. With regards to invalid indices, the langref (llvm/docs/LangRef.rst) specifies: (llvm.experimental.vector.insert) ``idx`` represents the starting element number at which ``subvec`` will be inserted. ``idx`` must be a constant multiple of ``subvec``'s known minimum vector length. (llvm.experimental.vector.extract) The ``idx`` specifies the starting element number within ``vec`` from which a subvector is extracted. ``idx`` must be a constant multiple of the known-minimum vector length of the result type. Similarly, these conditions were not previously enforced in the verifier. In some circumstances, invalid indices were permitted silently, and in other circumstances, an undef was spawned where a verifier error would have been preferred. This commit adds verifier checks to enforce the constraints above. Differential Revision: https://reviews.llvm.org/D104468	2021-06-23 10:33:22 +00:00
Florian Hahn	aa58fdb396	[llvm] Update tests that got missed in `adee485adf`.	2021-06-23 10:29:58 +01:00
Max Kazantsev	842b4c83cb	[LoopDeletion] Exploit undef Phi inputs when symbolically executing 1st iteration Follow-up on Roman's idea expressed in D103959. - If a Phi has undefined inputs from live blocks: - and no other inputs, assume it is undef itself; - and exactly one non-undef input, we can assume that all undefs are equal to this input. Differential Revision: https://reviews.llvm.org/D104618 Reviewed By: lebedev.ri, nikic	2021-06-23 11:53:48 +07:00
Max Kazantsev	976926e8ee	[Test] Clear out br i1 undef from tests to avoid UB We don't want to test possible unexpected impact of such branches. Replacing them with regular conditions. Idea by Nikita Popov.	2021-06-23 11:33:57 +07:00
Max Kazantsev	b7d2c173eb	[LSR] Filter out zero factors. PR50765 Zero factor leads to division by zero and failure of corresponding assert as shown in PR50765. We should filter out such factors. Differential Revision: https://reviews.llvm.org/D104702 Reviewed By: huihuiz, reames	2021-06-23 10:43:06 +07:00
Philip Reames	6a40bb01f6	precommit test for D104665	2021-06-22 15:52:00 -07:00
Joseph Huber	44feacc736	[OpenMP] Change remaining globalization from an analysis remark to missed After landing the globalization optimizations, the precense of globalization on the device that was not put in shared or stack memory is a failed optimization with performance consequences so it should indicate a missed remark. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104735	2021-06-22 16:52:06 -04:00
Nikita Popov	7bb7fa12e7	[OpaquePtr] Support changing load type in InstCombine When the load type is changed to ptr, we need the load pointer type to also be ptr, because it's not allowed to create a pointer to an opaque pointer. This is achieved by adjusting the getPointerTo() API to return an opaque pointer for an opaque pointer base type. Differential Revision: https://reviews.llvm.org/D104718	2021-06-22 21:16:15 +02:00
Sami Tolvanen	33c9438f11	Revert "ThinLTO: Fix inline assembly references to static functions with CFI" This reverts commit `4474958d3a`. Breaks check-llvm on Mac.	2021-06-22 12:10:58 -07:00
Sanjay Patel	bfd172999b	[InstCombine][test] add tests for FP min/max with negated op; NFC	2021-06-22 14:15:04 -04:00
Sanjay Patel	4e78bd3836	[InstCombine][test] add tests for FP min/max with negated op; NFC	2021-06-22 14:15:04 -04:00
Joseph Huber	30e36c9b3c	[Attributor] Add interface to emit remarks in Attributor Summary: This patch adds support for the Attributor to emit remarks on behalf of some other pass. The attributor can now optionally take a callback function that returns an OptimizationRemarkEmitter object when given a Function pointer. If this is availible then a remark will be emitted for the corresponding pass name. Depends on D102197 Reviewed By: sstefan1 thegameg Differential Revision: https://reviews.llvm.org/D102444	2021-06-22 14:12:46 -04:00
Joseph Huber	7d69da71dd	[OpenMP] Enable HeapToStack conversion in OpenMPOpt for new RTL globalization calls Summary: The changes to globalization introduced in D97680 introduce a large amount of overhead by default. The old globalization method would always ignore globalization code if executing in SPMD mode. This wasn't strictly correct as data sharing is still possible in SPMD mode. The new interface is correct but introduces globalization code even when unnecessary. This optimization will use the existing HeapToStack transformation in the attributor to allow for unneeded globalization to be replaced with thread-private stack memory. This is done using the newly introduced library instances for the RTL functions added in D102087. Depends on D97818 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102197	2021-06-22 13:23:05 -04:00
Sami Tolvanen	4474958d3a	ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D104058	2021-06-22 10:01:55 -07:00
Joseph Huber	03d7e61c87	[OpenMP] Internalize functions in OpenMPOpt to improve IPO passes Summary: Currently the attributor needs to give up if a function has external linkage. This means that the optimization introduced in D97818 will only apply to static functions. This change uses the Attributor to internalize OpenMP device routines by making a copy of each function with private linkage and replacing the uses in the module with it. This allows for the optimization to be applied to any regular function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102824	2021-06-22 12:38:10 -04:00
Joseph Huber	6fc51c9f7d	[OpenMP] Replace GPU globalization calls with shared memory in the middle-end Summary: The changes introduced in D97680 create a simpler interface to code that needs to be globalized. This interface is used to simplify the globalization calls in the middle end. We can check any globalization call that is only called by a single thread in the team and replace it with a static shared memory buffer. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97818	2021-06-22 11:55:44 -04:00
Nikita Popov	e790d3667e	[OpaquePtr] Handle addrspacecasts in InstCombine This adds support for addrspace casts involving opaque pointers to InstCombine, as well as the isEliminableCastPair() helper (otherwise the assertion failure would just move there). Add PointerType::hasSameElementTypeAs() to hide the element type details. Differential Revision: https://reviews.llvm.org/D104668	2021-06-22 17:45:30 +02:00
Jingu Kang	873ff5a728	[SimpleLoopUnswich] Fixa a bug on ComputeUnswitchedCost with partial unswitch There was a bug from cost calculation for partially invariant unswitch. The costs of non-duplicated blocks are substracted from the total LoopCost, so anything that is duplicated should not be counted. Differential Revision: https://reviews.llvm.org/D103816	2021-06-22 16:18:00 +01:00
Joseph Huber	68d133a3e8	[OpenMP] Simplify GPU memory globalization Summary: Memory globalization is required to maintain OpenMP standard semantics for data sharing between worker and master threads. The GPU cannot share data between its threads so must allocate global or shared memory to store the data in. Currently this is implemented fully in the frontend using the `__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard CPU stack sharing. The front-end scans the target region for variables that escape the region and must be shared between the threads. Each variable then has a field created for it in a global record type. This patch replaces this functinality with a single allocation command, effectively mimicing an alloca instruction for the variables that must be shared between the threads. This will be much slower than the current solution, but makes it much easier to optimize as we can analyze each variable independently and determine if it is not captured. In the future, we can replace these calls with an `alloca` and small allocations can be pushed to shared memory. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97680	2021-06-22 10:52:46 -04:00
Rosie Sumpter	b2f48cc914	[SLP][AArch64] Add SLP vectorizer tests for XOR and AND reductions. NFC These regression tests show missed SLP vectorization opportunities, which will be fixed in a future commit (see: https://reviews.llvm.org/D104538). Differential Revision: https://reviews.llvm.org/D104708	2021-06-22 15:16:02 +01:00
Nikita Popov	e638a290f7	[ConstantFold] Delay fetching pointer element type Don't do this while stipping pointer casts, instead fetch it at the end. This improves compatibility with opaque pointers for the case where the base object is not opaque.	2021-06-22 15:51:00 +02:00
Nikita Popov	87bdde4962	[ConstantFold] Skip bitcast -> GEP transform for opaque pointers Same as with the InstCombine transform, this is not possible for bitcasts involving opaque pointers, as GEP preserves opaqueness.	2021-06-22 15:50:55 +02:00
Florian Hahn	d17798823c	[SCEV] Retain AddExpr flags when subtracting a foldable constant. Currently we drop wrapping flags for expressions like (A + C1)<flags> - C2. But we can retain flags under certain conditions: * Adding a smaller constant is NUW if the original AddExpr was NUW. * Adding a constant with the same sign and small magnitude is NSW, if the original AddExpr was NSW. This can improve results after using `SimplifyICmpOperands`, which may subtract one in order to use stricter predicates, as is the case for `isKnownPredicate`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D104319	2021-06-22 11:27:51 +01:00
Max Kazantsev	4c4f1ae93e	Re-land "[LoopDeletion] Handle Phis with similar inputs from different blocks" Patch was reverted due to a bug that existed before it and was exposed by it. Returning after the underlying bug has been fixed. Differential Revision: https://reviews.llvm.org/D103959	2021-06-22 12:28:46 +07:00
Max Kazantsev	575253887b	[LoopDeletion] Require loop to have a predecessor when executing 1st iteration symbolically Two predecessors break the further logic, and the loop may come to the opt in non-canonicalized state.	2021-06-22 12:20:55 +07:00
Eli Friedman	8f3d16905d	[ScalarEvolution] Ensure backedge-taken counts are not pointers. A backedge-taken count doesn't refer to memory; returning a pointer type is nonsense. So make sure we always return an integer. The obvious way to do this would be to just convert the operands of the icmp to integers, but that doesn't quite work out at the moment: isLoopEntryGuardedByCond currently gets confused by ptrtoint operations. So we perform the ptrtoint conversion late for lt/gt operations. The test changes are mostly innocuous. The most interesting changes are more complex SCEV expressions of the form "(-1 * (ptrtoint i8* %ptr to i64)) + %ptr)". This is expected: we can't fold this to zero because we need to preserve the pointer base. The call to isLoopEntryGuardedByCond in howFarToZero is less precise because of ptrtoint operations; this shows up in the function pr46786_c26_char in ptrtoint.ll. Fixing it here would require more complex refactoring. It should eventually be fixed by future improvements to isImpliedCond. See https://bugs.llvm.org/show_bug.cgi?id=46786 for context. Differential Revision: https://reviews.llvm.org/D103656	2021-06-21 16:24:16 -07:00
Roman Lebedev	4cf74469a0	[NFC][SimplifyCFG] Add basic test for debuginfo preservation of `ret` tail merging	2021-06-21 23:56:54 +03:00
Roman Lebedev	3e98b88797	[NFC][SimplifyCFG] Fix tests to use FileCheck instead of grep	2021-06-21 23:56:54 +03:00
Alexey Bataev	c5bbc737e8	[SLP][NFC]Rename functions in the tests, NFC.	2021-06-21 13:37:12 -07:00
Nikita Popov	39796e1ad0	Reapply [InstCombine] Don't try converting opaque pointer bitcast to GEP Reapplied without changes -- this was reverted together with an underlying patch. ----- Bitcasts having opaque pointer source or result type cannot be converted into a zero-index GEP, GEP source and result types always have the same opaque-ness.	2021-06-21 22:15:56 +02:00
Nikita Popov	e2c2124a4b	Reapply [InstCombine] Extract bitcast -> gep transform Relative to the original patch, an InstCombine test has been added to show a previously missed pattern, and the Coroutine test that resulted in the revert has been regenerated. ----- Move this into a separate function, to make sure that early returns do not accidentally skip other transforms. This previously happened for the isSized() check, which skipped folds like distributing a bitcast over a select.	2021-06-21 22:03:15 +02:00
Nikita Popov	403792f91e	[InstCombine] Add test for bitcast of unsized pointer (NFC) The bitcast should get folded into the select, but currently isn't due to an incorrect early bailout.	2021-06-21 22:03:15 +02:00
Nikita Popov	6922ab73a5	Revert "[InstCombine] Extract bitcast -> gep transform" This reverts commit `d9f5d7b959`. This reverts commit `5780611d7e`. This causes a failure in Coroutine tests.	2021-06-21 21:34:17 +02:00
Alexey Bataev	908b753661	[SLP]Improve vectorization of PHI instructions. Perform better analysis when trying to vectorize PHIs. 1. Do not try to vectorize vector PHIs. 2. Do deeper analysis for more profitable nodes for the vectorization. Before we just tried to vectorize the PHIs of the same type. Patch improves this and tries to vectorize PHIs with incoming values which come from the same basic block, have the same and/or alternative opcodes. It allows to save the compile time and provides better vectorization results in general. Part of D57059. Differential Revision: https://reviews.llvm.org/D103638	2021-06-21 12:26:24 -07:00
Nikita Popov	5780611d7e	[InstCombine] Don't try converting opaque pointer bitcast to GEP Bitcasts having opaque pointer source or result type cannot be converted into a zero-index GEP, GEP source and result types always have the same opaque-ness.	2021-06-21 21:24:50 +02:00
Philip Reames	0c09e5bd74	Split a test for ease of auto update	2021-06-21 11:02:26 -07:00
Jacob Hegna	f86d1f99b3	Remove ML inlining model artifacts. They are not conducive to being stored in git. Instead, we autogenerate mock model artifacts for use in tests. Production models can be specified with the cmake flag LLVM_INLINER_MODEL_PATH. LLVM_INLINER_MODEL_PATH has two sentinel values: - download, which will download the most recent compatible model. - autogenerate, which will autogenerate a "fake" model for testing the model uptake infrastructure. Differential Revision: https://reviews.llvm.org/D104251	2021-06-21 17:38:09 +00:00
Nathan Chancellor	f52666985d	Revert "[LoopDeletion] Handle Phis with similar inputs from different blocks" This reverts commit `bb1dc876eb`. This patch causes an assertion failure when building an arm64 defconfig Linux kernel. See https://reviews.llvm.org/D103959 for a link to the original bug report and a reduced reproducer.	2021-06-21 10:18:55 -07:00
Rosie Sumpter	2251f33bef	[SLP][AArch64] Add SLP vectorizer regression test. NFC This test is for a missed SLP vectorizer opportunity, reported here https://bugs.llvm.org/show_bug.cgi?id=44593. This is due to a cost modelling issue with vector reduction intrinsics which will be fixed in a future commit (see https://reviews.llvm.org/D104538).	2021-06-21 16:31:00 +01:00
Sanjay Patel	64b2676ca8	[InstCombine] fold ctlz/cttz-of-select with 1 or more constant arms Building on: `4c44b02d87` ...and adding handling for the extra operand in these intrinsics. This pattern is discussed in: https://llvm.org/PR50140	2021-06-21 11:04:12 -04:00
Sjoerd Meijer	071dbaec87	[FuncSpec] Add minsize test. NFC.	2021-06-21 15:21:09 +01:00
Florian Hahn	05bb969014	[LoopIdiom] Add test case that involves adds with flags and zero exts. Test coverage to ensure D104319 does not introduce a regression here.	2021-06-21 12:10:58 +01:00
Nikita Popov	acefe0eaaf	[Mem2Reg] Regenerate test checks (NFC)	2021-06-21 11:06:28 +02:00
Nikita Popov	80e0424b2c	[Mem2Reg] Use poison for unreachable cases Use poison instead of undef for cases dealing with unreachable code. This still leaves the more interesting case of "load from uninitialized memory" as undef.	2021-06-21 10:54:13 +02:00
Nikita Popov	00a88a81d2	[Mem2Reg] Regenerate test checks (NFC)	2021-06-21 10:47:59 +02:00
Juneyoung Lee	c038845f58	[InstCombine] Fold icmp (select c,const,arg), null if icmp arg, null can be simplified This patch folds icmp (select c,const,arg), null if icmp arg, null can be simplified. Resolves llvm.org/pr48975. Reviewed By: nikic, xbolva00 Differential Revision: https://reviews.llvm.org/D96663	2021-06-21 17:39:05 +09:00
Sjoerd Meijer	342bbb7832	[FuncSpec] Don't specialise functions with NoDuplicate instructions. getSpecializationCost was returning INT_MAX for a case when specialisation shouldn't happen, but this wasn't properly checked if specialisation was forced. Differential Revision: https://reviews.llvm.org/D104461	2021-06-21 09:02:11 +01:00
Max Kazantsev	3f2ff7cc8c	[Test] Add some tests showing room for optimization exploiting undef and UB	2021-06-21 13:11:46 +07:00
Max Kazantsev	bb1dc876eb	[LoopDeletion] Handle Phis with similar inputs from different blocks This patch lifts the requirement to have the only incoming live block for Phis. There can be multiple live blocks if the same value comes to phi from all of them. Differential Revision: https://reviews.llvm.org/D103959 Reviewed By: nikic, lebedev.ri	2021-06-21 11:37:06 +07:00
Juneyoung Lee	ce192ced2b	[InstCombine] Use poison constant to represent the result of unreachable instrs This patch updates InstCombine to use poison constant to represent the resulting value of (either semantically or syntactically) unreachable instrs, or a don't-care value of an unreachable store instruction. This allows more aggressive folding of unused results, as shown in llvm/test/Transforms/InstCombine/getelementptr.ll . Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104602	2021-06-21 09:58:44 +09:00
Dmitri Gribenko	ffa252e8ce	[GCOVProfiling][test] Ensure that 'opt' drops any files in a temp directory	2021-06-20 22:48:35 +02:00
Nikita Popov	1ae266f452	[LoopUnroll] Use smallest exact trip count from any exit This is a more general alternative/extension to D102635. Rather than handling the special case of "header exit with non-exiting latch", this unrolls against the smallest exact trip count from any exit. The latch exit is no longer treated as priviledged when it comes to full unrolling. The motivating case is in full-unroll-one-unpredictable-exit.ll. Here the header exit is an IV-based exit, while the latch exit is a data comparison. This kind of loop does not get rotated, because the latch is already exiting, and loop rotation doesn't try to distinguish IV-based/analyzable latches. Differential Revision: https://reviews.llvm.org/D102982	2021-06-20 20:58:26 +02:00
David Green	a24b02193a	[DSE] Remove stores in the same loop iteration DSE will currently only remove stores in the same block unless they can be guaranteed to be loop invariant. This expands that to any stores that are in the same Loop, at the same loop level. This should still account for where AA/MSSA will not handle aliasing between loops, but allow the dead stores to be removed where they overlap in the same loop iteration. It requires adding loop info to DSE, but that looks fairly harmless. The test case this helps is from code like this, which can come up in certain matrix operations: for(i=..) dst[i] = 0; for(j=..) dst[i] += src[in+j]; After LICM, this becomes: for(i=..) dst[i] = 0; sum = 0; for(j=..) sum += src[in+j]; dst[i] = sum; The first store is dead, and with this patch is now removed. Differntial Revision: https://reviews.llvm.org/D100464	2021-06-20 17:03:30 +01:00
Sanjay Patel	4c44b02d87	[InstCombine] fold ctpop-of-select with 1 or more constant arms The general pattern is mentioned in: https://llvm.org/PR50140 ...but we need to do a bit more to handle intrinsics with extra operands like ctlz/cttz.	2021-06-20 11:28:45 -04:00
Sanjay Patel	240acb0cff	[InstCombine] avoid infinite loops with select folds of constant expressions This pair of transforms was added recently with: `8591640379` And could lead to conflicting folds: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35399	2021-06-20 09:46:25 -04:00
Roman Lebedev	c5b7335dc8	[SimplifyCFG] FoldTwoEntryPHINode(): don't fold if either block has it's address taken Same as with HoistThenElseCodeToIf() (`ad87761925`).	2021-06-20 12:37:14 +03:00
Roman Lebedev	ad87761925	[SimplifyCFG] HoistThenElseCodeToIf(): don't hoist if either block has it's address taken This problem is exposed by D104598, after it tail-merges `ret` in `@test_inline_constraint_S_label`, the verifier would start complaining `invalid operand for inline asm constraint 'S'`. Essentially, taking address of a block is mismodelled in IR. It should probably be an explicit instruction, a first one in block, that isn't identical to any other instruction of the same type, so that it can't be hoisted.	2021-06-20 12:18:15 +03:00
Juneyoung Lee	09e8c0d5aa	[InstSimplify] icmp poison, X -> poison This adds a simple transformation from icmp with poison constant to poison. Comparing poison with something else is poison, so this is okay. https://alive2.llvm.org/ce/z/e8iReb https://alive2.llvm.org/ce/z/q4MurY	2021-06-20 15:39:07 +09:00
Fangrui Song	8ea2a58a2e	[llvm-profdata] Make diagnostics consistent with the (no capitalization, no period) style The format is currently inconsistent. Use the https://llvm.org/docs/CodingStandards.html#error-and-warning-messages style. And add `error:` or `warning:` to CHECK lines wherever appropriate.	2021-06-19 14:54:25 -07:00
Sanjay Patel	328b21a338	[InstCombine][test] add tests for select-of-bit-manip; NFC	2021-06-19 12:34:32 -04:00
Liqiang Tao	671a87104b	[llvm][Inliner] Add an optional PriorityInlineOrder This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining. The callsite which size is smaller would have a higher priority. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D104028	2021-06-19 10:17:32 +08:00
Guozhi Wei	575ba6f425	[InstCombine] Don't transform code if DoTransform is false In patch https://reviews.llvm.org/D72396, it doesn't check DoTransform before transforming the code, and generates wrong result for the attached test case. Differential Revision: https://reviews.llvm.org/D104567	2021-06-18 18:01:34 -07:00
Fangrui Song	3307240f05	[InstrProfiling][ELF] Make __profd_ private if the function does not use value profiling On ELF, the D1003372 optimization can apply to more cases. There are two prerequisites for making `__profd_` private: * `__profc_` keeps `__profd_` live under compiler/linker GC * `__profd_` is not referenced by code The first is satisfied because all counters/data are in a section group (either `comdat any` or `comdat noduplicates`). The second requires that the function does not use value profiling. Regarding the second point: `__profd_` may be referenced by other text sections due to inlining. There will be a linker error if a prevailing text section references the non-prevailing local symbol. With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`) clang is 4.2% smaller (1-169620032/177066968). `stat -c %s */.o \| awk '{s+=$1}END{print s}' is 2.5% smaller. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103717	2021-06-18 17:01:17 -07:00
Hongtao Yu	bd52495518	[CSSPGO] Undoing the concept of dangling pseudo probe As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen. I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D104477	2021-06-18 15:14:11 -07:00
Nikita Popov	3308205ae9	[LoopUnroll] Simplify optimization remarks Remove dependence on ULO.TripCount/ULO.TripMultiple from ORE and debug code. For debug code, print information about all exits. For optimization remarks, only include the unroll count and the type of unroll (complete, partial or runtime), but omit detailed information about exit folding, now that more than one exit may be folded. Differential Revision: https://reviews.llvm.org/D104482	2021-06-18 23:47:03 +02:00
Nick Desaulniers	bef2992861	[GCOVProfiling] don't profile Fn's w/ noprofile attribute Similar to D104475, the Linux kernel would like to avoid compiler generated code in certain functions. The no_profile function attribute can be used in C to generate the the noprofile fn attr in IR. Respect that from GCOVProfiling. Link: https://lore.kernel.org/lkml/CAKwvOdmPTi93n2L0_yQkrzLdmpxzrOR7zggSzonyaw2PGshApw@mail.gmail.com/ Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D104257	2021-06-18 13:58:34 -07:00
Liqiang Tao	93183a41b9	Revert D104028 "[llvm][Inliner] Add an optional PriorityInlineOrder"	2021-06-18 18:52:00 +08:00
Max Kazantsev	de92287cf8	[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration (try 3) This patch handles one particular case of one-iteration loops for which SCEV cannot straightforwardly prove BECount = 1. The idea of the optimization is to symbolically execute conditional branches on the 1st iteration, moving in topoligical order, and only visiting blocks that may be reached on the first iteration. If we find out that we never reach header via the latch, then the backedge can be broken. This implementation uses InstSimplify. SCEV version was rejected due to high compile time impact. Differential Revision: https://reviews.llvm.org/D102615 Reviewed By: nikic	2021-06-18 17:31:57 +07:00
Daniil Seredkin	6643e51d79	[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X) InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case. Differential Revision: https://reviews.llvm.org/D104193	2021-06-18 16:28:06 +07:00
Max Kazantsev	07bbfd9c13	[Test] Add XFAIL unit test for PR50765	2021-06-18 16:25:42 +07:00
Liqiang Tao	a740b707d1	[llvm][Inliner] Add an optional PriorityInlineOrder This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining. The callsite which size is smaller would have a higher priority. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D104028	2021-06-18 16:55:38 +08:00
Haojian Wu	7670938bba	[Attributor] Don't print the call-graph in a hard-coded file. This looks like not a practical pattern in our codebase (it could fail in some sandbox environement). Instead we print it via standard output, and it is controled by the -attributor-print-call-graph, this follows a similiar pattern of attributor-print-dep.	2021-06-18 09:38:07 +02:00
Daniil Seredkin	6de741de08	Revert "[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X)" This reverts commit `31053338c9`.	2021-06-18 14:21:02 +07:00
Daniil Seredkin	31053338c9	[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X) InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case. Differential Revision: https://reviews.llvm.org/D104193	2021-06-18 14:12:00 +07:00
Fangrui Song	5798be8458	Revert D103717 "[InstrProfiling] Make __profd_ unconditionally private for ELF" This reverts commit `76d0747e08`. If a group has `__llvm_prf_vals` due to static value profiler counters (`NS!=0`), we cannot make `__llvm_prf_data` private, because a prevailing text section may reference `__llvm_prf_data` and will cause a `relocation refers to a discarded section` linker error. Note: while a `__profc_` group is non-prevailing, it may be referenced by a prevailing text section due to inlining. ``` group section [ 66] `.group' [__profc__ZN5clang20EmitClangDeclContextERN4llvm12RecordKeeperERNS0_11raw_ostreamE] contains 4 sections: [Index] Name [ 67] __llvm_prf_cnts [ 68] __llvm_prf_vals [ 69] __llvm_prf_data [ 70] .rela__llvm_prf_data ```	2021-06-17 23:38:17 -07:00
Johannes Doerfert	30c9d68ad9	[Attributor][FIX] Arguments of unknown functions can be undef This should fix PR50683. The wrong assumption was that we could always know what the callee is when we replace a call site argument with undef. We wanted to know that to remove the `noundef` that might be attached to the argument. Since no callee means we did the propagation on the caller site, there is no need to remove an attribute. It is only needed if we replace all uses and therefore pass `undef` instead of the value that was passed in otherwise.	2021-06-18 01:07:53 -05:00
Johannes Doerfert	666dc6f126	[Attributor] Use a centralized value simplification interface To allow outside AAs that simplify values we need to ensure all value simplification goes through the Attributor, not AAValueSimplify (or any of the other AAs we have already like AAPotentialValues). This patch also introduces an interface for the outside AAs to register simplification callbacks for an IRPosition. To make this work as expected we have to pass IRPositions instead of Values in AAValueSimplify, which makes sense by itself.	2021-06-18 01:07:53 -05:00
Johannes Doerfert	9959eee001	[Attributor] Make sure Heap2Stack works properly on a GPU target If the target stack is not accessible between different running "threads" we have to make sure not to create allocas for mallocs that might be used by multiple "threads". The "use check" is sufficient to prevent this but if we apply the "free check" we have to make sure the pointer is not communicated to others before the free is reached. Differential Revision: https://reviews.llvm.org/D98608	2021-06-18 01:07:52 -05:00
Johannes Doerfert	ca7563bb02	[Attributor][NFC] Add test from PR49606 It is not clear to me how we fixed this, I reverted a few candidates but I couldn't make the test fail. Still worth having it in our regression suite.	2021-06-18 01:07:52 -05:00
Johannes Doerfert	39e1876b06	[Attributor][NFC] Precommit a set of test cases for load simplification	2021-06-18 01:07:51 -05:00
Daniil Seredkin	aea67232b1	[InstCombine][NFC] Added tests for mul with zext/sext operands Baseline tests for D104193	2021-06-18 11:14:50 +07:00
Xun Li	3522167efd	[Coroutine] Properly deal with byval and noalias parameters This patch is to address https://bugs.llvm.org/show_bug.cgi?id=48857. Previous attempts can be found in D104007 and D101980. A lot of discussions can be found in those two patches. To summarize the bug: When Clang emits IR for coroutines, the first thing it does is to make a copy of every argument to the local stack, so that uses of the arguments in the function will all refer to the local copies instead of the arguments directly. However, in some cases we find that arguments are still directly used: When Clang emits IR for a function that has pass-by-value arguments, sometimes it emits an argument with byval attribute. A byval attribute is considered to be local to the function (just like alloca) and hence it can be easily determined that it does not alias other values. If in the IR there exists a memcpy from a byval argument to a local alloca, and then from that local alloca to another alloca, MemCpyOpt will optimize out the first memcpy because byval argument's content will not change. This causes issues because after a coroutine suspension, the byval argument may die outside of the function, and latter uses will lead to memory use-after-free. This is only a problem for arguments with either byval attribute or noalias attribute, because only these two kinds are considered local. Arguments without these two attributes will be considered to alias coro_suspend and hence we won't have this problem. So we need to be able to deal with these two attributes in coroutines properly. For noalias arguments, since coro_suspend may potentially change the value of any argument outside of the function, we simply shouldn't mark any argument in a coroutiune as noalias. This can be taken care of in CoroEarly pass. For byval arguments, if such an argument needs to live across suspensions, we will have to copy their value content to the frame, not just the pointer. Differential Revision: https://reviews.llvm.org/D104184	2021-06-17 19:06:10 -07:00
Kuter Dinel	eaf1b6810c	[Attributor] Derive AACallEdges attribute This attribute computes the optimistic live call edges using the attributor liveness information. This attribute will be used for deriving a inter-procedural function reachability attribute. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104059	2021-06-18 03:29:22 +03:00
Fangrui Song	76d0747e08	[InstrProfiling] Make __profd_ unconditionally private for ELF For ELF, since all counters/data are in a section group (either `comdat any` or `comdat noduplicates`), and the signature for `comdat any` is `__profc_`, the D1003372 optimization prerequisite (linker GC cannot discard data variables while the text section is retained) is always satisified, we can make __profd_ unconditionally private. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103717	2021-06-17 14:16:54 -07:00
Craig Topper	99e95856fb	[PartiallyInlineLibCalls] Disable sqrt expansion for strictfp. This pass emits a floating point compare and a conditional branch, but if strictfp is enabled we don't emit a constrained compare intrinsic. The backend also won't expand the readonly sqrt call this pass inserts to a sqrt instruction under strictfp. So we end up with 2 libcalls as seen here. https://godbolt.org/z/oax5zMEWd Fix these things by disabling the pass. Differential Revision: https://reviews.llvm.org/D104479	2021-06-17 14:15:12 -07:00
Eli Friedman	8a567e5f22	[ScalarEvolution] Fix pointer/int type handling converting select/phi to min/max. The old version of this code would blindly perform arithmetic without paying attention to whether the types involved were pointers or integers. This could lead to weird expressions like negating a pointer. Explicitly handle simple cases involving pointers, like "x < y ? x : y". In all other cases, coerce the operands of the comparison to integer types. This avoids the weird cases, while handling most of the interesting cases. Differential Revision: https://reviews.llvm.org/D103660	2021-06-17 14:05:12 -07:00
Nikita Popov	f7c54c4603	[LoopUnroll] Fold all exits based on known trip count/multiple Fold all exits based on known trip count/multiple information from SCEV. Previously only the latch exit or the single exit were folded. This doesn't yet eliminate ULO.TripCount and ULO.TripMultiple entirely: They're still used to a) decide whether runtime unrolling should be performed and b) for ORE remarks. However, the core unrolling logic is independent of them now. Differential Revision: https://reviews.llvm.org/D104203	2021-06-17 20:58:34 +02:00
Sanjay Patel	61196f855c	[InstSimplify] add tests for computeKnownBits of shift-with-bitcast op; NFC	2021-06-17 12:39:16 -04:00
Sanjay Patel	5b1079f641	[InstCombine][x86] add tests for complex vector shift value tracking; NFC https://llvm.org/PR50123	2021-06-17 12:39:16 -04:00
Kevin P. Neal	60a8edf30d	[FPEnv][InstSimplify] Precommit tests for D103169. In D103169 I'm adding to InstSimplify support for NaN to constrained intrinsics that have a regular FP IR instruction counterpart. Precommit the tests for clarity when that ticket lands.	2021-06-17 10:34:39 -04:00
Sjoerd Meijer	3f596842e3	[FuncSpec] Precommit test: don't specialise funcs with NoDuplicate instrs. NFC.	2021-06-17 14:13:25 +01:00
hyeongyukim	69b0ed9a0a	[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210) As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210 ...the bug is triggered as Eli say when sext(idx) * ElementSize overflows. ``` // assume that GV is an array of 4-byte elements GEP = gep GV, 0, Idx // this is accessing Idx * 4 L = load GEP ICI = icmp eq L, value => ICI = icmp eq Idx, NewIdx ``` The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp. And there is a problem because Idx * ElementSize can overflow. Let's assume that the wanted value is at offset 0. Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00. We should return true for all these values, but currently, the new icmp only returns true for 0x00..00. This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx. ``` ... => Idx' = and Idx, 0x3F..FF ICI = icmp eq Idx', NewIdx ``` Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D99481	2021-06-17 19:46:17 +09:00
Sjoerd Meijer	dcd23d875a	[FuncSpec] Don't specialise functions with attribute NoDuplicate. Differential Revision: https://reviews.llvm.org/D104378	2021-06-17 10:32:29 +01:00
David Green	fda8b4714e	[InterleaveAccess] Copy fast math flags when adjusting binary operators in interleave access pass The Interleave Access pass will convert shuffle(binop(load, load)) to binop(shuffle(load), shuffle(load)), in order to create more interleaving load patterns (VLD2/3/4) that might have been messed up by instcombine. As shown in D104247 we were missing copying IR flags to the new instruction though, which should just be kept the same as the original instruction. Differential Revision: https://reviews.llvm.org/D104255	2021-06-17 09:53:33 +01:00
Bjorn Pettersson	4c7f820b2b	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit `0ee439b705`, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Hongtao Yu	cef9b96b01	[CSSPGO] Report zero-count probe in profile instead of dangling probes. Previously dangling samples were represented by INT64_MAX in sample profile while probes never executed were not reported. This was based on an observation that dangling probes were only at a smaller portion than zero-count probes. However, with compiler optimizations, dangling probes end up becoming at large portion of all probes in general and reporting them does not make sense from profile size point of view. This change flips sample reporting by reporting zero-count probes instead. This enabled dangling probe to be represented by none (missing entry in profile). This has a couple benefits: 1. Reducing sample profile size in optimize mode, even when the number of non-executed probes outperform the number of dangling probes, since INT64_MAX takes more space over 0 to encode. 2. Binary size savings. No need to encode dangling probe anymore, since missing probes are treated as dangling in the profile reader. 3. Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes. 4. Improving counts quality by respecting the counts already collected on the non-dangling copy of a probe. A probe, when duplicated, gets two copies at runtime. If one of them is dangling while the other is not, merging the two probes at profile generation time will cause the real samples collected on the non-dangling one to be discarded. Not reporting the dangling counterpart will keep the real samples. 5. Better readability. 6. Be consistent with non-CS dwarf line number based profile. Zero counts are trusted by the compiler counts inferencer while missing counts will be inferred by the compiler. Note that the current patch does include any work for #3. There will be follow-up changes. For #1, I've seen for a large Facebook service, the text profile is reduced by 7%. For extbinary profile, the size of LBRProfileSection is reduced by 35%. For #4, I have seen general counts quality for SPEC2017 is improved by 10%. Reviewed By: wenlei, wlei, wmi Differential Revision: https://reviews.llvm.org/D104129	2021-06-16 11:45:29 -07:00
Sanjay Patel	ce95200b79	[InstSimplify] propagate poison through FP ops We already have this fold: fadd float poison, 1.0 --> poison ...via ConstantFolding, so this makes the behavior consistent if the other operand(s) are non-constant. The fold for undef was added before poison existed as a value/type in IR. This came up in D102673 / D103169 because we're trying to sort out the more complicated handling for constrained math ops. We should have the handling for the regular instructions done first, so we can build on that (or diverge as needed). Differential Revision: https://reviews.llvm.org/D104383	2021-06-16 11:31:58 -04:00
Sjoerd Meijer	08c75fc5e3	[FuncSpec] Fixed prefix typo in test function-specialization-noexec.ll. NFC.	2021-06-16 16:25:26 +01:00
Sjoerd Meijer	c8a3fce776	[FuncSpec] Remove other passes in a test RUN line. NFC.	2021-06-16 10:36:22 +01:00
Sjoerd Meijer	29843cbc88	[FuncSpec] Add test for a call site that will never be executed. NFC.	2021-06-16 10:10:52 +01:00
Sjoerd Meijer	49ab3b1735	[FuncSpec] Statistics Adds some bookkeeping for collecting the number of specialised functions and a test for that. Differential Revision: https://reviews.llvm.org/D104102	2021-06-16 09:11:51 +01:00
Evgeniy Brevnov	96cded5b79	[SLP] Incorrect handling of external scalar values Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D103954	2021-06-16 13:27:36 +07:00
Chuanqi Xu	86906304d8	[FuncSpec] Use std::pow instead of operator^ The original implementation calculating UserBonus uses operator ^, which means XOR in C++ language. At the first glance of reviewing, I thought it should be power, my bad. It doesn't make sense to use XOR here. So I believe it should be a carelessness as I made. Test Plan: check-all Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D104282	2021-06-16 10:13:21 +08:00
Arthur Eubanks	9aa1428174	[InstSimplify] Treat invariant group insts as bitcasts for load operands We can look through invariant group intrinsics for the purposes of simplifying the result of a load. Since intrinsics can't be constants, but we also don't want to completely rewrite load constant folding, we convert the load operand to a constant. For GEPs and bitcasts we just treat them as constants. For invariant group intrinsics, we treat them as a bitcast. Relanding with a check for self-referential values. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D101103	2021-06-15 12:59:43 -07:00
Roman Lebedev	e52364532a	[NewPM] Remove SpeculateAroundPHIs pass Addition of this pass has been botched. There is no particular reason why it had to be sold as an inseparable part of new-pm transition. It was added when old-pm was still the default, and very very few users were actually tracking new-pm, so it's effects weren't measured. Which means, some of the turnoil of the new-pm transition are actually likely regressions due to this pass. Likewise, there has been a number of post-commit feedback (post new-pm switch), namely * https://reviews.llvm.org/D37467#2787157 (regresses HW-loops) * https://reviews.llvm.org/D37467#2787259 (should not be in middle-end, should run after LSR, not before) * https://reviews.llvm.org/D95789 (an attempt to fix bad loop backedge metadata) and in the half year past, the pass authors (google) still haven't found time to respond to any of that. Hereby it is proposed to backout the pass from the pipeline, until someone who cares about it can address the issues reported, and properly start the process of adding a new pass into the pipeline, with proper performance evaluation. Furthermore, neither google nor facebook reports any perf changes from this change, so i'm dropping the pass completely. It can always be re-reverted should/if anyone want to pick it up again. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D104099	2021-06-15 20:35:55 +03:00
Andrew Litteken	2c21278e74	[IROutliner] Adding DebugInfo handling for IR Outlined Functions This adds support for functions outlined by the IR Outliner to be recognized by the debugger. The expected behavior is that it will skip over the instructions included in that section. This is due to the fact that we can not say which of the original locations the instructions originated from. These functions will show up in the call stack, but you cannot step through them. Reviewers: paquette, vsk, djtodoro Differential Revision: https://reviews.llvm.org/D87302	2021-06-15 10:57:08 -05:00
Florian Hahn	f7fc8927c0	[LoopDeletion] Check for irreducible cycles when deleting loops. Loops with irreducible cycles may loop infinitely. Those cannot be removed, unless the loop/function is marked as mustprogress. Also discussed in D103382. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104238	2021-06-15 12:56:12 +01:00
Sanjay Patel	8591640379	[InstCombine] add DeMorgan folds for logical ops in select form We canonicalized to these select patterns (poison-safe logic) with D101191, so we need to reduce 'not' ops when possible as we would with 'and'/'or' instructions. This is shown in a secondary example in: https://llvm.org/PR50389 https://alive2.llvm.org/ce/z/BvsESh	2021-06-14 12:54:35 -04:00
Sanjay Patel	56ae4f23b2	[InstCombine] add tests for logical and/or with not ops; NFC	2021-06-14 12:54:35 -04:00
Florian Hahn	ee9bb258bb	[LoopDeletion] Add test with irreducible control flow in loop. Currently the irreducible cycles in the loops are ignored. The irreducible cycle may loop infinitely in irreducible_subloop_no_mustprogress, which is allowed and the loop should not be removed. Discussed in D103382.	2021-06-14 17:42:32 +01:00
Florian Hahn	96ca03493a	[VectorCombine] Limit scalarization to non-poison indices for now. As Eli mentioned post-commit in D103378, the result of the freeze may still be out-of-range according to Alive2. So for now, just limit the transform to indices that are non-poison.	2021-06-14 16:40:14 +01:00
Florian Hahn	9e77526d46	[VPlan] Add additional tests for region merging. Add additional tests suggested in D100260. Also drop the unneeded `indvars.` prefix from induction phi name.	2021-06-14 11:25:06 +01:00
Mindong Chen	8449af41e5	[LoopVectorize] precommit pr50686.ll for D104148	2021-06-14 13:58:25 +08:00
David Green	562593ff82	[DSE] Extra multiblock loop tests, NFC. Some of these can be DSE'd, some of which cannot. Useful in D100464.	2021-06-13 22:30:42 +01:00
Nikita Popov	6ecc99210c	[LoopUnroll] Test multi-exit runtime unrolling with predictable exit (NFC) The (prior to prologue insertion) predictable exit shouldn't get folded here. Make sure it isn't...	2021-06-13 18:48:38 +02:00
Sanjay Patel	afd44bb6f2	[InstCombine] fold ctlz/cttz of bool types https://alive2.llvm.org/ce/z/tX4pUT	2021-06-13 08:26:40 -04:00
Xun Li	fae7debadc	[CHR] Don't run ControlHeightReduction if any BB has address taken This patch is to address https://bugs.llvm.org/show_bug.cgi?id=50610. In computed goto pattern, there are usually a list of basic blocks that are all targets of indirectbr instruction, and each basic block also has address taken and stored in a variable. CHR pass could potentially clone these basic blocks, which would generate a cloned version of the indirectbr and clonved version of all basic blocks in the list. However these basic blocks will not have their addresses taken and stored anywhere. So latter SimplifyCFG pass will simply remove all tehse cloned basic blocks, resulting in incorrect code. To fix this, when searching for scopes, we skip scopes that contains BBs with addresses taken. Added a few test cases. Reviewed By: aeubanks, wenlei, hoy Differential Revision: https://reviews.llvm.org/D103867	2021-06-12 10:29:53 -07:00
Florian Hahn	0d9e8f5f4b	[VPlan] Add more sinking/merging tests with predicated loads/stores.	2021-06-12 15:36:51 +01:00
spupyrev	0a0800c4d1	A post-processing for BFI inference The current implementation for computing relative block frequencies does not handle correctly control-flow graphs containing irreducible loops. This results in suboptimally generated binaries, whose perf can be up to 5% worse than optimal. To resolve the problem, we apply a post-processing step, which iteratively updates block frequencies based on the frequencies of their predesessors. This corresponds to finding the stationary point of the Markov chain by an iterative method aka "PageRank computation". The algorithm takes at most O(\|E\| * IterativeBFIMaxIterations) steps but typically converges faster. It is turned on by passing option `use-iterative-bfi-inference` and applied only for functions containing profile data and irreducible loops. Tested on SPEC06/17, where it is helping to get correct profile counts for one of the binaries (403.gcc). In prod binaries, we've seen a speedup of up to 2%-5% for binaries containing functions with hot irreducible loops. Reviewed By: hoy, wenlei, davidxl Differential Revision: https://reviews.llvm.org/D103289	2021-06-11 21:46:04 -07:00
Sanjay Patel	1c51bf3b78	[InstCombine] add tests for bit manipulation intrinsics with bool values; NFC	2021-06-11 18:20:14 -04:00
Sanjay Patel	ad1d60bf53	[InstCombine] update test checks; NFC	2021-06-11 18:20:14 -04:00
Philip Reames	ac81cb7e6d	Allow ptrtoint/inttoptr of non-integral pointer types in IR I don't like landing this change, but it's an acknowledgement of a practical reality. Despite not having well specified semantics for inttoptr and ptrtoint involving non-integral pointer types, they are used in practice. Here's a quick summary of the current pragmatic reality: * I happen to know that the main external user of non-integral pointers has effectively disabled the verifier rules. * RS4GC (the lowering pass for abstract GC machine model which is the key motivation for non-integral pointers), even supports them. We just have all the tests using an integral pointer space to let the verifier run. * Certain idioms (such as alignment checks for alignment N, where any relocation is guaranteed to be N byte aligned) are fine in practice. * As implemented, inttoptr/ptrtoint are CSEd and are not control dependent. This means that any code which is intending to check a particular bit pattern at site of use must be wrapped in an intrinsic or external function call. This change allows them in the Verifier, and updates the LangRef to specific them as implementation dependent. This allows us to acknowledge current reality while still leaving ourselves room to punt on figuring out "good" semantics until the future.	2021-06-11 13:38:32 -07:00
Adam Nemet	e0efebb8eb	[Matrix] In transpose opts, handle a^t * a^t Without the fix the testcase crashes because we remove the same instruction twice. Differential Revision: https://reviews.llvm.org/D104127	2021-06-11 09:29:43 -07:00
Alexey Bataev	a010d4230e	[SLP]Allow reordering of insertelements. After we added support for non-ordered insertelements, we can allow their reordering. Differential Revision: https://reviews.llvm.org/D104057	2021-06-11 08:47:41 -07:00
Alexey Bataev	74af4bb1f4	[SLP]Remove unnecessary UndefValue in CreateShuffle. No need to use UndefValue in CreateShuffle call. Differential Revision: https://reviews.llvm.org/D104113	2021-06-11 08:08:30 -07:00
Alexey Bataev	cd2bb16d56	[SLP][NFC]Add a test for unordered stores, NFC.	2021-06-11 08:02:24 -07:00
Sanjay Patel	602ab24833	[SimplifyCFG] avoid crash on degenerate loop The problematic code pattern in the test is based on: https://llvm.org/PR50638 If the IfCond is itself the phi that we are trying to remove, then the loop around line 2835 can end up with something like: %cmp = select i1 %cmp, i1 false, i1 true That can then lead to a use-after-free and assert (although I'm still not seeing that locally in my release + asserts build). I think this can only happen with unreachable code. Differential Revision: https://reviews.llvm.org/D104063	2021-06-11 09:37:06 -04:00
Max Kazantsev	8840c94a33	[Test] One more elaborate test with selects for loop deletion	2021-06-11 18:41:34 +07:00
Max Kazantsev	8dc2c1a0ab	[Test] Add loop deletion test with switch	2021-06-11 18:05:07 +07:00
Roman Lebedev	20542b47d6	[VectorCombine] scalarizeLoadExtract(): use computeAlignmentAfterScalarization() helper This results in slightly more optimistic alignments in some cases	2021-06-11 12:47:10 +03:00
Sjoerd Meijer	c4a0969b9c	Function Specialization Pass This adds a function specialization pass to LLVM. Constant parameters like function pointers and constant globals are propagated to the callee by specializing the function. This is a first version with a number of limitations: - The pass is off by default, so needs to be enabled on the command line, - It does not handle specialization of recursive functions, - It does not yet handle constants and constant ranges, - Only 1 argument per function is specialised, - The cost-model could be further looked into, and perhaps related, - We are not yet caching analysis results. This is based on earlier work by Matthew Simpson (D36432) and Vinay Madhusudan. More recently this was also discussed on the list, see: https://lists.llvm.org/pipermail/llvm-dev/2021-March/149380.html. The motivation for this work is that function specialisation often comes up as a reason for performance differences of generated code between LLVM and GCC, which has this enabled by default from optimisation level -O3 and up. And while this certainly helps a few cpu benchmark cases, this also triggers in real world codes and is thus a generally useful transformation to have in LLVM. Function specialisation has great potential to increase compile-times and code-size. The summary from some investigations with this patch is: - Compile-time increases for short compile jobs is high relatively, but the increase in absolute numbers still low. - For longer compile-jobs, the extra compile time is around 1%, and very much in line with GCC. - It is difficult to blame one thing for compile-time increases: it looks like everywhere a little bit more time is spent processing more functions and instructions. - But the function specialisation pass itself is not very expensive; it doesn't show up very high in the profile of the optimisation passes. The goal of this work is to reach parity with GCC which means that eventually we would like to get this enabled by default. But first we would like to address some of the limitations before that. Differential Revision: https://reviews.llvm.org/D93838	2021-06-11 09:11:29 +01:00
Qiu Chaofan	2670c7dd5b	[VectorCombine] Fix alignment in single element store This fixes the concern in single element store scalarization that the alignment of new store may be larger than it should be. It calculates the largest alignment if index is constant, and a safe one if not. Reviewed By: lebedev.ri, spatel Differential Revision: https://reviews.llvm.org/D103419	2021-06-11 10:28:15 +08:00
Slava Nikolaev	119965865c	LoadStoreVectorizer: support different operand orders in the add sequence match First we refactor the code which does no wrapping add sequences match: we need to allow different operand orders for the key add instructions involved in the match. Then we use the refactored code trying 4 variants of matching operands. Originally the code relied on the fact that the matching operands of the two last add instructions of memory index calculations had the same LHS argument. But which operand is the same in the two instructions is actually not essential, so now we allow that to be any of LHS or RHS of each of the two instructions. This increases the chances of vectorization to happen. Reviewed By: volkan Differential Revision: https://reviews.llvm.org/D103912	2021-06-10 16:31:35 -07:00
Andy Kaylor	41555eaf65	Preserve more MD_mem_parallel_loop_access and MD_access_group in SROA SROA sometimes preserves MD_mem_parallel_loop_access and MD_access_group metadata on loads/stores, and sometimes fails to do so. This change adds copying of the MD after other CreateAlignedLoad/CreateAlignedStores. Also fix a case where the metadata was being copied from a load, rather than the store. Added a LIT test to catch one case. Patch by Mark Mendell Differential Revision: https://reviews.llvm.org/D103254	2021-06-10 15:47:03 -07:00
Joachim Meyer	4f01122c3f	[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads. This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL. The question remains why this was initially added and what the implications of removing this optimization would be. Do we need an alternative mechanism to propagate the information about legality of if-conversion? Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general? I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous. Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103907	2021-06-10 23:37:57 +02:00
Sanjay Patel	7b969ef8b4	[SimplifyCFG] avoid 'tmp' variables in test file; NFC	2021-06-10 17:04:23 -04:00
Alexey Bataev	a893b44187	[SLP]Disable scheduling of insertelements. There is no need to schedule insertelement instructions. The compiler did not schedule them before it started support their vectorization and it should not do it after. We pre-schedule them manually when finding a build vector sequence. Disabling scheduling of insertelement instructions improves compile time and vectorization of the very large basic blocks by saving scheduling budget for other instructions. Differential Revision: https://reviews.llvm.org/D104026	2021-06-10 10:25:26 -07:00
Caroline Concatto	1ad52105eb	[InstCombine] Add fold for extracting known elements from a stepvector This patch allows folding stepvector + extract to the lane when the lane is lower than the minimum size of the scalable vector. This fold is possible because lane X of a stepvector is also X! For instance, extracting element 3 of a <vscale x 4 x i64>stepvector is 3. Differential Revision: https://reviews.llvm.org/D103153	2021-06-10 13:36:57 +01:00
Caroline Concatto	3c1f0e9ef8	[InstSimplify] Add constant fold for extractelement + splat for scalable vectors This patch allows that scalable vector can fold extractelement and constant splat only when the lane index is lower than the minimum number of elements of the vector. Differential Revision: https://reviews.llvm.org/D103180	2021-06-10 12:41:40 +01:00
Qiu Chaofan	a115c5247f	[NFC] Pre-commit tests for VectorCombine scalarize	2021-06-10 14:29:18 +08:00
Serge Pavlov	8ff36aab69	[ConstantFolding] Enable folding of min/max/copysign for all floats Previously such folding was enabled for half, float and double values only. With this change it is allowed for other floating point values also. Differential Revision: https://reviews.llvm.org/D103956	2021-06-10 11:57:51 +07:00
Joseph Huber	4c9471581f	[Attributor] Set floating point loads and stores as nofree in AANoFreeFloating Summary: The current implementation of AANoFreeFloating will incorrectly list floating point loads and stores as may-free. This prevents other attributor instances like HeapToStack from pushing some allocations to the stack. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103975	2021-06-09 16:16:37 -04:00
Joseph Huber	c70c30d6da	[OpenMP][NFC] Precommit change to hide_mem_transfer_latency test flags	2021-06-09 16:16:37 -04:00
Arthur Eubanks	222cce3828	Revert "[InstSimplify] Treat invariant group insts as bitcasts for load operands" This reverts commit `26044c6a54`. Breaks on invalid IR (see D101103).	2021-06-09 11:46:10 -07:00
Sanjay Patel	9eef6e3981	[InstCombine] add tests for casts-around-ctlz; NFC Baseline for D103788	2021-06-09 11:22:44 -04:00
LemonBoy	d3faef6eef	[SROA] Avoid splitting loads/stores with irregular type Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D99435	2021-06-09 16:36:58 +02:00
Alexey Bataev	a0086add2e	[SLP]Improve gathering of scalar elements. 1. Better sorting of scalars to be gathered. Trying to insert constants/arguments/instructions-out-of-loop at first and only then the instructions which are inside the loop. It improves hoisting of invariant insertelements instructions. 2. Better detection of shuffle candidates in gathering function. 3. The cost of insertelement for constants is 0. Part of D57059. Differential Revision: https://reviews.llvm.org/D103458	2021-06-09 05:23:21 -07:00
Max Kazantsev	0120e6c295	[Test] Add more elaborate case of symbolic execution of 1-iteration loop	2021-06-09 19:08:54 +07:00
Nico Weber	205cde63c7	Revert "[SROA] Avoid splitting loads/stores with irregular type" This reverts commit `905f4eb537`. Breaks check-llvm on most (all?) bots, see https://reviews.llvm.org/D99435	2021-06-09 06:32:58 -04:00
LemonBoy	905f4eb537	[SROA] Avoid splitting loads/stores with irregular type Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D99435	2021-06-09 11:48:20 +02:00
Jingu Kang	8eee02020b	[LoopBoundSplit] Ignore phi node which is not scevable There was a bug in LoopBoundSplit. The pass should ignore phi node which is not scevable. Differential Revision: https://reviews.llvm.org/D103913	2021-06-09 09:44:36 +01:00
Sanjay Patel	d2012d965d	[InstCombine] fix nsz (fast-math) propagation from fneg-of-select As discussed in the post-commit comments for: `3cdd05e519` It seems to be safe to propagate all flags from the final fneg except for 'nsz' to the new select: https://alive2.llvm.org/ce/z/J_APDc nsz has unique FMF semantics: it is not poison, it is only "insignificant" in the calculation according to the LangRef.	2021-06-08 17:04:30 -04:00
Sanjay Patel	c52ed5c4f1	[InstCombine] add FMF tests for fneg-of-select; NFC As noted in the post-commit comments for `3cdd05e519`, we need to be more careful about FMF propagation.	2021-06-08 17:04:29 -04:00
David Green	0178ae734c	[DSE] Add another multiblock loop DSE test. NFC As reported in D100464, the stores in these loops should not be removed.	2021-06-08 21:54:59 +01:00
David Green	297088d1ad	Revert "[DSE] Remove stores in the same loop iteration" Apparently non-dead stores are being removed, as noted in D100464. This reverts commit `222aeb4d51`.	2021-06-08 21:23:08 +01:00
Hans Wennborg	386b66b2fc	Revert "3rd Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"" > This reapplies `c0f3dfb9`, which was reverted following the discovery of > crashes on linux kernel and chromium builds - these issues have since > been fixed, allowing this patch to re-land. This reverts commit `36ec97f76a`. The change caused non-determinism in the compiler, see comments on the code review at https://reviews.llvm.org/D91722. Reverting to unbreak people's builds until that can be addressed. This also reverts the follow-up "[DebugInfo] Limit the number of values that may be referenced by a dbg.value" in `a0bd6105d8`.	2021-06-08 14:54:08 +02:00
maekawatoshiki	09e92c607c	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Also, a crash problem on legacy pass manager is fixed. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-06-08 20:30:02 +09:00
Kerry McLaughlin	5db52751a5	[CostModel] Return an invalid cost for memory ops with unsupported types Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector type cannot be legalized by widening/splitting. When this is the method of legalization found, getTypeLegalizationCost will return an Invalid cost. The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call getTypeLegalizationCost and will now also return an Invalid cost for unsupported types. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102515	2021-06-08 12:07:36 +01:00
Caroline Concatto	6fd1604d14	[InstCombine] Add instcombine fold for extractelement + splat for scalable vectors This patch allows that scalable vector can also use the fold that already exists for fixed vector, only when the lane index is lower than the minimum number of elements of the vector. Differential Revision: https://reviews.llvm.org/D102404	2021-06-08 10:43:38 +01:00
Kerry McLaughlin	14eeccfe9a	[LoopVectorize] Don't use strict reductions when reordering is allowed If the `-enable-strict-reductions` flag is set to true, then currently we will always choose to vectorize the loop with strict in-order reductions. This is not necessary where we allow the reordering of FP operations, such as when loop hints are passed via metadata. This patch moves useOrderedReductions so that we can also check whether loop hints allow reordering, in which case we should use the default behaviour of vectorizing with unordered reductions. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D103814	2021-06-08 10:39:29 +01:00
Arthur Eubanks	47211fa889	Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry" Needs to be discussed more. This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46 This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1 This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23 This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd	2021-06-07 16:07:44 -07:00
Philip Reames	3c6e419198	[SCEV] Properly guard reasoning about infinite loops being UB on mustprogress Noticed via code inspection. We changed the semantics of the IR when we added mustprogress, and we appear to have not updated this location. Differential Revision: https://reviews.llvm.org/D103834	2021-06-07 14:47:36 -07:00
Daniil Suchkov	d32cc150fe	[BasicAA] Handle PHIs without incoming values gracefully Fix a bug introduced by `f6f6f6375d`. Now for empty PHIs, instead of crashing on assert(hasVal()) in Optional's internals, we'll return NoAlias, as we did before that patch. Differential Revision: https://reviews.llvm.org/D103831	2021-06-07 21:39:01 +00:00
Daniil Suchkov	e72f16b7e6	[Test] Add a JumpThreading test exposing a bug in BasicAA.	2021-06-07 21:39:00 +00:00
Nikita Popov	8fdd7c2ff1	[LoopUnroll] Clamp unroll count to MaxTripCount Unrolling with more iterations than MaxTripCount is pointless, as those iterations can never be executed. As such, we clamp ULO.Count to MaxTripCount if it is known. This means we no longer need to consider iterations after MaxTripCount for exit folding, and the CompletelyUnroll flag becomes independent of ULO.TripCount. Differential Revision: https://reviews.llvm.org/D103748	2021-06-07 21:08:42 +02:00
Philip Reames	c880d5e583	[RS4GC] Treat inttoptr as base pointer This is a modified version of a patch by tolziplohu with a style change, and most importantly, a revised commit message. inttoptr for a non-integral address space is currently ill defined in the LangRef. Figuring out exactly what the dynamic semantics of such a cast would be is hard, and not yet settled. Despite that, we still need to go ahead and implement something in RS4GC for a couple of reasons. First, as a simple consistency argument. We're apparently added support for constexpr inttoptrs a while back, and even have tests which exercised them. Having a lack of constant folding trigger a crash during lowering is non-ideal. Second, and more fundementally, the optimizer is allowed to insert undefined constructs in unreachable code. At the same time, we can't assume that dynamically dead code is always pruned before lowering. As a result, we must assume that inttoptrs can occur (even if completely ill defined) along dead paths. We need the lowering to not crash. The stackmaps produced can be garbage (as the assumption is the code is dynamically dead), but the lowering itself can't crash. Differential Revision: https://reviews.llvm.org/D103492	2021-06-07 10:27:23 -07:00
Sanjay Patel	4675beaa21	[InstCombine] intersect nsz and ninf fast-math-flags (FMF) for fneg(fdiv) fold https://alive2.llvm.org/ce/z/3KPvih https://llvm.org/PR49654	2021-06-07 13:22:49 -04:00
Sanjay Patel	dc173254e7	[InstCombine] add tests for FMF propagation via -(C/X); NFC There are bugs here as discussed in: https://llvm.org/PR49654	2021-06-07 13:22:49 -04:00
Florian Hahn	1465e7770b	[VPlan] Print successors of VPRegionBlocks. The non-DOT printing does not include the successors of VPregionBlocks. This patch use the same style for printing successors as for VPBasicBlock. I think the printing of successors could be a bit improved further, as at the moment it is hard to ensure a check line matches all successors. But that can be done as follow-up. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D103515	2021-06-07 17:57:21 +01:00
Fraser Cormack	ae3f6de3a8	[InstCombine] Support negation of scalable-vector splats This patch is an extension of D103421. It allows the InstCombiner to generate the negated form of integer scalable-vector splats. It can technically handle fixed-length vectors too but those are completely covered by the preceding logic. This enables extra combining opportunities for scalable vector types. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103801	2021-06-07 15:14:00 +01:00
Fraser Cormack	fd3b556958	[Constants] Extend support for scalable-vector splats This patch extends the various "isXXX" functions of the `Constant` class to include scalable-vector splats. In several "isXXX" functions, code that was separately inspecting `ConstantVector` and `ConstantDataVector` was unified to use `getSplatValue`, which already includes support for said splats. In the varous "isNotXXX" functions, code was added to check whether the scalar splat value -- if any -- satisfies the predicate. An extra fix for `isNotMinSignedValue` was included, as it previously crashed when passed a scalable-vector type because it unconditionally cast to `FixedVectorType` These changes address numerous missed optimizations, a compiler crash mentioned above and -- perhaps most egregiously -- an infinite loop in InstCombine due to the compiler breaking canonical form when it failed to pick up on a splat in a select instruction. Test cases have been added to cover as many of these functions as possible, though existing coverage is slim; it doesn't appear that there are any in-tree uses of `Constant::isNegativeZeroValue`, for example. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103421	2021-06-07 14:37:56 +01:00
Daniil Seredkin	7736c1936a	[InstCombine] Missed optimization for pow(x, y) * pow(x, z) with fast-math If FP reassociation (fast-math) is allowed, then LLVM is free to do the following transformation pow(x, y) * pow(x, z) -> pow(x, y + z). This patch adds this transformation and tests for it. See more https://bugs.llvm.org/show_bug.cgi?id=47205 It handles two cases 1. When operands of fmul are different instructions %4 = call reassoc float @llvm.pow.f32(float %0, float %1) %5 = call reassoc float @llvm.pow.f32(float %0, float %2) %6 = fmul reassoc float %5, %4 --> %3 = fadd reassoc float %1, %2 %4 = call reassoc float @llvm.pow.f32(float %0, float %3) 2. When operands of fmul are the same instruction %4 = call reassoc float @llvm.pow.f32(float %0, float %1) %5 = fmul reassoc float %4, %4 --> %3 = fadd reassoc float %1, %1 %4 = call reassoc float @llvm.pow.f32(float %0, float %3) Differential Revision: https://reviews.llvm.org/D102574	2021-06-07 08:08:05 -04:00
Bradley Smith	60c9b5f35c	[AArch64][SVE] Improve codegen for dupq SVE ACLE intrinsics Use llvm.experimental.vector.insert instead of storing into an alloca when generating code for these intrinsics. This defers the codegen of the generated vector to instruction selection, allowing existing shufflevector style optimizations to apply. Additionally, introduce a new target transform that can recognise fixed predicate patterns in the svbool variants of these intrinsics. Differential Revision: https://reviews.llvm.org/D103082	2021-06-07 12:21:38 +01:00
Florian Hahn	8344e215ec	[LV] Update more target-specific tests after `23c2f2e6b2`.	2021-06-07 12:13:21 +01:00
Florian Hahn	87c99d2b97	[Matrix] Add -matrix-allow-contract=false to tests. Explicitly specify contract behavior, so the tests are independent of the current default of the flag.	2021-06-07 12:13:20 +01:00
Florian Hahn	131343d35b	[PhaseOrdering] Update tests after `23c2f2e6b2`.	2021-06-07 10:59:30 +01:00
Jingu Kang	a2a0ac42ab	[SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IV This pass transforms loops that contain a conditional branch with induction variable. For example, it transforms left code to right code: newbound = min(n, c) while (iv < n) { while(iv < newbound) { A A if (iv < c) B B C C } } if (iv != n) { while (iv < n) { A C } } Differential Revision: https://reviews.llvm.org/D102234	2021-06-07 10:55:25 +01:00
Florian Hahn	23c2f2e6b2	[LV] Mark increment of main vector loop induction variable as NUW. This patch marks the induction increment of the main induction variable of the vector loop as NUW when not folding the tail. If the tail is not folded, we know that End - Start >= Step (either statically or through the minimum iteration checks). We also know that both Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV + %Step == %End. Hence we must exit the loop before %IV + %Step unsigned overflows and we can mark the induction increment as NUW. This should make SCEV return more precise bounds for the created vector loops, used by later optimizations, like late unrolling. At the moment quite a few tests still need to be updated, but before doing so I'd like to get initial feedback to make sure I am not missing anything. Note that this could probably be further improved by using information from the original IV. Attempt of modeling of the assumption in Alive2: https://alive2.llvm.org/ce/z/H_DL_g Part of a set of fixes required for PR50412. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D103255	2021-06-07 10:47:52 +01:00
Simon Pilgrim	ae973380c5	[CostModel][X86] Improve AVX512 FDIV costs Add missing v16f32/v8f64 costs and adjust other costs as well based off the SkylakeServer model	2021-06-06 21:41:05 +01:00
maekawatoshiki	0a9d079931	Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass" This reverts commit `2165360003`. To fix the crash problem in legacy pass manager	2021-06-07 01:26:47 +09:00
Nikita Popov	92ce29ee45	[LoopUnroll] Regenerate test checks (NFC)	2021-06-05 10:52:02 +02:00
Nikita Popov	db45746821	[LoopUnroll] Separate peeling from unrolling Loop peeling is currently performed as part of UnrollLoop(). Outside test scenarios, it is always performed with an unroll count of 1. This means that unrolling doesn't actually do anything apart from performing post-unroll simplification. When testing, it's currently possible to specify both an explicit peel count and an explicit unroll count. This doesn't perform any sensible operation and may result in miscompiles, see https://bugs.llvm.org/show_bug.cgi?id=45939. This patch moves peeling from UnrollLoop() into tryToUnrollLoop(), so that peeling does not also perform a susequent unroll. We only run the post-unroll simplifications. Specifying both an explicit peel count and unroll count is forbidden. In the future, we may want to support both (non-PGO) peeling a loop and unrolling it, but this needs to be done by first performing the peel and then recalculating unrolling heuristics on a now possibly analyzable loop. Differential Revision: https://reviews.llvm.org/D103362	2021-06-05 10:32:00 +02:00
Eli Friedman	925cd6b467	Regenerate a few tests related to SCEV. In preparation for https://reviews.llvm.org/D103656	2021-06-04 13:35:00 -07:00
Fangrui Song	9e51d1f348	[InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat `__profd_` variables are referenced by code only when value profiling is enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just waste space on ELF/Mach-O. We change the comdat symbol from `__profd_` to `__profc_` because an internal symbol does not provide deduplication features on COFF. The choice doesn't matter on ELF. (In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_` symbols.) On Windows this enables further optimization. We are no longer affected by the link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can cause duplicate definition error. https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585). This avoids many `/INCLUDE:` directives in `.drectve`. Here is rnk's measurement for Chrome: ``` This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%: #BEFORE $ find . -iname '.obj' \| xargs du -b \| awk '{ sum += $1 } END { print sum}' 1047758867 $ du -cksh base_unittests.exe 82M base_unittests.exe 82M total # AFTER $ find . -iname '.obj' \| xargs du -b \| awk '{ sum += $1 } END { print sum}' 937886499 $ du -cksh base_unittests.exe 78M base_unittests.exe 78M total ``` The change is NFC for Mach-O. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103372	2021-06-04 13:27:56 -07:00
Adam Nemet	ffde966cd9	[Matrix] Fix transpose-multiply folding if transpose has multiple uses Don't add it to FusedInsts in this case. Differential Revision: https://reviews.llvm.org/D103627	2021-06-04 10:55:03 -07:00
Sanjay Patel	8a4d05ddb3	[ConstantFolding] add copysign tests for more FP types; NFC D102673 proposes to ease the current type check, but there doesn't appear to be any test coverage for that.	2021-06-04 11:42:53 -04:00
Sanjay Patel	f03f4944cf	[InstCombine] add tests for pow() reassociation; NFC Baseline tests for D102574	2021-06-04 10:16:07 -04:00
Nico Weber	e9a9c85098	Revert "[InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat" This reverts commit `a14fc749aa`. Breaks check-profile on macOS. See https://reviews.llvm.org/D103372 for details.	2021-06-04 10:00:12 -04:00
Sanjay Patel	1fc6027406	[InstCombine] add/adjust test comments; NFC Follow-up to post-commit comment: https://reviews.llvm.org/rG23a116c8c446	2021-06-04 09:04:53 -04:00
Sanjay Patel	23a116c8c4	[InstCombine] convert lshr to ashr to eliminate cast op This is similar to `b865eead76` ( D103617 ) and fixes: https://llvm.org/PR50575 `41b71f718b` did this and more (noted with TODO comments in the tests), but it didn't handle the case where the destination is narrower than the source, so it got reverted. This is a simple match-and-replace. If there's evidence that the TODO cases are useful, we can revisit/extend.	2021-06-04 07:04:37 -04:00
Sanjay Patel	8937450e85	[InstCombine] add tests for sext-of-trunc-of-lshr; NFC	2021-06-04 07:04:37 -04:00
Arthur Eubanks	edf2056ff3	[BuildLibCalls] Properly set ABI attributes on arguments Some floating point lib calls have ABI attributes that need to be set on the caller. Found via D103412. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D103415	2021-06-03 15:45:07 -07:00
Philip Reames	5c0d1b2f90	[LoopUnroll] Eliminate PreserveCondBr parameter and fix a bug in the process This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing when the passed in trip count is an exact trip count or a max trip count. In theory the new code is slightly less powerful (since it relies on exact computable trip counts), but in practice, it appears to cover all the same cases. It can also be extended if needed. The test change shows what appears to be a bug in the existing code around the interaction of peeling and unrolling. The original loop only ran 8 iterations. The previous output had the loop peeled by 2, and then an exact unroll of 8. This meant the loop ran a total of 10 iterations which appears to have been a miscompile. Differential Revision: https://reviews.llvm.org/D103620	2021-06-03 14:09:16 -07:00
Fangrui Song	a14fc749aa	[InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat `__profd_` variables are referenced by code only when value profiling is enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just waste space on ELF/Mach-O. We change the comdat symbol from `__profd_` to `__profc_` because an internal symbol does not provide deduplication features on COFF. The choice doesn't matter on ELF. (In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_` symbols.) On Windows this enables further optimization. We are no longer affected by the link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can cause duplicate definition error. https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585). This avoids many `/INCLUDE:` directives in `.drectve`. Here is rnk's measurement for Chrome: ``` This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%: #BEFORE $ find . -iname '.obj' \| xargs du -b \| awk '{ sum += $1 } END { print sum}' 1047758867 $ du -cksh base_unittests.exe 82M base_unittests.exe 82M total # AFTER $ find . -iname '.obj' \| xargs du -b \| awk '{ sum += $1 } END { print sum}' 937886499 $ du -cksh base_unittests.exe 78M base_unittests.exe 78M total ``` Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103372	2021-06-03 13:16:13 -07:00
Nikita Popov	33e41eaecd	[LoopUnroll] Add additional test with one unpredictable exit (NFC) One exit is unpredictable, the other has a known trip count. For one function the predictable exit is the latch exit, for the other the non-latch exit. Currently they are treated differently.	2021-06-03 21:58:51 +02:00
Sanjay Patel	b865eead76	[InstCombine] eliminate sext and/or trunc if value has enough signbits If we have enough signbits in a source value, we can skip an intermediate cast for a trunc+sext pair: https://alive2.llvm.org/ce/z/A_mQt- This is the original problem shown in: https://llvm.org/PR49543 There's a test that shows we transformed what used to be a pair of shifts, so that suggests we could add another ComputeNumSignBits fold starting from a shift. There does not appear to be any change in compile-time from the extra analysis: https://llvm-compile-time-tracker.com/compare.php?from=3d2c9069dcafd0cbb641841aa3dd6e851fb7d760&to=b9513cdf2419704c7bb0c3a02a9ca06aae13d902&stat=instructions Differential Revision: https://reviews.llvm.org/D103617	2021-06-03 13:58:19 -04:00
Alexey Bataev	8c48d77cdf	[SLP]Improve cost estimation/emission of externally used extractelements. No need to recalculate the cost of extractelements, just no need to compensate the cost of all extractelements, need to check before if this is actually going to be removed at the vectorization. Also, no need to generate new extractelement instruction, we may just regenerate the original one. It may improve the final vectorization. Differential Revision: https://reviews.llvm.org/D102933	2021-06-03 10:26:59 -07:00
Alexey Bataev	89f3bc7698	[SLP]Allow to reorder nodes with >2 scalar values. tryToVectorizeList function allows to reorder only 2 scalars. Patch allows to reorder >2 scalars. Also, to avoid possible regressions, it allows extra vectorization of the remaining parts of the scalars elements if possible. Part of D57059. Differential Revision: https://reviews.llvm.org/D103247	2021-06-03 10:01:36 -07:00
Harald van Dijk	5d2b3de284	[SLP] Avoid std::stable_sort(properlyDominates()). As noticed by NAKAMURA Takumi back in 2017, we cannot use properlyDominates for std::stable_sort as properlyDominates only partially orders blocks. That is, for blocks A, B, C, D, where A dominates B and C dominates D, we have A == C, B == C, but A < B. This is not a valid comparison function for std::stable_sort and causes different results between libstdc++ and libc++. This change uses DFS numbering to give deterministic results for all reachable blocks. Unreachable blocks are ignored already, so do not need special consideration. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103441	2021-06-03 17:51:52 +01:00
Hamza Mahfooz	83235b07e3	[Matrix] Preserve existing fast-math flags during lowering This patch makes it so, floating-point instructions created in LowerMatrixIntrinsics retain fast-math flags from instructions that are higher up the chain. Fixes https://bugs.llvm.org/show_bug.cgi?id=49738 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D103233	2021-06-03 15:29:31 +01:00
Dave Lee	60ce8babf7	[coro] Preserve scope line for compiler generated functions Coro-split functions with an active suspend point have their scope line set to the line of the suspend point. However for compiler generated functions, this results in debug info with unconventional results: a file named `<compiler-generated>` with a non-zero line number. The convention for `<compiler-generated>` is that the line number is zero. This change propagates the scope line only for non-compiler generated functions. Differential Revision: https://reviews.llvm.org/D102412	2021-06-02 15:57:12 -07:00
Rong Xu	6745ffe4fa	[SampleFDO] New hierarchical discriminator for FS SampleFDO (ProfileData part) This patch was split from https://reviews.llvm.org/D102246 [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This is mainly for ProfileData part of change. It will load FS Profile when such profile is detected. For an extbinary format profile, create_llvm_prof tool will add a flag to profile summary section. For other format profiles, the users need to use an internal option (-profile-isfs) to tell the compiler that the profile uses FS discriminators. This patch also simplified the bit API used by FS discriminators. Differential Revision: https://reviews.llvm.org/D103041	2021-06-02 10:32:52 -07:00
Stephen Tozer	4316b0e59c	[LoopStrengthReduce] Ensure that debug intrinsics do not affect LSR's output During Loop Strength Reduce, if the terminating condition for the loop is not immediately adjacent to the terminating branch and it has more than one use, a clone of the condition will be created just before the terminating branch and will be used as the branch condition. Currently, whether the instructions are "immediately adjacent" is determined by checking whether the next instruction after the condition is the terminating branch; this is incorrect however, as the presence of a debug intrinsic between the two will result in a change to the output. This is fixed by using getNextNonDebugInstruction() instead. Differential Revision: https://reviews.llvm.org/D103033	2021-06-02 15:56:23 +01:00
Arnold Schwaighofer	f1a0c5d67c	[coro async] Add the swiftasync attribute to the resume partial function Transfer the swiftasync attribute to the resume partial function according to suspend.async specification. It's first argument denotes which argument is the async context. rdar://71499498 Differential Revision: https://reviews.llvm.org/D103285	2021-06-02 07:44:33 -07:00
Sander de Smalen	d41cb6bb26	[LV] Build and cost VPlans for scalable VFs. This patch uses the calculated maximum scalable VFs to build VPlans, cost them and select a suitable scalable VF. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98722	2021-06-02 14:47:47 +01:00
Jingu Kang	f3a27511c9	[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch This re-enables commit `107d19eb01` with bug fixes. Differential Revision: https://reviews.llvm.org/D99354	2021-06-02 10:58:22 +01:00
Bjorn Pettersson	fe208a4ef4	[InstCombine][msp430] Pre-commit test case for @llvm.powi and 16-bit ints This is a pre-commit of a test case D99439 which is a patch that updates @llvm.powi to handle different int sizes for the exponent. Problem is that @llvm.powi is used as an IR construct that maps to RT libcalls to __powi* functions, and those lib functions depend on sizeof(int) to use correct type for the exponent. The test cases show that we use i32 for the powi expenent, which later would result in wrong type being used in libcalls (miscompile). But there are also a couple of the negative test cases that show that we rewrite into using powi when having a uitofp conversion from i16, which would be wrong when doing the libcall as an "unsigned int" isn't guaranteed to fit inside the "int" argument in the called libcall function. Differential Revision: https://reviews.llvm.org/D102919	2021-06-02 11:40:34 +02:00
Bjorn Pettersson	9c54ee4378	[SimplifyLibCalls] Take size of int into consideration when emitting ldexp/ldexpf When rewriting powf(2.0, itofp(x)) -> ldexpf(1.0, x) exp2(sitofp(x)) -> ldexp(1.0, sext(x)) exp2(uitofp(x)) -> ldexp(1.0, zext(x)) the wrong type was used for the second argument in the ldexp/ldexpf libc call, for target architectures with 16 bit "int" type. The transform incorrectly used a bitcasted function pointer with a 32-bit argument when emitting the ldexp/ldexpf call for such targets. The fault is solved by using the correct function prototype in the call, by asking TargetLibraryInfo about the size of "int". TargetLibraryInfo by default derives the size of the int type by assuming that it is 16 bits for 16-bit architectures, and 32 bits otherwise. If this isn't true for a target it should be possible to override that default in the TargetLibraryInfo initializer. Differential Revision: https://reviews.llvm.org/D99438	2021-06-02 11:40:34 +02:00
Arthur Eubanks	26044c6a54	[InstSimplify] Treat invariant group insts as bitcasts for load operands We can look through invariant group intrinsics for the purposes of simplifying the result of a load. Since intrinsics can't be constants, but we also don't want to completely rewrite load constant folding, we convert the load operand to a constant. For GEPs and bitcasts we just treat them as constants. For invariant group intrinsics, we treat them as a bitcast. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D101103	2021-06-01 16:33:06 -07:00
Arthur Eubanks	3aa943070c	[test] Precommit test for D101103	2021-06-01 16:31:02 -07:00
Sanjay Patel	3378542700	[InstCombine] add tests for cast folding; NFC https://llvm.org/PR49543	2021-06-01 16:03:24 -04:00
Harald van Dijk	f126e8ec28	[SLPVectorizer] Ignore unreachable blocks As the existing test unreachable.ll shows, we should be doing more work to avoid entering unreachable blocks: we should not stop vectorization just because a PHI incoming value from an unreachable block cannot be vectorized. We know that particular value will never be used so we can just replace it with poison.	2021-06-01 20:21:04 +01:00
Xun Li	41d08541e8	Simplify coro-zero-alloca.ll D101841 added this test. It appears to generate different outcome on different platforms. Make it to only call -coro-split instead of entire O2 pipeline to simplify the test flow. Hope this will make the test more robust. Reviewed By: djtodoro Differential Revision: https://reviews.llvm.org/D103418	2021-06-01 08:12:35 -07:00
Alexey Bataev	36911971a5	[SLP]Better detection of perfect/shuffles matches for gather nodes. Implemented better scheme for perfect/shuffled matches of the gather nodes which allows to fix the performance regressions introduced by earlier patches. Starting detecting matches for broadcast nodes and extractelement gathering. Differential Revision: https://reviews.llvm.org/D102920	2021-06-01 07:08:07 -07:00
Daniil Seredkin	13140120dc	[InstCombine] Relax constraints of uses for exp(X) * exp(Y) -> exp(X + Y) InstCombine didn't perform the transformations when fmul's operands were the same instruction because it required to have one use for each of them which is false in the case. This patch fixes this + adds tests for them and introduces a new function isOnlyUserOfAnyOperand to check these cases in a single place. This patch is a result of discussion in D102574. Differential Revision: https://reviews.llvm.org/D102698	2021-06-01 08:33:23 -04:00
Florian Hahn	1b84acb23a	[LoopDeletion] Consider infinite loops alive, unless mustprogress. The current loop or any of its sub-loops may be infinite. Unless the function or the loops are marked as mustprogress, this in itself makes the loop not dead. This patch moves the logic to check whether the current loop is finite or mustprogress to `isLoopDead` and also extends it to check the sub-loops. This should fix PR50511. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D103382	2021-06-01 13:07:36 +01:00
Florian Hahn	d4c070d801	[VectorCombine] Freeze index unless it is known to be non-poison. If the index itself is already poison, the poison propagates through instructions clamping the index to a valid range. This still causes introducing a load of poison, as flagged by Alive2 and pointed out at `575e2aff55`. This patch updates the code to freeze the index, unless it is proven to not be poison. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D103378	2021-06-01 10:40:57 +01:00
Florian Hahn	f000c4cfb6	[VectorCombine] Add tests with multiple noundef indices for scalarization.	2021-06-01 10:17:50 +01:00
Douglas Yung	18225d4576	Mark test as requiring asserts.	2021-06-01 02:01:01 -07:00
Max Kazantsev	4ef47eaed9	[Test] Add one more loop deletion irreducible CFG test	2021-06-01 11:11:15 +07:00
Nathan Chancellor	e6b086bef2	Revert "[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210)" This reverts commit `4f2fd3818b`. The Linux kernel fails to build after this commit. See https://reviews.llvm.org/D99481 for a reproducer. Signed-off-by: Nathan Chancellor <nathan@kernel.org>	2021-05-31 20:21:26 -07:00
Congzhe Cao	bfefde22b6	[LoopInterhcange] Handle movement of reduction phis appropriately This patch fixes pr43326 and pr48212. Currently when we move reduction phis to the right place, loop interchange assumes the first phi in loop headers is an induction phi, skips the first phi and assumes the rest of phis are candidate reduction phis to move. However, it may not always be the case. This patch loops over all phis in loop headers and considers a phi node as a candidate reduction phi to move only when it is indeed a reduction phi across outer and inner loop. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102743	2021-05-31 16:27:38 -04:00
Florian Hahn	5c9fe816e3	[LoopDeletion] Add additional test cases with more nested loops. Also remove mustprogress function attribute from one of the tests Extends test coverage for D103382.	2021-05-31 20:27:07 +01:00
Florian Hahn	aa00b1d763	[LV] Try to sink users recursively for first-order recurrences. Update isFirstOrderRecurrence to explore all uses of a recurrence phi and check if we can sink them. If there are multiple users to sink, they are all mapped to the previous instruction. Fixes PR44286 (and another PR or two). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D84951	2021-05-31 19:55:33 +01:00
Arthur Eubanks	d350dd8ba2	[test] Properly match parameter/argument ABI attributes These were found with D103412.	2021-05-31 09:12:18 -07:00
Juneyoung Lee	7161bb87c9	[InsCombine] Fix a few remaining vec transforms to use poison instead of undef This is a patch that replaces shufflevector and insertelement's placeholder value with poison. Underlying motivation is to fix the semantics of shufflevector with undef mask to return poison instead (D93818) The consensus has been made in the late 2020 via mailing list as well as the thread in https://bugs.llvm.org/show_bug.cgi?id=44185 . This patch is a simple syntactic change to the existing code, hence directly pushed as a commit.	2021-05-31 18:47:09 +09:00
David Green	222aeb4d51	[DSE] Remove stores in the same loop iteration DSE will currently only remove stores in the same block unless they can be guaranteed to be loop invariant. This expands that to any stores that are in the same Loop, at the same loop level. This should still account for where AA/MSSA will not handle aliasing between loops, but allow the dead stores to be removed where they overlap in the same loop iteration. It requires adding loop info to DSE, but that looks fairly harmless. The test case this helps is from code like this, which can come up in certain matrix operations: for(i=..) dst[i] = 0; for(j=..) dst[i] += src[in+j]; After LICM, this becomes: for(i=..) dst[i] = 0; sum = 0; for(j=..) sum += src[in+j]; dst[i] = sum; The first store is dead, and with this patch is now removed. Differntial Revision: https://reviews.llvm.org/D100464	2021-05-31 10:22:37 +01:00
Hyeongyu Kim	4f2fd3818b	[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210) As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210 ...the bug is triggered as Eli say when sext(idx) * ElementSize overflows. ``` // assume that GV is an array of 4-byte elements GEP = gep GV, 0, Idx // this is accessing Idx * 4 L = load GEP ICI = icmp eq L, value => ICI = icmp eq Idx, NewIdx ``` The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp. And there is a problem because Idx * ElementSize can overflow. Let's assume that the wanted value is at offset 0. Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00. We should return true for all these values, but currently, the new icmp only returns true for 0x00..00. This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx. ``` ... => Idx' = and Idx, 0x3F..FF ICI = icmp eq Idx', NewIdx ``` Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D99481	2021-05-31 14:08:20 +09:00
Florian Hahn	268e24a46a	[LoopDeletion] Add more tests with infinite sub-loops & mustprogress. A couple of additional tests inspired by PR50511.	2021-05-30 16:41:57 +01:00
Florian Hahn	829978744d	[VectorCombine] Add tests with noundef index for load scalarization.	2021-05-30 12:15:41 +01:00
Sanjay Patel	7bb8bfa062	[InstCombine] fix miscompile from vector select substitution This is similar to the fix in `c590a9880d` ( PR49832 ), but we missed handling the pattern for select of bools (no compare inst). We can't substitute a vector value because the equality condition replacement that we are attempting requires that the condition is true/false for the entire value. Vector select can be partly true/false. I added an assert for vector types, so we shouldn't hit this again. Fixed formatting while auditing the callers. https://llvm.org/PR50500	2021-05-30 07:11:58 -04:00
Pengxuan Zheng	056733d019	[SafeStack] Use proper API to get stack guard Using the proper API automatically sets `__stack_chk_guard` to `dso_local` if `Reloc::Static`. This wasn't strictly necessary until recently when dso_local was no longer implied by `TargetMachine::shouldAssumeDSOLocal` for `__stack_chk_guard`. By using the proper API, we can avoid generating unnecessary GOT relocations. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102646	2021-05-30 00:52:48 -07:00
Sanjay Patel	c7da0c383a	[InstCombine] fold zext of masked bit set/clear This does not solve PR17101, but it is one of the underlying diffs noted here: https://bugs.llvm.org/show_bug.cgi?id=17101#c8 We could ease the one-use checks for the 'clear' (no 'not' op) half of the transform, but I do not know if that asymmetry would make things better or worse. Proofs: https://rise4fun.com/Alive/uVB Name: masked bit set %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp ne i32 %and, 0 %r = zext i1 %cmp to i32 => %s = lshr i32 %x, %y %r = and i32 %s, 1 Name: masked bit clear %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp eq i32 %and, 0 %r = zext i1 %cmp to i32 => %xn = xor i32 %x, -1 %s = lshr i32 %xn, %y %r = and i32 %s, 1 Note: this is a re-post of a patch that I committed at: rGa041c4ec6f7a The commit was reverted because it exposed another bug: rGb212eb7159b40 But that has since been corrected with: rG8a156d1c2795189 ( D101191 ) Differential Revision: https://reviews.llvm.org/D72396	2021-05-29 08:52:26 -04:00
LemonBoy	b577ec4956	[AtomicExpandPass][AArch64] Promote xchg with floating-point types to integer ones Follow the same strategy used for atomic loads/stores by converting the operands to equally-sized integer types. This change prevents the atomic expansion pass from generating illegal LL/SC pairs when targeting AArch64: `expand-atomicrmw-xchg-fp.ll` would previously instantiate intrinsics such as `llvm.aarch64.ldaxr.p0f32` that cannot be lowered. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D103232	2021-05-29 08:57:27 +02:00
Luke	c4c3869554	[RISCV] Enable interleaved vectorization for RVV Enable interleaved vectorization for RVV. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101469	2021-05-29 11:03:27 +08:00
Fangrui Song	38dbdde792	[Internalize] Simplify comdat renaming with noduplicates after D103043 I realized that we can use `comdat noduplicates` which is available on ELF. Add a special case for wasm which doesn't support the feature.	2021-05-28 16:58:38 -07:00
Nikita Popov	4af2730ac3	[LoopUnroll] Add store to unreachable latch test (NFC) This is to show that we currently only convert the terminator to unreachable, but don't clean up instructions before it (unless trivial DCE removes them). Also clean up excessive whitespace in this test.	2021-05-28 22:49:23 +02:00
Florian Hahn	007f268c35	[VectorCombine] Check indices for all extracts we scalarize. We need to make sure that the indices of all extracts we scalarize are valid.	2021-05-28 18:35:29 +01:00
Florian Hahn	f01df9805c	[VectorCombine] Add variants of multi-extract tests with assumes.	2021-05-28 18:35:24 +01:00
Stefan Pintilie	0159652058	Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" (try 2)" This reverts commit `be1a23203b`.	2021-05-28 12:21:22 -05:00
Sanjay Patel	403cfe5d70	[PassManager] unify late simplifycfg options between regular and LTO pipelines This is split off from D102002, and I think it is clear that the difference in behavior was not intended. Options were added to SimplifyCFG over time, but different chunks of the pass pipelines were not kept in sync.	2021-05-28 13:06:49 -04:00
Sanjay Patel	a279550cde	[PhaseOrdering] add test for late simplifycfg with LTO; NFC Part of D102002	2021-05-28 13:06:48 -04:00
Florian Hahn	2ee59f75fe	[LoopDeletion] Add test with potentially infinite sub-loop. Tests for PR50511.	2021-05-28 17:45:44 +01:00

... 3 4 5 6 7 ...

19024 Commits