llvm-project

Commit Graph

Author	SHA1	Message	Date
Augie Fackler	ca051a46fb	InstCombineCalls: infer return alignment from allocalign attributes This exposes a couple of lingering bugs, which will be fixed in the next two commits. Differential Revision: https://reviews.llvm.org/D123052	2022-04-07 12:38:44 -04:00
Nikita Popov	682ef39b1a	[InstCombine] Remove call to getPointerElementType() This was erroneously re-introduced as part of `bb0b23174e`.	2022-03-29 16:52:29 +02:00
Johannes Doerfert	bb0b23174e	[InstCombineCalls] Optimize call of bitcast even w/ parameter attributes Before we gave up if a call through bitcast had parameter attributes. Interestingly, we allowed attributes for the return value already. We now handle both the same way, namely, we drop the ones that are incompatible with the new type and keep the rest. This cannot cause "more UB" than initially present. Differential Revision: https://reviews.llvm.org/D119967	2022-03-28 20:57:52 -05:00
chenglin.bi	52f323d0f1	[InstCombine] Fold abs of known negative operand when source is sub When abs source comes from (x - y), check if a "x > y" dominating condition exists. Fixes #54132 Differential Revision: https://reviews.llvm.org/D122013	2022-03-23 15:21:33 -04:00
Philip Reames	7abefc4222	[instcombine] Fold away memset/memmove from otherwise unused alloca The motivation for this is that while both memcpyopt and dse will catch this case, both are limited by MSSA's walk back threshold when finding clobbers. As such, if you have a memcpy of an otherwise dead alloca placed towards the end of a long basic block with lots of other memory instructions, it would be missed. This is a bit undesirable for such an "obviously" useless bit of code. As noted in comments, we should probably generalize instcombine's escape analysis peephole (see visitAllocInst) to allow read xor write. Doing that would subsume this code in a more general way, but is also a more involved change. For the moment, I went with the easiest fix.	2022-03-22 13:48:48 -07:00
Sanjay Patel	60820e53ec	[InstCombine] try to canonicalize logical shift after bswap When shifting by a byte-multiple: bswap (shl X, C) --> lshr (bswap X), C bswap (lshr X, C) --> shl (bswap X), C This is an IR implementation of a transform suggested in D120648. The "swaps cancel" test models the motivating optimization from that proposal. Alive2 checks (as noted in the other review, we could use knownbits to handle shift-by-variable-amount, but that can be an enhancement patch): https://alive2.llvm.org/ce/z/pXUaRf https://alive2.llvm.org/ce/z/ZnaMLf Differential Revision: https://reviews.llvm.org/D122010	2022-03-22 09:10:55 -04:00
Nikita Popov	c1b9667148	[InstCombine] Support opaque pointers in callee bitcast fold To make this actually trigger, we also need to check whether the function types differ, which is a hidden cast under opaque pointers. The transform is somewhat less relevant there because it is primarily about pointer bitcasts, but it can also happen with other bit- or pointer-castable types. Byval handling is easier with opaque pointers because there is no need to adjust the byval type, we only need to make sure that it's still a pointer.	2022-03-03 11:07:39 +01:00
Nikita Popov	6c8adc5054	[InstCombine] Remove unnecessary byval check in callee cast fold The logic for handling this was fixed in `8d7f118ab2`, but the check for byval on the callee was retained. This resulted in a weird situation where the transform would work depending on whether the byval was only on the call or on both the call and the function.	2022-03-03 10:55:14 +01:00
serge-sans-paille	59630917d6	Cleanup includes: Transform/Scalar Estimated impact on preprocessor output line: before: 1062981579 after: 1062494547 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120817	2022-03-03 07:56:34 +01:00
Nikita Popov	9353ed6a53	[InstCombine] Don't call matchSAddSubSat() for SPF (NFC) Only call it for intrinsic min/max. The moved implementation is unchanged apart from the one-use check: It is now hardcoded to one-use, without the two-use special case for SPF.	2022-02-28 10:41:56 +01:00
Simon Pilgrim	be1ffda0a5	[InstCombine] visitCallInst - pull out repeated bswap scalar type bitwidth. NFC.	2022-02-18 17:33:11 +00:00
Sanjay Patel	58df2da054	[InstCombine] push constant operand down/outside in sequence of min/max intrinsics A generalization like this was suggested in D119754. This is the inverse direction of D119851, and we get all of the folds there plus the one that was missed. There is precedence for this kind of transform in instcombine with "or" instructions (but strangely only with that one opcode AFAICT). Similar justification as in the other patch: The line between instcombine and reassociate for these kinds of folds is blurry. This doesn't appear to have much cost and gives us the expected wins from repeated folds as seen in the last set of test diffs. Differential Revision: https://reviews.llvm.org/D119955	2022-02-17 10:36:37 -05:00
Sanjay Patel	6357ccf57f	[InstCombine] reassociate min/max intrinsics with constant operands Integer min/max operations are associative: max (max X, C0), C1 --> max X, (max C0, C1) --> max X, NewC https://alive2.llvm.org/ce/z/wW5HVM This would avoid a regression when we canonicalize to min/max intrinsics (see D98152 ). Differential Revision: https://reviews.llvm.org/D119754	2022-02-15 08:31:23 -05:00
Roman Lebedev	cd9e6a9c10	[NFC][InstCombine] `visitCallInst()`: make comment more understandable	2022-02-05 02:15:07 +03:00
Anna Thomas	4fc52db116	[InstCombine] Remove weaker fence adjacent to a stronger fence We have an instCombine rule to remove identical consecutive fences. We can extend this to remove weaker fences when we have consecutive stronger fence. As stated in the LangRef, a fence with a stronger ordering also implies ordering weaker than itself: "A fence which has seq_cst ordering, in addition to having both acquire and release semantics specified above, participates in the global program order of other seq_cst operations and/or fences." Reviewed-By: reames Differential Revision: https://reviews.llvm.org/D118607	2022-02-01 11:05:34 -08:00
Nikita Popov	8d992862a0	[InstCombine] Remove some pointer element type accesses One of these is guarded against opaque pointers, and the others were accessing the call function type in a rather convoluted way.	2022-01-27 10:15:35 +01:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Sanjay Patel	2e26633af0	[IR] document and update ctlz/cttz intrinsics to optionally return poison rather than undef The behavior in Analysis (knownbits) implements poison semantics already, and we expect the transforms (for example, in instcombine) derived from those semantics, so this patch changes the LangRef and remaining code to be consistent. This is one more step in removing "undef" from LLVM. Without this, I think https://github.com/llvm/llvm-project/issues/53330 has a legitimate complaint because that report wants to allow subsequent code to mask off bits, and that is allowed with undef values. The clang builtins are not actually documented anywhere AFAICT, but we might want to add that to remove more uncertainty. Differential Revision: https://reviews.llvm.org/D117912	2022-01-23 11:22:48 -05:00
Caroline Concatto	ad43217a04	[InstCombine] Fold for masked gather when loading the same value each time. This patch checks in the masked gather when the first operand value is a splat and the mask is all one, because the masked gather is reloading the same value each time. This patch replaces this pattern of masked gather by a scalar load of the value and splats it in a vector. Differential Revision: https://reviews.llvm.org/D115726	2022-01-21 14:19:51 +00:00
Pawe Bylica	1d7604fdce	[InstCombine] Simplify bswap -> shift Simplify bswap(x) to shl(x) or lshr(x) if x has exactly one "active byte", i.e. all active bits are contained in boundaries of a single byte of x. https://alive2.llvm.org/ce/z/nvbbU5 https://alive2.llvm.org/ce/z/KiiL3J Reviewed By: spatel, craig.topper, lebedev.ri Differential Revision: https://reviews.llvm.org/D117680	2022-01-21 01:25:30 +01:00
Nikita Popov	c63a3175c2	[AttrBuilder] Remove ctor accepting AttributeList and Index Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeList. Moving this out of the AttrBuilder generally results in cleaner code.	2022-01-15 22:39:31 +01:00
Caroline Concatto	8e5a5b619d	[InstCombine] Fold for masked scatters to a uniform address When masked scatter intrinsic does a uniform store to a destination address from a source vector, and in this case, the mask is all one value. This patch replaces the masked scatter with an extracted element of the last lane of the source vector and stores it in the destination vector. This patch also folds when the value in the masked scatter is a splat. In this case, the mask cannot be all zero, and it folds to a scalar store of the value in the destination pointer. Differential Revision: https://reviews.llvm.org/D115724	2022-01-14 09:44:34 +00:00
Philip Reames	5265ac72c6	[MemoryBuiltin] Add an API for checking if an unused allocation can be removed [NFC] Not all allocation functions are removable if unused. An example of a non-removable allocation would be a direct call to the replaceable global allocation function in C++. An example of a removable one - at least according to historical practice - would be malloc.	2022-01-10 15:43:39 -08:00
Bryce Wilson	fb936595fa	[MemoryBuiltins] Add field for alignment argument [NFC] There are a few places where the alignment argument for AlignedAllocLike functions was previously hardcoded. This patch adds an getAllocAlignment function and a change to the MemoryBuiltin table to allow alignment arguments to be found generically. This will shortly allow alignment inference on operator new's with align_val params and an extension to Attributor's HeapToStack. The former will follow shortly - I split Bryce's patch for purpose of having the large change be NFC. The later will be reviewed separately. Differential Revision: https://reviews.llvm.org/D116851 (part 1 of 2)	2022-01-10 09:15:20 -08:00
Philip Reames	f4c54683d6	[instcombine] Infer alignment for aligned_alloc with potentially zero size This change removes a previous restriction where we had to prove the allocation performed by aligned_alloc was non-zero in size before using the align parameter to annotate the result. I believe this was conservatism around the C11 specification of this routine which allowed UB when size was not a multiple of alignment, but if so, it was a partial one at best. (ex: align 32, size 16 was equally UB, but not restricted) The spec has since been clarified to require nullptr return, not UB. A nullptr - the documented return for this function on failure for all cases after UB mentioned above was removed - is trivially aligned for any power of two. This isn't totally new behavior even for this transform, we'd previously annotate potentially failing allocs (e.g. huge sizes) meaning we were putting align on potentially null pointers anyways. This change simpy does the same for all failure modes.	2022-01-10 08:48:49 -08:00
Serge Guelton	d2cc6c2d0c	Use a sorted array instead of a map to store AttrBuilder string attributes Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step. Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions Differential Revision: https://reviews.llvm.org/D116599	2022-01-10 14:49:53 +01:00
Philip Reames	2cafbcb560	[instcombine] Key deref vs deref_or_null annotation of allocation sites off nonnull attribute Goal is to remove use of isOpNewLike. I looked at a couple approaches to this, and this turned out to be the cheapest one. Just letting deref_or_null be generated causes a bunch of test diffs, and I couldn't convince myself there wasn't a real regression somewhere. A generic instcombine to convert deref_or_null + nonnull to deref is annoying complicated since you have to mix facts from callsite and declaration while manipulating only existing call site attributes. It just wasn't worth the code complexity. Note that the change in new-delete-itanium.ll is a real regression. If you have a callsite which overrides the builtin status of a nobuiltin declaration, and you don't put the apppriate attributes on that callsite, you may lose the deref fact. I decided this didn't matter; if anyone disagrees, you can add this case to the generic non-null inference.	2022-01-08 10:33:54 -08:00
Philip Reames	dcbc91f40c	[instcombine] Delete duplicate object size logic nstCombine appears to duplicate the allocation size logic used inside getObjectSize when figuring out which attributes are safe to place on the callsite. We can use the existing utility function instead. The test change is correct. With aligned_alloc, a zero alignment is required to return nullptr. As such, deref_or_null is a correct attribute to use. Differential Revision: https://reviews.llvm.org/D116816	2022-01-07 10:32:26 -08:00
Nick Desaulniers	95ba0e4563	[SimplifyLibCalls] propagate tail flags on CallInsts I noticed we weren't propagating tail flags on calls when FortifiedLibCallSimplifier.optimizeCall() was replacing calls to runtime checked calls to the non-checked routines (when safe to do so). Make sure to check this before replacing the original calls! Also, avoid any libcall transforms when notail/musttail is present. PR46734 Fixes: https://github.com/llvm/llvm-project/issues/46079 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D107872	2021-12-13 11:18:30 -08:00
Zarko Todorovski	0d3add216f	[llvm][NFC] Inclusive language: Reword replace uses of sanity in llvm/lib/Transform comments and asserts Reworded some comments and asserts to avoid usage of `sanity check/test` Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114372	2021-11-23 13:22:55 -05:00
Itay Bookstein	f9059efa0d	[InstCombine] Extend stacksave/restore elimination Previously, InstCombine detected a pair of llvm.stacksave/stackrestore instructions that are adjacent modulo debug instructions in order to eliminate the llvm.stackrestore. This precludes situations where intervening instructions (e.g. loads) preclude the llvm.stacksave and llvm.stackrestore from becoming adjacent. This commit extends the logic and allows for eliminating the llvm.stackrestore when the range of instructions between them does not include any alloca or side-effect causing instructions. Signed-off-by: Itay Bookstein <itay.bookstein@nextsilicon.com> Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D113105	2021-11-10 10:41:58 +02:00
Itay Bookstein	fe7491d32f	[InstCombine][NFC] Refactor llvm.stackrestore handling Hoist the instruction classification logic outside the loop in preparation for reuse in a future commit. Signed-off-by: Itay Bookstein <itay.bookstein@nextsilicon.com> Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D113464	2021-11-10 10:41:56 +02:00
Hongtao Yu	098a0d8fbc	[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3. This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation. Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are: - Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes - Unblocked CSE by avoiding pseudo probe from clobbering memory SSA - Unblocked induction variable simpliciation - Allow empty loop deletion by treating probe intrinsic isDroppable - Some refactoring. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110847	2021-10-12 09:44:12 -07:00
Jay Foad	a9bceb2b05	[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC. Stop using APInt constructors and methods that were soft-deprecated in D109483. This fixes all the uses I found in llvm, except for the APInt unit tests which should still test the deprecated methods. Differential Revision: https://reviews.llvm.org/D110807	2021-10-04 08:57:44 +01:00
Kazu Hirata	4f0225f6d2	[Transforms] Migrate from getNumArgOperands to arg_size (NFC) Note that getNumArgOperands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-01 09:57:40 -07:00
Sanjay Patel	6063e6b499	[InstCombine] move add after min/max intrinsic This is another regression noted with the proposal to canonicalize to the min/max intrinsics in D98152. Here are Alive2 attempts to show correctness without specifying exact constants: https://alive2.llvm.org/ce/z/bvfCwh (smax) https://alive2.llvm.org/ce/z/of7eqy (smin) https://alive2.llvm.org/ce/z/2Xtxoh (umax) https://alive2.llvm.org/ce/z/Rm4Ad8 (umin) (if you comment out the assume and/or no-wrap, you should see failures) The different output for the umin test is due to a fold added with `c4fc2cb5b2` : // umin(x, 1) == zext(x != 0) We probably want to adjust that, so it applies more generally (umax --> sext or patterns where we can fold to select-of-constants). Some folds that were ok when starting with cmp+select may increase instruction count for the equivalent intrinsic, so we have to decide if it's worth altering a min/max. Differential Revision: https://reviews.llvm.org/D110038	2021-09-26 09:49:10 -04:00
Florian Hahn	e08a5dc86f	[InstCombine] Move InstCombineWorklist to Utils to allow reuse (NFC). InstCombine's worklist can be re-used by other passes like VectorCombine. Move it to llvm/Transform/Utils and rename it to InstructionWorklist. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D110181	2021-09-22 08:47:21 +01:00
Usman Nadeem	f417d9d821	[InstCombine] Eliminate vector reverse if all inputs/outputs to an instruction are reverses Differential Revision: https://reviews.llvm.org/D109808 Change-Id: I1a10d2bc33acbe0ea353c6cb3d077851391fe73e	2021-09-20 18:32:24 -07:00
Dávid Bolvanský	a4a426c9e0	[InstCombine] Added llvm.powi optimizations If power is even: powi(-x, p) -> powi(x, p) powi(fabs(x), p) -> powi(x, p) powi(copysign(x, y), p) -> powi(x, p)	2021-09-16 19:42:21 +02:00
Chris Lattner	735f46715d	[APInt] Normalize naming on keep constructors / predicate methods. This renames the primary methods for creating a zero value to `getZero` instead of `getNullValue` and renames predicates like `isAllOnesValue` to simply `isAllOnes`. This achieves two things: 1) This starts standardizing predicates across the LLVM codebase, following (in this case) ConstantInt. The word "Value" doesn't convey anything of merit, and is missing in some of the other things. 2) Calling an integer "null" doesn't make any sense. The original sin here is mine and I've regretted it for years. This moves us to calling it "zero" instead, which is correct! APInt is widely used and I don't think anyone is keen to take massive source breakage on anything so core, at least not all in one go. As such, this doesn't actually delete any entrypoints, it "soft deprecates" them with a comment. Included in this patch are changes to a bunch of the codebase, but there are more. We should normalize SelectionDAG and other APIs as well, which would make the API change more mechanical. Differential Revision: https://reviews.llvm.org/D109483	2021-09-09 09:50:24 -07:00
Arthur Eubanks	b81fc14f2d	[NFC][InstCombine] Make check for sret in a vararg function clearer We're trying to get the parameter index of sret and see if it's part of a function's varargs. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D109335	2021-09-07 11:19:27 -07:00
Roman Lebedev	3f1f08f0ed	Revert @llvm.isnan intrinsic patchset. Please refer to https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html (and that whole thread.) TLDR: the original patch had no prior RFC, yet it had some changes that really need a proper RFC discussion. It won't be productive to discuss such an RFC, once it's actually posted, while said patch is already committed, because that introduces bias towards already-committed stuff, and the tree is potentially in broken state meanwhile. While the end result of discussion may lead back to the current design, it may also not lead to the current design. Therefore i take it upon myself to revert the tree back to last known good state. This reverts commit `4c4093e6e3`. This reverts commit `0a2b1ba33a`. This reverts commit `d9873711cb`. This reverts commit `791006fb8c`. This reverts commit `c22b64ef66`. This reverts commit `72ebcd3198`. This reverts commit `5fa6039a5f`. This reverts commit `9efda541bf`. This reverts commit `94d3ff09cf`.	2021-09-02 13:53:56 +03:00
Sanjay Patel	8c7a7e1f67	[InstCombine] allow more min/max with 'not' folds for intrinsics isFreeToInvert allows min/max with 'not' on both operands, so easing the argument restriction catches the case where that operand has one use. We already handle the sub-patterns when there are less uses: https://alive2.llvm.org/ce/z/8Jatm_ ...but this is another step towards parity with the equivalent icmp+select idioms ( D98152 ). Differential Revision: https://reviews.llvm.org/D109059	2021-09-01 14:40:00 -04:00
Sanjay Patel	8a10f4a0f6	[InstCombine] use isFreeToInvert to generalize min/max with 'not' This mimics the code for the corresponding cmp-select idiom. This also prevents an infinite loop because isFreeToInvert does not match constant expressions. So this patch solves the same problem as D108814 and obsoletes it, but my main motivation is to enhance the pattern matching to allow more invertible ops. That change will be a follow-up patch on top of this one. Differential Revision: https://reviews.llvm.org/D109058	2021-09-01 14:34:22 -04:00
Arthur Eubanks	3f4d00bc3b	[NFC] More get/removeAttribute() cleanup	2021-08-17 21:05:41 -07:00
Sanjay Patel	50c1138796	[InstCombine] add TODO about another min/max fold; NFC Suggested in post-commit for `d0975b7cb0`	2021-08-17 14:14:25 -04:00
Sanjay Patel	e73f4e1123	[InstCombine] remove unused function argument; NFC	2021-08-17 08:10:42 -04:00
Sanjay Patel	d0975b7cb0	[InstCombine] fold signed min/max intrinsics with negated operands If both operands are negated, we can invert the min/max and do the negation after: smax (neg nsw X), (neg nsw Y) --> neg nsw (smin X, Y) smin (neg nsw X), (neg nsw Y) --> neg nsw (smax X, Y) This is visible as a remaining regression in D98152. I don't see a way to generalize this for 'unsigned' or adapt Negator to handle it. This only appears to be safe with 'nsw': https://alive2.llvm.org/ce/z/GUy1zJ Differential Revision: https://reviews.llvm.org/D108165	2021-08-17 08:10:42 -04:00
David Green	c6b7db015f	[InstCombine] Add call to matchSAddSubSat from min/max This adds a call to matchSAddSubSat from smin/smax instrinsics, allowing the same patterns to match if the canonical form of a min/max is an intrinsics, not a icmp/select. Differential Revision: https://reviews.llvm.org/D108077	2021-08-15 17:25:16 +01:00
Arthur Eubanks	80ea2bb574	[NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes() This is more consistent with similar methods.	2021-08-13 11:16:52 -07:00
Arthur Eubanks	a0c42ca56c	[NFC] Remove AttributeList::hasParamAttribute() It's the same as AttributeList::hasParamAttr().	2021-08-13 10:58:21 -07:00
Sanjay Patel	14eefa57f2	[InstCombine] factorize min/max intrinsic ops with common operand (2nd try) This is a re-try of `6de1dbbd09` which was reverted because it missed a null check. Extra test for that failure added. Original commit message: This is an adaptation of D41603 and another step on the way to canonicalizing to the intrinsic forms of min/max. See D98152 for status.	2021-08-12 16:32:07 -04:00
Amy Huang	427520a8fa	Revert "[InstCombine] factorize min/max intrinsic ops with common operand" This reverts commit `6de1dbbd09` because it causes a compiler crash.	2021-08-12 12:36:25 -07:00
Sanjay Patel	cd44cc86e3	[InstCombine] remove unused function argument; NFC This was just added with `6de1dbbd09` , and I missed pulling the extra arg from the final revision.	2021-08-12 11:47:25 -04:00
Sanjay Patel	6de1dbbd09	[InstCombine] factorize min/max intrinsic ops with common operand This is an adaptation of D41603 and another step on the way to canonicalizing to the intrinsic forms of min/max. See D98152 for status.	2021-08-12 11:19:09 -04:00
Roman Lebedev	0a241e90d4	[NFC][InstCombine] `vector_reduce_xor(?ext(<n x i1>))` --> `?ext(vector_reduce_add(<n x i1>))` Instead of expanding it ourselves, we can just forward to `?ext(vector_reduce_add(<n x i1>))`, as per alive2: https://alive2.llvm.org/ce/z/ymz7zE (self) https://alive2.llvm.org/ce/z/eKu2v2 (skipped zext) https://alive2.llvm.org/ce/z/c3BXgc (skipped sext)	2021-08-07 17:31:33 +03:00
Roman Lebedev	c6ff867f92	[NFC][InstCombine] Simplify emitted IR for `vector_reduce_xor(?ext(<n x i1>))` Now that we canonicalize low bit splatting to the form we were emitting here ourselves, emit simpler IR that will be canonicalized later. See `1e801439be` for proofs: https://alive2.llvm.org/ce/z/MjCm5W (self) https://alive2.llvm.org/ce/z/kgqF4M (skipped zext) https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)	2021-08-07 17:31:24 +03:00
Serge Pavlov	4c4093e6e3	Introduce intrinsic llvm.isnan This is recommit of the patch `16ff91ebcc`, reverted in `0c28a7c990` because it had an error in call of getFastMathFlags (base type should be FPMathOperator but not Instruction). The original commit message is duplicated below: Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-06 14:32:27 +07:00
Serge Pavlov	0c28a7c990	Revert "Introduce intrinsic llvm.isnan" This reverts commit `16ff91ebcc`. Several errors were reported mainly test-suite execution time. Reverted for investigation.	2021-08-04 17:18:15 +07:00
Serge Pavlov	16ff91ebcc	Introduce intrinsic llvm.isnan Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-04 15:27:49 +07:00
Roman Lebedev	4ba3326f17	[InstCombine] `vector_reduce_{or,and}(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259) This allows the expansion logic to actually trigger if the argument was extended from i1 element type, like the rest of the reductions expect. Alive2 agrees: https://alive2.llvm.org/ce/z/wcfews (or zext) https://alive2.llvm.org/ce/z/FCXNFx (or sext) https://alive2.llvm.org/ce/z/f26zUY (and zext) https://alive2.llvm.org/ce/z/jprViN (and sext)	2021-08-03 00:54:35 +03:00
Roman Lebedev	554fc9ad0a	[InstCombine] `vector_reduce_smax(?ext(<n x i1>))` --> `?ext(vector_reduce_{and,or}(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/3oqir9 (self) https://alive2.llvm.org/ce/z/6cuI5m (zext) https://alive2.llvm.org/ce/z/4FL8rD (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-03 00:29:06 +03:00
Roman Lebedev	f47b7b6d10	[InstCombine] `vector_reduce_smin(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/noXtZ8 (self) https://alive2.llvm.org/ce/z/JNrN6C (zext) https://alive2.llvm.org/ce/z/58snuN (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-03 00:29:06 +03:00
Roman Lebedev	b9b7162b8b	[InstCombine] `vector_reduce_umax(?ext(<n x i1>))` --> `?ext(vector_reduce_or(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/NbBaeT (self) https://alive2.llvm.org/ce/z/iEaig4 (zext) https://alive2.llvm.org/ce/z/meGb3y (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-02 23:02:23 +03:00
Roman Lebedev	0c13798056	[InstCombine] `vector_reduce_umin(?ext(<n x i1>))` --> `?ext(vector_reduce_and(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/XxUScW (self) https://alive2.llvm.org/ce/z/3usTF- (zext) https://alive2.llvm.org/ce/z/GVxwQz (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-02 23:02:22 +03:00
Roman Lebedev	469793efa7	[InstCombine] `vector_reduce_mul(?ext(<n x i1>))` --> `zext(vector_reduce_and(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/PDansB (self) https://alive2.llvm.org/ce/z/55D-Xc (zext) https://alive2.llvm.org/ce/z/LxG3-r (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-02 21:57:51 +03:00
Roman Lebedev	1e801439be	[InstCombine] `xor` reduction w/ i1 elt type is a parity check For i1 element type, `xor` and `add` are interchangeable (https://alive2.llvm.org/ce/z/e77hhQ), so we should treat it just like an `add` reduction and consistently transform them both: https://alive2.llvm.org/ce/z/MjCm5W (self) https://alive2.llvm.org/ce/z/kgqF4M (skipped zext) https://alive2.llvm.org/ce/z/pgy3HP (skipped sext) Though, let's emit the IR that is similar to the one we produce for `vector_reduce_add(<n x i1>)`. See https://bugs.llvm.org/show_bug.cgi?id=51259	2021-08-02 20:21:37 +03:00
Jun Ma	958dddf7df	[NFC][InstCombine] Fix typo	2021-07-27 11:33:10 +08:00
Alexey Bataev	8af69975af	[InstCombine][NFC]Use only `replaceInstUsesWith`, NFC.	2021-07-08 13:58:30 -07:00
Alexey Bataev	b5113bff46	[Instcombine]Transform reduction+(sext/zext(<n x i1>) to <n x im>) to [-]zext/trunc(ctpop(bitcast <n x i1> to in)) to im. Some of the SPEC tests end up with reduction+(sext/zext(<n x i1>) to <n x im>) pattern, which can be transformed to [-]zext/trunc(ctpop(bitcast <n x i1> to in)) to im. Also, reduction+(<n x i1>) can be transformed to ctpop(bitcast <n x i1> to in) & 1 != 0. Differential Revision: https://reviews.llvm.org/D105587	2021-07-08 07:56:41 -07:00
Philip Reames	c4fc2cb5b2	[instcombine] umin(x, 1) == zext(x != 0) We already implemented this for the select form, but the intrinsic form was missing. Note that this doesn't change poison behavior as 1 is non-poison, and the optimized form is still poison exactly when x is.	2021-06-30 10:20:01 -07:00
Alexey Bataev	129ae515fb	[INSTCOMBINE] Transform reduction(shuffle V, poison, unique_mask) to reduction(V). After SLP + LTO we may have have reduction(shuffle V, poison, mask). This can be simplified to just reduction(V) if the mask is only for single vector and just all elements from this vector are permuted, without reusing, replacing with undefs and/or other values, etc. Differential Revision: https://reviews.llvm.org/D105053	2021-06-29 10:02:38 -07:00
Sanjay Patel	153da08a6c	[InstCombine] hoist min/max intrinsics above select with constant op This is an extension of the handling for unary intrinsics and follows the logic that we use for binary ops. We don't canonicalize to min/max intrinsics yet, but this might help unlock other folds seen in D98152.	2021-06-27 10:02:23 -04:00
Nikita Popov	8e0ff44bf8	[InstCombine] Make varargs cast transform compatible with opaque ptrs The whole transform can be dropped once we have fully transitioned to opaque pointers (as it's purpose is to remove no-op pointer casts). For now, make sure that it handles opaque pointers correctly.	2021-06-24 21:57:05 +02:00
Nikita Popov	8321335fd8	[InstCombine] Use getFunctionType() Avoid fetching pointer element type...	2021-06-23 20:28:34 +02:00
Sanjay Patel	1e9b6b89a7	[InstCombine] convert FP min/max with negated op to fabs This is part of improving floating-point patterns seen in: https://llvm.org/PR39480 We don't require any FMF because the 2 potential corner cases (-0.0 and NaN) are correctly handled without FMF: 1. -0.0 is treated as strictly less than +0.0 with maximum/minimum, so fabs/fneg work as expected. 2. +/- 0.0 with maxnum/minnum is indeterminate, so transforming to fabs/fneg is more defined. 3. The sign of a NaN may be altered by this transform, but that is allowed in the default FP environment. If there are FMF, they are propagated from the min/max call to one or both new operands which seems to agree with Alive2: https://alive2.llvm.org/ce/z/bem_xC	2021-06-23 10:41:39 -04:00
Joe Ellis	3c4dbf6ea9	[Verifier] Fail on overrunning and invalid indices for {insert,extract} vector intrinsics With regards to overrunning, the langref (llvm/docs/LangRef.rst) specifies: (llvm.experimental.vector.insert) Elements ``idx`` through (``idx`` + num_elements(``subvec``) - 1) must be valid ``vec`` indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined. (llvm.experimental.vector.extract) Elements ``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined. For the non-mixed cases (e.g. inserting/extracting a scalable into/from another scalable, or inserting/extracting a fixed into/from another fixed), it is possible to statically check whether or not the above conditions are met. This was previously missing from the verifier, and if the conditions were found to be false, the result of the insertion/extraction would be replaced with an undef. With regards to invalid indices, the langref (llvm/docs/LangRef.rst) specifies: (llvm.experimental.vector.insert) ``idx`` represents the starting element number at which ``subvec`` will be inserted. ``idx`` must be a constant multiple of ``subvec``'s known minimum vector length. (llvm.experimental.vector.extract) The ``idx`` specifies the starting element number within ``vec`` from which a subvector is extracted. ``idx`` must be a constant multiple of the known-minimum vector length of the result type. Similarly, these conditions were not previously enforced in the verifier. In some circumstances, invalid indices were permitted silently, and in other circumstances, an undef was spawned where a verifier error would have been preferred. This commit adds verifier checks to enforce the constraints above. Differential Revision: https://reviews.llvm.org/D104468	2021-06-23 10:33:22 +00:00
Sanjay Patel	b1f6ef92ec	[InstCombine] reduce code duplication for FP min/max with casts fold; NFC	2021-06-22 14:15:04 -04:00
Sanjay Patel	198b79caae	[InstCombine] move bitmanipulation-of-select folds This is no outwardly-visible-difference-intended, but it is obviously better to have all transforms for an intrinsic housed together since we already have helper functions in place. It is also potentially more efficient to zap a simple pattern match before trying to do expensive computeKnownBits() calls.	2021-06-21 11:32:16 -04:00
Sanjay Patel	64b2676ca8	[InstCombine] fold ctlz/cttz-of-select with 1 or more constant arms Building on: `4c44b02d87` ...and adding handling for the extra operand in these intrinsics. This pattern is discussed in: https://llvm.org/PR50140	2021-06-21 11:04:12 -04:00
Juneyoung Lee	ce192ced2b	[InstCombine] Use poison constant to represent the result of unreachable instrs This patch updates InstCombine to use poison constant to represent the resulting value of (either semantically or syntactically) unreachable instrs, or a don't-care value of an unreachable store instruction. This allows more aggressive folding of unused results, as shown in llvm/test/Transforms/InstCombine/getelementptr.ll . Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104602	2021-06-21 09:58:44 +09:00
Sanjay Patel	4c44b02d87	[InstCombine] fold ctpop-of-select with 1 or more constant arms The general pattern is mentioned in: https://llvm.org/PR50140 ...but we need to do a bit more to handle intrinsics with extra operands like ctlz/cttz.	2021-06-20 11:28:45 -04:00
Sanjay Patel	afd44bb6f2	[InstCombine] fold ctlz/cttz of bool types https://alive2.llvm.org/ce/z/tX4pUT	2021-06-13 08:26:40 -04:00
Juneyoung Lee	7161bb87c9	[InsCombine] Fix a few remaining vec transforms to use poison instead of undef This is a patch that replaces shufflevector and insertelement's placeholder value with poison. Underlying motivation is to fix the semantics of shufflevector with undef mask to return poison instead (D93818) The consensus has been made in the late 2020 via mailing list as well as the thread in https://bugs.llvm.org/show_bug.cgi?id=44185 . This patch is a simple syntactic change to the existing code, hence directly pushed as a commit.	2021-05-31 18:47:09 +09:00
cynecx	8ec9fd4839	Support unwinding from inline assembly I've taken the following steps to add unwinding support from inline assembly: 1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax: ``` invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"() to label %exit unwind label %uexit ``` 2.) Add Bitcode writing/reading support + LLVM-IR parsing. 3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled. 4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind. 5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled. 6.) Don't allow unwinding callbr. Reviewed By: Amanieu Differential Revision: https://reviews.llvm.org/D95745	2021-05-13 19:13:03 +01:00
Fangrui Song	d8aba75a76	Internalize some cl::opt global variables or move them under namespace llvm	2021-05-07 11:15:43 -07:00
Coplin, Jared	6251b2f7f6	Attach metadata to simplified masked loads and stores	2021-05-05 18:01:49 -05:00
Dávid Bolvanský	08c08577f9	[InstCombine] cttz(sext(x)) -> cttz(zext(x)) ``` ---------------------------------------- define i32 @src(i16 %x, i1 %b) { %0: %z = sext i16 %x to i32 %p = cttz i32 %z, %b ret i32 %p } => define i32 @tgt(i16 %x, i1 %b) { %0: %z = zext i16 %x to i32 %p = cttz i32 %z, %b ret i32 %p } Transformation seems to be correct! ``` https://alive2.llvm.org/ce/z/evomeg Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D101764	2021-05-03 23:59:30 +02:00
Dávid Bolvanský	27b651ca47	[InstCombine] cttz(zext(x)) -> zext(cttz(x)) if the 'ZeroIsUndef' parameter is 'true' (PR50172) Zext doesn't change the number of trailing zeros, so narrow cttz(zext(x)) -> zext(cttz(x)) if the 'ZeroIsUndef' parameter is 'true'. Proofs: https://alive2.llvm.org/ce/z/o2dnjY Solves https://bugs.llvm.org/show_bug.cgi?id=50172 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D101582	2021-05-03 17:05:12 +02:00
Sanjay Patel	0f8b6686ac	[InstCombine] narrow popcount with zext operand https://llvm.org/PR50141	2021-04-29 15:07:16 -04:00
Sanjay Patel	025bb52903	[InstCombine] fold clamp to 2 values from min/max intrinsics The "select" versions of these folds is also missing and can cause infinite loops as shown in: https://llvm.org/PR48900 ...but it seems easier to match these as max/min as a first fix. https://alive2.llvm.org/ce/z/wv-_dT	2021-04-27 15:35:49 -04:00
Dávid Bolvanský	137568e579	[InstCombine] Fixed UB in foldCtpop	2021-04-24 19:44:16 +02:00
Dávid Bolvanský	de3fa35cdb	[InstCombine] ctpop(rot(X)) -> ctpop(X) Proof: https://alive2.llvm.org/ce/z/ss2zyt - rotl https://alive2.llvm.org/ce/z/ZM7Aue - rotr Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101235	2021-04-24 18:25:03 +02:00
Dávid Bolvanský	5f77e7708a	[InstCombine] Fixed crash when setting align attr for memalign	2021-04-23 14:04:08 +02:00
Dávid Bolvanský	324d641b75	[InstCombine] Enhance deduction of alignment for aligned_alloc This patch improves https://reviews.llvm.org/D76971 (Deduce attributes for aligned_alloc in InstCombine) and implements "TODO" item mentioned in the review of that patch. > The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a multiple of alignment. Currently, we simply bail out if we see a non-constant size - change that. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100785	2021-04-20 02:04:18 +02:00
Yuanfang Chen	c5fda0e662	Reland "Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom"" This reverts commit `a3fabc79ae` (relands `f4d682d6ce` with fix for the compile-time regression issue).	2021-04-12 14:50:54 -07:00
Nikita Popov	a3fabc79ae	Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom" This reverts commit `f4d682d6ce`. This caused a significant compile-time regression: https://llvm-compile-time-tracker.com/compare.php?from=4b7bad9eaea2233521a94f6b096aaa88dc584e23&to=f4d682d6ce6c5b3a41a0acf297507c82f5c21eef&stat=instructions Possibly this is due to overeager parsing of target triples.	2021-04-12 22:55:59 +02:00
Yuanfang Chen	f4d682d6ce	[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom D24453 enabled libcalls simplication for ARM PCS. This may cause caller/callee calling conventions mismatch in some situations such as LTO. This patch makes instcombine aware that the compatible calling conventions differences are benign (not emitting undef idom). Differential Revision: https://reviews.llvm.org/D99773	2021-04-12 09:32:23 -07:00
Sanjay Patel	84cdccc9dc	[InstCombine] try to eliminate an instruction in min/max -> abs fold As suggested in the review thread for `5094e12` and seen in the motivating example from https://llvm.org/PR49885, it's not clear if we have a way to create the optimal code without this heuristic.	2021-04-09 10:34:03 -04:00
Sanjay Patel	5094e1279e	[InstCombine] fold min/max intrinsic with negated operand to abs The smax case shows up in https://llvm.org/PR49885 . The others seem unlikely, but we might as well try for uniformity (although that could mean an extra instruction to create "nabs"). smax -- https://alive2.llvm.org/ce/z/8yYaGy smin -- https://alive2.llvm.org/ce/z/0_7zc_ umax -- https://alive2.llvm.org/ce/z/EcsZWs umin -- https://alive2.llvm.org/ce/z/Xw6WvB	2021-04-08 14:37:39 -04:00

1 2 3 4 5 ...

914 Commits