llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	6d24dd7ed1	[InstSimplify] Regenerate compares tests to fix issue reported on D77354	2020-04-03 17:34:56 +01:00
Simon Pilgrim	43d2fc7ed7	[LoopRotate] Cleanup test checks to fix issue reported on D77354	2020-04-03 17:21:37 +01:00
Matt Arsenault	57a55313c3	InstCombine: Reduce minnum/maxnum if inputs are casted	2020-04-03 11:57:25 -04:00
laith sakka	a0983ed3d2	Handle exp2 with proper vectorization and lowering to SVML calls Summary: Add mapping from exp2 math functions to corresponding SVML calls. This is a follow up and extension for llvm diff https://reviews.llvm.org/D19544 Test Plan: - update test case and run ninja check. - run tests locally Reviewers: wenlei, hoyFB, mmasten, mzolotukhin, spatel Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77114	2020-04-02 21:11:13 -07:00
Hongtao Yu	88da019977	Fix a bug in the inliner that causes subsequent double inlining Summary: A recent change in the instruction simplifier enables a call to a function that just returns one of its parameter to be simplified as simply loading the parameter. This exposes a bug in the inliner where double inlining may be involved which in turn may cause compiler ICE when an already-inlined callsite is reused for further inlining. To put it simply, in the following-like C program, when the function call second(t) is inlined, its code t = third(t) will be reduced to just loading the return value of the callsite first(). This causes the inliner internal data structure to register the first() callsite for the call edge representing the third() call, therefore incurs a double inlining when both call edges are considered an inline candidate. I'm making a fix to break the inliner from reusing a callsite for new call edges. ``` void top() { int t = first(); second(t); } void second(int t) { t = third(t); fourth(t); } void third(int t) { return t; } ``` The actual failing case is much trickier than the example here and is only reproducible with the legacy inliner. The way the legacy inliner works is to process each SCC in a bottom-up order. That means in reality function first may be already inlined into top, or function third is either inlined to second or is folded into nothing. To repro the failure seen from building a large application, we need to figure out a way to confuse the inliner so that the bottom-up inlining is not fulfilled. I'm doing this by making the second call indirect so that the alias analyzer fails to figure out the right call graph edge from top to second and top can be processed before second during the bottom-up. We also need to tweak the test code so that when the inlining of top happens, the function body of second is not that optimized, by delaying the pass of function attribute deducer (i.e, which tells function third has no side effect and just returns its parameter). Since the CGSCC pass is iterative, additional calls are added to top to postpone the inlining of second to the second round right after the first function attribute deducing pass is done. I haven't been able to repro the failure with the new pass manager since the processing order of ininlined callsites is a bit different, but in theory the issue could happen there too. Note that this fix could introduce a side effect that blocks the simplification of inlined code, specifically for a call site that can be folded to another call site. I hope this can probably be complemented by subsequent inlining or folding, as shown in the attached unit test. The ideal fix should be to separate the use of VMap. However, in reality this failing pattern shouldn't happen often. And even if it happens, there should be a good chance that the non-folded call site will be refolded by iterative inlining or subsequent simplification. Reviewers: wenlei, davidxl, tejohnson Reviewed By: wenlei, davidxl Subscribers: eraman, nikic, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76248	2020-04-02 21:08:05 -07:00
Adrian Prantl	93fe58c9cf	Teach the stripNonLineTableDebugInfo pass about the llvm.dbg.label intrinsic. Debug info for labels is not generated at -gline-tables-only, so this pass should remove them. Differential Revision: https://reviews.llvm.org/D77345	2020-04-02 17:39:33 -07:00
Adrian Prantl	c024f3ebdc	Teach the stripNonLineTableDebugInfo pass about the llvm.dbg.addr intrinsic. This patch also strips llvm.dbg.addr intrinsics when downgrading debug info to linetables-only. Differential Revision: https://reviews.llvm.org/D77343	2020-04-02 17:39:33 -07:00
Matt Arsenault	5660bb6bc9	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.	2020-04-02 17:17:12 -04:00
Anna Thomas	bf7a16a768	[InlineFunction] Update valid return attributes at callsite within callee body Consider a callee function that has a call (C) within it which feeds into the return. When we inline that callee into a callsite that has return attributes, we can backward propagate valid attributes to the call (C) within that inlined callee body. This is safe to do so only if we can guarantee transfer of execution to successor in the window of instructions between return value (i.e. the call C) and the return instruction. Also, this is valid only for attributes which are a property of a callsite and not those that are not dependent on the ABI, or a property of the call itself. Reviewed-By: reames, jdoerfert Differential Revision: https://reviews.llvm.org/D76140	2020-04-02 14:13:12 -04:00
Sanjay Patel	f4448063cc	[InstCombine] try to reduce shuffle with bitcasted operand shuf (bitcast X), undef, Mask --> bitcast X' The 'inverse shuffles' test (shuf_bitcast_operand) is a pattern in the motivating examples from PR35454: https://bugs.llvm.org/show_bug.cgi?id=35454 (see also D76727) We can deal with this class of patterns in generic instcombine because we are not creating any new shuffles, just a bitcast. Alive2 proof: http://volta.cs.utah.edu:8080/z/mwDUZf Differential Revision: https://reviews.llvm.org/D76844	2020-04-02 13:44:50 -04:00
Sanjay Patel	b6050ca181	[VectorCombine] transform bitcasted shuffle to narrower elements bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC' We do not attempt this in InstCombine because we do not want to change types and create new shuffle ops that are potentially not lowered as well as the original code. Here, we can check the cost model to see if it is worthwhile. I've aggressively enabled this transform even if the types are the same size and/or equal cost because moving the bitcast allows InstCombine to make further simplifications. In the motivating cases from PR35454: https://bugs.llvm.org/show_bug.cgi?id=35454 ...this is enough to let instcombine and the backend eliminate the redundant shuffles, but we probably want to extend VectorCombine to handle the inverse pattern (shuffle-of-bitcast) to get that simplification directly in IR. Differential Revision: https://reviews.llvm.org/D76727	2020-04-02 13:30:22 -04:00
Sanjay Patel	12fcbcecff	[InstCombine] add tests for cmyk benchmark; NFC These are versions of a function that regressed with: rGf2fbdf76d8d0 That particular problem occurs with an instcombine-simplifycfg-instcombine sequence, but we can show that it exists within instcombine only with other variations of the pattern.	2020-04-02 13:00:46 -04:00
Sanjay Patel	1008435f3d	Revert "[InstCombine] do not exclude min/max from icmp with casted operand fold" This reverts commit `f2fbdf76d8`. As noted in the post-commit thread: https://reviews.llvm.org/rGf2fbdf76d8d0 ...this can obscure a min/max pattern where the components have extra uses. We can show that the problem is independent of this change with a slightly modified source example, so this revert just delays/reduces the need to fix the real problem. We need to improve our analysis of negation or -- more generally -- subtraction using patches like D77230 or D68408.	2020-04-02 09:15:23 -04:00
Tyker	c00cb76274	[NFC] Split Knowledge retention and place it more appropriatly Summary: Splitting Knowledge retention into Queries in Analysis and Builder into Transform/Utils allows Queries and Transform/Utils to use Analysis. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77171	2020-04-02 15:01:41 +02:00
Sanjay Patel	a19b27b90e	[PhaseOrdering] add test for vector trunc; NFC See discussion in D76983.	2020-04-02 08:13:19 -04:00
Sanjay Patel	ecb048c7ac	[InstCombine] add tests for disguised vector trunc; NFC	2020-04-02 08:13:19 -04:00
Clement Courbet	fb4aa30f27	[ExpandMemCmp] Allow overlaping loads in the zero-relational case. Summary: This allows doing `memcmp(p, q, 7)` with 2 loads instead of a call to memcmp. This fixes part of PR45147. Reviewers: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76133	2020-04-02 11:20:47 +02:00
Florian Hahn	a63b5c9e53	[CallSiteSplitting] Simplify isPredicateOnPHI & continue checking PHIs. As pointed out by @thakis, currently CallSiteSplitting bails out after checking the first PHI node. We should check all PHI nodes, until we find one where call site splitting is beneficial. This patch also slightly simplifies the code using BasicBlock::phis(). Reviewers: davidxl, junbuml, thakis Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D77089	2020-04-02 10:11:27 +01:00
Johannes Doerfert	bcd8009369	[Attributor] Use the proper context instruction in genericValueTraversal There was a TODO in genericValueTraversal to provide the context instruction and due to the lack of it users that wanted one just used something available. Unfortunately, using a fixed instruction is wrong in the presence of PHIs so we need to update the context instruction properly. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D76870	2020-04-01 22:20:47 -05:00
Johannes Doerfert	9e19693994	[Attributor] Derive better alignment for accessed pointers Use DL & ABI information for better alignment deduction, e.g., if a type is accessed and the ABI specifies an alignment requirement for such an access we can use it. This is based on a patch by @lebedev.ri and inspired by getBaseAlign in Loads.cpp. Depends on D76673. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D76674	2020-04-01 21:49:57 -05:00
Johannes Doerfert	b1c788d051	[Attributor][FIX] Prevent alignment breakage wrt. must-tail calls If we have a must-tail call the callee and caller need to have matching ABIs. Part of that is alignment which we might modify when we deduce alignment of arguments of either. Since we would need to keep them in sync, which is not as simple, we simply avoid deducing alignment for arguments of the must-tail caller or callee. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D76673	2020-04-01 21:40:07 -05:00
Johannes Doerfert	f7f9322843	[Attributor][NFC] Cleanup leftover check lines	2020-04-01 21:37:33 -05:00
Sanjay Patel	3d90048791	[InstCombine] enhance freelyNegateValue() by handling xor Negation is equivalent to bitwise-not + 1, so try to convert more subtracts into adds using this relationship: 0 - (A ^ C) => ((A ^ C) ^ -1) + 1 => A ^ ~C + 1 I doubt this will recover the regression noted in rGf2fbdf76d8d0, but seems like we're going to need to improve here and/or revive D68408? Alive2 proofs: http://volta.cs.utah.edu:8080/z/Re5tMU http://volta.cs.utah.edu:8080/z/An-uns Differential Revision: https://reviews.llvm.org/D77230	2020-04-01 15:05:13 -04:00
Sanjay Patel	8431dbacd4	[InstCombine] add tests for negate with xor operand; NFC	2020-04-01 15:05:13 -04:00
Jonathan Roelofs	1148f004fa	Fix PR45371: SeparateConstOffsetFromGEP clean up bookkeeping find() was altering the UserChain, even in cases where it subsequently discovered that the resulting constant was a 0. This confuses rebuildWithoutConstOffset() when it attempts to walk the chain later, since it is expected that the chain itself be a path down the use-def edges of an expression.	2020-04-01 12:38:15 -06:00
Uday Bondhugula	6ee11c3b0f	[NewGVN] Make NewGVN aware of aligned_alloc Make the New GVN pass aware of aligned_alloc. Depends on D76975. Differential Revision: https://reviews.llvm.org/D76976	2020-04-01 23:26:51 +05:30
Uday Bondhugula	4cf70af94f	[GVN] Make GVN aware of aligned_alloc Make the GVN pass aware of aligned_alloc. Depends on D76974. Differential Revision: https://reviews.llvm.org/D76975	2020-04-01 23:26:50 +05:30
Uday Bondhugula	c4499e3333	[Attributor] Make attributor aware of aligned_alloc for heap to stack conversion Make the attributor pass aware of aligned_alloc for converting heap allocations to stack ones. Depends on D76971. Differential Revision: https://reviews.llvm.org/D76974	2020-04-01 23:26:50 +05:30
Matt Arsenault	3f465d0d36	AMDGPU: Fix broken check lines	2020-04-01 10:52:22 -07:00
shchenz	e344f8b9db	Revert "[LSR] re-add testcase for wrongly phi node elimination - NFC" This reverts commit `f25a1b4f58`. ARM and hexagon fail at the new added case.	2020-04-01 12:58:06 +00:00
shchenz	f25a1b4f58	[LSR] re-add testcase for wrongly phi node elimination - NFC Retest the case on X86/SystemZ/AArch64/PowerPC	2020-04-01 11:11:17 +00:00
Cullen Rhodes	84aa6cf1a9	[Transforms][SROA] Promote allocas with mem2reg for scalable types Summary: Aggregate types containing scalable vectors aren't supported and as far as I can tell this pass is mostly concerned with optimisations on aggregate types, so the majority of this pass isn't very useful for scalable vectors. This patch modifies SROA such that mem2reg is run on allocas with scalable types that are promotable, but nothing else such as slicing is done. The use of TypeSize in this pass has also been updated to be explicitly fixed size. When invoking the following methods in DataLayout: * getTypeSizeInBits * getTypeStoreSize * getTypeStoreSizeInBits * getTypeAllocSize we now called getFixedSize on the resultant TypeSize. This is quite an extensive change with around 50 calls to these functions, and also the first change of this kind (being explicit about fixed vs scalable size) as far as I'm aware, so feedback welcome. A test is included containing IR with scalable vectors that this pass is able to optimise. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D76720	2020-04-01 10:34:11 +00:00
shchenz	8b8cd150a4	Revert "[LSR] add testcase for wrongly phi node elimination - NFC" This reverts commit `dbf5e4f6c7`. The testcase has different behaviour on PowerPC and X86.	2020-04-01 10:28:43 +00:00
shchenz	dbf5e4f6c7	[LSR] add testcase for wrongly phi node elimination - NFC	2020-04-01 09:58:58 +00:00
Florian Hahn	d307174e1d	[ConstantRange] Use APInt::or/APInt::and for single elements. Currently ConstantRange::binaryAnd/binaryOr results are too pessimistic for single element constant ranges. If both operands are single element ranges, we can use APInt's AND and OR implementations directly. Note that some other binary operations on constant ranges can cover the single element cases naturally, but for OR and AND this unfortunately is not the case. Reviewers: nikic, spatel, lebedev.ri Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D76446	2020-04-01 09:50:24 +01:00
Florian Hahn	e20cac3650	[Matrix] Add new test case with getelementptr constant exprs. The new test mostly ensures we keep doing the right thing for constant expressions while lowering matrix instructions.	2020-04-01 09:32:13 +01:00
Evgenii Stepanov	f9471b0010	Fix MSan false positive due to select folding. Summary: Select folding in JumpThreading can create a conditional branch on a code patch that did not have one in the original program. This is not a valid transformation in sanitize_memory functions. Note that JumpThreading does select folding in 3 different places. Two of them seem safe - they apply to a select instruction in a BB that ends with an unconditional branch to another BB, which (in turn) ends with a conditional branch or a switch with the same condition. Fixes PR45220. Reviewers: glider, dvyukov, efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76332	2020-03-31 15:25:42 -07:00
Anna Thomas	58a05675da	Revert "[InlineFunction] Handle return attributes on call within inlined body" This reverts commit `28518d9ae3`. There is a failure in MsgPackReader.cpp when built with clang. It complains about "signext and zeroext" are incompatible. Investigating offline if it is infact a UB in the MsgPackReader code.	2020-03-31 16:16:34 -04:00
Guozhi Wei	6d20937c29	[CodeGenPrepare] Delete intrinsic call to llvm.assume to enable more tailcall The attached test case is simplified from tcmalloc. Both function calls should be optimized as tailcall. But llvm can only optimize the first call. The second call can't be optimized because function dupRetToEnableTailCallOpts failed to duplicate ret into block case2. There 2 problems blocked the duplication: 1 Intrinsic call llvm.assume is not handled by dupRetToEnableTailCallOpts. 2 The control flow is more complex than expected, dupRetToEnableTailCallOpts can only duplicate ret into its predecessor, but here we have an intermediate block between call and ret. The solutions: 1 Since CodeGenPrepare is already at the end of LLVM IR phase, we can simply delete the intrinsic call to llvm.assume. 2 A general solution to the complex control flow is hard, but for this case, after exit2 is duplicated into case1, exit2 is the only successor of exit1 and exit1 is the only predecessor of exit2, so they can be combined through eliminateFallThrough. But this function is called too late, there is no more dupRetToEnableTailCallOpts after it. We can add an earlier call to eliminateFallThrough to solve it. Differential Revision: https://reviews.llvm.org/D76539	2020-03-31 11:55:51 -07:00
Anna Thomas	28518d9ae3	[InlineFunction] Handle return attributes on call within inlined body Consider a callee function that has a call (C) within it which feeds into the return. When we inline that callee into a callsite that has return attributes, we can backward propagate those attributes to the call (C) within that inlined callee body. This is safe to do so only if we can guarantee transfer of execution to successor in the window of instructions between return value (i.e. the call C) and the return instruction. See added test cases. Reviewed-By: reames, jdoerfert Differential Revision: https://reviews.llvm.org/D76140	2020-03-31 14:35:40 -04:00
Uday Bondhugula	dc817b2dea	[InstCombine] Deduce attributes for aligned_alloc in InstCombine Make InstCombine aware of the aligned_alloc library function. Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Depends on D76970. Differential Revision: https://reviews.llvm.org/D76971	2020-03-31 23:17:28 +05:30
Florian Hahn	b0cd7b2799	[SCCP] Limit use of range info for binops to integers for now. This fixes a crash when building the test suite.	2020-03-31 17:08:09 +01:00
Tyker	4aeb7e1ef4	[AssumeBundles] Preserve information in EarlyCSE Summary: this patch preserve information from various places in EarlyCSE into assume bundles. Reviewers: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76769	2020-03-31 17:47:04 +02:00
Sanjay Patel	fa61b5059a	[InstCombine] remove stray auto-generated test comment; NFC The script now includes extra info about command-line options used when generating its advertisement heading, but we don't want that here. This is a special-case because we have enhanced the check lines (as noted in the 2nd comment line).	2020-03-31 09:19:12 -04:00
Florian Hahn	b37543750c	[ValueLattice] Distinguish between constant ranges with/without undef. This patch updates ValueLattice to distinguish between ranges that are guaranteed to not include undef and ranges that may include undef. A constant range guaranteed to not contain undef can be used to simplify instructions to arbitrary values. A constant range that may contain undef can only be used to simplify to a constant. If the value can be undef, it might take a value outside the range. For example, consider the snipped below define i32 @f(i32 %a, i1 %c) { br i1 %c, label %true, label %false true: %a.255 = and i32 %a, 255 br label %exit false: br label %exit exit: %p = phi i32 [ %a.255, %true ], [ undef, %false ] %f.1 = icmp eq i32 %p, 300 call void @use(i1 %f.1) %res = and i32 %p, 255 ret i32 %res } In the exit block, %p would be a constant range [0, 256) including undef as %p could be undef. We can use the range information to replace %f.1 with false because we remove the compare, effectively forcing the use of the constant to be != 300. We cannot replace %res with %p however, because if %a would be undef %cond may be true but the second use might not be < 256. Currently LazyValueInfo uses the new behavior just when simplifying AND instructions and does not distinguish between constant ranges with and without undef otherwise. I think we should address the remaining issues in LVI incrementally. Reviewers: efriedma, reames, aqjune, jdoerfert, sstefan1 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D76931	2020-03-31 12:50:20 +01:00
Daan Sprenkels	464b9aeafe	[InstCombine] Transform extelt-trunc -> bitcast-extelt Canonicalize the case when a scalar extracted from a vector is truncated. Transform such cases to bitcast-then-extractelement. This will enable erasing the truncate operation. This commit fixes PR45314. reviewers: spatel Differential revision: https://reviews.llvm.org/D76983	2020-03-31 11:53:41 +02:00
Sebastian Neubauer	5d3a69feca	[AMDGPU] New llvm.amdgcn.ballot intrinsic Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function in GLSL and other shader languages. It returns a bitfield containing the result of its boolean argument in all active lanes, and zero in all inactive lanes. This is intended to replace the existing llvm.amdgcn.icmp and llvm.amdgcn.fcmp intrinsics after a suitable transition period. Use the new intrinsic in the atomic optimizer pass. Differential Revision: https://reviews.llvm.org/D65088	2020-03-31 10:35:39 +02:00
Florian Hahn	0c9c58ada0	[SCCP] Use constant ranges for casts. For casts with constant range operands, we can use ConstantRange::castOp. Reviewers: davide, efriedma, mssimpso Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D71938	2020-03-31 09:22:04 +01:00
Wei Mi	ebad678857	[SampleFDO] Port MD5 name table support to extbinary format. Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it. Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support. Differential Revision: https://reviews.llvm.org/D76255	2020-03-30 22:07:08 -07:00
Daan Sprenkels	5227fa0c72	Recommit "[InstCombine] Update assertions in InstCombine test; NFC"	2020-03-31 00:00:41 +02:00
Daan Sprenkels	273b0d7766	Revert "[InstCombine] Update assertions in InstCombine test; NFC" This reverts commit `4243bd494d`.	2020-03-30 22:41:33 +02:00
Daan Sprenkels	4243bd494d	[InstCombine] Update assertions in InstCombine test; NFC	2020-03-30 22:15:50 +02:00
Sanjay Patel	f2fbdf76d8	[InstCombine] do not exclude min/max from icmp with casted operand fold InstCombine has a mess of logic that tries to preserve min/max patterns, but AFAICT, this one is not necessary because we can always narrow the corresponding select in this sequence to match the narrow compare. The biggest danger for this patch is inducing infinite looping or assert from exceeding max iterations. If any bots hit that in the vicinity of this commit, this is the likely patch to blame.	2020-03-30 16:10:51 -04:00
Bill Wendling	fa496ce3c6	[Intrinsic] Give "is.constant" the "convergent" attribute Summary: Code frequently relies upon the results of "is.constant" intrinsics to DCE invalid code paths. We don't want the intrinsic to be made control- dependent on any additional values. For instance, we can't split a PHI into a "constant" and "non-constant" part via jump threading in order to "optimize" the constant part, because the "is.constant" intrinsic is meant to return "false". Reviewers: wmi, kazu, MaskRay Reviewed By: kazu Subscribers: jdoerfert, efriedma, joerg, lebedev.ri, nikic, xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75799	2020-03-30 11:47:12 -07:00
Sameer Sahasrabuddhe	3cbbded68c	Introduce unify-loop-exits pass. For each natural loop with multiple exit blocks, this pass creates a new block N such that all exiting blocks now branch to N, and then control flow is redistributed to all the original exit blocks. The bulk of the tranformation is a new function introduced in BasicBlockUtils that an redirect control flow from a set of incoming blocks to a set of outgoing blocks via a common "hub". This is a useful workaround for a limitation in the structurizer which incorrectly orders blocks when processing a nest of loops. This pass bypasses that issue by ensuring that each natural loop is recognized as a separate region. Since the structurizer is a region pass, it no longer sees a nest of loops in a single region, and instead processes each "level" in the nesting as a separate region. The AMDGPU backend provides a new option to enable this pass before the structurizer, which may eventually be enabled by default. Reviewers: madhur13490, arsenm, nhaehnle Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D75865	2020-03-30 13:23:56 -04:00
Vedant Kumar	dcc410b5cf	[LoopVectorize] Fix crash on "getNoopOrZeroExtend cannot truncate!" (PR45259) In InnerLoopVectorizer::getOrCreateTripCount, when the backedge taken count is a SCEV add expression, its type is defined by the type of the last operand of the add expression. In the test case from PR45259, this last operand happens to be a pointer, which (according to llvm::Type) does not have a primitive size in bits. In this case, LoopVectorize fails to truncate the SCEV and crashes as a result. Uing ScalarEvolution::getTypeSizeInBits makes the truncation work as expected. https://bugs.llvm.org/show_bug.cgi?id=45259 Differential Revision: https://reviews.llvm.org/D76669	2020-03-30 10:14:14 -07:00
Sanjay Patel	bc60cdcc3f	[InstCombine] add test for trunc-extelt; NFC Goes with D76983	2020-03-30 09:43:03 -04:00
Florian Hahn	84c1fbab5d	[CVP] Add additional icmp for ranges with undef to test.	2020-03-30 10:59:25 +01:00
Jun Ma	31a1d85c53	[Coroutines 2/2] Improve symmetric control transfer feature Differential Revision: https://reviews.llvm.org/D76913	2020-03-30 09:53:09 +08:00
Jun Ma	a94fa2c049	[Coroutines 1/2] Improve symmetric control transfer feature Differential Revision: https://reviews.llvm.org/D76911	2020-03-30 09:53:09 +08:00
Daan Sprenkels	24562c6588	[InstCombine] Add tests for trunc (extelt x); (NFC) Baseline tests for D76983 (PR45314) Differential Revision: https://reviews.llvm.org/D77024	2020-03-29 17:30:54 -04:00
Uday Bondhugula	c0955edfd6	Introduce support for lib function aligned_alloc in TLI / memory builtins Aligned_alloc is a standard lib function and has been in glibc since 2.16 and in the C11 standard. It has semantics similar to malloc/calloc for several analyses/transforms. This patch introduces aligned_alloc in target library info and memory builtins. Subsequent ones will make other passes aware and fix https://bugs.llvm.org/show_bug.cgi?id=44062 This change will also be useful to LLVM generators that need to allocate buffers of vector elements larger than 16 bytes (for eg. 256-bit ones), element boundary alignment for which is not typically provided by glibc malloc. Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Differential Revision: https://reviews.llvm.org/D76970	2020-03-29 23:36:24 +05:30
Sanjay Patel	febcb24f14	[InstCombine] make test independent of branch undef/UB; NFC	2020-03-29 13:32:47 -04:00
Richard Diamond	4bf015c035	[AlignmentFromAssumptions] Fix a SCEV assertion resulting from address space differences. Summary: On targets with different pointer sizes, -alignment-from-assumptions could attempt to create SCEV expressions which use different effective SCEV types. The provided test illustrates the issue. In `getNewAlignment`, AASCEV would be the (only) alloca, which would have an effective SCEV type of i32. But PtrSCEV, the GEP in this case, due to being in the flat/default address space, will have an effective SCEV of i64. This patch resolves the issue by truncating PtrSCEV to AASCEV's effective type. Reviewers: hfinkel, jdoerfert Reviewed By: jdoerfert Subscribers: jvesely, nhaehnle, hiraditya, javed.absar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75471	2020-03-29 01:26:31 -05:00
Nikita Popov	30d712103f	[InstCombine] Use replaceOperand() API in GEP transforms To make sure that replaced operands get DCEd. This drops one iteration from gepphigep.ll, which is still not optimal. This was the last test case performing more than 3 iterations. NFC-ish, only worklist order should change.	2020-03-28 19:07:25 +01:00
Nikita Popov	672e8bfbfc	[InstCombine] Fix worklist management in foldXorOfICmps() Because this code does not use the IC-aware replaceInstUsesWith() helper, we need to manually push users to the worklist. This is NFC-ish, in that it may only change worklist order.	2020-03-28 18:25:21 +01:00
Nikita Popov	337b671b0d	[InstCombine] Change limit-max-iterations test case; NFC This particular case will stop needing multiple iterations in a followup change.	2020-03-28 18:25:20 +01:00
Serge Pavlov	f398739152	[FEnv] Constfold some unary constrained operations This change implements constant folding to constrained versions of intrinsics, implementing rounding: floor, ceil, trunc, round, rint and nearbyint. Differential Revision: https://reviews.llvm.org/D72930	2020-03-28 12:28:33 +07:00
Sanjay Patel	0f56bbc1a5	[InstCombine] reduce FP-casted and bitcasted signbit check PR45305: https://bugs.llvm.org/show_bug.cgi?id=45305 Alive2 proofs: http://volta.cs.utah.edu:8080/z/bVyrko http://volta.cs.utah.edu:8080/z/Vxpz9q	2020-03-27 17:33:59 -04:00
Sanjay Patel	e72730ee3a	[InstCombine] add tests for FP cast+bitcast signbit checks; NFC PR45305: https://bugs.llvm.org/show_bug.cgi?id=45305	2020-03-27 17:25:25 -04:00
Simon Pilgrim	f4f4a8bfef	[InstCombine][X86] Add repeated ops demanded elts tests for SSE intrinsics (PR24523)	2020-03-27 14:51:09 +00:00
Simon Pilgrim	ec3bb6c3e7	[InstCombine][X86] Regenerate SSE2 tests	2020-03-27 14:51:09 +00:00
Sanjay Patel	5237262feb	[InstCombine] add shuffle-with-bitcast-operand tests; NFC	2020-03-26 14:28:47 -04:00
Jonathan Roelofs	7a89a5d81b	[InstCombine] Fix Incorrect fold of ashr+xor -> lshr w/ vectors Fixes https://bugs.llvm.org/show_bug.cgi?id=43665	2020-03-26 12:09:36 -06:00
Sam Parker	db8a3c4206	[NFC] Create X86 subdirectory for indvar tests Many IndVarSiimplify tests target an x86 triple, so move them into a target specific folder.	2020-03-26 12:24:45 +00:00
John McCall	9514c048d8	Use optimal layout and preserve alloca alignment in coroutine frames. Previously, we would ignore alloca alignment when building the frame and just use the natural alignment of the allocated type. If an alloca is over-aligned for its IR type, this could lead to a frame entry with inadequate alignment for the downstream uses of the alloca. Since highly-aligned fields also tend to produce poor layouts under a naive layout algorithm, I've also switched coroutine frames to use the new optimal struct layout algorithm. In order to communicate the frame size and alignment to later passes, I needed to set align+dereferenceable attributes on the frame-pointer parameter of the resume function. This is clearly the right thing to do, but the align attribute currently seems to result in assumptions being added during inlining that the optimizer cannot easily remove.	2020-03-26 00:51:09 -04:00
Florian Hahn	081efa7dd0	[SCCP] Add a few constantexpr,undef tests for cond propagation	2020-03-25 21:28:35 +00:00
Tyker	f1a9efabcb	Ignore/Drop droppable uses for code-sinking in InstCombine Summary: This patch allows code-sinking in InstCombine to be performed when instruction have uses in llvm.assume. Use are considered droppable when it is preferable to modify the User such that the use disappears rather than to prevent a transformation because of the use. for now uses are considered droppable if they are in an llvm.assume. Reviewers: jdoerfert, nikic, spatel, lebedev.ri, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73832	2020-03-25 20:42:52 +01:00
Alexandre Ganea	934d4feab1	[ThinLTO] Don't rely on debug output for thinlto_samplepgo_icp3 test Because using -print-imports is not thread-safe, make the test rely on llvm-dis instead. Also cover the ICALL-PROM part as intended originally. Differential Revision: https://reviews.llvm.org/D76775	2020-03-25 14:38:20 -04:00
Sanjay Patel	f631b9dc36	[VectorCombine] add shuffle tests; NFC Goes with DD76727.	2020-03-25 10:35:03 -04:00
sstefan1	72b51d6f93	OpenMP] Adding InaccessibleMemOnly and InaccessibleMemOrArgMemOnly for runtime calls. Summary: Attempt to add more attributes for runtime calls. Reviewers: jdoerfert Subscribers: guansong, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75010	2020-03-25 14:08:50 +00:00
Juneyoung Lee	d82c1e8c56	Rename test name, add more tests for codegenprepare	2020-03-25 20:31:12 +09:00
Juneyoung Lee	e951a48996	Add freeze(and x, const) case to codegenprepare's freeze-cmp.ll	2020-03-25 17:29:01 +09:00
Johannes Doerfert	5699d08b79	[Attributor] Use knowledge retained in llvm.assume (operand bundles) This patch integrates operand bundle llvm.assumes [0] with the Attributor. Most IRAttributes will now look at uses of the associated value and if there are llvm.assume operand bundle uses with the right tag we will check if they are in the must-be-executed-context (around the context instruction). Droppable users, which is currently only llvm::assume, are handled special in some places now as well. [0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D74888	2020-03-24 15:33:40 -05:00
Sanjay Patel	c84446f4e9	[VectorCombine] add tests for bitcast (shuffle); NFC	2020-03-24 15:18:32 -04:00
Juneyoung Lee	49f75132bc	[DivRemPairs] Freeze operands if they can be undef values Summary: DivRemPairs is unsound with respect to undef values. ``` // bb1: // %rem = srem %x, %y // bb2: // %div = sdiv %x, %y // --> // bb1: // %div = sdiv %x, %y // %mul = mul %div, %y // %rem = sub %x, %mul ``` If X can be undef, X should be frozen first. For example, let's assume that Y = 1 & X = undef: ``` %div = sdiv undef, 1 // %div = undef %rem = srem undef, 1 // %rem = 0 => %div = sdiv undef, 1 // %div = undef %mul = mul %div, 1 // %mul = undef %rem = sub %x, %mul // %rem = undef - undef = undef ``` http://volta.cs.utah.edu:8080/z/m7Xrx5 Same for Y. If X = 1 and Y = (undef \| 1), %rem in src is either 1 or 0, but %rem in tgt can be one of many integer values. This resolves https://bugs.llvm.org/show_bug.cgi?id=42619 . This miscompilation disappears if undef value is removed, but it may take a while. DivRemPair happens pretty late during the optimization pipeline, so this optimization seemed as a good candidate to fix without major regression using freeze than other broken optimizations. Reviewers: spatel, lebedev.ri, george.burgess.iv Reviewed By: spatel Subscribers: wuzish, regehr, nlopes, nemanjai, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76483	2020-03-25 03:46:14 +09:00
Sanjay Patel	88b493a838	[ValueTracking] improve undef/poison analysis for constant vectors Differential Revision: https://reviews.llvm.org/D76702	2020-03-24 13:35:47 -04:00
Sanjay Patel	6c3c7a0dd6	[InstSimplify] add tests for freeze(constexpr); NFC	2020-03-24 11:39:19 -04:00
Sanjay Patel	58ec867a3b	[InstSimplify] add more tests for freeze(constant); NFC These should really be moved over to a ConstantFolding test file, but since this may overlap with the in-progress D76010 and similar tests already exist here, we can do that as a later cleanup.	2020-03-24 09:53:49 -04:00
Douglas Yung	18e1a59eed	Fix another instance where a variable was renamed in the generated LLVM IR. [NFC]	2020-03-23 22:53:29 -07:00
Jun Ma	a44de12ab2	[Coroutines] Also check lifetime intrinsic for local variable when build coroutine frame Currently we move all allocas into the frame when build coroutine frame in CoroSplit pass. However, this can be relaxed. Since CoroSplit pass run after Inline pass, we can use lifetime intrinsic to do such analysis: If the scope of lifetime intrinsic is not across any suspend point, rather than move the allocas to frame, we can just move them to entry bb of corresponding function. This reduce the frame size. More importantly, this also avoid data race in multithread environment. Consider one inline function by coroutine: it starts a thread which access local variables, while after inline the movement of allocs to frame also access them. cause data race. Differential Revision: https://reviews.llvm.org/D75664	2020-03-24 13:41:55 +08:00
Vedant Kumar	b7cd291c15	[GlobalOpt] Treat null-check of loaded value as use of global (PR35760) PR35760 shows an example program which, when compiled with `clang -O0` or gcc at any optimization level, prints '0'. However, llvm transforms the program in a way that causes it to print '1'. Fix the issue by having `AllUsesOfValueWillTrapIfNull` return false when analyzing a load from a global which is used by an `icmp`. This special case was untested [0] so this is just deleting dead code. An alternative fix might be to change the GlobalStatus analysis for the global to report "Stored" instead of "StoredOnce". However, "StoredOnce" is appropriate when only one value other than the initializer is stored to the global. [0] http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/Transforms/IPO/GlobalOpt.cpp.html#L662 Differential Revision: https://reviews.llvm.org/D76645	2020-03-23 22:36:09 -07:00
Douglas Yung	e79b1ab65b	Make test more flexible for when the variable is renamed in the generated LLVM IR. [NFC]	2020-03-23 22:03:21 -07:00
Matt Arsenault	66073953a5	AMDGPU: Allow vectorization of round intrinsic There seems to be a small benefit to the legalized sequence for v2f16 round with packed instructions, so allow vectorizing it by reducing the cost. An unintended side effect is vectorization of f32 round also happens. The current FMA logic seems off to me, and isn't checking for packed instructions.	2020-03-23 17:00:41 -04:00
Matt Arsenault	b20a1d840f	GVNSink: Allow handling addrspacecast	2020-03-23 16:50:58 -04:00
Matt Arsenault	43d98a0ecf	Allow replacing intrinsic operands with variables Since intrinsics can now specify when an argument is required to be constant, it is now OK to replace arguments with variables if they aren't. This means intrinsics must now be accurately marked with immarg.	2020-03-23 15:51:57 -04:00
Sanjay Patel	a1fe6beb1e	[InstCombine] remove one-use check for ctpop -> cttz Two one-use checks were added with rGfdcb27105537, but only the first one is necessary to limit an increase in instruction count. The second transform only creates one instruction, so it is always a reasonable canonicalization/optimization.	2020-03-23 13:59:57 -04:00
Johannes Doerfert	9d38f98dc3	[OpenMPOpt] Validate declaration types against the expected types Validation of the found runtime library functions declarations types (return and argument types) with the expected types. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D76058	2020-03-23 11:43:36 -05:00
Johannes Doerfert	68fed27067	[Attributor] Handle calls in AAValueConstantRange properly We did handle calls that were operands of certain instructions but not standalone calls we visit via indirection, e.g., selects.	2020-03-23 10:45:24 -05:00
Johannes Doerfert	54ec9b54f6	[Attributor] Unify handling of must-tail calls We special cased must-tail calls all over the place because they cannot be modified as other calls can be. However, we already centralized the modification API so we can centralize the handling as well. This simplifies the code and allows to remove must-tail calls completely.	2020-03-23 10:45:24 -05:00

1 2 3 4 5 ...

14670 Commits