Commit Graph

14670 Commits

Author SHA1 Message Date
Simon Pilgrim 6d24dd7ed1 [InstSimplify] Regenerate compares tests to fix issue reported on D77354 2020-04-03 17:34:56 +01:00
Simon Pilgrim 43d2fc7ed7 [LoopRotate] Cleanup test checks to fix issue reported on D77354 2020-04-03 17:21:37 +01:00
Matt Arsenault 57a55313c3 InstCombine: Reduce minnum/maxnum if inputs are casted 2020-04-03 11:57:25 -04:00
laith sakka a0983ed3d2 Handle exp2 with proper vectorization and lowering to SVML calls
Summary:
Add mapping from exp2 math functions
to corresponding SVML calls.

This is a follow up and extension for llvm diff
https://reviews.llvm.org/D19544

Test Plan:
- update test case and run ninja check.
- run tests locally

Reviewers: wenlei, hoyFB, mmasten, mzolotukhin, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77114
2020-04-02 21:11:13 -07:00
Hongtao Yu 88da019977 Fix a bug in the inliner that causes subsequent double inlining
Summary:
A recent change in the instruction simplifier enables a call to a function that just returns one of its parameter to be simplified as simply loading the parameter. This exposes a bug in the inliner where double inlining may be involved which in turn may cause compiler ICE when an already-inlined callsite is reused for further inlining.
To put it simply, in the following-like C program, when the function call second(t) is inlined, its code t = third(t) will be reduced to just loading the return value of the callsite first(). This causes the inliner internal data structure to register the first() callsite for the call edge representing the third() call, therefore incurs a double inlining when both call edges are considered an inline candidate. I'm making a fix to break the inliner from reusing a callsite for new call edges.

```
void top()
{
    int t = first();
    second(t);
}

void second(int t)
{
   t = third(t);
   fourth(t);
}

void third(int t)
{
   return t;
}
```
The actual failing case is much trickier than the example here and is only reproducible with the legacy inliner. The way the legacy inliner works is to process each SCC in a bottom-up order. That means in reality function first may be already inlined into top, or function third is either inlined to second or is folded into nothing. To repro the failure seen from building a large application, we need to figure out a way to confuse the inliner so that the bottom-up inlining is not fulfilled. I'm doing this by making the second call indirect so that the alias analyzer fails to figure out the right call graph edge from top to second and top can be processed before second during the bottom-up.  We also need to tweak the test code so that when the inlining of top happens, the function body of second is not that optimized, by delaying the pass of function attribute deducer (i.e, which tells function third has no side effect and just returns its parameter). Since the CGSCC pass is iterative, additional calls are added to top to postpone the inlining of second to the second round right after the first function attribute deducing pass is done. I haven't been able to repro the failure with the new pass manager since the processing order of ininlined callsites is a bit different, but in theory the issue could happen there too.

Note that this fix could introduce a side effect that blocks the simplification of inlined code, specifically for a call site that can be folded to another call site. I hope this can probably be complemented by subsequent inlining or folding, as shown in the attached unit test. The ideal fix should be to separate the use of VMap. However, in reality this failing pattern shouldn't happen often. And even if it happens, there should be a good chance that the non-folded call site will be refolded by iterative inlining or subsequent simplification.

Reviewers: wenlei, davidxl, tejohnson

Reviewed By: wenlei, davidxl

Subscribers: eraman, nikic, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76248
2020-04-02 21:08:05 -07:00
Adrian Prantl 93fe58c9cf Teach the stripNonLineTableDebugInfo pass about the llvm.dbg.label intrinsic.
Debug info for labels is not generated at -gline-tables-only, so this
pass should remove them.

Differential Revision: https://reviews.llvm.org/D77345
2020-04-02 17:39:33 -07:00
Adrian Prantl c024f3ebdc Teach the stripNonLineTableDebugInfo pass about the llvm.dbg.addr intrinsic.
This patch also strips llvm.dbg.addr intrinsics when downgrading debug
info to linetables-only.

Differential Revision: https://reviews.llvm.org/D77343
2020-04-02 17:39:33 -07:00
Matt Arsenault 5660bb6bc9 AMDGPU: Remove denormal subtarget features
Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
2020-04-02 17:17:12 -04:00
Anna Thomas bf7a16a768 [InlineFunction] Update valid return attributes at callsite within callee body
Consider a callee function that has a call (C) within it which feeds
into the return. When we inline that callee into a callsite that has
return attributes, we can backward propagate valid attributes to the
call (C) within that inlined callee body.

This is safe to do so only if we can guarantee transfer of execution to
successor in the window of instructions between return value (i.e. the
call C) and the return instruction.

Also, this is valid only for attributes which are a property of a
callsite and not those that are not dependent on the ABI, or a property
of the call itself.

Reviewed-By: reames, jdoerfert

Differential Revision: https://reviews.llvm.org/D76140
2020-04-02 14:13:12 -04:00
Sanjay Patel f4448063cc [InstCombine] try to reduce shuffle with bitcasted operand
shuf (bitcast X), undef, Mask --> bitcast X'

The 'inverse shuffles' test (shuf_bitcast_operand) is a pattern
in the motivating examples from PR35454:
https://bugs.llvm.org/show_bug.cgi?id=35454
(see also D76727)

We can deal with this class of patterns in generic instcombine
because we are not creating any new shuffles, just a bitcast.

Alive2 proof:
http://volta.cs.utah.edu:8080/z/mwDUZf

Differential Revision: https://reviews.llvm.org/D76844
2020-04-02 13:44:50 -04:00
Sanjay Patel b6050ca181 [VectorCombine] transform bitcasted shuffle to narrower elements
bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'

We do not attempt this in InstCombine because we do not want to change
types and create new shuffle ops that are potentially not lowered as
well as the original code. Here, we can check the cost model to see if
it is worthwhile.

I've aggressively enabled this transform even if the types are the same
size and/or equal cost because moving the bitcast allows InstCombine to
make further simplifications.

In the motivating cases from PR35454:
https://bugs.llvm.org/show_bug.cgi?id=35454
...this is enough to let instcombine and the backend eliminate the
redundant shuffles, but we probably want to extend VectorCombine to
handle the inverse pattern (shuffle-of-bitcast) to get that
simplification directly in IR.

Differential Revision: https://reviews.llvm.org/D76727
2020-04-02 13:30:22 -04:00
Sanjay Patel 12fcbcecff [InstCombine] add tests for cmyk benchmark; NFC
These are versions of a function that regressed with:
rGf2fbdf76d8d0

That particular problem occurs with an instcombine-simplifycfg-instcombine
sequence, but we can show that it exists within instcombine only with
other variations of the pattern.
2020-04-02 13:00:46 -04:00
Sanjay Patel 1008435f3d Revert "[InstCombine] do not exclude min/max from icmp with casted operand fold"
This reverts commit f2fbdf76d8.
As noted in the post-commit thread:
https://reviews.llvm.org/rGf2fbdf76d8d0
...this can obscure a min/max pattern where the components
have extra uses. We can show that the problem is independent
of this change with a slightly modified source example, so
this revert just delays/reduces the need to fix the real
problem.

We need to improve our analysis of negation or -- more
generally -- subtraction using patches like D77230 or D68408.
2020-04-02 09:15:23 -04:00
Tyker c00cb76274 [NFC] Split Knowledge retention and place it more appropriatly
Summary:
Splitting Knowledge retention into Queries in Analysis and Builder into Transform/Utils
allows Queries and Transform/Utils to use Analysis.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77171
2020-04-02 15:01:41 +02:00
Sanjay Patel a19b27b90e [PhaseOrdering] add test for vector trunc; NFC
See discussion in D76983.
2020-04-02 08:13:19 -04:00
Sanjay Patel ecb048c7ac [InstCombine] add tests for disguised vector trunc; NFC 2020-04-02 08:13:19 -04:00
Clement Courbet fb4aa30f27 [ExpandMemCmp] Allow overlaping loads in the zero-relational case.
Summary:
This allows doing `memcmp(p, q, 7)` with 2 loads instead of a call to
memcmp.
This fixes part of PR45147.

Reviewers: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76133
2020-04-02 11:20:47 +02:00
Florian Hahn a63b5c9e53 [CallSiteSplitting] Simplify isPredicateOnPHI & continue checking PHIs.
As pointed out by @thakis, currently CallSiteSplitting bails out after
checking the first PHI node. We should check all PHI nodes, until we
find one where call site splitting is beneficial.

This patch also slightly simplifies the code using BasicBlock::phis().

Reviewers: davidxl, junbuml, thakis

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D77089
2020-04-02 10:11:27 +01:00
Johannes Doerfert bcd8009369 [Attributor] Use the proper context instruction in genericValueTraversal
There was a TODO in genericValueTraversal to provide the context
instruction and due to the lack of it users that wanted one just used
something available. Unfortunately, using a fixed instruction is wrong
in the presence of PHIs so we need to update the context instruction
properly.

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D76870
2020-04-01 22:20:47 -05:00
Johannes Doerfert 9e19693994 [Attributor] Derive better alignment for accessed pointers
Use DL & ABI information for better alignment deduction, e.g., if a type
is accessed and the ABI specifies an alignment requirement for such an
access we can use it. This is based on a patch by @lebedev.ri and
inspired by getBaseAlign in Loads.cpp.

Depends on D76673.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D76674
2020-04-01 21:49:57 -05:00
Johannes Doerfert b1c788d051 [Attributor][FIX] Prevent alignment breakage wrt. must-tail calls
If we have a must-tail call the callee and caller need to have matching
ABIs. Part of that is alignment which we might modify when we deduce
alignment of arguments of either. Since we would need to keep them in
sync, which is not as simple, we simply avoid deducing alignment for
arguments of the must-tail caller or callee.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D76673
2020-04-01 21:40:07 -05:00
Johannes Doerfert f7f9322843 [Attributor][NFC] Cleanup leftover check lines 2020-04-01 21:37:33 -05:00
Sanjay Patel 3d90048791 [InstCombine] enhance freelyNegateValue() by handling xor
Negation is equivalent to bitwise-not + 1, so try to convert more
subtracts into adds using this relationship:
0 - (A ^ C) => ((A ^ C) ^ -1) + 1 => A ^ ~C + 1

I doubt this will recover the regression noted in rGf2fbdf76d8d0,
but seems like we're going to need to improve here and/or revive D68408?

Alive2 proofs:
http://volta.cs.utah.edu:8080/z/Re5tMU
http://volta.cs.utah.edu:8080/z/An-uns

Differential Revision: https://reviews.llvm.org/D77230
2020-04-01 15:05:13 -04:00
Sanjay Patel 8431dbacd4 [InstCombine] add tests for negate with xor operand; NFC 2020-04-01 15:05:13 -04:00
Jonathan Roelofs 1148f004fa Fix PR45371: SeparateConstOffsetFromGEP clean up bookkeeping
find() was altering the UserChain, even in cases where it subsequently
discovered that the resulting constant was a 0. This confuses
rebuildWithoutConstOffset() when it attempts to walk the chain later, since it
is expected that the chain itself be a path down the use-def edges of an
expression.
2020-04-01 12:38:15 -06:00
Uday Bondhugula 6ee11c3b0f [NewGVN] Make NewGVN aware of aligned_alloc
Make the New GVN pass aware of aligned_alloc.

Depends on D76975.

Differential Revision: https://reviews.llvm.org/D76976
2020-04-01 23:26:51 +05:30
Uday Bondhugula 4cf70af94f [GVN] Make GVN aware of aligned_alloc
Make the GVN pass aware of aligned_alloc.

Depends on D76974.

Differential Revision: https://reviews.llvm.org/D76975
2020-04-01 23:26:50 +05:30
Uday Bondhugula c4499e3333 [Attributor] Make attributor aware of aligned_alloc for heap to stack conversion
Make the attributor pass aware of aligned_alloc for converting heap
allocations to stack ones.

Depends on D76971.

Differential Revision: https://reviews.llvm.org/D76974
2020-04-01 23:26:50 +05:30
Matt Arsenault 3f465d0d36 AMDGPU: Fix broken check lines 2020-04-01 10:52:22 -07:00
shchenz e344f8b9db Revert "[LSR] re-add testcase for wrongly phi node elimination - NFC"
This reverts commit f25a1b4f58.
ARM and hexagon fail at the new added case.
2020-04-01 12:58:06 +00:00
shchenz f25a1b4f58 [LSR] re-add testcase for wrongly phi node elimination - NFC
Retest the case on X86/SystemZ/AArch64/PowerPC
2020-04-01 11:11:17 +00:00
Cullen Rhodes 84aa6cf1a9 [Transforms][SROA] Promote allocas with mem2reg for scalable types
Summary:
Aggregate types containing scalable vectors aren't supported and as far
as I can tell this pass is mostly concerned with optimisations on
aggregate types, so the majority of this pass isn't very useful for
scalable vectors.

This patch modifies SROA such that mem2reg is run on allocas with
scalable types that are promotable, but nothing else such as slicing is
done.

The use of TypeSize in this pass has also been updated to be explicitly
fixed size. When invoking the following methods in DataLayout:

    * getTypeSizeInBits
    * getTypeStoreSize
    * getTypeStoreSizeInBits
    * getTypeAllocSize

we now called getFixedSize on the resultant TypeSize. This is quite an
extensive change with around 50 calls to these functions, and also the
first change of this kind (being explicit about fixed vs scalable
size) as far as I'm aware, so feedback welcome.

A test is included containing IR with scalable vectors that this pass is
able to optimise.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D76720
2020-04-01 10:34:11 +00:00
shchenz 8b8cd150a4 Revert "[LSR] add testcase for wrongly phi node elimination - NFC"
This reverts commit dbf5e4f6c7.
The testcase has different behaviour on PowerPC and X86.
2020-04-01 10:28:43 +00:00
shchenz dbf5e4f6c7 [LSR] add testcase for wrongly phi node elimination - NFC 2020-04-01 09:58:58 +00:00
Florian Hahn d307174e1d [ConstantRange] Use APInt::or/APInt::and for single elements.
Currently ConstantRange::binaryAnd/binaryOr results are too pessimistic
for single element constant ranges.

If both operands are single element ranges, we can use APInt's AND and
OR implementations directly.

Note that some other binary operations on constant ranges can cover the
single element cases naturally, but for OR and AND this unfortunately is
not the case.

Reviewers: nikic, spatel, lebedev.ri

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D76446
2020-04-01 09:50:24 +01:00
Florian Hahn e20cac3650 [Matrix] Add new test case with getelementptr constant exprs.
The new test mostly ensures we keep doing the right thing for constant
expressions while lowering matrix instructions.
2020-04-01 09:32:13 +01:00
Evgenii Stepanov f9471b0010 Fix MSan false positive due to select folding.
Summary:
Select folding in JumpThreading can create a conditional branch on a
code patch that did not have one in the original program. This is not a
valid transformation in sanitize_memory functions.

Note that JumpThreading does select folding in 3 different places. Two
of them seem safe - they apply to a select instruction in a BB that ends
with an unconditional branch to another BB, which (in turn) ends with a
conditional branch or a switch with the same condition.

Fixes PR45220.

Reviewers: glider, dvyukov, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76332
2020-03-31 15:25:42 -07:00
Anna Thomas 58a05675da Revert "[InlineFunction] Handle return attributes on call within inlined body"
This reverts commit 28518d9ae3.
There is a failure in MsgPackReader.cpp when built with clang. It
complains about "signext and zeroext" are incompatible. Investigating
offline if it is infact a UB in the MsgPackReader code.
2020-03-31 16:16:34 -04:00
Guozhi Wei 6d20937c29 [CodeGenPrepare] Delete intrinsic call to llvm.assume to enable more tailcall
The attached test case is simplified from tcmalloc. Both function calls should be optimized as tailcall. But llvm can only optimize the first call. The second call can't be optimized because function dupRetToEnableTailCallOpts failed to duplicate ret into block case2.

There 2 problems blocked the duplication:

  1 Intrinsic call llvm.assume is not handled by dupRetToEnableTailCallOpts.
  2 The control flow is more complex than expected, dupRetToEnableTailCallOpts can only duplicate ret into its predecessor, but here we have an intermediate block between call and ret.

The solutions:

  1 Since CodeGenPrepare is already at the end of LLVM IR phase, we can simply delete the intrinsic call to llvm.assume.
  2 A general solution to the complex control flow is hard, but for this case, after exit2 is duplicated into case1, exit2 is the only successor of exit1 and exit1 is the only predecessor of exit2, so they can be combined through eliminateFallThrough. But this function is called too late, there is no more dupRetToEnableTailCallOpts after it. We can add an earlier call to eliminateFallThrough to solve it.

Differential Revision: https://reviews.llvm.org/D76539
2020-03-31 11:55:51 -07:00
Anna Thomas 28518d9ae3 [InlineFunction] Handle return attributes on call within inlined body
Consider a callee function that has a call (C) within it which feeds
into the return.  When we inline that callee into a callsite that has
return attributes, we can backward propagate those attributes to the
call (C) within that inlined callee body.

This is safe to do so only if we can guarantee transfer of execution to
successor in the window of instructions between return value (i.e. the
call C) and the return instruction.

See added test cases.

Reviewed-By: reames, jdoerfert

Differential Revision: https://reviews.llvm.org/D76140
2020-03-31 14:35:40 -04:00
Uday Bondhugula dc817b2dea [InstCombine] Deduce attributes for aligned_alloc in InstCombine
Make InstCombine aware of the aligned_alloc library function.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Depends on D76970.

Differential Revision: https://reviews.llvm.org/D76971
2020-03-31 23:17:28 +05:30
Florian Hahn b0cd7b2799 [SCCP] Limit use of range info for binops to integers for now.
This fixes a crash when building the test suite.
2020-03-31 17:08:09 +01:00
Tyker 4aeb7e1ef4 [AssumeBundles] Preserve information in EarlyCSE
Summary: this patch preserve information from various places in EarlyCSE into assume bundles.

Reviewers: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76769
2020-03-31 17:47:04 +02:00
Sanjay Patel fa61b5059a [InstCombine] remove stray auto-generated test comment; NFC
The script now includes extra info about command-line options used
when generating its advertisement heading, but we don't want that
here. This is a special-case because we have enhanced the check
lines (as noted in the 2nd comment line).
2020-03-31 09:19:12 -04:00
Florian Hahn b37543750c [ValueLattice] Distinguish between constant ranges with/without undef.
This patch updates ValueLattice to distinguish between ranges that are
guaranteed to not include undef and ranges that may include undef.

A constant range guaranteed to not contain undef can be used to simplify
instructions to arbitrary values. A constant range that may contain
undef can only be used to simplify to a constant. If the value can be
undef, it might take a value outside the range. For example, consider
the snipped below

define i32 @f(i32 %a, i1 %c) {
  br i1 %c, label %true, label %false
true:
  %a.255 = and i32 %a, 255
  br label %exit
false:
  br label %exit
exit:
  %p = phi i32 [ %a.255, %true ], [ undef, %false ]
  %f.1 = icmp eq i32 %p, 300
  call void @use(i1 %f.1)
  %res = and i32 %p, 255
  ret i32 %res
}

In the exit block, %p would be a constant range [0, 256) including undef as
%p could be undef. We can use the range information to replace %f.1 with
false because we remove the compare, effectively forcing the use of the
constant to be != 300. We cannot replace %res with %p however, because
if %a would be undef %cond may be true but the  second use might not be
< 256.

Currently LazyValueInfo uses the new behavior just when simplifying AND
instructions and does not distinguish between constant ranges with and
without undef otherwise. I think we should address the remaining issues
in LVI incrementally.

Reviewers: efriedma, reames, aqjune, jdoerfert, sstefan1

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D76931
2020-03-31 12:50:20 +01:00
Daan Sprenkels 464b9aeafe [InstCombine] Transform extelt-trunc -> bitcast-extelt
Canonicalize the case when a scalar extracted from a vector is
truncated.  Transform such cases to bitcast-then-extractelement.
This will enable erasing the truncate operation.

This commit fixes PR45314.

reviewers: spatel

Differential revision: https://reviews.llvm.org/D76983
2020-03-31 11:53:41 +02:00
Sebastian Neubauer 5d3a69feca [AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.

This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.

Use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D65088
2020-03-31 10:35:39 +02:00
Florian Hahn 0c9c58ada0 [SCCP] Use constant ranges for casts.
For casts with constant range operands, we can use
ConstantRange::castOp.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71938
2020-03-31 09:22:04 +01:00
Wei Mi ebad678857 [SampleFDO] Port MD5 name table support to extbinary format.
Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it.

Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support.

Differential Revision: https://reviews.llvm.org/D76255
2020-03-30 22:07:08 -07:00
Daan Sprenkels 5227fa0c72 Recommit "[InstCombine] Update assertions in InstCombine test; NFC" 2020-03-31 00:00:41 +02:00
Daan Sprenkels 273b0d7766 Revert "[InstCombine] Update assertions in InstCombine test; NFC"
This reverts commit 4243bd494d.
2020-03-30 22:41:33 +02:00
Daan Sprenkels 4243bd494d [InstCombine] Update assertions in InstCombine test; NFC 2020-03-30 22:15:50 +02:00
Sanjay Patel f2fbdf76d8 [InstCombine] do not exclude min/max from icmp with casted operand fold
InstCombine has a mess of logic that tries to preserve min/max patterns,
but AFAICT, this one is not necessary because we can always narrow the
corresponding select in this sequence to match the narrow compare.

The biggest danger for this patch is inducing infinite looping or
assert from exceeding max iterations. If any bots hit that in the
vicinity of this commit, this is the likely patch to blame.
2020-03-30 16:10:51 -04:00
Bill Wendling fa496ce3c6 [Intrinsic] Give "is.constant" the "convergent" attribute
Summary:
Code frequently relies upon the results of "is.constant" intrinsics to
DCE invalid code paths. We don't want the intrinsic to be made control-
dependent on any additional values. For instance, we can't split a PHI
into a "constant" and "non-constant" part via jump threading in order
to "optimize" the constant part, because the "is.constant" intrinsic is
meant to return "false".

Reviewers: wmi, kazu, MaskRay

Reviewed By: kazu

Subscribers: jdoerfert, efriedma, joerg, lebedev.ri, nikic, xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75799
2020-03-30 11:47:12 -07:00
Sameer Sahasrabuddhe 3cbbded68c Introduce unify-loop-exits pass.
For each natural loop with multiple exit blocks, this pass creates a
new block N such that all exiting blocks now branch to N, and then
control flow is redistributed to all the original exit blocks.

The bulk of the tranformation is a new function introduced in
BasicBlockUtils that an redirect control flow from a set of incoming
blocks to a set of outgoing blocks via a common "hub".

This is a useful workaround for a limitation in the structurizer which
incorrectly orders blocks when processing a nest of loops. This pass
bypasses that issue by ensuring that each natural loop is recognized
as a separate region. Since the structurizer is a region pass, it no
longer sees a nest of loops in a single region, and instead processes
each "level" in the nesting as a separate region.

The AMDGPU backend provides a new option to enable this pass before
the structurizer, which may eventually be enabled by default.

Reviewers: madhur13490, arsenm, nhaehnle

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D75865
2020-03-30 13:23:56 -04:00
Vedant Kumar dcc410b5cf [LoopVectorize] Fix crash on "getNoopOrZeroExtend cannot truncate!" (PR45259)
In InnerLoopVectorizer::getOrCreateTripCount, when the backedge taken
count is a SCEV add expression, its type is defined by the type of the
last operand of the add expression.

In the test case from PR45259, this last operand happens to be a
pointer, which (according to llvm::Type) does not have a primitive size
in bits. In this case, LoopVectorize fails to truncate the SCEV and
crashes as a result.

Uing ScalarEvolution::getTypeSizeInBits makes the truncation work as expected.

https://bugs.llvm.org/show_bug.cgi?id=45259

Differential Revision: https://reviews.llvm.org/D76669
2020-03-30 10:14:14 -07:00
Sanjay Patel bc60cdcc3f [InstCombine] add test for trunc-extelt; NFC
Goes with D76983
2020-03-30 09:43:03 -04:00
Florian Hahn 84c1fbab5d [CVP] Add additional icmp for ranges with undef to test. 2020-03-30 10:59:25 +01:00
Jun Ma 31a1d85c53 [Coroutines 2/2] Improve symmetric control transfer feature
Differential Revision: https://reviews.llvm.org/D76913
2020-03-30 09:53:09 +08:00
Jun Ma a94fa2c049 [Coroutines 1/2] Improve symmetric control transfer feature
Differential Revision: https://reviews.llvm.org/D76911
2020-03-30 09:53:09 +08:00
Daan Sprenkels 24562c6588 [InstCombine] Add tests for trunc (extelt x); (NFC)
Baseline tests for D76983 (PR45314)

Differential Revision: https://reviews.llvm.org/D77024
2020-03-29 17:30:54 -04:00
Uday Bondhugula c0955edfd6 Introduce support for lib function aligned_alloc in TLI / memory builtins
Aligned_alloc is a standard lib function and has been in glibc since
2.16 and in the C11 standard. It has semantics similar to malloc/calloc
for several analyses/transforms. This patch introduces aligned_alloc
in target library info and memory builtins. Subsequent ones will
make other passes aware and fix https://bugs.llvm.org/show_bug.cgi?id=44062

This change will also be useful to LLVM generators that need to allocate
buffers of vector elements larger than 16 bytes (for eg. 256-bit ones),
element boundary alignment for which is not typically provided by glibc malloc.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Differential Revision: https://reviews.llvm.org/D76970
2020-03-29 23:36:24 +05:30
Sanjay Patel febcb24f14 [InstCombine] make test independent of branch undef/UB; NFC 2020-03-29 13:32:47 -04:00
Richard Diamond 4bf015c035 [AlignmentFromAssumptions] Fix a SCEV assertion resulting from address space differences.
Summary:
On targets with different pointer sizes, -alignment-from-assumptions could attempt to create SCEV expressions which use different effective SCEV types. The provided test illustrates the issue.

In `getNewAlignment`, AASCEV would be the (only) alloca, which would have an effective SCEV type of i32. But PtrSCEV, the GEP in this case, due to being in the flat/default address space, will have an effective SCEV of i64.

This patch resolves the issue by truncating PtrSCEV to AASCEV's effective type.

Reviewers: hfinkel, jdoerfert

Reviewed By: jdoerfert

Subscribers: jvesely, nhaehnle, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75471
2020-03-29 01:26:31 -05:00
Nikita Popov 30d712103f [InstCombine] Use replaceOperand() API in GEP transforms
To make sure that replaced operands get DCEd. This drops one
iteration from gepphigep.ll, which is still not optimal.

This was the last test case performing more than 3 iterations.

NFC-ish, only worklist order should change.
2020-03-28 19:07:25 +01:00
Nikita Popov 672e8bfbfc [InstCombine] Fix worklist management in foldXorOfICmps()
Because this code does not use the IC-aware replaceInstUsesWith()
helper, we need to manually push users to the worklist.

This is NFC-ish, in that it may only change worklist order.
2020-03-28 18:25:21 +01:00
Nikita Popov 337b671b0d [InstCombine] Change limit-max-iterations test case; NFC
This particular case will stop needing multiple iterations in
a followup change.
2020-03-28 18:25:20 +01:00
Serge Pavlov f398739152 [FEnv] Constfold some unary constrained operations
This change implements constant folding to constrained versions of
intrinsics, implementing rounding: floor, ceil, trunc, round, rint and
nearbyint.

Differential Revision: https://reviews.llvm.org/D72930
2020-03-28 12:28:33 +07:00
Sanjay Patel 0f56bbc1a5 [InstCombine] reduce FP-casted and bitcasted signbit check
PR45305:
https://bugs.llvm.org/show_bug.cgi?id=45305

Alive2 proofs:
http://volta.cs.utah.edu:8080/z/bVyrko
http://volta.cs.utah.edu:8080/z/Vxpz9q
2020-03-27 17:33:59 -04:00
Sanjay Patel e72730ee3a [InstCombine] add tests for FP cast+bitcast signbit checks; NFC
PR45305:
https://bugs.llvm.org/show_bug.cgi?id=45305
2020-03-27 17:25:25 -04:00
Simon Pilgrim f4f4a8bfef [InstCombine][X86] Add repeated ops demanded elts tests for SSE intrinsics (PR24523) 2020-03-27 14:51:09 +00:00
Simon Pilgrim ec3bb6c3e7 [InstCombine][X86] Regenerate SSE2 tests 2020-03-27 14:51:09 +00:00
Sanjay Patel 5237262feb [InstCombine] add shuffle-with-bitcast-operand tests; NFC 2020-03-26 14:28:47 -04:00
Jonathan Roelofs 7a89a5d81b [InstCombine] Fix Incorrect fold of ashr+xor -> lshr w/ vectors
Fixes https://bugs.llvm.org/show_bug.cgi?id=43665
2020-03-26 12:09:36 -06:00
Sam Parker db8a3c4206 [NFC] Create X86 subdirectory for indvar tests
Many IndVarSiimplify tests target an x86 triple, so move them into
a target specific folder.
2020-03-26 12:24:45 +00:00
John McCall 9514c048d8 Use optimal layout and preserve alloca alignment in coroutine frames.
Previously, we would ignore alloca alignment when building the frame
and just use the natural alignment of the allocated type.  If an alloca
is over-aligned for its IR type, this could lead to a frame entry with
inadequate alignment for the downstream uses of the alloca.

Since highly-aligned fields also tend to produce poor layouts under a
naive layout algorithm, I've also switched coroutine frames to use the
new optimal struct layout algorithm.

In order to communicate the frame size and alignment to later passes,
I needed to set align+dereferenceable attributes on the frame-pointer
parameter of the resume function.  This is clearly the right thing to
do, but the align attribute currently seems to result in assumptions
being added during inlining that the optimizer cannot easily remove.
2020-03-26 00:51:09 -04:00
Florian Hahn 081efa7dd0 [SCCP] Add a few constantexpr,undef tests for cond propagation 2020-03-25 21:28:35 +00:00
Tyker f1a9efabcb Ignore/Drop droppable uses for code-sinking in InstCombine
Summary:
This patch allows code-sinking in InstCombine to be performed when instruction have uses in llvm.assume.

Use are considered droppable when it is preferable to modify the User such that the use disappears rather than to prevent a transformation because of the use.
for now uses are considered droppable if they are in an llvm.assume.

Reviewers: jdoerfert, nikic, spatel, lebedev.ri, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73832
2020-03-25 20:42:52 +01:00
Alexandre Ganea 934d4feab1 [ThinLTO] Don't rely on debug output for thinlto_samplepgo_icp3 test
Because using -print-imports is not thread-safe, make the test rely on llvm-dis instead.
Also cover the ICALL-PROM part as intended originally.

Differential Revision: https://reviews.llvm.org/D76775
2020-03-25 14:38:20 -04:00
Sanjay Patel f631b9dc36 [VectorCombine] add shuffle tests; NFC
Goes with DD76727.
2020-03-25 10:35:03 -04:00
sstefan1 72b51d6f93 OpenMP] Adding InaccessibleMemOnly and InaccessibleMemOrArgMemOnly for runtime calls.
Summary: Attempt to add more attributes for runtime calls.

Reviewers: jdoerfert

Subscribers: guansong, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75010
2020-03-25 14:08:50 +00:00
Juneyoung Lee d82c1e8c56 Rename test name, add more tests for codegenprepare 2020-03-25 20:31:12 +09:00
Juneyoung Lee e951a48996 Add freeze(and x, const) case to codegenprepare's freeze-cmp.ll 2020-03-25 17:29:01 +09:00
Johannes Doerfert 5699d08b79 [Attributor] Use knowledge retained in llvm.assume (operand bundles)
This patch integrates operand bundle llvm.assumes [0] with the
Attributor. Most IRAttributes will now look at uses of the associated
value and if there are llvm.assume operand bundle uses with the right
tag we will check if they are in the must-be-executed-context (around
the context instruction). Droppable users, which is currently only
llvm::assume, are handled special in some places now as well.

[0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D74888
2020-03-24 15:33:40 -05:00
Sanjay Patel c84446f4e9 [VectorCombine] add tests for bitcast (shuffle); NFC 2020-03-24 15:18:32 -04:00
Juneyoung Lee 49f75132bc [DivRemPairs] Freeze operands if they can be undef values
Summary:
DivRemPairs is unsound with respect to undef values.

```
      // bb1:
      //   %rem = srem %x, %y
      // bb2:
      //   %div = sdiv %x, %y
      // -->
      // bb1:
      //   %div = sdiv %x, %y
      //   %mul = mul %div, %y
      //   %rem = sub %x, %mul
```

If X can be undef, X should be frozen first.
For example, let's assume that Y = 1 & X = undef:
```
   %div = sdiv undef, 1 // %div = undef
   %rem = srem undef, 1 // %rem = 0
 =>
   %div = sdiv undef, 1 // %div = undef
   %mul = mul %div, 1   // %mul = undef
   %rem = sub %x, %mul  // %rem = undef - undef = undef
```
http://volta.cs.utah.edu:8080/z/m7Xrx5

Same for Y. If X = 1 and Y = (undef | 1), %rem in src is either 1 or 0,
but %rem in tgt can be one of many integer values.

This resolves https://bugs.llvm.org/show_bug.cgi?id=42619 .

This miscompilation disappears if undef value is removed, but it may take a while.
DivRemPair happens pretty late during the optimization pipeline, so this optimization seemed as a good candidate to fix without major regression using freeze than other broken optimizations.

Reviewers: spatel, lebedev.ri, george.burgess.iv

Reviewed By: spatel

Subscribers: wuzish, regehr, nlopes, nemanjai, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76483
2020-03-25 03:46:14 +09:00
Sanjay Patel 88b493a838 [ValueTracking] improve undef/poison analysis for constant vectors
Differential Revision: https://reviews.llvm.org/D76702
2020-03-24 13:35:47 -04:00
Sanjay Patel 6c3c7a0dd6 [InstSimplify] add tests for freeze(constexpr); NFC 2020-03-24 11:39:19 -04:00
Sanjay Patel 58ec867a3b [InstSimplify] add more tests for freeze(constant); NFC
These should really be moved over to a ConstantFolding test file,
but since this may overlap with the in-progress D76010 and similar
tests already exist here, we can do that as a later cleanup.
2020-03-24 09:53:49 -04:00
Douglas Yung 18e1a59eed Fix another instance where a variable was renamed in the generated LLVM IR. [NFC] 2020-03-23 22:53:29 -07:00
Jun Ma a44de12ab2 [Coroutines] Also check lifetime intrinsic for local variable when build
coroutine frame

Currently we move all allocas into the frame when build coroutine frame in
CoroSplit pass. However, this can be relaxed.

Since CoroSplit pass run after Inline pass, we can use lifetime intrinsic to
do such analysis: If the scope of lifetime intrinsic is not across any suspend
point, rather than move the allocas to frame, we can just move them to entry bb
of corresponding function. This reduce the frame size.

More importantly, this also avoid data race in multithread environment.
Consider one inline function by coroutine: it starts a thread which access
local variables, while after inline the movement of allocs to frame also access
them. cause data race.

Differential Revision: https://reviews.llvm.org/D75664
2020-03-24 13:41:55 +08:00
Vedant Kumar b7cd291c15 [GlobalOpt] Treat null-check of loaded value as use of global (PR35760)
PR35760 shows an example program which, when compiled with `clang -O0`
or gcc at any optimization level, prints '0'. However, llvm transforms
the program in a way that causes it to print '1'.

Fix the issue by having `AllUsesOfValueWillTrapIfNull` return false when
analyzing a load from a global which is used by an `icmp`. This special
case was untested [0] so this is just deleting dead code.

An alternative fix might be to change the GlobalStatus analysis for the
global to report "Stored" instead of "StoredOnce". However, "StoredOnce"
is appropriate when only one value other than the initializer is stored
to the global.

[0]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/Transforms/IPO/GlobalOpt.cpp.html#L662

Differential Revision: https://reviews.llvm.org/D76645
2020-03-23 22:36:09 -07:00
Douglas Yung e79b1ab65b Make test more flexible for when the variable is renamed in the generated LLVM IR. [NFC] 2020-03-23 22:03:21 -07:00
Matt Arsenault 66073953a5 AMDGPU: Allow vectorization of round intrinsic
There seems to be a small benefit to the legalized sequence for v2f16
round with packed instructions, so allow vectorizing it by reducing
the cost.

An unintended side effect is vectorization of f32 round also
happens. The current FMA logic seems off to me, and isn't checking for
packed instructions.
2020-03-23 17:00:41 -04:00
Matt Arsenault b20a1d840f GVNSink: Allow handling addrspacecast 2020-03-23 16:50:58 -04:00
Matt Arsenault 43d98a0ecf Allow replacing intrinsic operands with variables
Since intrinsics can now specify when an argument is required to be
constant, it is now OK to replace arguments with variables if they
aren't. This means intrinsics must now be accurately marked with
immarg.
2020-03-23 15:51:57 -04:00
Sanjay Patel a1fe6beb1e [InstCombine] remove one-use check for ctpop -> cttz
Two one-use checks were added with rGfdcb27105537,
but only the first one is necessary to limit an
increase in instruction count. The second transform
only creates one instruction, so it is always a
reasonable canonicalization/optimization.
2020-03-23 13:59:57 -04:00
Johannes Doerfert 9d38f98dc3 [OpenMPOpt] Validate declaration types against the expected types
Validation of the found runtime library functions declarations types
(return and argument types) with the expected types.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76058
2020-03-23 11:43:36 -05:00
Johannes Doerfert 68fed27067 [Attributor] Handle calls in AAValueConstantRange properly
We did handle calls that were operands of certain instructions but not
standalone calls we visit via indirection, e.g., selects.
2020-03-23 10:45:24 -05:00
Johannes Doerfert 54ec9b54f6 [Attributor] Unify handling of must-tail calls
We special cased must-tail calls all over the place because they cannot
be modified as other calls can be. However, we already centralized the
modification API so we can centralize the handling as well. This
simplifies the code and allows to remove must-tail calls completely.
2020-03-23 10:45:24 -05:00