Commit Graph

23852 Commits

Author SHA1 Message Date
Guillaume Chatelet ff858d7781 [Alignment][NFC] Add DebugStr and operator*
Summary:
This is a roll forward of D77394 minus AlignmentFromAssumptions (which needs to be addressed separately)
Differences from D77394:
 - DebugStr() now prints the alignment value or `None` and no more `Align(x)` or `MaybeAlign(x)`
   - This is to keep Warning message consistent (CodeGen/SystemZ/alloca-04.ll)
 - Removed a few unneeded headers from Alignment (since it's included everywhere it's better to keep the dependencies to a minimum)

Reviewers: courbet

Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77537
2020-04-06 12:09:45 +00:00
Florian Hahn 39f2d9aa81 [Matrix] Add option to use row-major matrix layout as default.
This patch adds a -matrix-default-layout option which can be used to
set the default matrix layout to row-major or column-major (default).

The initial patch updates codegen for loads, stores, binary operators
and matrix multiply.

Reviewers: anemet, Gerolf, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D76325
2020-04-06 10:00:56 +01:00
Florian Hahn d1fed7081d [Matrix] Add initial tiling for load/multiply/store chains.
This patch adds initial fusion for load/multiply/store chains of matrix
operations.

The patch contains roughly two parts:

1. Code generation for a fused load/multiply/store chain (LowerMatrixMultiplyFused).
First, we ensure that both loads of the multiply operands do not alias the store.
If they do, we create new non-aliasing copies of the operands. Note that this
may introduce new basic block. Finally we process TileSize x TileSize blocks.
That is: load tiles from the input operands, multiply and store them.

2. Identify fusion candidates & matrix instructions.
As a first step, collect all instructions with shape info and fusion candidates
(currently @llvm.matrix.multiply calls). Next, try to fuse candidates and
collect instructions eliminated by fusion. Finally iterate over all matrix
instructions, skip the ones eliminated by fusion and lower the rest as usual.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D75566
2020-04-06 09:28:15 +01:00
Guillaume Chatelet 6000478f39 Revert "[Alignment][NFC] Add DebugStr and operator*"
This reverts commit 1e34ab98fc.
2020-04-06 07:55:25 +00:00
Guillaume Chatelet 1e34ab98fc [Alignment][NFC] Add DebugStr and operator*
Summary:
Also updates files to use them.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77394
2020-04-06 07:12:46 +00:00
Tarindu Jayatilaka b43b59fcc0 Expose `attributor-disable` to the new and old pass managers
The new and old pass managers (PassManagerBuilder.cpp and
PassBuilder.cpp) are exposed to an `extern` declaration of
`attributor-disable` option which will guard the addition of the
attributor passes to the pass pipelines.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76871
2020-04-05 22:29:34 -05:00
Anna Thomas 1d0f757904 [InlineFunction] Update metadata on loads that are return values
This patch builds upon D76140 by updating metadata on pointer typed
loads in inlined functions, when the load is the return value, and the
callsite contains return attributes which can be updated as metadata on
the load.
Added test cases show this for nonnull, dereferenceable,
dereferenceable_or_null

Reviewed-By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76792
2020-04-05 14:50:10 -04:00
Sanjay Patel 538a8f0227 [InstCombine] convert bitcast-shuffle to vector trunc
As discussed in D76983, that patch can turn a chain of insert/extract
with scalar trunc ops into bitcast+extract and existing instcombine
vector transforms end up creating a shuffle out of that (see the
PhaseOrdering test for an example). Currently, that process requires
at least this sequence: -instcombine -early-cse -instcombine.

Before D76983, the sequence of insert/extract would reach the SLP
vectorizer and become a vector trunc there.

Based on a small sampling of public targets/types, converting the
shuffle to a trunc is better for codegen in most cases (and a
regression of that form is the reason this was noticed). The trunc is
clearly better for IR-level analysis as well.

This means that we can induce "spontaneous vectorization" without
invoking any explicit vectorizer passes (at least a vector cast op
may be created out of scalar casts), but that seems to be the right
choice given that we started with a chain of insert/extract, and the
backend would expand back to that chain if a target does not support
the op.

Differential Revision: https://reviews.llvm.org/D77299
2020-04-05 09:48:02 -04:00
Sanjay Patel 4036a0af24 [InstCombine] enhance freelyNegateValue() by handling 'not'
This patch extends D77230. If we have a 'not' instruction inside a
negated expression, we can ignore extra uses of that op because the
negation has a one-to-one replacement: negate becomes increment.

Alive2 examples of the test cases:
http://volta.cs.utah.edu:8080/z/T5-u9P
http://volta.cs.utah.edu:8080/z/eT89L6

Differential Revision: https://reviews.llvm.org/D77459
2020-04-05 09:16:19 -04:00
Stefanos Baziotis f3dd3a66d3 [Attributor] AAUndefinedBehavior: Use AAValueSimplify in memory accessing instructions.
Query AAValueSimplify on pointers in memory accessing instructions to take
advantage of the constant propagation (or any other value simplification) of such values.
2020-04-05 02:46:26 +03:00
Florian Hahn a2b18c5a08 [LV] Simplify tryToWiden as recipes are not re-used (NFC).
After 49d00824bb, VPWidenRecipe only stores a single instruction.
tryToWiden can simply return the widen recipe, like other helpers in
VPRecipeBuilder.
2020-04-04 18:30:50 +01:00
Nikita Popov 4ede730096 [InstCombine] Don't limit uses in eraseInstFromFunction()
eraseInstFromFunction() adds the operands of the erased instructions,
as those might now be dead as well. However, this is limited to
instructions with less than 8 operands.

This check doesn't make a lot of sense to me. As the instruction
gets removed afterwards, I don't see a potential for anything
overly pathological happening here (as we can only add those
operands to the worklist once). The impact on CTMark is in
the noise. We also have the same code in instruction sinking
and don't limit the operand count there.

Differential Revision: https://reviews.llvm.org/D77325
2020-04-04 18:37:30 +02:00
Luofan Chen eec6d87626 [Attributor] Deduce attributes for non-exact functions
This patch is based on D63312 and D63319. For now we create shallow wrappers for all functions that are IPO amendable.
See also [this github issue](https://github.com/llvm/llvm-project/issues/172).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76404
2020-04-04 11:34:58 -05:00
Nikita Popov 6896d559f3 [VNCoercion] Use IRBuilderBase; NFC
And remove include from header.
2020-04-04 12:44:50 +02:00
Nikita Popov ebd5a1b049 [Reassociate] Use IRBuilderBase; NFC
And remove now unnecessary IRBuilder.h include in header.
2020-04-04 12:34:16 +02:00
Nikita Popov 1055e9e3c8 [IVDescriptors] Remove IRBuilder.h include; NFC
IVDescriptors.h itself does not reference IRBuilder at all.
Move the include into transformation passes that do.
2020-04-04 12:07:57 +02:00
Sanjay Patel ce97ce3a5d [VectorCombine] try to form a better extractelement
Extracting to the same index that we are going to insert back into
allows forming select ("blend") shuffles and enables further transforms.

Admittedly, this is a quick-fix for a more general problem that I'm
hoping to solve by adding transforms for patterns that start with an
insertelement.

But this might resolve some regressions known to be caused by the
extract-extract transform (although I have not gotten more details on
those yet).

In the motivating case from PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724

The combination of subsequent instcombine and codegen transforms gets us this improvement:

  vmovshdup	%xmm0, %xmm2    ## xmm2 = xmm0[1,1,3,3]
  vhaddps	%xmm1, %xmm1, %xmm4
  vmovshdup	%xmm1, %xmm3    ## xmm3 = xmm1[1,1,3,3]
  vaddps	%xmm0, %xmm2, %xmm0
  vaddps	%xmm1, %xmm3, %xmm1
  vshufps	$200, %xmm4, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm4[0,3]
  vinsertps	$177, %xmm1, %xmm0, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm1[2]

  -->

  vmovshdup	%xmm0, %xmm2    ## xmm2 = xmm0[1,1,3,3]
  vhaddps	%xmm1, %xmm1, %xmm1
  vaddps	%xmm0, %xmm2, %xmm0
  vshufps	$200, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm1[0,3]

Differential Revision: https://reviews.llvm.org/D76623
2020-04-03 13:55:13 -04:00
Roman Lebedev 7d572ef2dd
Revert "[SCEV] rewriteLoopExitValues(): even if have hard uses, still rewrite if cheap (PR44668)"
As discussed in post-commit review in https://reviews.llvm.org/D73501
if the goal of this is to help vectorizer, then we should actually
be teaching vectorizer to do this, because right now this rewrite
is still budget-limited, which isn't what we'd want.

Additionally, while the rest of the patch series was universally profitable,
this particular patch is reportedly (https://reviews.llvm.org/D73501#1905171)
exposing cost-modeling issues on ARM.

So let's just back this particular patch out. Once there's an undo transform,
this could be considered for reintegration.

This reverts commit 44edc6fd2c.
2020-04-03 20:15:04 +03:00
Matt Arsenault 57a55313c3 InstCombine: Reduce minnum/maxnum if inputs are casted 2020-04-03 11:57:25 -04:00
Guillaume Chatelet 1a584a8d50 [Alignment][NFC] Remove unused private functions
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77297
2020-04-03 09:16:20 +00:00
OCHyams 9b56cc9361 [DebugInfo] Salvage debug info when sinking loop invariant instructions
Reviewed By: vsk, aprantl, djtodoro

Differential Revision: https://reviews.llvm.org/D77318
2020-04-03 09:19:26 +01:00
Hongtao Yu 88da019977 Fix a bug in the inliner that causes subsequent double inlining
Summary:
A recent change in the instruction simplifier enables a call to a function that just returns one of its parameter to be simplified as simply loading the parameter. This exposes a bug in the inliner where double inlining may be involved which in turn may cause compiler ICE when an already-inlined callsite is reused for further inlining.
To put it simply, in the following-like C program, when the function call second(t) is inlined, its code t = third(t) will be reduced to just loading the return value of the callsite first(). This causes the inliner internal data structure to register the first() callsite for the call edge representing the third() call, therefore incurs a double inlining when both call edges are considered an inline candidate. I'm making a fix to break the inliner from reusing a callsite for new call edges.

```
void top()
{
    int t = first();
    second(t);
}

void second(int t)
{
   t = third(t);
   fourth(t);
}

void third(int t)
{
   return t;
}
```
The actual failing case is much trickier than the example here and is only reproducible with the legacy inliner. The way the legacy inliner works is to process each SCC in a bottom-up order. That means in reality function first may be already inlined into top, or function third is either inlined to second or is folded into nothing. To repro the failure seen from building a large application, we need to figure out a way to confuse the inliner so that the bottom-up inlining is not fulfilled. I'm doing this by making the second call indirect so that the alias analyzer fails to figure out the right call graph edge from top to second and top can be processed before second during the bottom-up.  We also need to tweak the test code so that when the inlining of top happens, the function body of second is not that optimized, by delaying the pass of function attribute deducer (i.e, which tells function third has no side effect and just returns its parameter). Since the CGSCC pass is iterative, additional calls are added to top to postpone the inlining of second to the second round right after the first function attribute deducing pass is done. I haven't been able to repro the failure with the new pass manager since the processing order of ininlined callsites is a bit different, but in theory the issue could happen there too.

Note that this fix could introduce a side effect that blocks the simplification of inlined code, specifically for a call site that can be folded to another call site. I hope this can probably be complemented by subsequent inlining or folding, as shown in the attached unit test. The ideal fix should be to separate the use of VMap. However, in reality this failing pattern shouldn't happen often. And even if it happens, there should be a good chance that the non-folded call site will be refolded by iterative inlining or subsequent simplification.

Reviewers: wenlei, davidxl, tejohnson

Reviewed By: wenlei, davidxl

Subscribers: eraman, nikic, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76248
2020-04-02 21:08:05 -07:00
Jun Ma 9c6f32a0ff [Coroutines] Simplify implementation using removePredecessor
Differential Revision: https://reviews.llvm.org/D77035
2020-04-03 09:20:07 +08:00
Anna Thomas bf7a16a768 [InlineFunction] Update valid return attributes at callsite within callee body
Consider a callee function that has a call (C) within it which feeds
into the return. When we inline that callee into a callsite that has
return attributes, we can backward propagate valid attributes to the
call (C) within that inlined callee body.

This is safe to do so only if we can guarantee transfer of execution to
successor in the window of instructions between return value (i.e. the
call C) and the return instruction.

Also, this is valid only for attributes which are a property of a
callsite and not those that are not dependent on the ABI, or a property
of the call itself.

Reviewed-By: reames, jdoerfert

Differential Revision: https://reviews.llvm.org/D76140
2020-04-02 14:13:12 -04:00
Sanjay Patel f4448063cc [InstCombine] try to reduce shuffle with bitcasted operand
shuf (bitcast X), undef, Mask --> bitcast X'

The 'inverse shuffles' test (shuf_bitcast_operand) is a pattern
in the motivating examples from PR35454:
https://bugs.llvm.org/show_bug.cgi?id=35454
(see also D76727)

We can deal with this class of patterns in generic instcombine
because we are not creating any new shuffles, just a bitcast.

Alive2 proof:
http://volta.cs.utah.edu:8080/z/mwDUZf

Differential Revision: https://reviews.llvm.org/D76844
2020-04-02 13:44:50 -04:00
Sanjay Patel b6050ca181 [VectorCombine] transform bitcasted shuffle to narrower elements
bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'

We do not attempt this in InstCombine because we do not want to change
types and create new shuffle ops that are potentially not lowered as
well as the original code. Here, we can check the cost model to see if
it is worthwhile.

I've aggressively enabled this transform even if the types are the same
size and/or equal cost because moving the bitcast allows InstCombine to
make further simplifications.

In the motivating cases from PR35454:
https://bugs.llvm.org/show_bug.cgi?id=35454
...this is enough to let instcombine and the backend eliminate the
redundant shuffles, but we probably want to extend VectorCombine to
handle the inverse pattern (shuffle-of-bitcast) to get that
simplification directly in IR.

Differential Revision: https://reviews.llvm.org/D76727
2020-04-02 13:30:22 -04:00
Sanjay Patel 12fcbcecff [InstCombine] add tests for cmyk benchmark; NFC
These are versions of a function that regressed with:
rGf2fbdf76d8d0

That particular problem occurs with an instcombine-simplifycfg-instcombine
sequence, but we can show that it exists within instcombine only with
other variations of the pattern.
2020-04-02 13:00:46 -04:00
Benjamin Kramer de8831934a [LoopDataPrefetch] Remove unused include that's a layering violation 2020-04-02 17:46:10 +02:00
Benjamin Kramer dffc503187 Revert "[SimplifyLibCalls] Erase replaced instructions"
This reverts commit 2a77544ad5. This
introduces a use-after-free in Transforms/InstCombine/sincospi.ll.
Found by asan.
2020-04-02 17:30:47 +02:00
Sanjay Patel 1008435f3d Revert "[InstCombine] do not exclude min/max from icmp with casted operand fold"
This reverts commit f2fbdf76d8.
As noted in the post-commit thread:
https://reviews.llvm.org/rGf2fbdf76d8d0
...this can obscure a min/max pattern where the components
have extra uses. We can show that the problem is independent
of this change with a slightly modified source example, so
this revert just delays/reduces the need to fix the real
problem.

We need to improve our analysis of negation or -- more
generally -- subtraction using patches like D77230 or D68408.
2020-04-02 09:15:23 -04:00
Tyker c00cb76274 [NFC] Split Knowledge retention and place it more appropriatly
Summary:
Splitting Knowledge retention into Queries in Analysis and Builder into Transform/Utils
allows Queries and Transform/Utils to use Analysis.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77171
2020-04-02 15:01:41 +02:00
Jonas Paulsson 36d4421f50 [LoopDataPrefetch + SystemZ] Let target decide on prefetching for each loop.
This patch adds

- New arguments to getMinPrefetchStride() to let the target decide on a
  per-loop basis if software prefetching should be done even with a stride
  within the limit of the hw prefetcher.

- New TTI hook enableWritePrefetching() to let a target do write prefetching
  by default (defaults to false).

- In LoopDataPrefetch:

  - A search through the whole loop to gather information before emitting any
    prefetches. This way the target can get information via new arguments to
    getMinPrefetchStride() and emit prefetches more selectively. Collected
    information includes: Does the loop have a call, how many memory
    accesses, how many of them are strided, how many prefetches will cover
    them. This is NFC to before as long as the target does not change its
    definition of getMinPrefetchStride().

  - If a previous access to the same exact address was 'read', and the
    current one is 'write', make it a 'write' prefetch.

  - If two accesses that are covered by the same prefetch do not dominate
    each other, put the prefetch in a block that dominates both of them.

  - If a ConstantMaxTripCount is less than ItersAhead, then skip the loop.

- A SystemZ implementation of getMinPrefetchStride().

Review: Ulrich Weigand, Michael Kruse

Differential Revision: https://reviews.llvm.org/D70228
2020-04-02 14:57:46 +02:00
Florian Hahn a63b5c9e53 [CallSiteSplitting] Simplify isPredicateOnPHI & continue checking PHIs.
As pointed out by @thakis, currently CallSiteSplitting bails out after
checking the first PHI node. We should check all PHI nodes, until we
find one where call site splitting is beneficial.

This patch also slightly simplifies the code using BasicBlock::phis().

Reviewers: davidxl, junbuml, thakis

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D77089
2020-04-02 10:11:27 +01:00
Johannes Doerfert bcd8009369 [Attributor] Use the proper context instruction in genericValueTraversal
There was a TODO in genericValueTraversal to provide the context
instruction and due to the lack of it users that wanted one just used
something available. Unfortunately, using a fixed instruction is wrong
in the presence of PHIs so we need to update the context instruction
properly.

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D76870
2020-04-01 22:20:47 -05:00
Johannes Doerfert ac96c8fd85 [Attributor][FIX] Do not compute ranges for arguments of declarations
This cannot be triggered right now, as far as I know, but it doesn't
make sense to deduce a constant range on arguments of declarations.
Exposed during testing of AAValueSimplify extensions.
2020-04-01 22:05:30 -05:00
Johannes Doerfert 54d6a608bf [Attributor][NFC] Predetermine the module
It could happen that we delete the first function in the SCC in the
future so we should be careful accessing `Functions` after the manifest
stage.
2020-04-01 21:56:17 -05:00
Johannes Doerfert 9e19693994 [Attributor] Derive better alignment for accessed pointers
Use DL & ABI information for better alignment deduction, e.g., if a type
is accessed and the ABI specifies an alignment requirement for such an
access we can use it. This is based on a patch by @lebedev.ri and
inspired by getBaseAlign in Loads.cpp.

Depends on D76673.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D76674
2020-04-01 21:49:57 -05:00
Johannes Doerfert b1c788d051 [Attributor][FIX] Prevent alignment breakage wrt. must-tail calls
If we have a must-tail call the callee and caller need to have matching
ABIs. Part of that is alignment which we might modify when we deduce
alignment of arguments of either. Since we would need to keep them in
sync, which is not as simple, we simply avoid deducing alignment for
arguments of the must-tail caller or callee.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D76673
2020-04-01 21:40:07 -05:00
Johannes Doerfert 41f2a57d0b [Attributor][NFC] Use a BumpPtrAllocator to allocate `AbstractAttribute`s
We create a lot of AbstractAttributes and they live as long as
the Attributor does. It seems reasonable to allocate them via a
BumpPtrAllocator owned by the Attributor.

Reviewed By: baziotis

Differential Revision: https://reviews.llvm.org/D76589
2020-04-01 20:53:28 -05:00
Sanjay Patel 3d90048791 [InstCombine] enhance freelyNegateValue() by handling xor
Negation is equivalent to bitwise-not + 1, so try to convert more
subtracts into adds using this relationship:
0 - (A ^ C) => ((A ^ C) ^ -1) + 1 => A ^ ~C + 1

I doubt this will recover the regression noted in rGf2fbdf76d8d0,
but seems like we're going to need to improve here and/or revive D68408?

Alive2 proofs:
http://volta.cs.utah.edu:8080/z/Re5tMU
http://volta.cs.utah.edu:8080/z/An-uns

Differential Revision: https://reviews.llvm.org/D77230
2020-04-01 15:05:13 -04:00
Jonathan Roelofs 1148f004fa Fix PR45371: SeparateConstOffsetFromGEP clean up bookkeeping
find() was altering the UserChain, even in cases where it subsequently
discovered that the resulting constant was a 0. This confuses
rebuildWithoutConstOffset() when it attempts to walk the chain later, since it
is expected that the chain itself be a path down the use-def edges of an
expression.
2020-04-01 12:38:15 -06:00
Nikita Popov 50a3e8738a Revert "[InstCombine] Erase old instruction when replacing extractelements"
This reverts commit d40368fdb5.

llvm-clang-x86_64-expensive-checks-debian failure looks related.
2020-04-01 20:10:11 +02:00
Nikita Popov 2a77544ad5 [SimplifyLibCalls] Erase replaced instructions
After RAUWing an instruction, also erase it. This makes sure we
don't perform extra InstCombine iterations to clean up the garbage.
2020-04-01 20:00:10 +02:00
Uday Bondhugula 6ee11c3b0f [NewGVN] Make NewGVN aware of aligned_alloc
Make the New GVN pass aware of aligned_alloc.

Depends on D76975.

Differential Revision: https://reviews.llvm.org/D76976
2020-04-01 23:26:51 +05:30
Uday Bondhugula 4cf70af94f [GVN] Make GVN aware of aligned_alloc
Make the GVN pass aware of aligned_alloc.

Depends on D76974.

Differential Revision: https://reviews.llvm.org/D76975
2020-04-01 23:26:50 +05:30
Uday Bondhugula c4499e3333 [Attributor] Make attributor aware of aligned_alloc for heap to stack conversion
Make the attributor pass aware of aligned_alloc for converting heap
allocations to stack ones.

Depends on D76971.

Differential Revision: https://reviews.llvm.org/D76974
2020-04-01 23:26:50 +05:30
Nikita Popov d40368fdb5 [InstCombine] Erase old instruction when replacing extractelements
As we are not returning the result of replaceInstUsesWith(),
so we need to clean up ourselves.

NFC apart from worklist order.
2020-04-01 19:55:28 +02:00
Nikita Popov 4b35c816ef [InstCombine] Use replaceOperand() in div transforms
To make sure the old operand is DCEd.

NFC apart from worklist order.
2020-04-01 19:55:00 +02:00
Benjamin Kramer 66b9f5f7f0 [GVNSink] Simplify code. NFC. 2020-04-01 13:13:00 +02:00
Cullen Rhodes 84aa6cf1a9 [Transforms][SROA] Promote allocas with mem2reg for scalable types
Summary:
Aggregate types containing scalable vectors aren't supported and as far
as I can tell this pass is mostly concerned with optimisations on
aggregate types, so the majority of this pass isn't very useful for
scalable vectors.

This patch modifies SROA such that mem2reg is run on allocas with
scalable types that are promotable, but nothing else such as slicing is
done.

The use of TypeSize in this pass has also been updated to be explicitly
fixed size. When invoking the following methods in DataLayout:

    * getTypeSizeInBits
    * getTypeStoreSize
    * getTypeStoreSizeInBits
    * getTypeAllocSize

we now called getFixedSize on the resultant TypeSize. This is quite an
extensive change with around 50 calls to these functions, and also the
first change of this kind (being explicit about fixed vs scalable
size) as far as I'm aware, so feedback welcome.

A test is included containing IR with scalable vectors that this pass is
able to optimise.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D76720
2020-04-01 10:34:11 +00:00
Eli Friedman ba4764c2cc Fix leak in GVNSink introduced in D72467. 2020-03-31 16:21:27 -07:00
Evgenii Stepanov f9471b0010 Fix MSan false positive due to select folding.
Summary:
Select folding in JumpThreading can create a conditional branch on a
code patch that did not have one in the original program. This is not a
valid transformation in sanitize_memory functions.

Note that JumpThreading does select folding in 3 different places. Two
of them seem safe - they apply to a select instruction in a BB that ends
with an unconditional branch to another BB, which (in turn) ends with a
conditional branch or a switch with the same condition.

Fixes PR45220.

Reviewers: glider, dvyukov, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76332
2020-03-31 15:25:42 -07:00
Anna Thomas 58a05675da Revert "[InlineFunction] Handle return attributes on call within inlined body"
This reverts commit 28518d9ae3.
There is a failure in MsgPackReader.cpp when built with clang. It
complains about "signext and zeroext" are incompatible. Investigating
offline if it is infact a UB in the MsgPackReader code.
2020-03-31 16:16:34 -04:00
Nikita Popov b7fe795e5b [InstCombine] Use replaceOperand() in some select transforms
To make sure the old operand is DCEd.

NFC apart from worklist order.
2020-03-31 22:10:55 +02:00
Eli Friedman 1ee6ec2bf3 Remove "mask" operand from shufflevector.
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.

This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.

I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors.  Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it.  Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.

Differential Revision: https://reviews.llvm.org/D72467
2020-03-31 13:08:59 -07:00
Nikita Popov c538c57d6d [InstCombine] Use replaceOperand() in descaling
To make sure the old operand gets DCEd.

NFC apart from worklist order.
2020-03-31 22:05:53 +02:00
Nikita Popov 19df7fa892 [InstCombine] Erase old alloca in cast of alloca transform
As we don't return the replaceInstUsesWith() result, we are
responsible for erasing the instruction.

NFC apart from worklist order.
2020-03-31 21:57:39 +02:00
Nikita Popov 87357808b8 [InstCombine] Use replaceOperand() in non zero phi transform
To make sure the old operand gets DCEd.

NFC apart from worklist order changes.
2020-03-31 21:54:21 +02:00
Nikita Popov f3d4166368 [InstCombine] Report change in non zero phi transform
We need to inform InstCombine (and transitively the pass manager)
that we changed an instruction.
2020-03-31 21:52:40 +02:00
Anna Thomas 28518d9ae3 [InlineFunction] Handle return attributes on call within inlined body
Consider a callee function that has a call (C) within it which feeds
into the return.  When we inline that callee into a callsite that has
return attributes, we can backward propagate those attributes to the
call (C) within that inlined callee body.

This is safe to do so only if we can guarantee transfer of execution to
successor in the window of instructions between return value (i.e. the
call C) and the return instruction.

See added test cases.

Reviewed-By: reames, jdoerfert

Differential Revision: https://reviews.llvm.org/D76140
2020-03-31 14:35:40 -04:00
Uday Bondhugula dc817b2dea [InstCombine] Deduce attributes for aligned_alloc in InstCombine
Make InstCombine aware of the aligned_alloc library function.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Depends on D76970.

Differential Revision: https://reviews.llvm.org/D76971
2020-03-31 23:17:28 +05:30
Florian Hahn b0cd7b2799 [SCCP] Limit use of range info for binops to integers for now.
This fixes a crash when building the test suite.
2020-03-31 17:08:09 +01:00
Tyker 4aeb7e1ef4 [AssumeBundles] Preserve information in EarlyCSE
Summary: this patch preserve information from various places in EarlyCSE into assume bundles.

Reviewers: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76769
2020-03-31 17:47:04 +02:00
Florian Hahn b37543750c [ValueLattice] Distinguish between constant ranges with/without undef.
This patch updates ValueLattice to distinguish between ranges that are
guaranteed to not include undef and ranges that may include undef.

A constant range guaranteed to not contain undef can be used to simplify
instructions to arbitrary values. A constant range that may contain
undef can only be used to simplify to a constant. If the value can be
undef, it might take a value outside the range. For example, consider
the snipped below

define i32 @f(i32 %a, i1 %c) {
  br i1 %c, label %true, label %false
true:
  %a.255 = and i32 %a, 255
  br label %exit
false:
  br label %exit
exit:
  %p = phi i32 [ %a.255, %true ], [ undef, %false ]
  %f.1 = icmp eq i32 %p, 300
  call void @use(i1 %f.1)
  %res = and i32 %p, 255
  ret i32 %res
}

In the exit block, %p would be a constant range [0, 256) including undef as
%p could be undef. We can use the range information to replace %f.1 with
false because we remove the compare, effectively forcing the use of the
constant to be != 300. We cannot replace %res with %p however, because
if %a would be undef %cond may be true but the  second use might not be
< 256.

Currently LazyValueInfo uses the new behavior just when simplifying AND
instructions and does not distinguish between constant ranges with and
without undef otherwise. I think we should address the remaining issues
in LVI incrementally.

Reviewers: efriedma, reames, aqjune, jdoerfert, sstefan1

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D76931
2020-03-31 12:50:20 +01:00
Daan Sprenkels 464b9aeafe [InstCombine] Transform extelt-trunc -> bitcast-extelt
Canonicalize the case when a scalar extracted from a vector is
truncated.  Transform such cases to bitcast-then-extractelement.
This will enable erasing the truncate operation.

This commit fixes PR45314.

reviewers: spatel

Differential revision: https://reviews.llvm.org/D76983
2020-03-31 11:53:41 +02:00
Sebastian Neubauer 5d3a69feca [AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.

This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.

Use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D65088
2020-03-31 10:35:39 +02:00
Florian Hahn 0c9c58ada0 [SCCP] Use constant ranges for casts.
For casts with constant range operands, we can use
ConstantRange::castOp.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71938
2020-03-31 09:22:04 +01:00
Wei Mi ebad678857 [SampleFDO] Port MD5 name table support to extbinary format.
Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it.

Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support.

Differential Revision: https://reviews.llvm.org/D76255
2020-03-30 22:07:08 -07:00
Sanjay Patel f2fbdf76d8 [InstCombine] do not exclude min/max from icmp with casted operand fold
InstCombine has a mess of logic that tries to preserve min/max patterns,
but AFAICT, this one is not necessary because we can always narrow the
corresponding select in this sequence to match the narrow compare.

The biggest danger for this patch is inducing infinite looping or
assert from exceeding max iterations. If any bots hit that in the
vicinity of this commit, this is the likely patch to blame.
2020-03-30 16:10:51 -04:00
Thomas Raoux 3ea0774b13 [ConstantFold][NFC] Compile time optimization for large vectors
Optimize the common case of splat vector constant. For large vector
going through all elements is expensive. For splatr/broadcast cases we
can skip going through all elements.

Differential Revision: https://reviews.llvm.org/D76664
2020-03-30 11:27:09 -07:00
Sameer Sahasrabuddhe 3cbbded68c Introduce unify-loop-exits pass.
For each natural loop with multiple exit blocks, this pass creates a
new block N such that all exiting blocks now branch to N, and then
control flow is redistributed to all the original exit blocks.

The bulk of the tranformation is a new function introduced in
BasicBlockUtils that an redirect control flow from a set of incoming
blocks to a set of outgoing blocks via a common "hub".

This is a useful workaround for a limitation in the structurizer which
incorrectly orders blocks when processing a nest of loops. This pass
bypasses that issue by ensuring that each natural loop is recognized
as a separate region. Since the structurizer is a region pass, it no
longer sees a nest of loops in a single region, and instead processes
each "level" in the nesting as a separate region.

The AMDGPU backend provides a new option to enable this pass before
the structurizer, which may eventually be enabled by default.

Reviewers: madhur13490, arsenm, nhaehnle

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D75865
2020-03-30 13:23:56 -04:00
Vedant Kumar dcc410b5cf [LoopVectorize] Fix crash on "getNoopOrZeroExtend cannot truncate!" (PR45259)
In InnerLoopVectorizer::getOrCreateTripCount, when the backedge taken
count is a SCEV add expression, its type is defined by the type of the
last operand of the add expression.

In the test case from PR45259, this last operand happens to be a
pointer, which (according to llvm::Type) does not have a primitive size
in bits. In this case, LoopVectorize fails to truncate the SCEV and
crashes as a result.

Uing ScalarEvolution::getTypeSizeInBits makes the truncation work as expected.

https://bugs.llvm.org/show_bug.cgi?id=45259

Differential Revision: https://reviews.llvm.org/D76669
2020-03-30 10:14:14 -07:00
Chris Jackson f6b2c003f3 [DebugInfo] Ensure that a demanded bits optimisation in
InstCombine does not result in an incorrect debuginfo variable
value

- Add an additional salvage and a test.

Reviewers: aprantl, djtodoro

Differential Revision: https://reviews.llvm.org/D76854

Bugzilla:  https://bugs.llvm.org/show_bug.cgi?id=44371
2020-03-30 15:39:22 +01:00
Chris Jackson 135709aa90 [DebugInfo] Ensure dead store elimination can mark an operand
value as undefined

    - Correct a debug info salvage and add a test

    Reviewers: aprantl, vsk

    Differential Revision: https://reviews.llvm.org/D76930
    Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45080
2020-03-30 14:58:14 +01:00
Florian Hahn 9e81249d76 [Matrix] Rename emitChainedMatrixMultiply to emitMatrixMultiply (NFC).
The Chained in the name potentially leads to confusion. Also updated the
comment to drop the unnecessary mention of tile-sized.
2020-03-30 11:17:25 +01:00
Jun Ma 31a1d85c53 [Coroutines 2/2] Improve symmetric control transfer feature
Differential Revision: https://reviews.llvm.org/D76913
2020-03-30 09:53:09 +08:00
Jun Ma a94fa2c049 [Coroutines 1/2] Improve symmetric control transfer feature
Differential Revision: https://reviews.llvm.org/D76911
2020-03-30 09:53:09 +08:00
Nikita Popov 8253a86b65 [InstCombine] Erase old mul when creating umulo
As we don't return the result of replaceInstUsesWith(), we are
responsible for erasing the instruction.

There is a small subtlety here in that we need to do this after
the other uses of Builder, which uses the original multiply as
the insertion point.

NFC apart from worklist order changes.
2020-03-29 20:46:08 +02:00
Nikita Popov 53d209076a [InstCombine] Use replaceOperand() in demanded elements simplification
To make sure that dead operands get DCEd. This fixes the largest
source of leftover dead operands we see in tests.

NFC apart from worklist changes.
2020-03-29 20:43:19 +02:00
Nikita Popov 0c87140065 [InstCombine] Use replaceOperand() in assoc cast simplification
To make sure the old operands are DCEd.

NFC apart from worklist order.
2020-03-29 20:28:37 +02:00
Nikita Popov a9ddcd6411 [InstCombine] Erase old add when optimizing add overflow
We don't return the replaceInstUsesWith() result, so we're
responsible for cleaning up.

NFC apart from worklist order changes.
2020-03-29 20:20:14 +02:00
Uday Bondhugula c0955edfd6 Introduce support for lib function aligned_alloc in TLI / memory builtins
Aligned_alloc is a standard lib function and has been in glibc since
2.16 and in the C11 standard. It has semantics similar to malloc/calloc
for several analyses/transforms. This patch introduces aligned_alloc
in target library info and memory builtins. Subsequent ones will
make other passes aware and fix https://bugs.llvm.org/show_bug.cgi?id=44062

This change will also be useful to LLVM generators that need to allocate
buffers of vector elements larger than 16 bytes (for eg. 256-bit ones),
element boundary alignment for which is not typically provided by glibc malloc.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Differential Revision: https://reviews.llvm.org/D76970
2020-03-29 23:36:24 +05:30
Sanjay Patel fc3cc8a4b0 [VectorCombine] skip debug intrinsics first for efficiency 2020-03-29 13:58:04 -04:00
Nikita Popov 26fa33755f [InstCombine] Simplify select of cmpxchg transform
Rather than converting to a dummy select with equal true and false
ops, just directly return the resulting value.

As a side-effect, this fixes missing DCE of the previously replaced
operand.
2020-03-29 18:57:32 +02:00
Nikita Popov 28f67bd5c5 [InstCombine] Fix worklist management in varargs transform
Add a replaceUse() helper to mirror replaceOperand() for the
rare cases where we're working directly on uses.

NFC apart from worklist order changes.
2020-03-29 18:04:12 +02:00
Nikita Popov 6f07a9e80a [InstCombine] Erase original add when creating saddo
Usually when we replaceInstUsesWith() we also return the original
instruction, and InstCombine will take care of erasing it. Here
we don't do that, so we need to manually erase it.

NFC apart from worklist order changes.
2020-03-29 18:01:32 +02:00
Nikita Popov 1e363023b8 [InstCombine] Use replaceOperand() in a few more places
To make sure the old operands get DCEd.

NFC apart from worklist order changes.
2020-03-29 18:01:00 +02:00
Florian Hahn 49d00824bb [VPlan] Use one VPWidenRecipe per original IR instruction. (NFC).
This patch changes VPWidenRecipe to only store a single original IR
instruction. This is the first required step towards modeling it's
operands as VPValues and also towards breaking it up into a
VPInstruction.

Discussed as part of D74695.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D76988
2020-03-29 13:47:28 +01:00
Richard Diamond 4bf015c035 [AlignmentFromAssumptions] Fix a SCEV assertion resulting from address space differences.
Summary:
On targets with different pointer sizes, -alignment-from-assumptions could attempt to create SCEV expressions which use different effective SCEV types. The provided test illustrates the issue.

In `getNewAlignment`, AASCEV would be the (only) alloca, which would have an effective SCEV type of i32. But PtrSCEV, the GEP in this case, due to being in the flat/default address space, will have an effective SCEV of i64.

This patch resolves the issue by truncating PtrSCEV to AASCEV's effective type.

Reviewers: hfinkel, jdoerfert

Reviewed By: jdoerfert

Subscribers: jvesely, nhaehnle, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75471
2020-03-29 01:26:31 -05:00
Nikita Popov 2215dcf1d7 [InstCombine] Remove unreachable blocks before DCE
Dropping unreachable code may reduce use counts on other instructions,
so it's better to do this earlier rather than later.

NFC-ish, may only impact worklist order.
2020-03-28 21:19:16 +01:00
Nikita Popov 97cc1275c7 [InstCombine] Merge two functions; NFC
Merge AddReachableCodeToWorklist() into prepareICWorklistFromFunction().
It's one logical step, and this makes it easier to move code.
2020-03-28 21:19:16 +01:00
Nikita Popov 30d712103f [InstCombine] Use replaceOperand() API in GEP transforms
To make sure that replaced operands get DCEd. This drops one
iteration from gepphigep.ll, which is still not optimal.

This was the last test case performing more than 3 iterations.

NFC-ish, only worklist order should change.
2020-03-28 19:07:25 +01:00
Nikita Popov b1f78baeaa [InstCombine] Reduce code duplication in GEP of PHI transform; NFC
The `NewGEP->setOperand(DI, NewPN)` call was duplicated, and the
insertion of NewGEP is the same in both if/else, so we can extract it.
2020-03-28 19:07:25 +01:00
Nikita Popov 672e8bfbfc [InstCombine] Fix worklist management in foldXorOfICmps()
Because this code does not use the IC-aware replaceInstUsesWith()
helper, we need to manually push users to the worklist.

This is NFC-ish, in that it may only change worklist order.
2020-03-28 18:25:21 +01:00
Enna1 03bc311a16 [CorrelatedValuePropagation] Remove redundant if statement in processSelect()
This statement

    if (ReplaceWith == S) ReplaceWith = UndefValue::get(S->getType());

is introduced in https://reviews.llvm.org/rG35609d97ae89b8e13f40f4e6b9b056954f8baa83
to fix a case where unreachable code can cause select instruction
simplification to fail. In https://reviews.llvm.org/rGd10480657527ffb44ea213460fb3676a6b1300aa,
we begin to perform a depth-first walk of basic blocks. This means
we will not visit unreachable blocks. So we do not need this the
special check any more.

Differential Revision: https://reviews.llvm.org/D76753
2020-03-28 18:01:17 +01:00
Florian Hahn 81f173ed0e [SCCP] Remove LatticeVal alias now that transition is done (NFC).
The LatticeVal alias was introduced to reduce the diff size for the
transition to ValueLatticeElement, which is done now.

This patch removes the unnecessary alias and updates some very verbose
type uses with auto.
2020-03-28 15:40:24 +00:00
Florian Hahn a44bf59c93 [SCCP] Remove unused toLatticeValue helper (NFC).
LatticeVal is an alias for ValueLatticeElement and the function is not
used any longer.
2020-03-28 15:40:24 +00:00
Uday Bondhugula 06066c4003 [NFC] Attributor comment updates / cast cleanup
Minor update/fixes to comments for the Attributor pass, and dyn_cast -> cast.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Differential Revision: https://reviews.llvm.org/D76972
2020-03-28 13:36:43 +05:30
Sanjay Patel 0f56bbc1a5 [InstCombine] reduce FP-casted and bitcasted signbit check
PR45305:
https://bugs.llvm.org/show_bug.cgi?id=45305

Alive2 proofs:
http://volta.cs.utah.edu:8080/z/bVyrko
http://volta.cs.utah.edu:8080/z/Vxpz9q
2020-03-27 17:33:59 -04:00
Sjoerd Meijer 401a324c51 [LV] Refactor widenIntOrFpInduction. NFC.
This untangles the logic in widenIntOrFpInduction in order to make more
explicit and visible how exactly the induction variable is lowered.

Differential Revision: https://reviews.llvm.org/D76686
2020-03-27 12:58:50 +00:00
Jonathan Roelofs 7a89a5d81b [InstCombine] Fix Incorrect fold of ashr+xor -> lshr w/ vectors
Fixes https://bugs.llvm.org/show_bug.cgi?id=43665
2020-03-26 12:09:36 -06:00
Fangrui Song 4c52d51e78 [InstCombine] Fix a code-sinking bug after D73832/f1a9efabcb9b
- UserParent = PN->getIncomingBlock(*I->use_begin());
+ UserParent = PN->getIncomingBlock(*SingleUse);

The first use of I may be droppable (llvm.assume).

When compiling llvm/lib/IR/AutoUpgrade.cpp with a bootstrapped clang
with ThinLTO with minimized bitcode files, I see such a case in
the function _ZN4llvm20UpgradeIntrinsicCallEPNS_8CallInstEPNS_8FunctionE

  clang -c -fthinlto-index=AutoUpgrade.o.thinlto.bc AutoUpgrade.bc -O3

Unfortunately it is really difficult to get a minimized reproduce.
2020-03-25 22:50:53 -07:00
John McCall 9514c048d8 Use optimal layout and preserve alloca alignment in coroutine frames.
Previously, we would ignore alloca alignment when building the frame
and just use the natural alignment of the allocated type.  If an alloca
is over-aligned for its IR type, this could lead to a frame entry with
inadequate alignment for the downstream uses of the alloca.

Since highly-aligned fields also tend to produce poor layouts under a
naive layout algorithm, I've also switched coroutine frames to use the
new optimal struct layout algorithm.

In order to communicate the frame size and alignment to later passes,
I needed to set align+dereferenceable attributes on the frame-pointer
parameter of the resume function.  This is clearly the right thing to
do, but the align attribute currently seems to result in assumptions
being added during inlining that the optimizer cannot easily remove.
2020-03-26 00:51:09 -04:00
Tyker f1a9efabcb Ignore/Drop droppable uses for code-sinking in InstCombine
Summary:
This patch allows code-sinking in InstCombine to be performed when instruction have uses in llvm.assume.

Use are considered droppable when it is preferable to modify the User such that the use disappears rather than to prevent a transformation because of the use.
for now uses are considered droppable if they are in an llvm.assume.

Reviewers: jdoerfert, nikic, spatel, lebedev.ri, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73832
2020-03-25 20:42:52 +01:00
Alina Sbirlea 3abcbf9903 [CFG/BasicBlock] Rename succ_const to const_succ. [NFC]
Summary:
Rename `succ_const_iterator` to `const_succ_iterator` and
`succ_const_range` to `const_succ_range` for consistency with the
predecessor iterators, and the corresponding iterators in
MachineBasicBlock.

Reviewers: nicholas, dblaikie, nlewycky

Subscribers: hiraditya, bmahjour, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75952
2020-03-25 12:40:55 -07:00
Gil Rapaport 078c863305 [LV] Replace stored value with a VPValue (NFCI)
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit introduces a VPValue for VPWidenMemoryInstructionRecipe to use as
the stored value. The recipe is generated with a VPValue wrapping the stored
value of the scalar store. This reduces ingredient def-use usage by ILV as a
step towards full VPlan-based def-use relations.

Differential Revision: https://reviews.llvm.org/D76373
2020-03-25 19:36:55 +02:00
Tyker d72c586aeb [NFC] Rename function to match Coding Convention and fix typo in KnowledgeRetention 2020-03-25 18:31:13 +01:00
Johannes Doerfert 5699d08b79 [Attributor] Use knowledge retained in llvm.assume (operand bundles)
This patch integrates operand bundle llvm.assumes [0] with the
Attributor. Most IRAttributes will now look at uses of the associated
value and if there are llvm.assume operand bundle uses with the right
tag we will check if they are in the must-be-executed-context (around
the context instruction). Droppable users, which is currently only
llvm::assume, are handled special in some places now as well.

[0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D74888
2020-03-24 15:33:40 -05:00
Juneyoung Lee 49f75132bc [DivRemPairs] Freeze operands if they can be undef values
Summary:
DivRemPairs is unsound with respect to undef values.

```
      // bb1:
      //   %rem = srem %x, %y
      // bb2:
      //   %div = sdiv %x, %y
      // -->
      // bb1:
      //   %div = sdiv %x, %y
      //   %mul = mul %div, %y
      //   %rem = sub %x, %mul
```

If X can be undef, X should be frozen first.
For example, let's assume that Y = 1 & X = undef:
```
   %div = sdiv undef, 1 // %div = undef
   %rem = srem undef, 1 // %rem = 0
 =>
   %div = sdiv undef, 1 // %div = undef
   %mul = mul %div, 1   // %mul = undef
   %rem = sub %x, %mul  // %rem = undef - undef = undef
```
http://volta.cs.utah.edu:8080/z/m7Xrx5

Same for Y. If X = 1 and Y = (undef | 1), %rem in src is either 1 or 0,
but %rem in tgt can be one of many integer values.

This resolves https://bugs.llvm.org/show_bug.cgi?id=42619 .

This miscompilation disappears if undef value is removed, but it may take a while.
DivRemPair happens pretty late during the optimization pipeline, so this optimization seemed as a good candidate to fix without major regression using freeze than other broken optimizations.

Reviewers: spatel, lebedev.ri, george.burgess.iv

Reviewed By: spatel

Subscribers: wuzish, regehr, nlopes, nemanjai, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76483
2020-03-25 03:46:14 +09:00
Jun Ma a44de12ab2 [Coroutines] Also check lifetime intrinsic for local variable when build
coroutine frame

Currently we move all allocas into the frame when build coroutine frame in
CoroSplit pass. However, this can be relaxed.

Since CoroSplit pass run after Inline pass, we can use lifetime intrinsic to
do such analysis: If the scope of lifetime intrinsic is not across any suspend
point, rather than move the allocas to frame, we can just move them to entry bb
of corresponding function. This reduce the frame size.

More importantly, this also avoid data race in multithread environment.
Consider one inline function by coroutine: it starts a thread which access
local variables, while after inline the movement of allocs to frame also access
them. cause data race.

Differential Revision: https://reviews.llvm.org/D75664
2020-03-24 13:41:55 +08:00
Vedant Kumar b7cd291c15 [GlobalOpt] Treat null-check of loaded value as use of global (PR35760)
PR35760 shows an example program which, when compiled with `clang -O0`
or gcc at any optimization level, prints '0'. However, llvm transforms
the program in a way that causes it to print '1'.

Fix the issue by having `AllUsesOfValueWillTrapIfNull` return false when
analyzing a load from a global which is used by an `icmp`. This special
case was untested [0] so this is just deleting dead code.

An alternative fix might be to change the GlobalStatus analysis for the
global to report "Stored" instead of "StoredOnce". However, "StoredOnce"
is appropriate when only one value other than the initializer is stored
to the global.

[0]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/Transforms/IPO/GlobalOpt.cpp.html#L662

Differential Revision: https://reviews.llvm.org/D76645
2020-03-23 22:36:09 -07:00
Johannes Doerfert f09f4b2676 [OpenMPOpt] Initialize value to avoid use of uninitialized memory
This should fix the issue reported here:
  https://reviews.llvm.org/D76058#1937554
2020-03-23 19:17:19 -05:00
Matt Arsenault b20a1d840f GVNSink: Allow handling addrspacecast 2020-03-23 16:50:58 -04:00
Stefanos Baziotis a650d555fc [Attributor][NFC] Refactorings and typos in doc
Reviewed By: sstefan1, uenoku

Differential Revision: https://reviews.llvm.org/D76175
2020-03-23 22:44:10 +02:00
Matt Arsenault 43d98a0ecf Allow replacing intrinsic operands with variables
Since intrinsics can now specify when an argument is required to be
constant, it is now OK to replace arguments with variables if they
aren't. This means intrinsics must now be accurately marked with
immarg.
2020-03-23 15:51:57 -04:00
Sanjay Patel a1fe6beb1e [InstCombine] remove one-use check for ctpop -> cttz
Two one-use checks were added with rGfdcb27105537,
but only the first one is necessary to limit an
increase in instruction count. The second transform
only creates one instruction, so it is always a
reasonable canonicalization/optimization.
2020-03-23 13:59:57 -04:00
Johannes Doerfert 9d38f98dc3 [OpenMPOpt] Validate declaration types against the expected types
Validation of the found runtime library functions declarations types
(return and argument types) with the expected types.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76058
2020-03-23 11:43:36 -05:00
Benjamin Kramer ff2f5097ed [Attributor] Fold single-use variable into assert
Fixes unused variable warning in Release builds.
2020-03-23 17:41:52 +01:00
Johannes Doerfert c57689bef2 [Attributor][NFC] Copy llvm::function_ref, don't use references
On IRC this was called a "code smell" so we get rid of it.
2020-03-23 10:45:24 -05:00
Johannes Doerfert 68fed27067 [Attributor] Handle calls in AAValueConstantRange properly
We did handle calls that were operands of certain instructions but not
standalone calls we visit via indirection, e.g., selects.
2020-03-23 10:45:24 -05:00
Johannes Doerfert 54ec9b54f6 [Attributor] Unify handling of must-tail calls
We special cased must-tail calls all over the place because they cannot
be modified as other calls can be. However, we already centralized the
modification API so we can centralize the handling as well. This
simplifies the code and allows to remove must-tail calls completely.
2020-03-23 10:45:24 -05:00
Johannes Doerfert 0995001ce5 [Attributor][NFC] Predetermine the module before verification
It could happen that we delete the first function in the SCC in the
future so we should be careful accessing `Functions` after the manifest
stage.
2020-03-23 10:45:23 -05:00
Johannes Doerfert f3bf4b05c2 [Attributor][NFC] clang-format Attributor.{h,cpp} 2020-03-23 10:45:23 -05:00
Simon Pilgrim fdcb271055 [InstCombine] Limit CTPOP -> CTTZ simplifications to one use
Tweak D76568 so we only combine if it will remove the bit-twiddling.

Suggested by @spatel
2020-03-23 14:33:41 +00:00
Guillaume Chatelet 32851f8d63 [Alignment][NFC] Deprecate VectorUtils::getAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, rogfer01, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76542
2020-03-23 13:54:15 +01:00
Simon Pilgrim 72d1419bfb [InstCombine] Add CTPOP -> CTTZ simplifications (PR43513)
As detailed on PR43513, we can simplify:

ctpop(x | -x) -> bitwidth - cttz(x, false)
Alive2: http://volta.cs.utah.edu:8080/z/caw49X

ctpop(~x & (x - 1)) -> cttz(x, false)
Alive2: http://volta.cs.utah.edu:8080/z/5zfVrx

I've tweaked the initial test cases I added at rG2d712fb75584 to increase commutativity testing.

Differential Revision: https://reviews.llvm.org/D76568
2020-03-23 11:04:33 +00:00
Nikita Popov dc81923659 [InstCombine] Remove ExpensiveCombines option
D75801 removed the last and only user of this option, so we can
drop it now. The original idea behind this was to only run expensive
transforms under -O3, but apart from the one known bits transform,
this has never really taken off. I believe nowadays the recommendation
is to put expensive transforms in AggressiveInstCombine instead,
though that isn't terribly popular either :)

Differential Revision: https://reviews.llvm.org/D76540
2020-03-22 16:56:28 +01:00
Matt Arsenault 830cfda19f Utils: Mostly convert memcpy expansion to use Align
The TTI hooks aren't converted. I also think the intrinsics should
have mandatory alignment and never return MaybeAlign.
2020-03-22 11:21:44 -04:00
Nikita Popov a63eaa5449 [SLP] Avoid repeated visitation in getVectorElementSize(); NFC
We need to insert into the Visited set at the same time we insert
into the worklist. Otherwise we may end up pushing the same
instruction to the worklist multiple times, and only adding it to
the visited set later.
2020-03-22 14:34:29 +01:00
Simon Pilgrim f00a4b531a [InstCombine][X86] simplifyX86immShift - remove ConstantAggregateZero handling. NFC.
The llvm::computeKnownBits path now handles this.
2020-03-21 11:30:44 +00:00
Nikita Popov 2b52e4e629 [InstCombine] Remove known bits constant folding
If ExpensiveCombines is enabled (which is the case with -O3 on the
legacy PM and always on the new PM), InstCombine tries to compute
the known bits of all instructions in the hope that all bits end up
being known, which is fairly expensive.

How effective is it? If we add some statistics on how often the
constant folding succeeds and how many KnownBits calculations are
performed and run test-suite we get:

    "instcombine.NumConstPropKnownBits": 642,
    "instcombine.NumConstPropKnownBitsComputed": 18744965,

In other words, we get one fold for every 30000 KnownBits calculations.
However, the truth is actually much worse: Currently, known bits are
computed before performing other folds, so there is a high chance
that cases that get folded by known bits would also have been
handled by other folds.

What happens if we compute known bits after all other folds
(hacky implementation: https://gist.github.com/nikic/751f25b3b9d9e0860db5dde934f70f46)?

    "instcombine.NumConstPropKnownBits": 0,
    "instcombine.NumConstPropKnownBitsComputed": 18105547,

So it turns out despite doing 18 million known bits calculations,
the known bits fold does not do anything useful on test-suite.
I was originally planning to move this into AggressiveInstCombine
so it only runs once in the pipeline, but seeing this, I think
we're better off removing it entirely.

As this is the only use of the "expensive combines" mechanism,
it may be removed afterwards, but I'll leave that to a separate patch.

Differential Revision: https://reviews.llvm.org/D75801
2020-03-20 20:54:06 +01:00
Nikita Popov 3205d1a860 [InstCombine] Handle known shl nsw sign bit in SimplifyDemanded
Ideally SimplifyDemanded should compute the same known bits as
computeKnownBits(). This patch addresses one discrepancy, where
ValueTracking is more powerful: If we have a shl nsw shift, we
know that the sign bit of the input and output must be the same.
If this results in a conflict, the result is poison.

This is implemented in
2c4ca6832f/lib/Analysis/ValueTracking.cpp (L1175-L1179)
and
2c4ca6832f/lib/Analysis/ValueTracking.cpp (L904-L908).

This implements the same basic logic in SimplifyDemanded. It's
slightly stronger, because I return undef instead of zero for the
poison case (which is not an option inside ValueTracking).

As mentioned in https://reviews.llvm.org/D75801#inline-698484,
we could detect poison in more cases, this just establishes parity
with the existing logic.

Differential Revision: https://reviews.llvm.org/D76489
2020-03-20 18:16:05 +01:00
Simon Pilgrim 34659de5fd [InstCombine][X86] simplifyX86immShift - convert variable in-range vector shift by scalar amounts to generic shifts (PR40391)
The sll/srl/sra scalar vector shifts can be replaced with generic shifts if the shift amount is known to be in range.

This also required public DemandedElts variants of llvm::computeKnownBits to be exposed (PR36319).
2020-03-20 15:48:06 +00:00
Nikita Popov 0372768776 [InstCombine] Simplify calls with "returned" attribute
If a call argument has the "returned" attribute, we can simplify
the call to the value of that argument. This was already partially
handled by InstSimplify/InstCombine for the case where the argument
is an integer constant, and the result is thus known via known bits.
The non-constant (or non-int) argument cases weren't handled though.

This previously landed as an InstSimplify transform, but was reverted
due to assertion failures when compiling the Linux kernel. The reason
is that simplifying a call to another call breaks assumptions in
call graph updating during inlining. As the code is not easy to fix,
and there is no particularly strong motivation for having this in
InstSimplify, the transform is only performed in InstCombine instead.

Differential Revision: https://reviews.llvm.org/D75815
2020-03-20 10:23:39 +01:00
Nikita Popov 5c10967157 [InstCombine] Don't replace musttail result based on known bits
This is the same change as D75824, but for two cases where
InstCombine performs the same optimization: Replacing an instruction
whose bits are fully known with a constant. This is not (generally)
legal for musttail calls.

Differential Revision: https://reviews.llvm.org/D76457
2020-03-20 10:17:09 +01:00
Florian Hahn be86bc76f0 [Matrix] Generalize ColumnMatrixTy to MatrixTy (NFC).
This patch sets the stage for supporting both row and column major
layouts for matrixes. It renames ColumnMatrixTy to MatrixTy, adds
booleans indicating the underlying layout to both MatrixTy and ShapeInfo
and generalizes the methods of MatrixTy to support both row and column
major layouts.

Reviewers: Gerolf, anemet, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D76324
2020-03-20 08:32:13 +00:00
Florian Hahn 3a8372ed02 [DSE] Support traversing MemoryPhis.
For MemoryPhis, we have to avoid that the MemoryPhi may be executed
before before the access we are currently looking at.

To do this we do a post-order numbering of the basic blocks in the
function and bail out once we reach a MemoryPhi with a larger (or equal)
post-order block number than the current MemoryAccess.
This changes the order in which we visit stores for elimination.

This patch also adds support for exploring multiple paths. We keep a worklist (ToCheck) of memory accesses that might be eliminated by our starting MemoryDef or MemoryPhis for further exploration.  For MemoryPhis, we add the incoming values to the worklist, for MemoryDefs we add the defining access.

Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D72148
2020-03-20 07:51:42 +00:00
Jun Ma 032251e34d [Coroutines] Fix PR45130
For now, when final suspend can be simplified by simplifySuspendPoint,
handleFinalSuspend is executed as well to remove last case in switch
instruction. This patch fixes it.

Differential Revision: https://reviews.llvm.org/D76345
2020-03-20 11:27:08 +08:00
Benjamin Kramer 1db8b341a6 [Matrix] Fold single-use variable into assert
Avoids -Wunused-variable warnings in Release builds.
2020-03-19 21:42:22 +01:00
Florian Hahn 796fb2e474 [Matrix] Move multiply-add code generation into separate function (NFC).
This logic can be shared with the tiled code generation.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D75565
2020-03-19 20:26:19 +00:00
Kazu Hirata e23d786526 [JumpThreading] Fix infinite loop (PR44611)
Summary:
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44611 by
preventing an infinite loop in the jump threading pass when
-jump-threading-across-loop-headers is on.  Specifically, without this
patch, jump threading through two basic blocks would trigger on the
same area of the CFG over and over, resulting in an infinite loop.

Consider testcase PR44611-across-header-hang.ll in this patch.  The
first opportunity to thread through two basic blocks is:

  from bb_body2 through bb_header and bb_body1 to bb_body2.

The pass duplicates bb_header and bb_body1 as, say, bb_header.thread1
and bb_body1.thread1.  Since bb_header contains a successor edge back
to itself, bb_header.thread1 also contains a successor edge to
bb_header, immediately giving rise to the next jump threading
opportunity:

  from bb_header.thread1 through bb_header and bb_body1 to bb_body2.

After that, we repeatedly thread an incoming edge into bb_header
through bb_header and bb_body1 to bb_body2.  In other words, we keep
peeling one iteration from bb_header's self loop.

The patch fixes the problem by preventing the pass from duplicating a
basic block containing a self loop.

Reviewers: wmi, junparser, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76390
2020-03-19 12:49:36 -07:00
Florian Hahn 0cc2d23751 [Matrix] Hoist load/store generation logic, add helpers for tiled access.
This patch slightly generalizes the code to emit loads and stores of a
matrix and adds helpers to load/store a tile of a larger matrix.

This will be used in a follow-up patch introducing initial tiling.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D75564
2020-03-19 19:28:21 +00:00
Simon Pilgrim a11e5b32df [InstCombine][X86] simplifyX86immShift - handle variable out-of-range vector shift by immediate amounts (PR40391)
If we know the SSE shift amount is out of range then we can simplify to zero value (logical) or a 'signsplat' bitwidth-1 shift (arithmetic). This allows us to remove the equivalent ConstantInt constant folding path from simplifyX86immShift.
2020-03-19 18:27:31 +00:00
Simon Pilgrim 433897da4a [InstCombine][X86] simplifyX86immShift - convert variable in-range vector shift by immediate amounts to generic shifts (PR40391)
The slli/srli/srai 'immediate' vector shifts (although its not immediate anymore to match gcc) can be replaced with generic shifts if the shift amount is known to be in range.
2020-03-19 15:44:24 +00:00
Florian Hahn 4a58996dd2 [SCCP] Use constant ranges for PHI nodes.
For PHIs with multiple incoming values, we can improve precision by
using constant ranges for integers. We can over-approximate phis
by merging the incoming values.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71933
2020-03-19 12:45:33 +00:00
Florian Hahn 8a36594a7e [SCCP] Use constant ranges for binary operators.
If one of the operands of a binary operator is a constant range, we can
use ConstantRange::binaryOp to approximate the result.

We still handle single element constant ranges as we did previously,
with ConstantExpr::get(), because ConstantRange::binaryOp still gives
worse results in a few cases for single element ranges.

Also note that we bail out early if any of the operands is still unknown.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71936
2020-03-19 09:35:48 +00:00
Huihui Zhang 2ea5495759 [InstCombine][SVE] Fix InstCombiner::visitAllocaInst for scalable vector.
Summary:
DataLayout::getTypeAllocSize() return TypeSize. For cases where scalable
property doesn't matter (check for zero-sized alloca), we should explicitly
call getKnownMinSize() to avoid implicit type conversion to uint64_t, which is
invalid for scalable vector type.

Reviewers: sdesmalen, efriedma, spatel, apazos

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76386
2020-03-18 20:57:14 -07:00
Florian Hahn fd2c15e602 [VPlan] Do not print mapping for Value2VPValue.
The latest improvements to VPValue printing make this mapping clear when
printing the operand. Printing the mapping separately is not required
any longer.

Reviewers: rengolin, hsaito, Ayal, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D76375
2020-03-18 21:44:07 +00:00
Florian Hahn 00c1cd1934 [VPlan] Record underlying value for VPValues created by addVPValue (NFC).
Now that printing VPValues uses the underlying IR value name, if
available, recording the underlying value here improves printing.

Reviewers: rengolin, hsaito, Ayal, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D76374
2020-03-18 21:30:58 +00:00
Eli Friedman e24e95fe90 Remove CompositeType class.
The existence of the class is more confusing than helpful, I think; the
commonality is mostly just "GEP is legal", which can be queried using
APIs on GetElementPtrInst.

Differential Revision: https://reviews.llvm.org/D75660
2020-03-18 13:53:17 -07:00
Florian Hahn e6a74803d4 [VPlan] Use underlying value for printing, if available.
When the an underlying value is available, we can use its name for
printing, as discussed in D73078.

Reviewers: rengolin, hsaito, Ayal, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D76200
2020-03-18 17:46:57 +00:00
Simon Pilgrim f4e495a18e [InstCombine][X86] simplifyX86varShift - convert variable in-range per-element shift amounts to generic shifts (PR40391)
AVX2/AVX512 per-element shifts can be replaced with generic shifts if the shift amounts are guaranteed to be in-range (upper bits are known zero).
2020-03-18 11:26:54 +00:00
Florian Hahn 5672ae8d86 [SCCP] Use constant ranges for select, if cond is overdefined.
For selects with an unknown condition, we can approximate the result by
merging the state of both options. This automatically takes care of
the case where on operand is undef.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71935
2020-03-18 09:26:02 +00:00
Michael Liao f2f8bdc2b1 Fix `-Wunused-variable` warning. NFC. 2020-03-17 20:15:50 -04:00
Florian Hahn a72ae99cf9 [SCCP] Split up callsite handling, only propagate result on change (NFC)
Functions include their arguments in the use-list. Changed function
values mean that the result of the function changed. We only need
to update the call sites with the new function result and do not
have to propagate the call arguments.

To do so, this patch splits up the visitCallSite into handleCallResult
and handleCallArguments and updates markUsersAsChanged to only update
call results for functions.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75846
2020-03-17 20:05:35 +00:00
Sanjay Patel be9e3d9416 [InstCombine] reduce demand-limited bool math to logic, part 2
Follow-on suggested in:
D75961
2020-03-17 15:18:18 -04:00
Tyker e8ac825f5b [AssumeBundles] Detection of Empty bundles
Summary: Prevent InstCombine from removing llvm.assume for which the arguement is true when they have operand bundles with usefull information.

Reviewers: jdoerfert, nikic, lebedev.ri

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76147
2020-03-17 15:50:15 +01:00
Florian Hahn 1d6f919df2 [SCCP] Explicitly mark values as overdefined (NFC).
This was part of D60582 but can be committed separately.
2020-03-17 12:13:30 +00:00
Roman Lebedev 398b497cd0
[NFC] LoopRotate: do issue debug message when not rotating due to instr count
It is somewhat problematic to notice this issue otherwise.
2020-03-17 09:26:09 +03:00
Serguei Katkov 80c351cdb6 [InstCombine] Transform to undef incorrect atomic unordered mem intrinsics
According to LangRef:
If len is not a positive integer multiple of element_size, then the behaviour of the intrinsic is undefined.

Add InstCombine rule to transform intrinsic to undef operation.

This is a follow-up for D76116.

Reviewers: reames
Reviewed By: reames
Subscribers: hiraditya, jfb, dantrushin, llvm-commits
Differential Revision: https://reviews.llvm.org/D76215
2020-03-17 10:20:16 +07:00
Matt Arsenault b0bdb186f5 Utils: Always set alignment when expanding mem intrinsics
This was creating natural aligned loads and stores, which may not be
the case. The target could request a wider type load with less
alignment.
2020-03-16 14:34:29 -04:00
Matt Arsenault 05e7d8d6ce TTI: Add addrspace parameters to memcpy lowering functions 2020-03-16 14:34:29 -04:00
Florian Hahn 4878aa36d4 [ValueLattice] Add new state for undef constants.
This patch adds a new undef lattice state, which is used to represent
UndefValue constants or instructions producing undef.

The main difference to the unknown state is that merging undef values
with constants (or single element constant ranges) produces  the
constant/constant range, assuming all uses of the merge result will be
replaced by the found constant.

Contrary, merging non-single element ranges with undef needs to go to
overdefined. Using unknown for UndefValues currently causes mis-compiles
in CVP/LVI (PR44949) and will become problematic once we use
ValueLatticeElement for SCCP.

Reviewers: efriedma, reames, davide, nikic

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75120
2020-03-14 17:19:59 +00:00
Whitney Tsang aca7167535 [NFC][LoopUnrollAndJam] clang-format.
I am currently working on this file.
2020-03-14 00:04:10 +00:00
Akira Hatanaka c6f1713c46 [ObjC][ARC] Don't remove autoreleaseRV/retainRV pairs if the call isn't
a tail call

This reapplies the patch in https://reviews.llvm.org/rG1f5b471b8bf4,
which was reverted because it was causing crashes.

https://bugs.chromium.org/p/chromium/issues/detail?id=1061289#c2

Check that HasSafePathToCall is true before checking the call is a tail
call.

Original commit message:

Previosly ARC optimizer removed the autoreleaseRV/retainRV pair in the
following code, which caused the object returned by @something to be
placed in the autorelease pool because the call to @something isn't a
tail call:

```
  %call = call i8* @something(...)
  %2 = call i8* @objc_retainAutoreleasedReturnValue(i8* %call)
  %3 = call i8* @objc_autoreleaseReturnValue(i8* %2)
  ret i8* %3
```

Fix the bug by checking whether @something is a tail call.

rdar://problem/59275894
2020-03-13 13:52:14 -07:00
Alexey Zhikhartsev f71abec661 [LoopInterchange] Fix interchanging contents of preheader BBs
Summary:
Previously LCSSA was getting broken by placing instructions into the
(newly) inner *header* instead of the *pre*header.

Fixes PR43474

Reviewers: fhahn

Reviewed By: fhahn

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75943
2020-03-13 15:59:37 -04:00
Reid Kleckner 478b06e687 Revert "[ObjC][ARC] Check the basic block size before calling DominatorTree::dominate"
This reverts commit 5c3117b0a9

This should not be necessary after
7593a480db, and Florian Hahn has confirmed
that the problem no longer reproduces with this patch.

I happened to notice this code because the FIXME talks about
OrderedBasicBlock.

Reviewed By: fhahn, dexonsmith

Differential Revision: https://reviews.llvm.org/D76075
2020-03-13 11:57:55 -07:00
Huihui Zhang fc1f205745 [SLPVectorizer][SVE] Bail out early for scalable vector.
Summary:
SLPVectorizer try to vectorize list of scalar instructions of the same type,
instructions already vectorized are rejected through isValidElementType().

Without this patch, tryToVectorizeList() will first try to determine vectorization
factor of a list of Instructions before checking whether each instruction has unsupported
type or not. For instructions already vectorized for SVE, it will crash at getVectorElementSize(),
where it try to return a fixed size.

This patch make sure invalid element types are rejected before trying to get vectorization
factor. This make sure we are not trying to vectorize instructions already vectorized.

Reviewers: sdesmalen, efriedma, spatel, RKSimon, ABataev, apazos, rengolin

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76017
2020-03-13 11:23:31 -07:00
Sanjay Patel 94f5d73182 [SimplifyCFG] fix formatting; NFC 2020-03-13 14:12:28 -04:00
Sanjay Patel 51e53af11c [SimplifyCFG] fix debug print formatting; NFC 2020-03-13 14:12:28 -04:00
Florian Hahn 0c5b6e2ea5 Recommit "[SCCP] Use ValueLatticeElement instead of LatticeVal (NFCI)"
This patch should fix the cause of the stage2 failures and
PR45185.

This reverts the revert commit c52f839e72.
2020-03-13 17:03:22 +00:00
Tyker 69375fd0a3 [AssumeBundles] Preserve Information in the inliner
Summary:
during inling Create and insert an llvm.assume with attributes to preserve them.
to prevent any changes for now generation of llvm.assume is under a flag disabled by default.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75825
2020-03-13 17:35:47 +01:00
omarahmed1111 b285b333dc [Attributor] Detect possibly unbounded cycles in functions
This patch add mayContainUnboundedCycle helper function which checks whether a function has any cycle which we don't know if it is bounded or not.
Loops with maximum trip count are considered bounded, any other cycle not.
It also contains some fixed tests and some added tests contain bounded and
unbounded loops and non-loop cycles.

Reviewed By: jdoerfert, uenoku, baziotis

Differential Revision: https://reviews.llvm.org/D74691
2020-03-13 11:17:33 -05:00
Pankaj Gode bf990530ae [Attributor] Improve noalias preservation using reachability
Resolution for below fixme:
(ii) Check whether the value is captured in the scope using AANoCapture.
FIXME: This is conservative though, it is better to look at CFG and
             check only uses possibly executed before this callsite.

Propagates caller argument's noalias attribute to callee.

Reviewed by: jdoerfert, uenoku

Reviewers: jdoerfert, sstefan1, uenoku

Subscribers: uenoku, sstefan1, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D71617
2020-03-13 21:09:08 +05:30
Sanjay Patel cbeffa3f6c [SimplifyCFG] convert if-else chain to switch; NFC
Fix formatting of related function names while changing the code.
2020-03-13 10:28:41 -04:00
Nico Weber 86eb2c3991 Revert "[ObjC][ARC] Don't remove autoreleaseRV/retainRV pairs if the call isn't"
This reverts commit 1f5b471b8b.
Causes asserts when building code with arc. See
https://bugs.chromium.org/p/chromium/issues/detail?id=1061289#c2
for a full repro. Will post a creduced repro once creduce is done
running.
2020-03-13 10:16:02 -04:00
Johannes Doerfert a198adb490 [Attributor] IPO across definition boundary of a function marked alwaysinline
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D75590
2020-03-13 01:06:12 -05:00
rathod-sahaab 263c4a3c75 Fix compiler warning when compiling without asserts
This patch aims to prevent warning-as-error failures in release build.
As suggested in this comment
https://reviews.llvm.org/D69930#1910922

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D75970
2020-03-13 00:26:49 -05:00
Huihui Zhang 118abf2017 [SVE] Update API ConstantVector::getSplat() to use ElementCount.
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.

This change is needed for D73753.

Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74386
2020-03-12 13:22:41 -07:00
Florian Hahn c52f839e72 Revert "[SCCP] Use ValueLatticeElement instead of LatticeVal (NFCI)"
This commit is likely causing clang-with-lto-ubuntu to fail
http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/16052

Also causes PR45185.

This reverts commit f1ac5d2263.
2020-03-12 18:49:11 +00:00
Hideto Ueno d9bf79f4e9 [Attributor][FIX] Add a missing dependence track in noalias deduction 2020-03-12 15:27:35 +00:00
Florian Hahn f1ac5d2263 [SCCP] Use ValueLatticeElement instead of LatticeVal (NFCI)
This patch switches SCCP to use ValueLatticeElement for lattice values,
instead of the local LatticeVal, as first step to enable integer range support.

This patch does not make use of constant ranges for additional operations
and the only difference for now is that integer constants are represented by
single element ranges. To preserve the existing behavior, the following helpers
are used

* isConstant(LV): returns true when LV is either a constant or a constant range with a single element. This should return true in the same cases where LV.isConstant() returned true previously.
* getConstant(LV): returns a constant if LV is either a constant or a constant range with a single element. This should return a constant in the same cases as LV.getConstant() previously.
* getConstantInt(LV): same as getConstant, but additionally casted to ConstantInt.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D60582
2020-03-12 12:03:06 +00:00
Max Kazantsev 3dc6e53c97 [LoopPeel] Turn incorrect assert into a check
Summary:
This patch replaces incorrectt assert with a check. Previously it asserts that
if SCEV cannot prove `isKnownPredicate(A != B)`, then it should be able to prove
`isKnownPredicate(A == B)`.

Both these fact may be not provable. It is shown in the provided test:

Could not prove: `{-294,+,-2}<%bb1> !=  0`
Asserting: `{-294,+,-2}<%bb1> == 0`

Obviously, this SCEV is not equal to zero, but 0 is in its range so we cannot
also prove that it is not zero.

Instead of assert, we should be checking the required conditions explicitly.

Reviewers: lebedev.ri, fhahn, sanjoy, fedor.sergeev
Reviewed By: lebedev.ri
Subscribers: hiraditya, zzheng, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76050
2020-03-12 17:23:07 +07:00
Sanjay Patel fae900921b [InstCombine] reduce demand-limited bool math to logic
The cmp math test is inspired by memcmp() patterns seen in D75840.
I know there's at least 1 related fold we can do here if both
values are sext'd, but I'm not seeing a way to generalize further.

We have some other bool math patterns that we want to reduce, but
that might require fixing the bogus transforms noted in D72396.

Alive proof translations of the regression tests:
https://rise4fun.com/Alive/zGWi

  Name: demand add 1
  %xz = zext i1 %x to i32
  %ys = sext i1 %y to i32
  %sub = add i32 %xz, %ys
  %r = lshr i32 %sub, 31
  =>
  %notx = xor i1 %x, 1
  %and = and i1 %y, %notx
  %r = zext i1 %and to i32

  Name: demand add 2
  %xz = zext i1 %x to i5
  %ys = sext i1 %y to i5
  %sub = add i5 %xz, %ys
  %r = and i5 %sub, 16
  =>
  %notx = xor i1 %x, 1
  %and = and i1 %y, %notx
  %r = select i1 %and, i5 -16, i5 0

  Name: demand add 3
  %xz = zext i1 %x to i8
  %ys = sext i1 %y to i8
  %a = add i8 %ys, %xz
  %r = ashr i8 %a, 7
  =>
  %notx = xor i1 %x, 1
  %and = and i1 %y, %notx
  %r = sext i1 %and to i8

  Name: cmp math
  %gt = icmp ugt i32 %x, %y
  %lt = icmp ult i32 %x, %y
  %xz = zext i1 %gt to i32
  %yz = zext i1 %lt to i32
  %s = sub i32 %xz, %yz
  %r = lshr i32 %s, 31
  =>
  %r = zext i1 %lt to i32

Differential Revision: https://reviews.llvm.org/D75961
2020-03-11 15:45:58 -04:00
Florian Hahn bc6c8c4bbb [Matrix] Add remark propagation along the inlined-at chain.
This patch adds support for propagating matrix expressions along the
inlined-at chain and emitting remarks at the traversed function scopes.

To motivate this new behavior, consider the example below. Without the
remark 'up-leveling', we would only get remarks in load.h and store.h,
but we cannot generate a remark describing the full expression in
toplevel.cpp, which is the place where the user has the best chance of
spotting/fixing potential problems.

With this patch, we generate a remark for the load in load.h, one for
the store in store.h and one for the complete expression in
toplevel.cpp. For a bigger example, please see remarks-inlining.ll.

    load.h:
    template <typename Ty, unsigned R, unsigned C> Matrix<Ty, R, C> load(Ty *Ptr) {
      Matrix<Ty, R, C> Result;
      Result.value = *reinterpret_cast <typename Matrix<Ty, R, C>::matrix_t *>(Ptr);
      return Result;
    }

    store.h:
    template <typename Ty, unsigned R, unsigned C> void store(Matrix<Ty, R, C> M1, Ty *Ptr) {
       *reinterpret_cast<typename decltype(M1)::matrix_t *>(Ptr) = M1.value;
    }

    toplevel.cpp
    void test(double *A, double *B, double *C) {
      store(add(load<double, 3, 5>(A), load<double, 3, 5>(B)), C);
    }

For a given function, we traverse the inlined-at chain for each
matrix instruction (= instructions with shape information). We collect
the matrix instructions in each DISubprogram we visit. This produces a
mapping of DISubprogram -> (List of matrix instructions visible in the
subpogram). We then generate remarks using the list of instructions for
each subprogram in the inlined-at chain. Note that the list of instructions
for a subprogram includes the instructions from its own subprograms
recursively. For example using the example above, for the subprogram
'test' this includes inline functions 'load' and 'store'. This allows
surfacing the remarks at a level useful to users.

Please note that the current approach may create a lot of extra remarks.
Additional heuristics to cut-off the traversal can be implemented in the
future. For example, it might make sense to stop 'up-leveling' once all
matrix instructions are at the same debug location.

Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D73600
2020-03-11 17:40:08 +00:00
Anna Welker a6d3bec83f [TTI][ARM][MVE] Refine gather/scatter cost model
Refines the gather/scatter cost model, but also changes the TTI
function getIntrinsicInstrCost to accept an additional parameter
which is needed for the gather/scatter cost evaluation.
This did require trivial changes in some non-ARM backends to
adopt the new parameter.
Extending gathers and truncating scatters are now priced cheaper.

Differential Revision: https://reviews.llvm.org/D75525
2020-03-11 10:23:41 +00:00
Fangrui Song a0c0389ffb [SimplifyLibcalls] Don't replace locked IO (fgetc/fgets/fputc/fputs/fread/fwrite) with unlocked IO (*_unlocked)
This essentially reverts some of the SimplifyLibcalls part changes of D45736 [SimplifyLibcalls] Replace locked IO with unlocked IO.

C11 7.21.5.2 The fflush function

> If stream is a null pointer, the fflush function performs this flushing action on all streams for which the behavior is defined above.

i.e. fopen'ed FILE* is inherently captured.

POSIX.1-2017 getc_unlocked, getchar_unlocked, putc_unlocked, putchar_unlocked - stdio with explicit client locking

> These functions can safely be used in a multi-threaded program if and only if they are called while the invoking thread owns the ( FILE *) object, as is the case after a successful call to the flockfile() or ftrylockfile() functions.

After a thread fopen'ed a FILE*, when it is calling foobar() which is now replaced by foobar_unlocked(),
if another thread is concurrently calling fflush(0), the behavior is undefined.

C11 7.22.4.4 The exit function

> Next, all open streams with unwritten buffered data are flushed, all open streams are closed, and all files created by the tmpfile function are removed.

The replacement is only feasible if the program is single threaded, or exit or fflush(0) is never called.
See also http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180528/556615.html
for how the replacement makes libc interceptors difficult to implement.

dalias: in a worst case, it's unbounded data corruption because of concurrent access to pointers
without synchronization.  f->wpos or rpos could get outside of the buffer, thread A could do
f->wpos += j after knowing j is in bounds, while thread B also changes it concurrently.

This can produce exploitable conditions depending on libc internals.

Revert the SimplifyLibcalls part change because the cons obviously
overweigh the pros.  Even when the replacement is feasible, the benefit
is indemonstrable, more so in an application instead of an artificial
glibc benchmark.  Theoretically the replacement could be beneficial when
calling getc_unlocked/putc_unlocked in a loop, but then it is better
using a blocked IO operation and the user is likely aware of that.

The function attribute inference is still useful and thus kept.

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D75933
2020-03-10 11:11:58 -07:00
Benjamin Kramer 247a177cf7 Give helpers internal linkage. NFC. 2020-03-10 18:27:42 +01:00
Tyker a4cde9ad7b Fixed [AssumeBundles] Move to IR so it can be used by Analysis
This is a recommit of 57c964aaa7
after fixing modules build.
2020-03-10 18:02:39 +01:00
Simon Moll d871ef4e6a [instcombine] remove fsub to fneg hacks; only emit fneg
Summary: Rewrite the fsub-0.0 idiom to fneg and always emit fneg for fp
negation. This also extends the scalarization cost in instcombine for unary
operators to result in the same IR rewrites for fneg as for the idiom.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D75467
2020-03-10 16:57:02 +01:00
Florian Hahn c8c14d979a [InstCombine] Support vectors in SimplifyAddWithRemainder.
SimplifyAddWithRemainder currently also matches for vector types, but
tries to create an integer constant, which causes a crash.

By using Constant::getIntegerValue() we can support both the scalar and
vector cases.

The 2 added test cases crash without the fix.

Reviewers: spatel, lebedev.ri

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D75906
2020-03-10 14:29:40 +00:00
Jonas Paulsson c2dafe12dc [SimplifyCFG] Skip merging return blocks if it would break a CallBr.
SimplifyCFG should not merge empty return blocks and leave a CallBr behind
with a duplicated destination since the verifier will then trigger an
assert. This patch checks for this case and avoids the transformation.

CodeGenPrepare has a similar check which also has a FIXME comment about why
this is needed. It seems perhaps better if these two passes would eventually
instead update the CallBr instruction instead of just checking and avoiding.

This fixes https://bugs.llvm.org/show_bug.cgi?id=45062.

Review: Craig Topper

Differential Revision: https://reviews.llvm.org/D75620
2020-03-10 14:59:13 +01:00
Sanjay Patel 467eec0910 [InstCombine] fold gep-of-select-of-constants (PR45084)
As shown in:
https://bugs.llvm.org/show_bug.cgi?id=45084
...we failed to combine a gep with constant indexes with a
pointer operand that is a select of constants.

Differential Revision: https://reviews.llvm.org/D75807
2020-03-10 09:25:13 -04:00
Florian Hahn 2d6ecf4648 [SLP] Support vectorizing functions provided by vector libs.
It seems like the SLPVectorizer is currently not aware of vector
versions of functions provided by libraries like Accelerate [1].
This patch updates SLPVectorizer to use the same infrastructure
the LoopVectorizer uses to detect vectorizable library functions.

For calls, it computes the cost of an intrinsic call (existing behavior)
and the cost of a vector function library call, if available. Like
LoopVectorizer, it assumes the cost of the vector function is simply the
cost of a call to a vector function.

[1] https://developer.apple.com/documentation/accelerate

Reviewers: ABataev, RKSimon, spatel

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D75878
2020-03-10 13:10:50 +00:00
ahatanak 1f5b471b8b [ObjC][ARC] Don't remove autoreleaseRV/retainRV pairs if the call isn't
a tail call

Previosly ARC optimizer removed the autoreleaseRV/retainRV pair in the
following code, which caused the object returned by @something to be
placed in the autorelease pool because the call to @something isn't a
tail call:

```
  %call = call i8* @something(...)
  %2 = call i8* @objc_retainAutoreleasedReturnValue(i8* %call)
  %3 = call i8* @objc_autoreleaseReturnValue(i8* %2)
  ret i8* %3
```

Fix the bug by checking whether @something is a tail call.

rdar://problem/59275894
2020-03-09 13:21:38 -07:00
Nikita Popov c3ca6876ed [InstCombine] Don't simplify calls without uses
When simplifying a call without uses, replaceInstUsesWith() is
going to do nothing, but we'll skip all following folds. We can
only run into this problem with calls that both simplify and are
not trivially dead if unused, which currently seems to happen only
with calls to undef, as the test diff shows. When extending
SimplifyCall() to handle "returned" attributes, this becomes a much
bigger problem, so I'm fixing this first.

Differential Revision: https://reviews.llvm.org/D75814
2020-03-09 18:47:46 +01:00
Jonas Devlieghere 882f589e20 Revert "[AssumeBundles] Move to IR so it can be used by Analysis"
This breaks the modules build:

http://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/
http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/

This reverts commit 57c964aaa7.
2020-03-09 09:02:47 -07:00
evgeny 6d2032e259 [WPD] Provide a way to prevent functions from being devirtualized
Differential revision: https://reviews.llvm.org/D75617
2020-03-09 14:05:15 +03:00
Hideto Ueno bdcbdb4848 [Attributor] Deduction based on path exploration
This patch introduces the propagation of known information based on path exploration.
For example,
```
int u(int c, int *p){
  if(c) {
     return *p;
  } else {
     return *p + 1;
  }
}
```
An argument `p` is dereferenced whatever c's value is.

For an instruction `CtxI`, we accumulate branch instructions in the must-be-executed-context of `CtxI` and then, we take the conjunction of the successors' known state.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D65593
2020-03-09 14:29:26 +09:00
Sanjay Patel a69158c12a [VectorCombine] fold extract-extract-op with different extraction indexes
opcode (extelt V0, Ext0), (ext V1, Ext1) --> extelt (opcode (splat V0, Ext0), V1), Ext1

The first part of this patch generalizes the cost calculation to accept
different extraction indexes. The second part creates a shuffle+extract
before feeding into the existing code to create a vector op+extract.

The patch conservatively uses "TargetTransformInfo::SK_PermuteSingleSrc"
rather than "TargetTransformInfo::SK_Broadcast" (splat specifically
from element 0) because we do not have a more general "SK_Splat"
currently. That does not affect any of the current regression tests,
but we might be able to find some cost model target specialization where
that comes into play.

I suspect that we can expose some missing x86 horizontal op codegen with
this transform, so I'm speculatively adding a debug flag to disable the
binop variant of this transform to allow easier testing.

The test changes show that we're sensitive to cost model diffs (as we
should be), so that means that patches like D74976
should have better coverage.

Differential Revision: https://reviews.llvm.org/D75689
2020-03-08 09:57:55 -04:00