Commit Graph

11688 Commits

Author SHA1 Message Date
Heejin Ahn aa097ef8d4 [WebAssembly] Fix reverse mapping in WasmEHFuncInfo
D97247 added the reverse mapping from unwind destination to their
source, but it had a critical bug; sources can be multiple, because
multiple BBs can have a single BB as their unwind destination.

This changes `WasmEHFuncInfo::getUnwindSrc` to `getUnwindSrcs` and makes
it return a vector rather than a single BB. It does not return the const
reference to the existing vector but creates a new vector because
`WasmEHFuncInfo` stores not `BasicBlock*` or `MachineBasicBlock*` but
`PointerUnion` of them. Also I hoped to unify those methods for
`BasicBlock` and `MachineBasicBlock` into one using templates to reduce
duplication, but failed because various usages require `BasicBlock*` to
be `const` but it's hard to make it `const` for `MachineBasicBlock`
usages.

Fixes https://github.com/emscripten-core/emscripten/issues/13514.
(More precisely, fixes
https://github.com/emscripten-core/emscripten/issues/13514#issuecomment-784708744)

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D97583
2021-02-26 17:12:10 -08:00
Craig Topper eea53b142d [DAGCombiner] Optimize SMULO/UMULO if we can prove that overflow is impossible.
Using ComputeNumSignBits or computeKnownBits we might be able
to determine that overflow is impossible.

This especially helps after type legalization if the type was
promoted from a type with half the bits or more. Type legalization
conservatively creates a promoted smulo/umulo and an overflow
check for the promoted bits. The overflow from the promoted
smulo/umulo is ORed with the result of the promoted bits
overflow check. Proving that the promoted smulo/umulo can never
overflow will leave us with just the promoted bits overflow check.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97160
2021-02-26 14:50:03 -08:00
Simon Pilgrim aefe8f2f6c [DAG] Fold vXi1 multiplies -> and
This allows us to remove X86 custom lowering of vXi1 MUL, which helps simplify a load of mask math.

Mentioned in D97478 post review.
2021-02-26 11:46:12 +00:00
Simon Pilgrim 73adc26ac0 [DAG] expandAddSubSat - break if-else chain. NFCI.
Fix styleguide issue - each if() block always returns so we don't need to make them a if-else chain.
2021-02-26 11:02:08 +00:00
Simon Pilgrim 9490b9f14b [DAG] Move simplification of SADDSAT/SSUBSAT/UADDSAT/USUBSAT of vXi1 to getNode()
As discussed on D97276 we should be able to always do this in node creation, we don't need a combine.
2021-02-25 17:49:26 +00:00
David Sherwood 87dbcd8865 [CodeGen] Canonicalise adds/subs of i1 vectors using XOR
When calling SelectionDAG::getNode() to create an ADD or SUB
of two vectors with i1 element types we can canonicalise this
to use XOR instead, where 1+1 is treated as wrapping around
to 0 and 0-1 wraps to 1.

I've added the following tests for SVE targets:

  CodeGen/AArch64/sve-pred-arith.ll

and modified some X86 tests to reflect the much simpler codegen
required.

Differential Revision: https://reviews.llvm.org/D97276
2021-02-25 10:31:26 +00:00
Craig Topper fe50be12c8 [LegalizeIntegerTypes] Further improve ExpandIntRes_SADDSUBO for targets where SADDO/SSUBO aren't supported.
Rather than converting 3 signbits to bools and comparing them,
we can do bitwise logic on the whole vector and convert the
resulting sign bit to a bool at the end.

This is still a different algorithm than what we do in LegalizeDAG
through expandSADDOSSUBO. That algorithm needs to know that the
RHS of SSUBO is > 0, but that's costly when the type is split.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97325
2021-02-24 10:05:38 -08:00
Simon Pilgrim 8082bfe7e5 [DAG] Add basic mul-with-overflow constant folding support
As noticed on D97160
2021-02-24 11:09:02 +00:00
Craig Topper cb6fc4b0a3 [LegalizeIntegerTypes] Use GetExpandedInteger instead of SplitInteger in ExpandIntRes_XMULO.
We know the input is going to be expanded as well, so we should
just ask for the already expanded operands. Otherwise we create
nodes that are just going to need to be legalized.
2021-02-23 23:53:45 -08:00
Heejin Ahn ea8c6375e3 [WebAssembly] Fix incorrect grouping and sorting of exceptions
This CL is not big but contains changes that span multiple analyses and
passes. This description is very long because it tries to explain basics
on what each pass/analysis does and why we need this change on top of
that. Please feel free to skip parts that are not necessary for your
understanding.

---

`WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next
unwind destination>. The value (unwind dest) here is where an exception
should end up when it is not caught by the key (EH pad). We record this
info in WasmEHPrepare to fix catch mismatches, because the CFG itself
does not have this info. A CFG only contains BBs and
predecessor-successor relationship between them, but in `WasmEHFuncInfo`
the unwind destination BB is not necessarily a successor or the key EH
pad BB. Their relationship can be intuitively explained by this C++ code
snippet:
```
try {
  try {
    foo();
  } catch (int) { // EH pad
    ...
  }
} catch (...) {   // unwind destination
}
```
So when `foo()` throws, it goes to `catch (int)` first. But if it is not
caught by it, it ends up in the next unwind destination `catch (...)`.
This unwind destination is what you see in `catchswitch`'s
`unwind label %bb` part.

---

`WebAssemblyExceptionInfo` groups exceptions so that they can be sorted
continuously together in CFGSort, as we do for loops. What this analysis
does is very simple: it creates a single `WebAssemblyException` per EH
pad, and all BBs that are dominated by that EH pad are included in this
exception. We also identify subexception relationship in this way: if
EHPad A domiantes EHPad B, EHPad B's exception is a subexception of
EHPad A's exception.

This simple rule turns out to be incorrect in some cases. In
`WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means
semantically EHPad B should not be included in EHPad A's exception,
because it does not make sense to rethrow/delegate to an inner scope.
This is what happened in CFGStackify as a result of this:
```
try
  try
  catch
    ...   <- %dest_bb is among here!
  end
delegate %dest_bb
```

So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to
make sure excptions' unwind destinations are not subexceptions of
their unwind sources in `WasmEHFuncInfo`.

But this alone does not prevent `dest_bb` in the example above from
being sorted within the inner `catch`'s exception, even if its exception
is not a subexception of that `catch`'s exception anymore, because of
how CFGSort works, which will be explained below.

---

CFGSort places BBs within the same `SortRegion` (loop or exception)
continuously together so they can be demarcated with `loop`-`end_loop`
or `catch`-`end_try` in CFGStackify.

`SortRegion` is a wrapper for one of `MachineLoop` or
`WebAssemblyException`. `SortRegionInfo` already does some complicated
things because there discrepancies between those two data structures.
`WebAssemblyException` is what we control, and it is defined as an EH
pad as its header and BBs dominated by the header as its BBs (with a
newly added exception of unwind destinations explained in the previous
paragraph). But `MachineLoop` is an LLVM data structure and uses the
standard loop detection algorithm. So by the algorithm, BBs that are 1.
dominated by the loop header and 2. have a path back to its header.
Because of the second condition, many BBs that are dominated by the loop
header are not included in the loop. So BBs that contain `return` or
branches to outside of the loop are not technically included in
`MachineLoop`, but they can be sorted together with the loop with no
problem.

Maybe to relax the condition, in CFGSort, when we are in a `SortRegion`
we allow sorting of not only BBs that belong to the current innermost
region but also BBs that are by the current region header.
(This was written this way from the first version written by Dan, when
only loops existed.) But now, we have cases in exceptions when EHPad B
is the unwind destination for EHPad A, even if EHPad B is dominated by
EHPad A it should not be included in EHPad A's exception, and should not
be sorted within EHPad A.

One way to make things work, at least correctly, is change `dominates`
condition to `contains` condition for `SortRegion` when sorting BBs, but
this will change compilation results for existing non-EH code and I
can't be sure it will not degrade performance or code size. I think it
will degrade performance because it will force many BBs dominated by a
loop, which don't have the path back to the header, to be placed after
the loop and it will likely to create more branches and blocks.

So this does a little hacky check when adding BBs to `Preferred` list:
(`Preferred` list is a ready list. CFGSort maintains ready list in two
priority queues: `Preferred` and `Ready`. I'm not very sure why, but it
was written that way from the beginning. BBs are first added to
`Preferred` list and then some of them are pushed to `Ready` list, so
here we only need to guard condition for `Preferred` list.)

When adding a BB to `Preferred` list, we check if that BB is an unwind
destination of another BB. To do this, this adds the reverse mapping,
`UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB
is an unwind destination, it checks if the current stack of regions
(`Entries`) contains its source BB by traversing the stack backwards. If
we find its unwind source in there, we add the BB to its `Deferred`
list, to make sure that unwind destination BB is added to `Preferred`
list only after that region with the unwind source BB is sorted and
popped from the stack.

---

This does not contain a new test that crashes because of this bug, but
this fix changes the result for one of existing test case. This test
case didn't crash because it fortunately didn't contain `delegate` to
the incorrectly placed unwind destination BB.

Fixes https://github.com/emscripten-core/emscripten/issues/13514.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D97247
2021-02-23 14:54:55 -08:00
Craig Topper eb165090bb [LegalizeIntegerTypes] Improve ExpandIntRes_SADDSUBO codegen on targets without SADDO/SSUBO.
This code creates 3 setccs that need to be expanded. It was
creating a sign bit test as setge X, 0 which is non-canonical.
Canonical would be setgt X, -1. This misses the special case in
IntegerExpandSetCCOperands for sign bit tests that assumes
canonical form. If we don't hit this special case we end up
with a multipart setcc instead of just checking the sign of
the high part.

To fix this I've reversed the polarity of all of the setccs to
setlt X, 0 which is canonical. The rest of the logic should
still work. This seems to produce better code on RISCV which
lacks a setgt instruction.

This probably still isn't the best code sequence we could use here.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97181
2021-02-23 09:40:32 -08:00
Heejin Ahn a08e609d2e [WebAssembly] Rename methods in WasmEHFuncInfo (NFC)
This renames variable and method names in `WasmEHFuncInfo` class to be
simpler and clearer. For example, unwind destinations are EH pads by
definition so it doesn't necessarily need to be included in every method
name. Also I am planning to add the reverse mapping in a later CL,
something like `UnwindDestToSrc`, so this renaming will make meanings
clearer.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D97173
2021-02-22 12:16:11 -08:00
Kazu Hirata ffba9e596d [CodeGen] Use range-based for loops (NFC) 2021-02-21 19:58:07 -08:00
Craig Topper 1a6c1ac686 [SelectionDAG][RISCV] Teach ComputeNumSignBits to handle SREM.
This also removes a pattern from RISCV that is no longer needed
since the sexti32 on the LHS of the srem in the pattern implies
the result is sign extended so the sign_extend_inreg should be
removed in DAG combine now.

Reviewed By: luismarques, RKSimon

Differential Revision: https://reviews.llvm.org/D97133
2021-02-21 11:13:36 -08:00
Simon Pilgrim 38ab47c813 [DAG] Match USUBSAT patterns through zext/trunc
This patch handles usubsat patterns hidden through zext/trunc and uses the getTruncatedUSUBSAT helper to determine if the USUBSAT can be correctly performed in the truncated form:

zext(x) >= y ? x - trunc(y) : 0 --> usubsat(x,trunc(umin(y,SatLimit)))
zext(x) >  y ? x - trunc(y) : 0 --> usubsat(x,trunc(umin(y,SatLimit)))

Based on original examples:

void foo(unsigned short *p, int max, int n) {
    int i;
    unsigned m;
    for (i = 0; i < n; i++) {
        m = *--p;
        *p = (unsigned short)(m >= max ? m-max : 0);
    }
}

Differential Revision: https://reviews.llvm.org/D25987
2021-02-21 15:26:54 +00:00
Kazu Hirata 0b417ba20f [CodeGen] Use range-based for loops (NFC) 2021-02-20 21:46:02 -08:00
Simon Pilgrim 761bbed264 [DAG] foldSubToUSubSat - fold sub(a,trunc(umin(zext(a),b))) -> usubsat(a,trunc(umin(b,SatLimit)))
This moves the last custom x86 USUBSAT fold to generic DAGCombine.

Completes PR40111

Differential Revision: https://reviews.llvm.org/D96703
2021-02-20 12:02:07 +00:00
Simon Pilgrim 5d3930bb8f [DAG] visitTRUNCATE - attempt to truncate USUBSAT
Fold trunc(usubsat(zext(x),y)) -> usubsat(x,trunc(umin(y,satlimit)))
2021-02-19 14:26:05 +00:00
Simon Pilgrim 53e83afcaf [DAG] getTruncatedUSUBSAT - always truncate operands. NFCI.
As noticed on D96703, we're always truncating the operands so should use getNode(ISD::TRUNCATE) instead of getZExtOrTrunc.
2021-02-18 21:28:55 +00:00
Guozhi Wei 66f2d09ebf [DAGCombiner] Transform (zext (select c, load1, load2)) -> (select c, zextload1, zextload2)
If extload is legal, following transform
    (zext (select c, load1, load2)) -> (select c, zextload1, zextload2)
can save one ext instruction.

Differential Revision: https://reviews.llvm.org/D95086
2021-02-18 13:15:20 -08:00
Bradley Smith 8bad8a43c3 [AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD
Adjust generateFMAsInMachineCombiner to return false if SVE is present
in order to combine fmul+fadd into fma. Also add new pseudo instructions
so as to select the most appropriate of FMLA/FMAD depending on register
allocation.

Depends on D96599

Differential Revision: https://reviews.llvm.org/D96424
2021-02-18 16:55:16 +00:00
Craig Topper 61d4d9a5d3 [TableGen][SelectionDAG] Improve efficiency of encoding negative immediates for isel's CheckInteger opcode.
CheckInteger uses an int64_t encoded using a variable width encoding
that is optimized for encoding a number with a lot of leading zeros.
Negative numbers have no leading zeros so use the largest encoding
requiring 9 bytes.

I believe its most like we want to check for positive and negative
numbers near 0. -1 is quite common due to its use in the 'not'
idiom.

To optimize for this, we can borrow an idea from the bitcode format
and move the sign bit to bit 0 with the magnitude stored in the
upper bits. This will drastically increase the number of leading
zeros for small magnitudes. Then we can run this value through
VBR encoding.

This gives a small reduction in the table size on all in tree
targets except VE where size increased by about 300 bytes due
to intrinsic ids now requiring 3 bytes instead of 2. Since the
intrinsic enum space is shared by all targets this an unfortunate
consquence of where VE is currently located in the range.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D96317
2021-02-18 08:53:17 -08:00
Simon Pilgrim 87fbc06d06 [DAG] Pull out getTruncatedUSUBSAT helper from foldSubToUSubSat. NFCI.
This will simplify an incoming generic implementation of D25987.

I'll rebase D96703 shortly to support this.
2021-02-17 12:17:08 +00:00
Simon Pilgrim 05c64ea672 [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) (REAPPLIED)
Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))

Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles.

Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle.

This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise.

Fixes issue raised by @saugustine in rG5aa8f4c0843a where we were failing to replace null shuffle operands from MergeInnerShuffle to UNDEFs.

Differential Revision: https://reviews.llvm.org/D96345
2021-02-17 11:42:43 +00:00
Sterling Augustine 5aa8f4c084 Revert "[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))"
This reverts commit 5dfba562dd.

That commit causes an assertion failure with the following repro:

typedef long b __attribute__((__vector_size__(16)));
b *d;
b e;
b __attribute__((__always_inline__)) c(b h, b i) {
  return (__attribute__((__vector_size__(8 * sizeof(short)))) short)h + i;
}
j() {
  b k, l, m, n, o[6], p, q;
  m = d[5];
  b r = m;
  b s = f(r, 8);
  q = s;
  l = d[1];
  p = l;
  t(q);
  n = c(m, l);
  o[1] = c(s, f(p, 8));
  k = __builtin_shufflevector(n, o[1], 0, 2);
  e = __builtin_ia32_psrlwi128(k, j);
}

./bin/clang -cc1 -triple x86_64-grtev4-linux-gnu -emit-obj -O1 -std=c99 test.c
2021-02-16 12:48:15 -08:00
Simon Pilgrim df45c18135 [DAG] PromoteIntRes_ADDSUBSHLSAT - promote ISD::UADDSAT as clamped add
Similar to D96622, we're better off just promoting uaddsat(x,y) -> umin(add(x,y),c) instead of trying to perform a shifted uaddsat.

I initially tried to just use shifted promotion in cases where we didn't have a legal/custom umin - but we don't appear to have any targets that have uaddsat but not umin, so imo we're better off always using the umin and avoid an untested shifted uaddsat code path.

Differential Revision: https://reviews.llvm.org/D96767
2021-02-16 17:37:44 +00:00
Craig Topper 064ada4ec6 [SelectionDAG][AArch64] Restrict matchUnaryPredicate to only handle SPLAT_VECTOR for scalable vectors.
fde2466171 added support for
scalable vectors to matchUnaryPredicate by handling SPLAT_VECTOR in
addition to BUILD_VECTOR. This was used to enabled UDIV/SDIV/UREM/SREM
by constant expansion in BuildUDIV/BuildSDIV in TargetLowering.cpp

The caller there expects to call getBuildVector from the match factors.
This leads to a crash right now if there is a SPLAT_VECTOR of
fixed vectors since the number of vectors won't match the number
of elements.

To fix this, this patch updates the callers to check the opcode
instead of whether the type is fixed or scalable. This assumes
that only 3 opcodes are handled by matchUnaryPredicate so
I've added an assertion to the final else to check that opcode.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D96174
2021-02-16 09:22:46 -08:00
Simon Pilgrim 5dfba562dd [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))
Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))

Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles.

Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle.

This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise.

Differential Revision: https://reviews.llvm.org/D96345
2021-02-16 15:46:34 +00:00
Simon Pilgrim 420420de57 [DAG] Avoid APInt copies by directly using the APInt reference from getAPIntValue. NFCI. 2021-02-16 13:50:34 +00:00
Simon Pilgrim dd879f7dc9 [DAG] Use APInt::extractBits instead of lshr().trunc(). NFCI.
Avoids so many APInt instances by directly using the APInt reference from getAPIntValue.
2021-02-16 13:50:33 +00:00
Craig Topper eb75f250fe [RISCV][LegalizeTypes] Try to expand BITREVERSE before promoting if the promoted BITREVERSE would expand anyway.
If we're going to end up expanding anyway, we should do it early
so we don't create extra operations to handle the bytes added by
promotion.

Simlilar was done for BSWAP previously.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D96681
2021-02-15 12:33:16 -08:00
Simon Pilgrim e47f21da61 [DAG] visitVSELECT - move OpLHS == LHS into inner if() in USUBSAT matching. NFCI.
This will be necessary for the update of D25987 where we'll need to match OpLHS against other ops.
2021-02-15 18:27:00 +00:00
Caroline Concatto 2d728bbff5 [CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse
This patch adds  a new intrinsic experimental.vector.reduce that takes a single
vector and returns a vector of matching type but with the original lane order
 reversed. For example:

```
vector.reverse(<A,B,C,D>) ==> <D,C,B,A>
```

The new intrinsic supports fixed and scalable vectors types.
The fixed-width vector relies on shufflevector to maintain existing behaviour.
Scalable vector uses the new ISD node - VECTOR_REVERSE.

This new intrinsic is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker (@paulwalker-arm).

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Differential Revision: https://reviews.llvm.org/D94883
2021-02-15 13:39:43 +00:00
Arlo Siemsen 080866470d Add ehcont section support
In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling.

This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker.

This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag.

The change includes a test that when the module flag is enabled the section is correctly generated.

The set of exception continuation information includes returns from exceptional control flow (catchret in llvm).

In order to collect catchret we:
1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation,
2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and
3) Combines these targets with the other EHCont targets that were already being collected.

Change originally authored by Daniel Frampton <dframpto@microsoft.com>

For more details, see MSVC documentation for `/guard:ehcont`
  https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D94835
2021-02-15 14:27:12 +08:00
Simon Pilgrim 6f5a805bbb [DAG] Fold i1/vXi1 saddsat/uaddsat(x,y) -> or(x,y)
Alive2: https://alive2.llvm.org/ce/z/FzcrpH
2021-02-13 15:02:01 +00:00
Simon Pilgrim 0df15e5eff [DAG] Fold i1/vXi1 ssubsat/usubsat(x,y) -> and(x,~y)
Alive2: https://alive2.llvm.org/ce/z/4nkNGh
2021-02-13 13:21:15 +00:00
Simon Pilgrim 60ba5397df [DAG] PromoteIntRes_ADDSUBSHLSAT - use promoted ISD::USUBSAT directly
As discussed on D96413, as long as the promoted bits of the args are zero we can use the basic ISD::USUBSAT pattern directly, without the shifting like we do for other ops.

I think something similar should be possible for ISD::UADDSAT as well, which I'll look at later.

Also, create a ISD::USUBSAT node directly - this will be expanded back by the legalizer later on if necessary.

Differential Revision: https://reviews.llvm.org/D96622
2021-02-13 12:35:10 +00:00
Simon Pilgrim 7ad0c573bd [DAG] Fix shift amount limit in SimplifyDemandedBits trunc(shift(x,c)) to truncated bitwidth
We lost this in D56387/rG69bc0990a9181e6eb86228276d2f59435a7fae67 - where I got the src/dst bitwidths mixed up and assumed getValidShiftAmountConstant would catch it.

Patch by @craig.topper - confirmed by @Carrot that it fixes PR49162
2021-02-13 12:00:08 +00:00
Simon Pilgrim 4841a225b7 [DAG] Move basic USUBSAT pattern matches from X86 to DAGCombine
Begin transitioning the X86 vector code to recognise sub(umax(a,b) ,b) or sub(a,umin(a,b)) USUBSAT patterns to make it more generic and available to all targets.

This initial patch just moves the basic umin/umax patterns to DAG, removing some vector-only checks on the way - these are some of the patterns that the legalizer will try to expand back to so we can be reasonably relaxed about matching these pre-legalization.

We can handle the trunc(sub(..))) variants as well, which helps with patterns where we were promoting to a wider type to detect overflow/saturation.

The remaining x86 code requires some cleanup first - some of it isn't actually tested etc. I also need to resurrect D25987.

Differential Revision: https://reviews.llvm.org/D96413
2021-02-12 18:22:57 +00:00
Akira Hatanaka ed4718eccb [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-12 09:51:57 -08:00
Simon Pilgrim 2465541dc0 [DAG] DAGTypeLegalizer::PromoteIntRes_ADDSUBSHLSAT - break if-else chain. NFCI.
Style fixup - the if() block always returns so we can pull out the contents of the else() block.
2021-02-12 10:33:12 +00:00
Craig Topper 5744502a13 [TargetLowering][RISCV][AArch64][PowerPC] Enable BuildUDIV/BuildSDIV on illegal types before type legalization if we can find a larger legal type that supports MUL.
If we wait until the type is legalized, we'll lose information
about the orginal type and need to use larger magic constants.
This gets especially bad on RISCV64 where i64 is the only legal
type.

I've limited this to simple scalar types so it only works for
i8/i16/i32 which are most likely to occur. For more odd types
we might want to do a small promotion to a type where MULH is legal
instead.

Unfortunately, this does prevent some urem/srem+seteq matching since
that still require legal types.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D96210
2021-02-11 09:43:13 -08:00
Simon Pilgrim 5beebf9c58 [DAG] foldLogicOfSetCCs - Generalize and/or (setcc X, CMax, ne), (setcc X, CMin, ne/eq) fold. NFCI.
Prep work to add support for non-uniform vectors - replace APInt values with using the SDValue ops directly.
2021-02-11 17:09:01 +00:00
Thomas Preud'homme bad0290ce3 Improve STRICT_FSETCC codegen in absence of no NaN
As for SETCC, use a less expensive condition code when generating
STRICT_FSETCC if the node is known not to have Nan.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D91972
2021-02-11 14:19:43 +00:00
Joe Ellis 67464dfe36 [DebugInfo] Only perform TypeSize -> unsigned cast when necessary
This commit moves a line in SelectionDAGBuilder::handleDebugValue to
avoid implicitly casting a TypeSize object to an unsigned earlier than
necessary. It was possible that we bail out of the loop before the value
is ever used, which means we could create a superfluous TypeSize
warning.

Reviewed By: DavidTruby

Differential Revision: https://reviews.llvm.org/D96423
2021-02-11 13:54:09 +00:00
Hongtao Yu 1cb47a063e [CSSPGO] Unblock optimizations with pseudo probe instrumentation.
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:

1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95982
2021-02-10 12:43:17 -08:00
Luís Marques acac29ca42 [DAGCombiner] Don't fold FCOPYSIGN vector sign operand casts
Avoid doing the following combine for vector types:

```
copysign(x, fp_extend(y)) -> copysign(x, y)
copysign(x, fp_round(y)) -> copysign(x, y)
```

That combine seemed to impede the selection of vector instruction and cause
a mess in some circumstances.

Differential Revision: https://reviews.llvm.org/D96037
2021-02-10 14:25:24 +00:00
Kazu Hirata 7e75f6fc1d [SelectionDAG] Use range-based for loops (NFC) 2021-02-09 22:14:30 -08:00
Nico Weber de1966e542 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 4a64d8fe39.
Makes clang crash when buildling trivial iOS programs, see comment
after https://reviews.llvm.org/D92808#2551401
2021-02-09 11:06:32 -05:00
Nemanja Ivanovic a5222aa085 [DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets
As of commit 284f2bffc9, the DAG Combiner gets rid of the masking of the
input to this node if the mask only keeps the bottom 16 bits. This is because
the underlying library function does not use the high order bits. However, on
PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits
from the register. Therefore, the library implementation of __gnu_h2f_ieee will
return an incorrect result if the bits aren't cleared.

This combine is desired for ARM (and possibly other targets) so this patch adds
a query to Target Lowering to check if this zeroing needs to be kept.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092

Differential revision: https://reviews.llvm.org/D96283
2021-02-09 06:33:48 -06:00
Thomas Preud'homme a50ab8672d Revert STRICT_FCMP nonan optimisation
Summary: This reverts commit b7b61a7b5b which fails on some of the builders: http://lab.llvm.org:8011/#/builders/14/builds/5806

Reviewers:

Subscribers:
2021-02-09 11:27:35 +00:00
Thomas Preud'homme b7b61a7b5b Improve STRICT_FSETCC codegen in absence of no NaN
As for SETCC, use a less expensive condition code when generating
STRICT_FSETCC if the node is known not to have Nan.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D91972
2021-02-09 11:18:16 +00:00
Simon Pilgrim c5c690a835 [DAG] visitVECTOR_SHUFFLE - move shuffle legality check into MergeInnerShuffle lamda. NFCI.
This is going to be necessary for a future reuse of MergeInnerShuffle
2021-02-08 14:25:16 +00:00
Kazu Hirata 7b9f6c2d42 [SelectionDAG] Drop unnecessary const from a return type (NFC)
Identified with const-return-type.
2021-02-07 09:49:33 -08:00
Simon Pilgrim 86dabf4226 [DAG] SelectionDAG::isSplatValue - handle OR/XOR cases
Add OR/XOR to the basic binops that we support when checking for a splat vector value
2021-02-07 13:27:57 +00:00
Huihui Zhang 1b81117f88 [DAGCombiner][SVE] Fix invalid use of getVectorNumElements() in visitSRA.
Make sure scalable property is preserved by using getVectorElementCount().

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D95967
2021-02-05 09:56:49 -08:00
Akira Hatanaka 4a64d8fe39 [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.

Original commit message:

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 06:09:42 -08:00
Akira Hatanaka 2fbbb18c1d Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 3fe3946d9a.

The commit violates layering by including a header from Analysis in
lib/IR/AutoUpgrade.cpp.
2021-02-05 06:00:05 -08:00
Akira Hatanaka 3fe3946d9a [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 05:55:18 -08:00
Craig Topper 11ef356d9e [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
Craig Topper 8cc9c42a0c [TargetLowering] Use LegalOnly operand to isOperationLegalOrCustom to simplify some code. NFC 2021-02-04 12:30:37 -08:00
Craig Topper 34da12dd1f [DAGCombiner] Remove (sra (shl X, C), C) if X has more than C sign bits.
If sext_inreg is supported, we will turn this into sext_inreg. That
will then remove it if there are enough sign bits. But if sext_inreg
isn't supported, we can still remove the shift pair based on sign
bits.

Split from D95890.
2021-02-03 10:18:40 -08:00
Craig Topper 4553821815 [SelectionDAG] Prevent scalable vector warning from ComputeNumSignBits on extract_vector_elt on a scalable vector. 2021-02-01 23:42:03 -08:00
Kerry McLaughlin 9b4fcfaa9e [SVE][CodeGen] Remove performMaskedGatherScatterCombine
The AArch64 DAG combine added by D90945 & D91433 extends the index
of a scalable masked gather or scatter to i32 if necessary.

This patch removes the combine and instead adds shouldExtendGSIndex, which
is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether
the index should be extended before calling getMaskedGather/Scatter.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D94525
2021-02-01 14:10:00 +00:00
xgupta 94fac81fcc [Branch-Rename] Fix some links
According to the [[ https://foundation.llvm.org/docs/branch-rename/ | status of branch rename ]], the master branch of the LLVM repository is removed on 28 Jan 2021.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D95766
2021-02-01 16:43:21 +05:30
Serge Pavlov bf416d166b [FPEnv] Intrinsic for setting rounding mode
To set non-default rounding mode user usually calls function 'fesetround'
from standard C library. This way has some disadvantages.

* It creates unnecessary dependency on libc. On the other hand, setting
  rounding mode requires few instructions and could be made by compiler.
  Sometimes standard C library even is not available, like in the case of
  GPU or AI cores that execute small kernels.
* Compiler could generate more effective code if it knows that a particular
  call just sets rounding mode.

This change introduces new IR intrinsic, namely 'llvm.set.rounding', which
sets current rounding mode, similar to 'fesetround'. It however differs
from the latter, because it is a lower level facility:

* 'llvm.set.rounding' does not return any value, whereas 'fesetround'
  returns non-zero value in the case of failure. In glibc 'fesetround'
  reports failure if its argument is invalid or unsupported or if floating
  point operations are unavailable on the hardware. Compiler usually knows
  what core it generates code for and it can validate arguments in many
  cases.
* Rounding mode is specified in 'fesetround' using constants like
  'FE_TONEAREST', which are target dependent. It is inconvenient to work
  with such constants at IR level.

C standard provides a target-independent way to specify rounding mode, it
is used in FLT_ROUNDS, however it does not define standard way to set
rounding mode using this encoding.

This change implements only IR intrinsic. Lowering it to machine code is
target-specific and will be implemented latter. Mapping of 'fesetround'
to 'llvm.set.rounding' is also not implemented here.

Differential Revision: https://reviews.llvm.org/D74729
2021-02-01 11:28:14 +07:00
Craig Topper 70289ea6f5 [RISCV][LegalizeTypes] Try to expand BSWAP before promoting if the promoted BSWAP would expand anyway.
If we're going to end up expanding anyway, we should do it early
so we don't create extra operations to handle the bytes added by
promotion.

This is helfpul on RISCV where we might have to promote i16 all
the way to i64.

Differential Revision: https://reviews.llvm.org/D95756
2021-01-31 14:33:29 -08:00
Craig Topper ea87cf2acd [TargetLowering][RISCV] Don't transform (seteq/ne (sext_inreg X, VT), C1) -> (seteq/ne (zext_inreg X, VT), C1) if the sext_inreg is cheaper
RISCV has to use 2 shifts for (i64 (zext_inreg X, i32)), but we
can use addiw rd, rs1, x0 for sext_inreg. We already understood this
when type legalizing i32 seteq/ne on rv64. But this transform in
SimplifySetCC would sometimes undo it.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D95289
2021-01-25 16:37:21 -08:00
Fraser Cormack fde2466171 [SelectionDAG] Support scalable-vector splats in more cases
This patch adds support for scalable-vector splats in DAGCombiner's
`isConstantOrConstantVector` and `ISD::matchUnaryPredicate` functions,
which enable the SelectionDAG div/rem-by-constant optimizations for
scalable vector types.

It also fixes up one case where the UDIV optimization was generating a
SETCC without first consulting the target for its preferred SETCC result
type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94501
2021-01-25 10:58:15 +00:00
Fangrui Song d5bbaaaf95 [XRay] Make __xray_customevent support non-Linux 2021-01-25 00:48:21 -08:00
QingShan Zhang ffc3e800c6 [NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero
For now, we correct the result for sqrt if iteration > 0. This doesn't make
sense as they are not strict relative.

Reviewed By: dmgreen, spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D94480
2021-01-25 04:02:44 +00:00
Kazu Hirata 16baad8f4e [llvm] Use pop_back_val (NFC) 2021-01-24 12:18:57 -08:00
Kazu Hirata d44ca0cf2f [CodeGen] Forward-declare TargetMachine (NFC)
InstrEmitter.h needs TargetMachine but relies on a forward declaration
of TargetMachine in MachineOperand.h.  This patch adds a forward
declaration right in InstrEmitter.h.

While we are at it, this patch removes the one in MachineOperand.h,
where it is unnecessary.
2021-01-24 12:18:54 -08:00
Craig Topper 147c0c263d [TargetLowering] Use isOneConstant to simplify some code. NFC 2021-01-22 19:32:19 -08:00
Simon Pilgrim 5dbe5d2c91 [DAG] Commute shuffle(splat(A,u), shuffle(C,D)) -> shuffle'(shuffle(C,D), splat(A,u))
We only merge shuffles if the inner (LHS) shuffle is a non-splat, so commute these shuffles to improve merging of multiple shuffles.
2021-01-22 11:43:18 +00:00
Craig Topper c953a83347 [TargetLowering] Use getBoolConstant instead of assuming zero or one for boolean contents.
Noticed while I was touching other nearby code. I don't have a
test where this matters because the targets I work on
use zero or one boolean contents. And the tests cases I've seen
this fire on happen before type legalization where the result type
is MVT::i1 so the distinction doesn't matter.
2021-01-22 00:26:14 -08:00
Craig Topper 5660dc5968 [TargetLowering] Simplify some code in SimplifySetCC that tries to handle SIGN_EXTEND_INREG operand types that should never happen. NFCI
There was code to handle the first operand being different than
the result type. And code to handle first operand having the
same type as the type to extend from. This should never happen
for a correctly formed SIGN_EXTEND_INREG. I've replace the
code with asserts.

I also noticed we created the same APInt twice so I've reused it.
2021-01-21 23:56:37 -08:00
Simon Pilgrim 69bc0990a9 [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED).
Add DemandedElts support inside the TRUNCATE analysis.

REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d

Differential Revision: https://reviews.llvm.org/D56387
2021-01-21 13:01:34 +00:00
Simon Pilgrim 935bacd3a7 [DAG] SimplifyDemandedBits - correctly adjust truncated shift amount type
As noticed on D56387, for vectors we must always correctly adjust the shift amount type during truncation (not just after legalization). We were getting away with it as we currently only accepted scalars via the dyn_cast<ConstantSDNode>.
2021-01-21 12:38:36 +00:00
Simon Pilgrim bc9ab9a5cd [DAG] CombineToPreIndexedLoadStore - use const APInt& for getAPIntValue(). NFCI.
Cleanup some code to use auto* properly from cast, and use const APInt& for getAPIntValue() to avoid an unnecessary copy.
2021-01-21 11:04:09 +00:00
Hans Wennborg a51226057f Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE"
It caused "Vector shift amounts must be in the same as their first arg"
asserts in Chromium builds. See the code review for repro instructions.

> Add DemandedElts support inside the TRUNCATE analysis.
>
> Differential Revision: https://reviews.llvm.org/D56387

This reverts commit cad4275d69.
2021-01-20 20:06:55 +01:00
Simon Pilgrim cad4275d69 [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE
Add DemandedElts support inside the TRUNCATE analysis.

Differential Revision: https://reviews.llvm.org/D56387
2021-01-20 15:39:58 +00:00
Kazu Hirata b023cdeacc [llvm] Use llvm::all_of (NFC) 2021-01-19 20:19:17 -08:00
Kazu Hirata 8857202489 [llvm] Use llvm::find (NFC) 2021-01-19 20:19:14 -08:00
Craig Topper 79e798aca3 Recommit "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results."
This recommits 2c51bef76c.

I've fixed the broken check line from when I renamed the test function.

Original commit message:
This builds on D94142 where scalable vectors are allowed in structs.

I did have to fix one scalable vector issue in the vector type
creation for these intrinsics where we used getVectorNumElements
instead of ElementCount.
2021-01-18 11:08:28 -08:00
Craig Topper 5d431c3d32 Revert "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results."
This reverts commit 2c51bef76c.

I seem to have messed up the check lines in the test.
2021-01-18 11:00:20 -08:00
Craig Topper 2c51bef76c [RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results.
This builds on D94142 where scalable vectors are allowed in structs.

I did have to fix one scalable vector issue in the vector type
creation for these intrinsics where we used getVectorNumElements
instead of ElementCount.

Differential Revision: https://reviews.llvm.org/D94149
2021-01-18 10:41:36 -08:00
Kazu Hirata 23b0ab2acb [llvm] Use the default value of drop_begin (NFC) 2021-01-18 10:16:36 -08:00
Simon Pilgrim 207f32948b [DAG] SimplifyDemandedBits - use KnownBits comparisons to remove ISD::UMIN/UMAX ops
Use the KnownBits icmp comparisons to determine when a ISD::UMIN/UMAX op is unnecessary should either op be known to be ULT/ULE or UGT/UGE than the other.

Differential Revision: https://reviews.llvm.org/D94532
2021-01-18 10:29:23 +00:00
Qiu Chaofan f776d8b12f [Legalizer] Promote result type in expanding FP_TO_XINT
This patch promotes result integer type of FP_TO_XINT in expanding.
So crash in conversion from ppc_fp128 to i1 will be fixed.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D92473
2021-01-18 11:56:11 +08:00
Kazu Hirata 19aacdb715 [llvm] Construct SmallVector with iterator ranges (NFC) 2021-01-16 09:40:53 -08:00
Bjorn Pettersson 4f15556731 [LegalizeDAG] Handle NeedInvert when expanding BR_CC
This is a follow-up fix to commit 03c8d6a0c4.
Seems like we now end up with NeedInvert being set in the result
from LegalizeSetCCCondCode more often than in the past, so we
need to handle NeedInvert when expanding BR_CC.

Not sure how to deal with the "Tmp4.getNode()" case properly,
but current assumption is that that code path isn't impacted
by the changes in 03c8d6a0c4 so we can simply move
the old assert into the if-branch and only handle NeedInvert in the
else-branch.

I think that the test case added here, for PowerPC, might have
failed also before commit 03c8d6a0c4. But we started
to hit the assert more often downstream when having merged that
commit.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94762
2021-01-16 14:33:19 +01:00
Jeroen Dobbelaere 668827b648 Introduce llvm.noalias.decl intrinsic
The ``llvm.experimental.noalias.scope.decl`` intrinsic identifies where a noalias
scope is declared. When the intrinsic is duplicated, a decision must
also be made about the scope: depending on the reason of the duplication,
the scope might need to be duplicated as well.

Reviewed By: nikic, jdoerfert

Differential Revision: https://reviews.llvm.org/D93039
2021-01-16 09:20:45 +01:00
Craig Topper a9e939760c [CodeGen] Removes unwanted optimisation for TargetConstantFP
This 'FIXME' popped up in the development of an out-of-tree backend.
Quick fix, but first llvm upstream patch, therefore I do not have commit rights, so if approved please commit?

- Test is not included as this came up in an out-of-tree backend (if required, please hint on how to test this).

Patch by simveg (Simon)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D93219
2021-01-15 11:52:53 -08:00
Craig Topper 4c5066b078 [TargetLowering] Don't speculatively call ComputeNumSignBits. NFC
These methods are recursive so a little costly.

We only look at the result in one place in this function and it's
conditional. We also only need the second call if the first had
enough returned enough sign bits.
2021-01-15 09:09:35 -08:00
Simon Pilgrim 46aa3c6c33 [DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - improve shuffle(shuffle(x,y),shuffle(x,y)) merging
MergeInnerShuffle currently attempts to merge shuffle(shuffle(x,y),z) patterns into a single shuffle, using 1 or 2 of the x,y,z ops.

However if we already match 2 ops we might be able to handle the third op if its also a shuffle that references one of the previous ops, allowing us to handle some cases like:

shuffle(shuffle(x,y),shuffle(x,y))
shuffle(shuffle(shuffle(x,z),y),z)
shuffle(shuffle(x,shuffle(x,y)),z)
etc.

This isn't an exhaustive match and is dependent on the order the candidate ops are encountered - if one of the matched ops was a shuffle that was peek-able we don't go back and try to split that, I haven't found much need for that amount of analysis yet.

This is a preliminary patch that will allow us to later improve x86 HADD/HSUB matching - but needs to be reviewed separately as its in generic code and affects existing Thumb2 tests.

Differential Revision: https://reviews.llvm.org/D94671
2021-01-15 15:08:31 +00:00
Jay Foad 868da2ea93 [SelectionDAG] Remove an early-out from computeKnownBits for smin/smax
Even if we know nothing about LHS, it can still be useful to know that
smax(LHS, RHS) >= RHS and smin(LHS, RHS) <= RHS.

Differential Revision: https://reviews.llvm.org/D87145
2021-01-14 18:15:17 +00:00
Jay Foad 517196e569 [Analysis,CodeGen] Make use of KnownBits::makeConstant. NFC.
Differential Revision: https://reviews.llvm.org/D94588
2021-01-14 14:02:43 +00:00
Jay Foad a1cba5b7a1 [SelectionDAG] Make use of KnownBits::commonBits. NFC.
Differential Revision: https://reviews.llvm.org/D94587
2021-01-14 14:02:43 +00:00
Simon Pilgrim 7c30c05ff7 [DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - reset shuffle ops and reorder early-out and second op matching. NFCI.
I'm hoping to reuse MergeInnerShuffle in some other folds - so ensure the candidate ops/mask are reset at the start of each run.

Also, move the second op matching before bailing to make it simpler to try to match other things afterward.
2021-01-14 11:55:20 +00:00
Simon Pilgrim af8d27a7a8 [DAG] visitVECTOR_SHUFFLE - pull out shuffle merging code into lambda helper. NFCI.
Make it easier to reuse in a future patch.
2021-01-14 11:05:19 +00:00
Kazu Hirata 5c1c39e8d8 [llvm] Use *Set::contains (NFC) 2021-01-13 19:14:41 -08:00
Simon Pilgrim 993c488ed2 [DAG] visitVECTOR_SHUFFLE - use all_of to check for all-undef shuffle mask. NFCI. 2021-01-13 17:19:41 +00:00
Kerry McLaughlin 2170e0ee60 [SVE][CodeGen] CTLZ, CTTZ & CTPOP operations (predicates)
Canonicalise the following operations in getNode() for predicate types:
 - CTLZ(Pred)  -> bitwise_NOT(Pred)
 - CTTZ(Pred)  -> bitwise_NOT(Pred)
 - CTPOP(Pred) -> Pred

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D94428
2021-01-13 12:24:54 +00:00
Serguei Katkov 157efd84ab [Statepoint Lowering] Add an option to allow use gc values in regs for landing pad
Default value is not changed, so it is NFC actually.

The option allows to use gc values on registers in landing pads.

Reviewers: reames, dantrushin
Reviewed By: reames, dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D94469
2021-01-13 11:39:34 +07:00
Juneyoung Lee 25eb7b08ba [DAGCombiner] Fold BRCOND(FREEZE(COND)) to BRCOND(COND)
This patch resolves the suboptimal codegen described in http://llvm.org/pr47873 .
When CodeGenPrepare lowers select into a conditional branch, a freeze instruction is inserted.
It is then translated to `BRCOND(FREEZE(SETCC))` in SelDag.
The `FREEZE` in the middle of `SETCC` and `BRCOND` was causing a suboptimal code generation however.
This patch adds `BRCOND(FREEZE(cond))` -> `BRCOND(cond)` fold to DAGCombiner to remove the `FREEZE`.

To make this optimization sound, `BRCOND(UNDEF)` simply should nondeterministically jump to the branch or not, rather than raising UB.
It wasn't clear what happens when the condition was undef according to the comments in ISDOpcodes.h, however.
I updated the comments of `BRCOND` to make it explicit (as well as `BR_CC`, which is also a conditional branch instruction).

Note that it diverges from the semantics of `br` instruction in IR, which is explicitly UB.
Since the UB semantics was necessary to explain optimizations that use branching conditions, and SelDag doesn't seem to have such optimization, I think this divergence is okay.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D92015
2021-01-13 09:36:52 +09:00
Craig Topper 03c8d6a0c4 [LegalizeDAG][RISCV][PowerPC][AMDGPU][WebAssembly] Improve expansion of SETONE/SETUEQ on targets without SETO/SETUO.
If SETO/SETUO aren't legal, they'll be expanded and we'll end up
with 3 comparisons.

SETONE is equivalent to (SETOGT || SETOLT)
so if one of those operations is supported use that expansion. We
don't need both since we can commute the operands to make the other.

SETUEQ can be implemented with !(SETOGT || SETOLT) or (SETULE && SETUGE).
I've only implemented the first because it didn't look like most of the
affected targets had legal SETULE/SETUGE.

Reviewed By: frasercrmck, tlively, nemanjai

Differential Revision: https://reviews.llvm.org/D94450
2021-01-12 10:45:03 -08:00
Craig Topper df74c001fa [DAGCombiner] Replace static helper function isConstantFPBuildVectorOrConstantFP with the identical version in SelectionDAG. NFC 2021-01-11 23:41:40 -08:00
Craig Topper f9ef3a6003 [SelectionDAG] Make isConstantIntBuildVectorOrConstantInt and isConstantFPBuildVectorOrConstantFP methods const. 2021-01-11 23:26:53 -08:00
Paul Robinson 1f9c29228c [FastISel] NFC: Clean up unnecessary bookkeeping
Now that we flush the local value map for every instruction, we don't
need any extra flushes for specific cases.  Also, LastFlushPoint is
not used for anything.  Follow-ups to #c161665 (D91734).

This reapplies #3fd39d3.

Differential Revision: https://reviews.llvm.org/D92338
2021-01-11 09:40:39 -08:00
Paul Robinson be179b9946 [FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option
This option is not used for anything after #c161665 (D91737).
This commit reapplies #a474657.
2021-01-11 09:32:49 -08:00
Paul Robinson c161775dec [FastISel] Flush local value map on every instruction
Local values are constants or addresses that can't be folded into
the instruction that uses them. FastISel materializes these in a
"local value" area that always dominates the current insertion
point, to try to avoid materializing these values more than once
(per block).

https://reviews.llvm.org/D43093 added code to sink these local
value instructions to their first use, which has two beneficial
effects. One, it is likely to avoid some unnecessary spills and
reloads; two, it allows us to attach the debug location of the
user to the local value instruction. The latter effect can
improve the debugging experience for debuggers with a "set next
statement" feature, such as the Visual Studio debugger and PS4
debugger, because instructions to set up constants for a given
statement will be associated with the appropriate source line.

There are also some constants (primarily addresses) that could be
produced by no-op casts or GEP instructions; the main difference
from "local value" instructions is that these are values from
separate IR instructions, and therefore could have multiple users
across multiple basic blocks. D43093 avoided sinking these, even
though they were emitted to the same "local value" area as the
other instructions. The patch comment for D43093 states:

  Local values may also be used by no-op casts, which adds the
  register to the RegFixups table. Without reversing the RegFixups
  map direction, we don't have enough information to sink these
  instructions.

This patch undoes most of D43093, and instead flushes the local
value map after(*) every IR instruction, using that instruction's
debug location. This avoids sometimes incorrect locations used
previously, and emits instructions in a more natural order.

In addition, constants materialized due to PHI instructions are
not assigned a debug location immediately; instead, when the
local value map is flushed, if the first local value instruction
has no debug location, it is given the same location as the
first non-local-value-map instruction.  This prevents PHIs
from introducing unattributed instructions, which would either
be implicitly attributed to the location for the preceding IR
instruction, or given line 0 if they are at the beginning of
a machine basic block.  Neither of those consequences is good
for debugging.

This does mean materialized values are not re-used across IR
instruction boundaries; however, only about 5% of those values
were reused in an experimental self-build of clang.

(*) Actually, just prior to the next instruction. It seems like
it would be cleaner the other way, but I was having trouble
getting that to work.

This reapplies commits cf1c774d and dc35368c, and adds the
modification to PHI handling, which should avoid problems
with debugging under gdb.

Differential Revision: https://reviews.llvm.org/D91734
2021-01-11 08:32:36 -08:00
Joe Ellis 007358239d [DAGCombiner] Use getVectorElementCount inside visitINSERT_SUBVECTOR
This avoids TypeSize-/ElementCount-related warnings.

Differential Revision: https://reviews.llvm.org/D92747
2021-01-11 14:15:11 +00:00
QingShan Zhang 7539c75bb4 [DAGCombine] Remove the check for unsafe-fp-math when we are checking the AFN
We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent.
As the direction is to remove this global option, we need to remove the unsafe-fp-math
check for sqrt and update the test with afn fast-math flags.

Reviewed By: Spatel

Differential Revision: https://reviews.llvm.org/D93891
2021-01-11 02:25:53 +00:00
Fraser Cormack 41d06095b0 [SelectionDAG] Teach isConstOrConstSplat about ISD::SPLAT_VECTOR
This improves llvm::isConstOrConstSplat by allowing it to analyze
ISD::SPLAT_VECTOR nodes, in order to allow more constant-folding of
operations using scalable vector types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94168
2021-01-09 20:54:34 +00:00
Fraser Cormack de373ef779 [SelectionDAG] Extend immAll(Ones|Zeros)V to handle ISD::SPLAT_VECTOR
The TableGen immAllOnesV and immAllZerosV helpers implicitly wrapped the
ISD::isBuildVectorAll(Ones|Zeros) helper functions. This was inhibiting
their use for targets such as RISC-V which use ISD::SPLAT_VECTOR. In
particular, RISC-V had to define its own 'vnot' fragment.

In order to extend the scope of these nodes to include support for
ISD::SPLAT_VECTOR, two new ISD predicate functions have been introduced:
ISD::isConstantSplatVectorAll(Ones|Zeros). These effectively supersede
the older "isBuildVector" predicates, which are now simple wrappers for
the new functions. They pass a defaulted boolean toggle which preserves
the old behaviour. It is hoped that in time all call-sites can be ported
to the "isConstantSplatVector" functions.

While the use of ISD::isBuildVectorAll(Ones|Zeros) has not changed, the
behaviour of the TableGen immAll(Ones|Zeros)V **has**. To test the new
functionality, the custom RISC-V TableGen fragment has been removed and
replaced with the built-in 'vnot'. To test their use as pattern-roots, two
splat patterns have been updated accordingly.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94223
2021-01-09 17:05:31 +00:00
Heejin Ahn 9e4eadeb13 [WebAssembly] Update basic EH instructions for the new spec
This implements basic instructions for the new spec.

- Adds new versions of instructions: `catch`, `catch_all`, and `rethrow`
- Adds support for instruction selection for the new instructions
 - `catch` needs a custom routine for the same reason `throw` needs one,
   to encode `__cpp_exception` tag symbol.
- Updates `WebAssembly::isCatch` utility function to include `catch_all`
  and Change code that compares an instruction's opcode with `catch` to
  use that function.
- LateEHPrepare
  - Previously in LateEHPrepare we added `catch` instruction to both
    `catchpad`s (for user catches) and `cleanuppad`s (for destructors).
    In the new version `catch` is generated from `llvm.catch` intrinsic
    in instruction selection phase, so we only need to add `catch_all`
    to the beginning of cleanup pads.
  - `catch` is generated from instruction selection, but we need to
    hoist the `catch` instruction to the beginning of every EH pad,
    because `catch` can be in the middle of the EH pad or even in a
    split BB from it after various code transformations.
  - Removes `addExceptionExtraction` function, which was used to
    generate `br_on_exn` before.
- CFGStackfiy: Deletes `fixUnwindMismatches` function. Running this
  function on the new instruction causes crashes, and the new version
  will be added in a later CL, whose contents will be completely
  different. So deleting the whole function will make the diff easier to
  read.
- Reenables all disabled tests in exception.ll and eh-lsda.ll and a
  single basic test in cfg-stackify-eh.ll.
- Updates existing tests to use the new assembly format. And deletes
  `br_on_exn` instructions from the tests and FileCheck lines.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D94040
2021-01-09 01:48:06 -08:00
Heejin Ahn 7be271537e [WebAssembly] Rename wasm_rethrow_in_catch intrinsic/builtin
`wasm_rethrow_in_catch` intrinsic and builtin are used in order to
rethrow an exception when the exception is caught but there is no
matching clause within the current `catch`. For example,
```
try {
  foo();
} catch (int n) {
  ...
}
```
If the caught exception does not correspond to C++ `int` type, it should
be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch`
because at the time I thought there would be another intrinsic for C++'s
`throw` keyword, which rethrows an exception. It turned out that `throw`
keyword doesn't require wasm's `rethrow` instruction, so we rename
`rethrow_in_catch` to just `rethrow` here.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D94038
2021-01-08 06:55:04 -08:00
Simon Moll 611d3c63f3 [VP] ISD helper functions [VE] isel for vp_add, vp_and
This implements vp_add, vp_and for the VE target by lowering them to the
VVP_* layer. We also add helper functions for VP SDNodes (isVPSDNode,
getVPMaskIdx, getVPExplicitVectorLengthIdx).

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D93766
2021-01-08 14:29:45 +01:00
Simon Pilgrim 350ab7aa1c [DAG] Simplify OR(X,SHL(Y,BW/2)) eq/ne 0/-1 'all/any-of' style patterns
Attempt to simplify all/any-of style patterns that concatenate 2 smaller integers together into an and(x,y)/or(x,y) + icmp 0/-1 instead.

This is mainly to help some bool predicate reduction patterns where we end up concatenating bool vectors that have been bitcasted to integers.

Differential Revision: https://reviews.llvm.org/D93599
2021-01-07 12:03:19 +00:00
Simon Pilgrim 1307e3f6c4 [TargetLowering] Add icmp ne/eq (srl (ctlz x), log2(bw)) vector support. 2021-01-06 16:13:51 +00:00
Craig Topper 4ef91f5871 [DAGCombiner] Don't speculatively create an all ones constant in visitREM that might not be used.
This looks to have been done to save some duplicated code under
two different if statements, but it ends up being harmful to D94073.
This speculative constant can be called on a scalable vector type
with i64 element size when i64 scalars aren't legal. The code tries
and fails to find a vector type with i32 elements that it can use.

So only create the node when we know it will be used.
2021-01-05 12:45:57 -08:00
Fraser Cormack 9a1ac97d3a [CodeGen] Format SelectionDAG::getConstant methods (NFC) 2021-01-05 12:59:46 +00:00
Cameron McInally 92be640bd7 [FPEnv][AMDGPU] Disable FSUB(-0,X)->FNEG(X) DAGCombine when subnormals are flushed
This patch disables the FSUB(-0,X)->FNEG(X) DAG combine when we're flushing subnormals. It requires updating the existing AMDGPU tests to use the fneg IR instruction, in place of the old fsub(-0,X) canonical form, since AMDGPU is the only backend currently checking the DenormalMode flags.

Note that this will require follow-up optimizations to make sure the FSUB(-0,X) form is handled appropriately

Differential Revision: https://reviews.llvm.org/D93243
2021-01-04 14:44:10 -06:00
Juneyoung Lee 5cdf6ed744 [CodeGen] recognize select form of and/ors when splitting branch conditions
Recently a few patches are made to move towards using select i1 instead of and/or i1 to represent "a && b"/"a || b" in C/C++.
"a && b" in C/C++ does not evaluate b if a is false whereas 'and a, b' in IR evaluates b and uses its result regardless of the result of a.
This is problematic because it can cause miscompilation if b was an erroneous operation (https://llvm.org/pr48353).
In C/C++, the result is simply false because b is not evaluated, but in IR the result is poison.
The discussion at D93065 has more context about this.

This patch makes two branch-splitting optimizations (one in SelectionDAGBuilder, one in CodeGenPrepare) recognize
select form of and/or as well using m_LogicalAnd/Or.
Since it is CodeGen, I think this is semantically ok (at least as safe as what codegen already did).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93853
2021-01-01 04:46:10 +09:00
Kazu Hirata 7bc76fd0ec [CodeGen] Construct SmallVector with iterator ranges (NFC) 2020-12-31 09:39:11 -08:00
Kazu Hirata 1e3ed09165 [CodeGen] Use llvm::append_range (NFC) 2020-12-28 19:55:16 -08:00
Layton Kifer d29f93bda5 [DAGCombiner] Don't create sexts of deleted xors when they were in-visit replaced
Fixes a bug introduced by D91589.

When folding `(sext (not i1 x)) -> (add (zext i1 x), -1)`, we try to replace the not first when possible. If we replace the not in-visit, then the now invalidated node will be returned, and subsequently we will return an invalid sext. In cases where the not is replaced in-visit we can simply return SDValue, as the not in the current sext should have already been replaced.

Thanks @jgorbe, for finding the below reproducer.

The following reduced test case crashes clang when built with `clang -O1 -frounding-math`:

```
template <class> class a {
  int b() { return c == 0.0 ? 0 : -1; }
  int c;
};
template class a<long>;
```

A debug build of clang produces this "assertion failed" error:
```
clang: /home/jgorbe/code/llvm/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:264: void {anonymous}::DAGCombiner::AddToWorklist(llvm::
SDNode*): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed.
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93274
2020-12-23 16:16:26 -08:00
Bing1 Yu e8ade4569b [LegalizeType] When LegalizeType procedure widens a masked_gather, set MemoryType's EltNum equal to Result's EltNum
When LegalizeType procedure widens a masked_gather, set MemoryType's EltNum equal to Result's EltNum.

As I mentioned in https://reviews.llvm.org/D91092, in previous code, If we have a v17i32's masked_gather in avx512, we widen it to a v32i32's masked_gather with a v17i32's MemoryType. When the SplitVecRes_MGATHER process this v32i32's masked_gather, GetSplitDestVTs will assert fail since what you are going to split is v17i32.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D93610
2020-12-22 13:27:38 +08:00
Denis Antrushin 6f45049fb6 [Statepoints] Disable VReg lowering for values used on exception path of invoke.
Currently we lower invokes the same way as usual calls, e.g.:

V1 = STATEPOINT ... V (tied-def 0)

But this is incorrect is V1 is used on exceptional path.
By LLVM rules V1 neither dominates its uses in landing pad, nor
its live range is live on entry to landing pad. So compiler is
allowed to do various weird transformations like splitting live
range after statepoint and use split LR in catch block.

Until (and if) we find better solution to this problem, let's
use old lowering (spilling) for those values which are used on
exceptional path and allow VReg lowering for values used only
on normal path.

Differential Revision: https://reviews.llvm.org/D93449
2020-12-21 20:27:05 +07:00
Bjorn Pettersson a89d751fb4 Add intrinsics for saturating float to int casts
This patch adds support for the fptoui.sat and fptosi.sat intrinsics,
which provide basically the same functionality as the existing fptoui
and fptosi instructions, but will saturate (or return 0 for NaN) on
values unrepresentable in the target type, instead of returning
poison. Related mailing list discussion can be found at:
https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ

The intrinsics have overloaded source and result type and support
vector operands:

    i32 @llvm.fptoui.sat.i32.f32(float %f)
    i100 @llvm.fptoui.sat.i100.f64(double %f)
    <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f)
    // etc

On the SelectionDAG layer two new ISD opcodes are added,
FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands
and one result. The second operand is an integer constant specifying
the scalar saturation width. The idea here is that initially the
second operand and the scalar width of the result type are the same,
but they may change during type legalization. For example:

    i19 @llvm.fptsi.sat.i19.f32(float %f)
    // builds
    i19 fp_to_sint_sat f, 19
    // type legalizes (through integer result promotion)
    i32 fp_to_sint_sat f, 19

I went for this approach, because saturated conversion does not
compose well. There is no good way of "adjusting" a saturating
conversion to i32 into one to i19 short of saturating twice.
Specifying the saturation width separately allows directly saturating
to the correct width.

There are two baseline expansions for the fp_to_xint_sat opcodes. If
the integer bounds can be exactly represented in the float type and
fminnum/fmaxnum are legal, we can expand to something like:

    f = fmaxnum f, FP(MIN)
    f = fminnum f, FP(MAX)
    i = fptoxi f
    i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

If the bounds cannot be exactly represented, we expand to something
like this instead:

    i = fptoxi f
    i = select f ult FP(MIN), MIN, i
    i = select f ogt FP(MAX), MAX, i
    i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

It should be noted that this expansion assumes a non-trapping fptoxi.

Initial tests are for AArch64, x86_64 and ARM. This exercises all of
the scalar and vector legalization. ARM is included to test float
softening.

Original patch by @nikic and @ebevhan (based on D54696).

Differential Revision: https://reviews.llvm.org/D54749
2020-12-18 11:09:41 +01:00
Layton Kifer 385e9a2a04 [DAGCombiner] Improve shift by select of constant
Clean up a TODO, to support folding a shift of a constant by a
select of constants, on targets with different shift operand sizes.

Reviewed By: RKSimon, lebedev.ri

Differential Revision: https://reviews.llvm.org/D90349
2020-12-18 02:21:42 +00:00
Krasimir Georgiev e71a4cc207 fix a -Wunused-variable warning in release build 2020-12-17 11:52:00 +01:00
Simon Pilgrim cdb692ee0c [X86] Add X86ISD::SUBV_BROADCAST_LOAD and begin removing X86ISD::SUBV_BROADCAST (PR38969)
Subvector broadcasts are only load instructions, yet X86ISD::SUBV_BROADCAST treats them more generally, requiring a lot of fallback tablegen patterns.

This initial patch replaces constant vector lowering inside lowerBuildVectorAsBroadcast with direct X86ISD::SUBV_BROADCAST_LOAD loads which helps us merge a number of equivalent loads/broadcasts.

As well as general plumbing/analysis additions for SUBV_BROADCAST_LOAD, I needed to wrap SelectionDAG::makeEquivalentMemoryOrdering so it can handle result chains from non generic LoadSDNode nodes.

Later patches will continue to replace X86ISD::SUBV_BROADCAST usage.

Differential Revision: https://reviews.llvm.org/D92645
2020-12-17 10:25:25 +00:00
QingShan Zhang ebdd20f430 Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128
X86 and AArch64 expand it as libcall inside the target. And PowerPC also
want to expand them as libcall for P8. So, propose an implement in the
legalizer to common the logic and remove the code for X86/AArch64 to
avoid the duplicate code.

Reviewed By: Craig Topper

Differential Revision: https://reviews.llvm.org/D91331
2020-12-17 07:59:30 +00:00
Kazu Hirata 5501b92957 [IR, CodeGen] Use llvm::is_contained (NFC) 2020-12-16 21:30:44 -08:00
Qiu Chaofan 38b4442198 [NFC] [Legalizer] Use common method for expanding fp-to-int operands
Reviewed By: RKSimon, steven.zhang

Differential Revision: https://reviews.llvm.org/D92481
2020-12-15 10:45:40 +08:00
Matt Arsenault 2e0e03c6a0 OpaquePtr: Require byval on x86_intrcc parameter 0
Currently the backend special cases x86_intrcc and treats the first
parameter as byval. Make the IR require byval for this parameter to
remove this special case, and avoid the dependence on the pointee
element type.

Fixes bug 46672.

I'm not sure the IR is enforcing all the calling convention
constraints. clang seems to ignore the attribute for empty parameter
lists, but the IR tolerates it.
2020-12-14 16:34:37 -05:00
Kerry McLaughlin c5ced82c8e [SVE][CodeGen] Lower scalable floating-point vector reductions
Changes in this patch:
-  Minor changes to the LowerVECREDUCE_SEQ_FADD function added by @cameron.mcinally
   to also work for scalable types
- Added TableGen patterns for FP reductions with unpacked types (nxv2f16, nxv4f16 & nxv2f32)
- Asserts added to expandFMINNUM_FMAXNUM & expandVecReduceSeq for scalable types

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D93050
2020-12-14 11:45:42 +00:00
Kazu Hirata ee5b5b7a35 [CodeGen] Use llvm::erase_value (NFC) 2020-12-13 20:05:48 -08:00
Joe Ellis d863a0ddeb [SelectionDAG] Implement SplitVecOp_INSERT_SUBVECTOR
This function is needed for when it is necessary to split the subvector
operand of an llvm.experimental.vector.insert call. Splitting the
subvector operand means performing two insertions: one inserting the
lower part of the split subvector into the destination vector, and
another for inserting the upper part.

Through experimenting, it seems quite rare to need split the subvector
operand, but this is necessary to avoid assertion errors.

Differential Revision: https://reviews.llvm.org/D92760
2020-12-11 11:07:59 +00:00
Craig Topper a1ae3c6ac9 [RISCV][LegalizeDAG] Expand SETO and SETUO comparisons. Teach LegalizeDAG to expand SETUO expansion when UNE isn't legal.
If SETUNE isn't legal, UO can use the NOT of the SETO expansion.

Removes some complex isel patterns. Most of the test changes are
from using XORI instead of SEQZ.

Differential Revision: https://reviews.llvm.org/D92008
2020-12-10 09:15:52 -08:00
Justin Bogner e6a1187dd8 Limit the recursion depth of SelectionDAG::isSplatValue()
This method previously always recursively checked both the left-hand
side and right-hand side of binary operations for splatted (broadcast)
vector values to determine if the parent DAG node is a splat.

Like several other SelectionDAG methods, limit the recursion depth to
MaxRecursionDepth (6). This prevents stack overflow.
See also https://issuetracker.google.com/173785481

Patch by Nicolas Capens. Thanks!

Differential Revision: https://reviews.llvm.org/D92421
2020-12-09 10:35:07 -08:00
Kerry McLaughlin 05edfc5475 [SVE][CodeGen] Add DAG combines for s/zext_masked_gather
This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true:
 - fold (and (masked_gather x)) -> (zext_masked_gather x)
 - fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x)

LowerMGATHER has also been updated to fetch the LoadExtType associated with the
gather and also use this value to determine the correct masked gather opcode to use.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92230
2020-12-09 11:53:19 +00:00
Kerry McLaughlin 4519ff4b6f [SVE][CodeGen] Add the ExtensionType flag to MGATHER
Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode.
Also updated SelectionDAGDumper::print_details so that details of the gather
load (is signed, is scaled & extension type) are printed.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91084
2020-12-09 11:19:08 +00:00
Joe Ellis 80c33de2d3 [SelectionDAG] Add llvm.vector.{extract,insert} intrinsics
This commit adds two new intrinsics.

- llvm.experimental.vector.insert: used to insert a vector into another
  vector starting at a given index.

- llvm.experimental.vector.extract: used to extract a subvector from a
  larger vector starting from a given index.

The codegen work for these intrinsics has already been completed; this
commit is simply exposing the existing ISD nodes to LLVM IR.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D91362
2020-12-09 11:08:41 +00:00
Simon Moll 3ffbc79357 [VP] Build VP SDNodes
Translate VP intrinsics to VP_* SDNodes.  The tests check whether a
matching vp_* SDNode is emitted.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D91441
2020-12-09 11:36:51 +01:00
Huihui Zhang 8e6fc1f97e [AArch64][SVE] Add lowering for llvm.maxnum|minnum for scalable type.
LLVM intrinsic llvm.maxnum|minnum is overloaded intrinsic, can be used on any
floating-point or vector of floating-point type.
This patch extends current infrastructure to support scalable vector type.

This patch also fix a warning message of incorrect use of EVT::getVectorNumElements()
for scalable type, when DAGCombiner trying to split scalable vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92607
2020-12-08 09:35:53 -08:00
David Sherwood e22259fafe [SVE] Remove duplicate assert in DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR 2020-12-08 14:41:14 +00:00
Tim Northover c5978f42ec UBSAN: emit distinctive traps
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
2020-12-08 10:28:26 +00:00
Kai Luo 44bd8ea167 [DAGCombine][PowerPC] Simplify nabs by using legal `smin` operation
Convert `0 - abs(x)` to `smin (x, -x)` if `smin` is a legal operation.

Verification: https://alive2.llvm.org/ce/z/vpquFR

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92637
2020-12-08 03:24:07 +00:00
Simon Pilgrim b6e847c396 [DAG] Cleanup by folding some single use VT.getScalarSizeInBits() calls into its comparison. NFCI. 2020-12-07 18:23:54 +00:00
Kerry McLaughlin 111f559bbd [SVE][CodeGen] Call refineIndexType & refineUniformBase from visitMGATHER
The refineIndexType & refineUniformBase functions added by D90942 can also be used to
improve CodeGen of masked gathers.

These changes were split out from D91092

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92319
2020-12-07 13:20:19 +00:00
Kerry McLaughlin f6dd32fd35 [SVE][CodeGen] Lower scalable masked gathers
Lowers the llvm.masked.gather intrinsics (scalar plus vector addressing mode only)

Changes in this patch:
- Add custom lowering for MGATHER, using getGatherVecOpcode() to choose the appropriate
  gather load opcode to use.
- Improve codegen with refineIndexType/refineUniformBase, added in D90942
- Tests added for gather loads with 32 & 64-bit scaled & unscaled offsets.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91092
2020-12-07 12:20:41 +00:00
Bing1 Yu eee30a6dce [CodeGen] Modify the refineIndexType(...)'s code to fix a bug in D90942.
In previous code, when refineIndexType(...) is called and Index is undef, Index.getOperand(0) will raise a assertion fail.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D92548
2020-12-07 08:49:07 +08:00
Layton Kifer ac522f8700 [DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1)
Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets.

Differential Revision: https://reviews.llvm.org/D91589
2020-12-06 11:52:10 -05:00
Kazu Hirata a553ac9791 [CodeGen] llvm::erase_if (NFC) 2020-12-05 15:44:40 -08:00
Simon Pilgrim 9cf4f493a7 [DAG] Move SelectionDAG implementation to KnownBits::setInReg(). NFCI. 2020-12-04 18:09:08 +00:00
Simon Pilgrim 6f4ee6f870 [DAGCombiner] Use const APInt& for getConstantOperandAPInt results. NFCI.
Avoid unnecessary instantiation.

Noticed while removing unnecessary autos
2020-12-04 09:44:58 +00:00
dfukalov 2ce38b3f03 [NFC] Reduce include files dependency.
1. Removed #include "...AliasAnalysis.h" in other headers and modules.
2. Cleaned up includes in AliasAnalysis.h.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92489
2020-12-03 18:25:05 +03:00
Joe Ellis 78c0ea54a2 [DAGCombine] Fix TypeSize warning in DAGCombine::visitLIFETIME_END
Bail out early if we encounter a scalable store.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D92392
2020-12-03 12:12:41 +00:00
Kazu Hirata 7a4af2a8e7 [SelectionDAG] Use is_contained (NFC) 2020-12-02 19:09:45 -08:00
Hongtao Yu 24d4291ca7 [CSSPGO] Pseudo probes for function calls.
An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work.

One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution.

With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst.

To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against  the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use.

Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`.

Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91756
2020-12-02 13:45:20 -08:00
James Park 78b0ec3d1c Avoid redundant inline with LLVM_ATTRIBUTE_ALWAYS_INLINE
Fix MSVC warning when __forceinline is paired with inline.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D85264
2020-12-01 14:43:16 -08:00
David Blaikie 615f63e149 Revert "[FastISel] Flush local value map on ever instruction" and dependent patches
This reverts commit cf1c774d6a.

This change caused several regressions in the gdb test suite - at least
a sample of which was due to line zero instructions making breakpoints
un-lined. I think they're worth investigating/understanding more (&
possibly addressing) before moving forward with this change.

Revert "[FastISel] NFC: Clean up unnecessary bookkeeping"
This reverts commit 3fd39d3694.

Revert "[FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option"
This reverts commit a474657e30.

Revert "Remove static function unused after cf1c774."
This reverts commit dc35368ccf.

Revert "[lldb] Fix TestThreadStepOut.py after "Flush local value map on every instruction""
This reverts commit 53a14a47ee.
2020-12-01 14:26:23 -08:00
Layton Kifer d7fec38f05
[DAGCombiner][NFC] Replace duplicate implementation flipBoolean with DAG.getLogicalNOT
Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D92246
2020-12-01 22:23:04 +03:00
Benjamin Kramer 107e92dff8 [DAG] Remove unused variable. NFC. 2020-12-01 16:29:02 +01:00
Simon Pilgrim 1b209ff9e3 [DAG] Move vselect(icmp_ult, 0, sub(x,y)) -> usubsat(x,y) to DAGCombine (PR40111)
Move the X86 VSELECT->USUBSAT fold to DAGCombiner - there's nothing target specific about these folds.
2020-12-01 14:25:29 +00:00
Simon Pilgrim 6dbd0d36a1 [DAG] Move vselect(icmp_ult, -1, add(x,y)) -> uaddsat(x,y) to DAGCombine (PR40111)
Move the X86 VSELECT->UADDSAT fold to DAGCombiner - there's nothing target specific about these folds.

The SSE42 test diffs are relatively benign - its avoiding an extra constant load in exchange for an extra xor operation - there are extra register moves, which is annoying as all those operations should commute them away.

Differential Revision: https://reviews.llvm.org/D91876
2020-12-01 11:56:26 +00:00
Paul Robinson 3fd39d3694 [FastISel] NFC: Clean up unnecessary bookkeeping
Now that we flush the local value map for every instruction, we don't
need any extra flushes for specific cases.  Also, LastFlushPoint is
not used for anything.  Follow-ups to #dc35368 (D91734).

Differential Revision: https://reviews.llvm.org/D92338
2020-11-30 12:27:50 -08:00
Paul Robinson a474657e30 [FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option
This option is not used for anything after #dc35368 (D91734).
2020-11-30 10:55:49 -08:00
Francesco Petrogalli f6150aa41a [SelectionDAGBuilder] Update signature of `getRegsAndSizes()`.
The mapping between registers and relative size has been updated to
use TypeSize to account for the size of scalable EVTs.

The patch is a NFCI, if not for the fact that with this change the
function `getUnderlyingArgRegs` does not raise a warning for implicit
conversion of `TypeSize` to `unsigned` when generating machine code
from the test added to the patch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D92096
2020-11-30 17:38:51 +00:00
Craig Topper fa0f01a3c0 [RISCV][LegalizeTypes] Teach type legalizer that it can promote UMIN/UMAX using SExtPromotedInteger if that's better for the target.
If Sext is cheaper than Zext for a target, we can use that to promote the operands of UMIN/UMAX. Using sext just makes numbers with the sign bit set even larger when treated as an unsigned number and it has no effect on number without the sign bit set. So the relative order doesn't change. This is similar to what we already do for promoting SETCC.

This is helpful on RISCV where i32 arguments are sign extended on RV64 and many instructions are able to produce results with 33 sign bits.

Differential Revision: https://reviews.llvm.org/D92128
2020-11-27 11:37:25 -08:00
Simon Pilgrim 969918e177 [DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal
If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion.

Allows us to move the x86-specific lowering to the generic expansion code.

Differential Revision: https://reviews.llvm.org/D92183
2020-11-27 11:18:58 +00:00
QingShan Zhang 4d83aba422 [DAGCombine] Adding a hook to improve the precision of fsqrt if the input is denormal
For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will
have the impact the precision. As the fsqrt added belong to the cold path of the
cmp+branch, it won't impact the performance for normal inputs for PowerPC, but improve
the precision if the input is denormal.

Reviewed By: Spatel

Differential Revision: https://reviews.llvm.org/D80974
2020-11-27 02:10:55 +00:00
Nikita Popov 4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Simon Pilgrim 8057ebf4a0 Revert rG12d59b696b330 "[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal"
This reverts commit 12d59b696b.

Prematurely pushed this to trunk
2020-11-26 15:07:45 +00:00
Simon Pilgrim 12d59b696b [DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal
If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion.

Allows us to move the x86-specific lowering to the generic expansion code.
2020-11-26 14:47:28 +00:00
Kerry McLaughlin 4bee3197f6 [SVE][CodeGen] Extend isConstantSplatValue to support ISD::SPLAT_VECTOR
Updated the affected scalable_of_scalable tests in sve-gep.ll, as isConstantSplatValue now returns true in DAGCombiner::visitMUL and folds `(mul x, 1) -> x`

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91363
2020-11-26 11:19:40 +00:00
Craig Topper aea130f736 [LegalizerTypes] Add support for scalarizing the operand of an FP_EXTEND when the result type is legal. 2020-11-25 20:30:21 -08:00
Craig Topper 2d6042937b [SelectionDAGBuilder] Add SPF_NABS support to visitSelect
We currently don't match this which limits the effectiveness of D91120 until
InstCombine starts canonicalizing to llvm.abs. This should be easy to remove
if/when we remove the SPF_ABS handling.

Differential Revision: https://reviews.llvm.org/D92118
2020-11-25 14:54:26 -08:00
Paul Robinson dc35368ccf Remove static function unused after cf1c774.
Caused some -Werror bot failures.
2020-11-25 13:43:06 -05:00
Simon Pilgrim 9c86c5e8ad [DAG] Legalize abs(x) -> umin(x,sub(0,x)) iff umin/sub are legal
If umin() is legal, this is likely to result in smaller codegen expansion for abs(x) than the xor(add,ashr) method.

Followup to D92095

Alive2: https://alive2.llvm.org/ce/z/8nuX6s  https://alive2.llvm.org/ce/z/q2hB9w
2020-11-25 18:06:02 +00:00
Paul Robinson cf1c774d6a [FastISel] Flush local value map on ever instruction
Local values are constants or addresses that can't be folded into
the instruction that uses them. FastISel materializes these in a
"local value" area that always dominates the current insertion
point, to try to avoid materializing these values more than once
(per block).

https://reviews.llvm.org/D43093 added code to sink these local
value instructions to their first use, which has two beneficial
effects. One, it is likely to avoid some unnecessary spills and
reloads; two, it allows us to attach the debug location of the
user to the local value instruction. The latter effect can
improve the debugging experience for debuggers with a "set next
statement" feature, such as the Visual Studio debugger and PS4
debugger, because instructions to set up constants for a given
statement will be associated with the appropriate source line.

There are also some constants (primarily addresses) that could be
produced by no-op casts or GEP instructions; the main difference
from "local value" instructions is that these are values from
separate IR instructions, and therefore could have multiple users
across multiple basic blocks. D43093 avoided sinking these, even
though they were emitted to the same "local value" area as the
other instructions. The patch comment for D43093 states:

  Local values may also be used by no-op casts, which adds the
  register to the RegFixups table. Without reversing the RegFixups
  map direction, we don't have enough information to sink these
  instructions.

This patch undoes most of D43093, and instead flushes the local
value map after(*) every IR instruction, using that instruction's
debug location. This avoids sometimes incorrect locations used
previously, and emits instructions in a more natural order.

This does mean materialized values are not re-used across IR
instruction boundaries; however, only about 5% of those values
were reused in an experimental self-build of clang.

(*) Actually, just prior to the next instruction. It seems like
it would be cleaner the other way, but I was having trouble
getting that to work.

Differential Revision: https://reviews.llvm.org/D91734
2020-11-25 13:05:00 -05:00
Simon Pilgrim 0637dfe88b [DAG] Legalize abs(x) -> smax(x,sub(0,x)) iff smax/sub are legal
If smax() is legal, this is likely to result in smaller codegen expansion for abs(x) than the xor(add,ashr) method.

This is also what PowerPC has been doing for its abs implementation, so it lets us get rid of a load of custom lowering code there (and which was never updated when they added smax lowering).

Alive2: https://alive2.llvm.org/ce/z/xRk3cD

Differential Revision: https://reviews.llvm.org/D92095
2020-11-25 15:03:03 +00:00
QingShan Zhang 9c588f53fc [DAGCombine] Add hook to allow target specific test for sqrt input
PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root.
LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to
target that has test instructions.

Reviewed By: Spatel, Chen Zheng, Qiu Chao Fang

Differential Revision: https://reviews.llvm.org/D80706
2020-11-25 05:37:15 +00:00
Kai Luo 8e6d92026c [DAG][PowerPC] Fix dropped `nsw` flag in `SimplifySetCC` by adding `doesNodeExist` helper
`SimplifySetCC` invokes `getNodeIfExists` without passing `Flags` argument and `getNodeIfExists` uses a default `SDNodeFlags` to intersect the original flags, as a consequence, flags like `nsw` is dropped. Added a new helper function `doesNodeExist` to check if a node exists without modifying its flags.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D89938
2020-11-25 04:39:03 +00:00
Hsiangkai Wang 8d06a678a5 [SelectionDAG] Avoid aliasing analysis if the object size is unknown.
If the size of memory access is unknown, do not use it to analysis. One
example of unknown size memory access is to load/store scalable vector
objects on the stack.

Differential Revision: https://reviews.llvm.org/D91833
2020-11-25 06:13:37 +08:00
Thomas Preud'homme 9c8af93c93 Add support for STRICT_FSETCC promotion
Add missing handling of STRICT_FSETCC promotion. This prevents assert
failure in llvm::TargetLoweringBase::getTypeToPromoteTo().

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D91962
2020-11-24 16:53:49 +00:00
Kai Luo 5931be60b5 [DAGCombine][PowerPC] Convert negated abs to trivial arithmetic ops
This patch converts `0 - abs(x)` to `Y = sra (X, size(X)-1); sub (Y, xor (X, Y))` for better codegen.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D91120
2020-11-24 09:43:35 +00:00
Simon Pilgrim 791040cd8b [DAG] LowerMINMAX - move default expansion to generic TargetLowering::expandIntMINMAX
This is part of the discussion on D91876 about trying to reduce custom lowering of MIN/MAX ops on older SSE targets - if we can improve generic vector expansion we should be able to relax the limitations in SelectionDAGBuilder when it will let MIN/MAX ops be generated, and avoid having to flag so many ops as 'custom'.
2020-11-22 13:02:27 +00:00
Kazu Hirata c2309ff3d5 [SelectionDAG] Remove unused declaration ExpandStrictFPOp (NFC)
ExpandStrictFPOp started taking two parameters instead of one on Jan
10, 2020 in commit f678fc7660, but the
declaration for the single-perameter version has remained since.
2020-11-21 22:29:44 -08:00
Hongtao Yu d0e42037bf [CSSPGO] MIR target-independent pseudo instruction for pseudo-probe intrinsic
This change introduces a MIR target-independent pseudo instruction corresponding to the IR intrinsic llvm.pseudoprobe for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

An `llvm.pseudoprobe` intrinsic call will be lowered into a target-independent operation named `PSEUDO_PROBE`. Given the following instrumented IR,

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}
```
the corresponding MIR is shown below. Note that block `bb3` is duplicated into `bb1` and `bb2` where its probe is duplicated too. This allows for an accurate execution count to be collected for `bb3`, which is basically the sum of the counts of `bb1` and `bb2`.

```
bb.0.bb0:
   frame-setup PUSH64r undef $rax, implicit-def $rsp, implicit $rsp
   TEST32rr killed renamable $edi, renamable $edi, implicit-def $eflags
   PSEUDO_PROBE 837061429793323041, 1, 0
   $edi = MOV32ri 1, debug-location !13; test.c:0
   JCC_1 %bb.1, 4, implicit $eflags

bb.2.bb2:
   PSEUDO_PROBE 837061429793323041, 3, 0
   PSEUDO_PROBE 837061429793323041, 4, 0
   $rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
   RETQ

bb.1.bb1:
   PSEUDO_PROBE 837061429793323041, 2, 0
   PSEUDO_PROBE 837061429793323041, 4, 0
   $rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
   RETQ
```

The target op PSEUDO_PROBE will be converted into a piece of binary data by the object emitter with no machine instructions generated. This is done in a different patch.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86495
2020-11-20 10:52:43 -08:00
Craig Topper a7eae62a42 [SelectionDAG][X86][PowerPC][Mips] Replace the default implementation of LowerOperationWrapper with the X86 and PowerPC version.
The default version only works if the returned node has a single
result. The X86 and PowerPC versions support multiple results
and allow a single result to be returned from a node with
multiple outputs. And allow a single result that is not result 0
of the node.

Also replace the Mips version since the new version should work
for it. The original version handled multiple results, but only
if the new node and original node had the same number of results.

Differential Revision: https://reviews.llvm.org/D91846
2020-11-20 10:06:53 -08:00
Pavel Iliin 4d7df43ffd [AArch64] Out-of-line atomics (-moutline-atomics) implementation.
This patch implements out of line atomics for LSE deployment
mechanism. Details how it works can be found in llvm/docs/Atomics.rst
Options -moutline-atomics and -mno-outline-atomics to enable and disable it
were added to clang driver. This is clang and llvm part of out-of-line atomics
interface, library part is already supported by libgcc. Compiler-rt
support is provided in separate patch.

Differential Revision: https://reviews.llvm.org/D91157
2020-11-20 13:30:12 +00:00
Kazu Hirata 2583d8eb08 [CodeGen] Use llvm::is_contained (NFC) 2020-11-19 22:07:56 -08:00
Nikita Popov 393b9e9db3 [MemLoc] Require LocationSize argument (NFC)
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
2020-11-19 21:45:52 +01:00
Leonard Chan a97f62837f [llvm][IR] Add dso_local_equivalent Constant
The `dso_local_equivalent` constant is a wrapper for functions that represents a
value which is functionally equivalent to the global passed to this. That is, if
this accepts a function, calling this constant should have the same effects as
calling the function directly. This could be a direct reference to the function,
the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the
resolved function as a call target.

When lowered, the returned address must have a constant offset at link time from
some other symbol defined within the same binary. The address of this value is
also insignificant. The name is leveraged from `dso_local` where use of a function
or variable is resolved to a symbol in the same linkage unit.

In this patch:
- Addition of `dso_local_equivalent` and handling it
- Update Constant::needsRelocation() to strip constant inbound GEPs and take
  advantage of `dso_local_equivalent` for relative references

This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959)
which makes vtables readonly. This works by replacing the dynamic relocations for
function pointers in them with static relocations that represent the offset between
the vtable and virtual functions. If a function is externally defined,
`dso_local_equivalent` can be used as a generic wrapper for the function to still
allow for this static offset calculation to be done.

See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details.

Differential Revision: https://reviews.llvm.org/D77248
2020-11-19 10:26:17 -08:00
Florian Hahn 1983acce7c
[SelDAGBuilder] Do not require simple VTs for constraints.
In some cases, the values passed to `asm sideeffect` calls cannot be
mapped directly to simple MVTs. Currently, we crash in the backend if
that happens. An example can be found in the @test_vector_too_large_r_m
test case, where we pass <9 x float> vectors. In practice, this can
happen in cases like the simple C example below.

using vec = float __attribute__((ext_vector_type(9)));
void f1 (vec m) {
  asm volatile("" : "+r,m"(m) : : "memory");
}

One case that use "+r,m" constraints for arbitrary data types in
practice is google-benchmark's DoNotOptimize.

This patch updates visitInlineAsm so that it use MVT::Other for
constraints with complex VTs. It looks like the rest of the backend
correctly deals with that and properly legalizes the type.

And we still report an error if there are no registers to satisfy the
constraint.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D91710
2020-11-19 09:31:54 +00:00
Kerry McLaughlin 306c8ab208 [SVE][CodeGen] Improve codegen of scalable masked scatters
If the scatter store is able to perform the sign/zero extend of
its index, this is folded into the instruction with refineIndexType().
Additionally, refineUniformBase() will return the base pointer and index
from an add + splat_vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90942
2020-11-13 11:19:36 +00:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Simon Pilgrim 8996742741 [KnownBits] Add KnownBits::makeConstant helper. NFCI.
Helper for cases where we need to create a KnownBits from a (fully known) constant value.
2020-11-12 16:16:04 +00:00
Pavel Iliin fdb979cfbb [NFC] [Legalize] Fix spaces and style. 2020-11-11 19:24:15 +00:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
Kerry McLaughlin 170947a5de [SVE][CodeGen] Lower scalable masked scatters
Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only)

Changes included in this patch:
 - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use.
    Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts.
 - Added the getCanonicalIndexType function to convert redundant addressing
   modes (e.g. scaling is redundant when accessing bytes)
 - Tests with 32 & 64-bit scaled & unscaled offsets

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90941
2020-11-11 11:50:22 +00:00
Kerry McLaughlin ffbbfc76ca [SVE][CodeGen] Add the isTruncatingStore flag to MSCATTER
This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter().
Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print
the details of masked scatters (is truncating, signed or scaled).

This is the first in a series of patches which adds support for scalable masked scatters

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90939
2020-11-11 10:58:24 +00:00
Chen Zheng 09e34048bf [SelectionDAG] fminnum should be a binary operator
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D91163
2020-11-11 03:41:40 -05:00
David Zarzycki a41ea782c8 [SelectionDAG] Enable CTPOP optimization fine tuning
Add a TLI hook to allow SelectionDAG to fine tune the conversion of CTPOP to a chain of "x & (x - 1)" when CTPOP isn't legal.

A subsequent patch will attempt to fine tune the X86 code gen.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89952
2020-11-09 13:49:01 -05:00
Paul Robinson 920befb337 [FastISel] Reduce spills around mem-intrinsic calls
FastISel generates instructions to materialize "local values" at the
top of a block, in the hope that these values could be reused within
the block.  To reduce spills and restores, FastISel treats calls as
sub-block boundaries, flushing the "local value map" at each call.

This patch treats the mem* intrinsics as if they were calls, because
at O0 generally they are calls.  Eliminating these spills/restores is
actually better for debugging (especially a "continue at this line"
command), code size, stack frame size, and maybe even performance.

Differential Revision: https://reviews.llvm.org/D90877
2020-11-09 09:45:14 -08:00
David Zarzycki 57e46e7123 [SelectionDAG] NFC: Hoist is legal check
This was requested during the code review of D89952.
2020-11-09 10:55:15 -05:00
Francesco Petrogalli fc2fe6817e [llvm][AArch64] Simplify (and (sign_extend..) #bitmask).
Fold

    VT = (and (sign_extend NarrowVT to VT) #bitmask)

into

    VT = (zero_extend NarrowVT)

With this combine, the test replaces a sign extended load + an
unsigned extention with a zero extended load to render one of the
operands of the last multiplication.

  BEFORE                       |  AFTER
    f_i16_i32:                 |    f_i16_i32:
         .fnstart              |           .fnstart
         ldrsh   r0, [r0]      |           ldrh    r1, [r1]
         ldrsh   r1, [r1]      |           ldrsh   r0, [r0]
         smulbb  r0, r1, r0    |           smulbb  r0, r0, r1
         uxth    r1, r1        |           mul     r0, r0, r1
         mul     r0, r0, r1    |           bx      lr
         bx      lr            |

Reviewed By: resistor

Differential Revision: https://reviews.llvm.org/D90605
2020-11-09 12:53:36 +00:00
Craig Topper 98d7e583db [LegalizeTypes] Remove unnecessary if around switch in ScalarizeVectorOperand and SplitVectorOperand. NFC
The if was checking !Res.getNode() but that's always true since
Res was initialized to SDValue() and not touched before the if.

This appears to be a leftover from a previous implementation of
Custom legalization where Res was updated instead of returning
immediately.
2020-11-05 11:00:51 -08:00
Simon Pilgrim bf04e34383 [DAG] computeKnownBits - Replace ISD::SREM handling with KnownBits::srem to reduce code duplication 2020-11-05 17:12:58 +00:00
Simon Pilgrim e237d56b43 [KnownBits] Move ValueTracking/SelectionDAG UREM KnownBits handling to KnownBits::urem. NFCI.
Both these have the same implementation - so move them to a single KnownBits copy.

GlobalISel will be able to use this as well with minimal effort.
2020-11-05 14:30:59 +00:00
Simon Pilgrim 32bee18b84 [KnownBits] Move ValueTracking/SelectionDAG UDIV KnownBits handling to KnownBits::udiv. NFCI.
Both these have the same implementation - so move them to a single KnownBits copy.

GlobalISel will be able to use this as well with minimal effort.
2020-11-05 13:42:42 +00:00
Cameron McInally c126eb7529 [SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL
Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247.

Differential Revision: https://reviews.llvm.org/D90644
2020-11-04 14:20:31 -06:00
Fraser Cormack f99580c1e5 [DAGCombine] Fix bug in load scalarization
Summary:
For vector element types which are not byte-sized, we would generate
incorrect scalar offsets and produce incorrect codegen.

This optimization could potentially be supported in the future, e.g. by
loading in bytes, then shifting and masking out the remaining bits of
the vector element. However, without an upstream target to test against
it's best to avoid the bad codegen in the simplest possible way.

Related to this bug:

  https://bugs.llvm.org/show_bug.cgi?id=27600

Reviewed by: foad

Differential Revision: https://reviews.llvm.org/D78568
2020-11-04 19:02:40 +00:00
Kerry McLaughlin f2412d372d [SVE][CodeGen] Lower scalable integer vector reductions
This patch uses the existing LowerFixedLengthReductionToSVE function to also lower
scalable vector reductions. A separate function has been added to lower VECREDUCE_AND
& VECREDUCE_OR operations with predicate types using ptest.

Lowering scalable floating-point reductions will be addressed in a follow up patch,
for now these will hit the assertion added to expandVecReduce() in TargetLowering.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D89382
2020-11-04 11:38:49 +00:00
Simon Pilgrim 825e517e34 [DAG] computeKnownBits - Replace ISD::MUL handling with the common KnownBits::computeForMul implementation 2020-11-04 11:32:08 +00:00
Simon Pilgrim e9b88c754a [DAG] computeKnownBits - Move ISD::SRA handling into KnownBits::ashr
As discussed on D90527, we should be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.
2020-11-03 18:09:33 +00:00
Simon Pilgrim cb798f040a [DAG] computeKnownBits - Move (most) ISD::SRL handling into KnownBits::lshr
As discussed on D90527, we should be be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.

The refactor to use the KnownBits fixed/min/max constant helpers allows us to hit a couple of cases that we were missing before.

We still need the getValidMinimumShiftAmountConstant case as KnownBits doesn't handle per-element vector cases.
2020-11-03 17:30:36 +00:00
Simon Pilgrim cab21d4fa8 [DAG] computeKnownBits - Move (most) ISD::SHL handling into KnownBits::shl
As discussed on D90527, we should be be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.

The refactor to use the KnownBits fixed/min/max constant helpers allows us to hit a couple of cases that we were missing before.

We still need the getValidMinimumShiftAmountConstant case as KnownBits doesn't handle per-element vector cases.
2020-11-03 14:22:28 +00:00
Qiu Chaofan 1f852ba853 [PowerPC] Avoid unnecessary fadd for unsigned to ppcf128
Unsigned 32-bit or shorter integer to ppcf128 conversion are currently
expanded as signed-to-double with an extra fadd to 'complement'. But on
PowerPC we have native instruction to directly convert unsigned to
double since ISA v2.06. This patch exploits it.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D89786
2020-11-01 23:22:47 +08:00
Cameron McInally dda1e74b58 [Legalize] Add legalizations for VECREDUCE_SEQ_FADD
Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass.

Differential Revision: https://reviews.llvm.org/D90247
2020-10-30 16:02:55 -05:00
Nikita Popov 6c2ad4cf87 [SDAG] Extract helper to determine neutral element (NFC)
Make the existing VECREDUCE based code more generic, but expressing
it in terms of the neutral value of the base opcode instead.
2020-10-29 22:05:06 +01:00
Nikita Popov a5f172927d [SDAG] Fix neutral value for vecreduce_fadd
The neutral value for FADD is -0.0, not 0.0, so this is what we
need to pad vectors with.
2020-10-29 21:27:59 +01:00
Nikita Popov 91bf172088 [SDAG] Extract helper to get vecreduce base opcode (NFC) 2020-10-29 20:22:22 +01:00
Simon Pilgrim f53d7f55f1 [DAG] Move canFoldInAddressingMode before foldBinOpIntoSelect. NFC.
Reduces the diff in D90113.
2020-10-28 12:16:05 +00:00
Sven van Haastregt 5d03080092 [TargetLowering] Add i1 condition for bit comparison fold
For i1 types, boolean false is represented identically regardless of
the boolean content, so we can allow optimizations that otherwise
would not be correct for booleans with false represented as a negative
one.

Patch by Erik Hogeman.

Differential Revision: https://reviews.llvm.org/D90145
2020-10-27 12:22:20 +00:00
Peter Waller 5b742a0c10 [SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination
The modified code in visitSTORE was missing a scalable vector check, and still
using the now deprecated implicit cast of TypeSize to uint64_t through the
overloaded operator. This patch fixes these issues.

This brings the logic in line with the comment on the context line immediately
above the added precondition.

Add a test in sve-redundant-store.ll that the warning is not triggered.

Differential Revision: https://reviews.llvm.org/D89701
2020-10-26 16:37:48 +00:00
Peter Waller 6536d6040f Revert "[SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination"
This reverts commit 4604441386.

Reverting because it was not the intended version of the patch, which
follows this patch.
2020-10-26 16:37:00 +00:00
Peter Waller 4604441386 [SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination
The modified code in visitSTORE was missing a scalable vector check, and still
using the now deprecated implicit cast of TypeSize to uint64_t through the
overloaded operator. This patch fixes these issues.

This brings the logic in line with the comment on the context line immediately
above the added precondition.

Add a test in Redundantstores.ll that the warning is not triggered.
2020-10-26 16:23:42 +00:00
Simon Pilgrim ce356e1546 [DAG] Add BuildVectorSDNode::getRepeatedSequence helper to recognise multi-element splat patterns
Replace the X86 specific isSplatZeroExtended helper with a generic BuildVectorSDNode method.

I've just used this to simplify the X86ISD::BROADCASTM lowering so far (and remove isSplatZeroExtended), but we should be able to use this in more places to lower to complex broadcast patterns.

Differential Revision: https://reviews.llvm.org/D87930
2020-10-24 12:23:09 +01:00
Simon Pilgrim 62b17a7697 [LegalizeTypes] Legalize vector rotate operations
Lower vector rotate operations as long as the legalization occurs outside of LegalizeVectorOps.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47320

Patch By: @rsanthir.quic (Ryan Santhirarajan)

Differential Revision: https://reviews.llvm.org/D89497
2020-10-24 11:30:32 +01:00
Cameron McInally a1cc274cb3 [SVE] Lower fixed length VECREDUCE_SEQ_FADD operation
Differential Revision: https://reviews.llvm.org/D89162
2020-10-23 16:24:02 -05:00
Denis Antrushin 4f7ee55971 Revert "[Statepoints] Allow deopt GC pointer on VReg if gc-live bundle is empty."
Downstream testing revealed some problems with this patch.
Reverting while investigating.
This reverts commit 2b96dcebfa.
2020-10-23 21:55:06 +07:00
Craig Topper 9e884169a2 [FPEnv][X86][SystemZ] Use different algorithms for i64->double uint_to_fp under strictfp to avoid producing -0.0 when rounding toward negative infinity
Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes.

There are still problems with unsigned i32 conversions too which I'll try to fix in another patch.

Fixes part of PR47393

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87115
2020-10-21 18:12:54 -07:00
Gaurav Jain 4634ad6c0b [NFC] Set return type of getStackPointerRegisterToSaveRestore to Register
Differential Revision: https://reviews.llvm.org/D89858
2020-10-21 16:19:38 -07:00
Simon Pilgrim 88523f6f4b [DAG] getNode(ISD::EXTRACT_SUBVECTOR) Drop unnecessary N2C null check - we assert that this isn't null and have already used the pointer. NFCI.
Fixes cppcheck + null dereference warning.
2020-10-21 11:53:44 +01:00
Sven van Haastregt bfc961aeb2 [TargetLowering] Check boolean content when folding bit compare
Updates an optimization that relies on boolean contents being either 0
or 1 to properly check for this before triggering.

The following:
  (X & 8) != 0 --> (X & 8) >> 3
Produces unexpected results when a boolean 'true' value is represented
by negative one.

Patch by Erik Hogeman.

Differential Revision: https://reviews.llvm.org/D89390
2020-10-21 11:46:55 +01:00
David Sherwood 5b17b323a6 [SVE][CodeGen] Replace use of TypeSize comparison operator in CreateStackTemporary
We were previously relying upon the TypeSize comparison operators to
obtain the maximum size of two types, however use of such operators is
being deprecated in favour of making the caller aware that it could
be dealing with scalable vector types. I have changed the code to assert
that the two types have the same scalable property and thus we can
simply take the maximum of the known minimum sizes instead.

Differential Revision: https://reviews.llvm.org/D88563
2020-10-21 08:31:36 +01:00
Qiu Chaofan 1b2fe71ecf [DAGCombiner] Tighten reasscociation of visitFMA
From LangRef, FMF contract should not enable reassociating to form
arbitrary contractions. So it should not help rearrange nodes like
(fma (fmul x, c1), c2, y) into (fma x, c1*c2, y).

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89527
2020-10-20 10:13:01 +08:00
Craig Topper edd0cb11bd [SelectionDAG][X86] Enable SimplifySetCC CTPOP transforms for vector splats
This enables these transforms for vectors:
(ctpop x) u< 2 -> (x & x-1) == 0
(ctpop x) u> 1 -> (x & x-1) != 0
(ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0)
(ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)

All enabled if CTPOP isn't Legal. This differs from the scalar
behavior where the first two are done unconditionally and the
last two are done if CTPOP isn't Legal or Custom. The Legal
check produced better results for vectors based on X86's
custom handling. Might be worth re-visiting scalars here.

I disabled the looking through truncate for vectors. The
code that creates new setcc can use the same result VT as the
original setcc even if we truncated the input. That may work
work for most scalars, but definitely wouldn't work for vectors
unless it was a vector of i1.

Fixes or at least improves PR47825

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89346
2020-10-19 12:56:59 -07:00
Amy Kwan 6a946fd06f [DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use isOperationLegalOrCustom directly instead.
MULH is often expanded on targets.
This patch removes the isMulhCheaperThanMulShift hook and uses
isOperationLegalOrCustom instead.

Differential Revision: https://reviews.llvm.org/D80485
2020-10-19 12:23:04 -05:00
David Sherwood 3945b69e81 [SVE][CodeGen] Replace more TypeSize comparison operators with their scalar equivalents
In certain places in llvm/lib/CodeGen we were relying upon the TypeSize
comparison operators when in fact the code was only ever expecting
either scalar values or fixed width vectors. This patch changes a few
functions that were always expecting to work on scalar or fixed width
types:

1. DAGCombiner::mergeTruncStores - deals with scalar integers only.
2. DAGCombiner::ReduceLoadWidth - not valid for vectors.
3. DAGCombiner::createBuildVecShuffle - should only be used for
   fixed width vectors.
4. SelectionDAGLegalize::ExpandFCOPYSIGN and
   SelectionDAGLegalize::getSignAsIntValue - only work on scalars.

Differential Revision: https://reviews.llvm.org/D88562
2020-10-19 08:38:50 +01:00
David Sherwood 35a531fb45 [SVE][CodeGen][NFC] Replace TypeSize comparison operators with their scalar equivalents
In certain places in llvm/lib/CodeGen we were relying upon the TypeSize
comparison operators when in fact the code was only ever expecting
either scalar values or fixed width vectors. I've changed some of these
places to use the equivalent scalar operator.

Differential Revision: https://reviews.llvm.org/D88482
2020-10-19 08:30:31 +01:00
David Sherwood f693f915a0 [SVE][CodeGen] Replace uses of TypeSize comparison operators
In certain places in the code we can never end up in a situation where
we're mixing fixed width and scalable vector types. For example,
we can't have truncations and extends that change the lane count. Also,
in other places such as GenWidenVectorStores and GenWidenVectorLoads we
know from the behaviour of FindMemType that we can never choose a vector
type with a different scalable property.

In various places I have used EVT::bitsXY functions instead of
TypeSize::isKnownXY, where it probably makes sense to keep an assert
that scalable properties match.

Differential Revision: https://reviews.llvm.org/D88654
2020-10-19 08:08:41 +01:00
Craig Topper 278bd06891 [TargetLowering] Extract simplifySetCCs ctpop into a separate function. NFCI
As requested in D89346. This allows us to add some early outs.

I reordered some checks a little bit to make the more common bail outs happen earlier. Like checking opcode before checking hasOneUse. And I moved the bit width check to make sure it was safe to look through a truncate to the spot where we look through truncates instead of after.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89494
2020-10-16 19:47:56 -07:00
Denis Antrushin 8f0ddd4a1a [Statepoints] Remove MI limit on number of tied operands.
After D87915 statepoint can have more than 15 tied operands.
Remove this restriction from statepoint lowering code.
2020-10-15 19:02:38 +07:00
Craig Topper 50c9f1e11d [TargetLowering] Replace Log2_32_Ceil with Log2_32 in SimplifySetCC ctpop combine.
This combine can look through (trunc (ctpop X)). When doing this
it tries to make sure the trunc doesn't lose any information
from the ctpop. It does this by checking that the truncated type
has more bits that Log2_32_Ceil of the ctpop type. The Ceil is
unnecessary and pessimizes non-power of 2 types.

For example, ctpop of i256 requires 9 bits to represent the max
value of 256. But ctpop of i255 only requires 8 bits to represent
the max result of 255. Log2_32_Ceil of 256 and 255 both return 8
while Log2_32 returns 8 for 256 and 7 for 255

The code with popcnt enabled is a regression for this test case,
but it does match what already happens with i256 truncated to i9.
Since power of 2 is more likely, I don't think it should block
this change.

Differential Revision: https://reviews.llvm.org/D89412
2020-10-15 01:05:07 -07:00
Jeremy Morse c4e7857d4e [DebugInstrRef] Create DBG_INSTR_REFs in SelectionDAG
When given the -experimental-debug-variable-locations option (via -Xclang
or to llc), have SelectionDAG generate DBG_INSTR_REF instructions instead
of DBG_VALUE. For now, this only happens in a limited circumstance: when
the value referred to is not a PHI and is defined in the current block.
Other situations introduce interesting problems, addresed in later patches.

Practically, this patch hooks into InstrEmitter and if it can find a
defining instruction for a value, gives it an instruction number, and
points the DBG_INSTR_REF at that <instr, operand> pair.

Differential Revision: https://reviews.llvm.org/D85747
2020-10-14 14:24:08 +01:00
Craig Topper 1687a8d83b [X86][SelectionDAG] Add SADDO_CARRY and SSUBO_CARRY to support multipart signed add/sub overflow legalization.
This passes existing X86 test but I'm not sure if it handles all type
legalization cases it needs to.

Alternative to D89200

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D89222
2020-10-12 23:18:29 -07:00
Simon Pilgrim c252200e4d [DAG][ARM][MIPS][RISCV] Improve funnel shift promotion to use 'double shift' patterns
Based on a discussion on D88783, if we're promoting a funnel shift to a width at least twice the size as the original type, then we can use the 'double shift' patterns (shifting the concatenated sources).

Differential Revision: https://reviews.llvm.org/D89139
2020-10-12 14:11:02 +01:00
David Sherwood c5ba0d33cc [SVE] Make ElementCount and TypeSize use a new PolySize class
I have introduced a new template PolySize class, where the template
parameter determines the type of quantity, i.e. for an element
count this is just an unsigned value. The ElementCount class is
now just a simple derivation of PolySize<unsigned>, whereas TypeSize
is more complicated because it still needs to contain the uint64_t
cast operator, since there are still many places in the code that
rely upon this implicit cast. As such the class also still needs
some of it's own operators.

I've tried to minimise the amount of code in the base PolySize
class, which led to a couple of changes:

1. In some places we were relying on '==' operator comparisons
between ElementCounts and the scalar value 1. I didn't put this
operator in the new PolySize class, and thought it was actually
clearer to use the isScalar() function instead.
2. I removed the isByteSized function and replaced it with calls
to isKnownMultipleOf(8).

I've also renamed NextPowerOf2 to be coefficientNextPowerOf2 so
that it's more consistent with coefficientDivideBy.

Differential Revision: https://reviews.llvm.org/D88409
2020-10-12 08:23:38 +01:00
Krzysztof Parzyszek 61eaa2e14a [SDAG] Remember to set UndefElts in isSplatValue for SPLAT_VECTOR 2020-10-10 19:42:24 -05:00
Denis Antrushin 2b96dcebfa [Statepoints] Allow deopt GC pointer on VReg if gc-live bundle is empty.
Currently we allow passing pointers from deopt bundle on VReg only if
they were seen in list of gc-live pointers passed on VRegs.
This means that for the case of empty gc-live bundle we spill deopt
bundle's pointers. This change allows lowering deopt pointers to VRegs
in case of empty gc-live bundle. In case of non-empty gc-live bundle,
behavior does not change.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D88999
2020-10-10 14:58:08 +07:00
Esme-Yi e9fd8823ba [DAGCombiner] Add decomposition patterns for Mul-by-Imm.
Summary: This patch is derived from D87384.
In this patch we expand the existing decomposition of mul-by-constant to be more general by implementing 2 patterns:
```
  mul x, (2^N + 2^M) --> (add (shl x, N), (shl x, M))
  mul x, (2^N - 2^M) --> (sub (shl x, N), (shl x, M))
```
The conversion will be trigged if the multiplier is a big constant that the target can't use a single multiplication instruction to handle. This is controlled by the hook `decomposeMulByConstant`.
More over, the conversion benefits from an ILP improvement since the instructions are independent. A case with the sequence like following also gets benefit since a shift instruction is saved.

```
*res1 = a * 0x8800;
*res2 = a * 0x8080;
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D88201
2020-10-09 08:51:40 +00:00
Fangrui Song c3de9a9e69 Fix incorect Register -> MCRegister conversion
getReg returns a Register which may represent a virtual register.
2020-10-08 21:40:48 -07:00
Amara Emerson e72cfd938f Rename the VECREDUCE_STRICT_{FADD,FMUL} SDNodes to VECREDUCE_SEQ_{FADD,FMUL}.
The STRICT was causing unnecessary confusion. I think SEQ is a more accurate
name for what they actually do, and the other obvious option of "ORDERED"
has the issue of already having a meaning in FP contexts.

Differential Revision: https://reviews.llvm.org/D88791
2020-10-07 10:45:09 -07:00
Amara Emerson 322d0afd87 [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787
2020-10-07 10:36:44 -07:00
Jay Foad 1aa8e6a51a [SDag] SimplifyDemandedBits: simplify to FP constant if all bits known
We were already doing this for integer constants. This patch implements
the same thing for floating point constants.

Differential Revision: https://reviews.llvm.org/D88570
2020-10-07 09:24:38 +01:00
Denis Antrushin c08d48fc2d [Statepoints] Change statepoint machine instr format to better suit VReg lowering.
Current Statepoint MI format is this:

   STATEPOINT
   <id>, <num patch bytes >, <num call arguments>, <call target>,
   [call arguments...],
   <StackMaps::ConstantOp>, <calling convention>,
   <StackMaps::ConstantOp>, <statepoint flags>,
   <StackMaps::ConstantOp>, <num deopt args>, [deopt args...],
   <gc base/derived pairs...> <gc allocas...>

Note that GC pointers are listed in pairs <base,derived>.
This causes base pointers to appear many times (at least twice) in
instruction, which is bad for us when VReg lowering is ON.
The problem is that machine operand tiedness is 1-1 relation, so
it might look like this:

  %vr2 = STATEPOINT ... %vr1, %vr1(tied-def0)

Since only one instance of %vr1 is tied, that may lead to incorrect
codegen (see PR46917 for more details), so we have to always spill
base pointers. This mostly defeats new VReg lowering scheme.

This patch changes statepoint instruction format so that every
gc pointer appears only once in operand list. That way they all can
be tied. Additional set of operands is added to preserve base-derived
relation required to build stackmap.
New statepoint has following format:

  STATEPOINT
  <id>, <num patch bytes>, <num call arguments>, <call target>,
  [call arguments...],
  <StackMaps::ConstantOp>, <calling convention>,
  <StackMaps::ConstantOp>, <statepoint flags>,
  <StackMaps::ConstantOp>, <num deopt args>, [deopt args...],
  <StackMaps::ConstantOp>, <num gc pointers>, [gc pointers...],
  <StackMaps::ConstantOp>, <num gc allocas>,  [gc allocas...]
  <StackMaps::ConstantOp>, <num entries in gc map>, [base/derived indices...]

Changes are:
  - every gc pointer is listed only once in a flat length-prefixed list;
  - alloca list is prefixed with its length too;
  - following alloca list is length-prefixed list of base-derived
    indices of pointers from gc pointer list. Note that indices are
    logical (number of pointer), not absolute (index of machine operand).

Differential Revision: https://reviews.llvm.org/D87154
2020-10-06 17:40:29 +07:00
David Sherwood 4ed47d50ea [SVE][CodeGen] Fix DAGCombiner::ForwardStoreValueToDirectLoad for scalable vectors
In DAGCombiner::ForwardStoreValueToDirectLoad I have fixed up some
implicit casts from TypeSize -> uint64_t and replaced calls to
getVectorNumElements() with getVectorElementCount(). There are some
simple cases of forwarding that we can definitely support for
scalable vectors, i.e. when the store and load are both scalable
vectors and have the same size. I have added tests for the new
code paths here:

  CodeGen/AArch64/sve-forward-st-to-ld.ll

Differential Revision: https://reviews.llvm.org/D87098
2020-10-06 08:04:03 +01:00
Craig Topper 1127662c6d [SelectionDAG] Make sure FMF are propagated when getSetcc canonicalizes FP constants to RHS.
getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize
FP constants to the RHS. When this happens we should create the node
with the FMF that was requested. By using FlagInserter when can ensure
any calls to getNode/getSetcc during canonicalization will also get the flags.

Differential Revision: https://reviews.llvm.org/D88063
2020-10-05 14:55:23 -07:00
David Sherwood fa0293081d [SVE][CodeGen] Fix TypeSize/ElementCount related warnings in sve-split-store.ll
I have fixed up a number of warnings resulting from TypeSize -> uint64_t
casts and calling getVectorNumElements() on scalable vector types. I
think most of the changes are fairly trivial except for those in
DAGTypeLegalizer::SplitVecRes_MSTORE I've tried to ensure we create
the MachineMemoryOperands in a sensible way for scalable vectors.

I have added a CHECK line to the following test:

 CodeGen/AArch64/sve-split-store.ll

that ensures no new warnings are added.

Differential Revision: https://reviews.llvm.org/D86928
2020-10-05 19:27:00 +01:00
Qiu Chaofan b326d4ff94 [SelectionDAG] Don't remove unused negated constant immediately
This reverts partial of a2fb5446 (actually, 2508ef01) about removing
negated FP constant immediately if it has no uses. However, as discussed
in bug 47517, there're cases when NegX is folded into constant from
other places while NegY is removed by that line of code and NegX is
equal to NegY. In these cases, NegX is deleted before used and crash
happens. So revert the code and add necessary test case.
2020-10-06 01:16:45 +08:00
Sanjay Patel 2ccbf3dbd5 [SDAG] fold x * 0.0 at node creation time
In the motivating case from https://llvm.org/PR47517
we create a node that does not get constant folded
before getNegatedExpression is attempted from some
other node, and we crash.

By moving the fold into SelectionDAG::simplifyFPBinop(),
we get the constant fold sooner and avoid the problem.
2020-10-04 11:31:57 -04:00
Denis Antrushin 7b19cd06d7 [Statepoints][ISEL] visitGCRelocate: chain to current DAG root.
This is similar to D87251, but for CopyFromRegs nodes.
Even for local statepoint uses we generate CopyToRegs/CopyFromRegs
nodes.  When generating CopyFromRegs in visitGCRelocate, we must chain
to current DAG root, not EntryNode, to ensure proper ordering of copy
w.r.t. statepoint node producing result for it.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D88639
2020-10-02 21:41:22 +07:00
David Sherwood b8ce6a6756 [SVE][CodeGen] Add new EVT/MVT getFixedSizeInBits() functions
When we know that a particular type is always going to be fixed
width we have so far been writing code like this:

  getSizeInBits().getFixedSize()

Since we are doing this in quite a few places now it seems to make
sense to add a new helper function that allows us to replace
these calls with a single getFixedSizeInBits() call.

Differential Revision: https://reviews.llvm.org/D88649
2020-10-02 07:47:31 +01:00
Kerry McLaughlin fcf70e1e3b [SVE][CodeGen] Lower scalable fp_extend & fp_round operations
This patch adds FP_EXTEND_MERGE_PASSTHRU & FP_ROUND_MERGE_PASSTHRU
ISD nodes, used to lower scalable vector fp_extend/fp_round operations.
fp_round has an additional argument, the 'trunc' flag, which is an integer of zero or one.

This also fixes a warning introduced by the new tests added to sve-split-fcvt.ll,
resulting from an implicit TypeSize -> uint64_t cast in SplitVecOp_FP_ROUND.

Reviewed By: sdesmalen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88321
2020-10-01 12:17:37 +01:00
Krzysztof Parzyszek db04bec5f1 [SDAG] Do not convert undef to 0 when folding CONCAT/BUILD_VECTOR
Differential Revision: https://reviews.llvm.org/D88273
2020-09-29 09:12:26 -05:00
Jay Foad 781edd501c [SDag] Verify DAG divergence after dumping. NFC.
When debugging, it's useful to be able to see the DAG that has just
failed divergence verification.
2020-09-29 14:05:07 +01:00
Jay Foad d6b04f3937 [SDag] Refactor and simplify divergence calculation and checking. NFC. 2020-09-29 14:05:07 +01:00
David Sherwood bafdd11326 [SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy
After some recent upstream discussion we decided that it was best
to avoid having the / operator for both ElementCount and TypeSize,
since this could give the impression that these classes can be used
in the same way as basic integer integer types. However, division
for scalable types is a bit odd because we are only dividing the
minimum quantity by a value, as opposed to something like:

  (MinSize * Vscale) / SomeValue

This is why when performing division it's important the caller
first establishes whether the operation makes sense, perhaps by
calling isKnownMultipleOf() prior to division. The caller must now
explictly call divideCoefficientBy() on the class to perform the
operation.

Differential Revision: https://reviews.llvm.org/D87700
2020-09-28 08:03:00 +01:00
Nikita Popov f229bf2e12 [Legalize][X86] Improve nnan fmin/fmax vector reduction
Use +/-Inf or +/-Largest as neutral element for nnan fmin/fmax
reductions. This avoids dropping any FMF flags. Preserving the
nnan flag in particular is important to get a good lowering on X86.

Differential Revision: https://reviews.llvm.org/D87586
2020-09-27 10:47:35 +02:00
Simon Pilgrim a61272a900 [DAG] Fold vector mul(x,0)/mul(x,1) to a clearing mask
If we're multiplying all elements of a vector by '0' or '1' then we can more efficiently perform this as a clearing mask (that is likely to further simplify to a shuffle blend).

This was noticed when reviewing D87502 but seems to help idiv/irem by constant cases even more as '0'/'1' values are often used for 'passthrough' cases.

Differential Revision: https://reviews.llvm.org/D88225
2020-09-26 14:31:57 +01:00
Qiu Chaofan c0f8e4c06c [SelectionDAG] Add guard to automatically insert flags
This is like FastMathFlagGuard in IR. Since we use SDAG instance to get
values, it's with SelectionDAG. By creating a FlagInserter in current
scope, all values created by getNode will get the flags if no Flags
argument provided.

In this patch, I applied it to floating point operations folding part in
DAG combiner, and removed Flags passing to getNode to show its effect.
Other places in DAG combiner and other helper methods similar to getNode
also need this. They can be done in follow-up patches.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D87361
2020-09-26 13:57:52 +08:00
Dávid Bolvanský 179e15d53a [SystemZ] Optimize bcmp calls (PR47420)
Solves https://bugs.llvm.org/show_bug.cgi?id=47420

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87988
2020-09-25 17:55:39 +02:00
Bill Wendling 0c0c57f7b2 Revert "[CodeGen] Postprocess PHI nodes for callbr"
Accidental commit.

This reverts commit 7f4c940bd0.
2020-09-24 14:35:23 -07:00
Bill Wendling 7f4c940bd0 [CodeGen] Postprocess PHI nodes for callbr
When processing PHI nodes after a callbr, we need to make sure that the
PHI nodes on the default branch are resolved after the callbr
(inserted after INLINEASM_BR). The PHI node values on the indirect
branches are processed before the INLINEASM_BR.

Differential Revision: https://reviews.llvm.org/D86260
2020-09-24 14:34:28 -07:00
Eli Friedman 3f739f736b [SelectionDAG][GISel] Make LegalizeDAG lower FNEG using integer ops.
Previously, if a floating-point type was legal, but FNEG wasn't legal,
we would use FSUB.  Instead, we should use integer ops, to preserve the
semantics.  (Alternatively, there's a compiler-rt call we could use, but
there isn't much reason to use that.)

It turns out we actually are still using this obscure codepath in a few
cases: on some targets, we have "legal" floating-point types that don't
actually support any floating-point operations.  In particular, ARM and
AArch64 are using this path.

The implementation for SelectionDAG is pretty simple because we can
reuse the infrastructure from FCOPYSIGN.

See also 9a3dc3e, the corresponding change to type legalization.

Also includes a "bonus" change to STRICT_FSUB legalization, so we can
lower a STRICT_FSUB to a float libcall.

Includes the changes to both LegalizeDAG and GlobalISel so we don't have
inconsistent results in the future.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46792 .

Differential Revision: https://reviews.llvm.org/D84287
2020-09-23 14:10:33 -07:00
David Sherwood e077367a28 [SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits
An existing function Type::getScalarSizeInBits returns a uint64_t
instead of a TypeSize class because the caller is requesting a
scalar size, which cannot be scalable. This patch makes other
similar functions requesting a scalar size consistent with that,
thereby eliminating more than 1000 implicit TypeSize -> uint64_t
casts.

Differential revision: https://reviews.llvm.org/D87889
2020-09-23 09:20:08 +01:00
Simon Pilgrim 4dada8d617 [DAG] Remove DAGTypeLegalizer::GenWidenVectorTruncStores (PR42046)
Just scalarize trunc stores - GenWidenVectorTruncStores does the same thing but is flawed (PR42046) and unused.

Differential Revision: https://reviews.llvm.org/D87708
2020-09-22 17:24:45 +01:00
Denis Antrushin ee86688b81 [Statepoints][ISEL] gc.relocate uniquification should be based on SDValue, not IR Value.
When exporting statepoint results to virtual registers we try to avoid
generating exports for duplicated inputs. But we erroneously use
IR Value* to check if inputs are duplicated. Instead, we should use
SDValue, because even different IR values can get lowered to the same
SDValue.
I'm adding a (degenerate) test case which emphasizes importance of this
feature for invoke statepoints.
If we fail to export only unique values we will end up with something
like that:

  %0 = STATEPOINT
  %1 = COPY %0

landing_pad:
  <use of %1>

And when exceptional path is taken, %1 is left uninitialized (COPY is never
execute).

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D87695
2020-09-21 19:44:46 +07:00
Alexander Belyaev 17dc729bd4 Revert "[NFC][ScheduleDAG] Remove unused EntrySU SUnit"
This reverts commit 0345d88de6.

Google internal backend uses EntrySU, we are looking into removing
dependency on it.

Differential Revision: https://reviews.llvm.org/D88018
2020-09-21 13:33:05 +02:00
Lucas Prates 53d238a961 [CodeGen] Fixing inconsistent ABI mangling of vlaues in SelectionDAGBuilder
SelectionDAGBuilder was inconsistently mangling values based on ABI
Calling Conventions when getting them through copyFromRegs in
SelectionDAGBuilder, causing duplicate value type convertions for
function arguments. The checking for the mangling requirement was based
on the value's originating instruction and was performed outside of, and
inspite of, the regular Calling Convention Lowering.

The issue could be observed in a scenario such as:

```
%arg1 = load half, half* %const, align 2
%arg2 = call fastcc half @someFunc()
call fastcc void @otherFunc(half %arg1, half %arg2)
; Here, %arg2 was incorrectly mangled twice, as the CallConv data from
; the call to @someFunc() was taken into consideration for the check
; when getting the value for processing the call to @otherFunc(...),
; after the proper convertion had taken place when lowering the return
; value of the first call.
```

This patch fixes the issue by disregarding the Calling Convention
information for such copyFromRegs, making sure the ABI mangling is
properly contanined in the Calling Convention Lowering.

This fixes Bugzilla #47454.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87844
2020-09-21 10:05:34 +01:00
Qiu Chaofan 1d782c2987 [PowerPC] Pass nofpexcept flag to custom lowered constrained ops
This is a follow-up of D86605. For strict DAG FP node, if its FP
exception behavior metadata is ignore, it should have nofpexcept flag.
But during custom lowering, this flag isn't passed down.

This is also seen on X86 target.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87390
2020-09-21 10:44:25 +08:00
Fangrui Song 6913812abc Fix some clang-tidy bugprone-argument-comment issues 2020-09-19 20:41:25 -07:00
Francis Visoiu Mistrih 0345d88de6 [NFC][ScheduleDAG] Remove unused EntrySU SUnit
EntrySU doesn't seem to be used at all when building the ScheduleDAG.

Differential Revision: https://reviews.llvm.org/D87867
2020-09-18 09:50:47 -07:00
Simon Pilgrim d967aaa8fa [DAG] BuildVectorSDNode::getSplatValue - pull out repeated getNumOperands() calls. NFCI. 2020-09-18 16:10:23 +01:00
Qiu Chaofan a2fb5446be [SelectionDAG] Check any use of negation result before removal
2508ef01 fixed a bug about constant removal in negation. But after
sanitizing check I found there's still some issue about it so it's
reverted.

Temporary nodes will be removed if useless in negation. Before the
removal, they'd be checked if any other nodes used it. So the removal
was moved after getNode. However in rare cases the node to be removed is
the same as result of getNode. We missed that and will be fixed by this
patch.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D87614
2020-09-17 16:00:54 +08:00
Craig Topper e30371d99d [DAGCombiner] Teach visitMSTORE to replace an all ones mask with an unmasked store.
Similar to what done in D87788 for MLOAD.

Again I've skipped indexed, truncating, and compressing stores.
2020-09-16 16:42:22 -07:00
Craig Topper 89ee4c0314 [DAGCombiner] Teach visitMLOAD to replace an all ones mask with an unmasked load
If we have an all ones mask, we can just a regular masked load. InstCombine already gets this in IR. But the all ones mask can appear after type legalization.

Only avx512 test cases are affected because X86 backend already looks for element 0 and the last element being 1. It replaces this with an unmasked load and blend. The all ones mask is a special case of that where the blend will be removed. That transform is only enabled on avx2 targets. I believe that's because a non-zero passthru on avx2 already requires a separate blend so its more profitable to handle mixed constant masks.

This patch adds a dedicated all ones handling to the target independent DAG combiner. I've skipped extending, expanding, and index loads for now. X86 doesn't use index so I don't know much about it. Extending made me nervous because I wasn't sure I could trust the memory VT had the right element count due to some weirdness in vector splitting. For expanding I wasn't sure if we needed different undef handling.

Differential Revision: https://reviews.llvm.org/D87788
2020-09-16 13:21:16 -07:00
Sebastian Neubauer 833b3b0d3a [AMDGPU] Add v3f16/v3i16 support to SDag
Fix lowering and instruction selection for v3x16 types
and enable InstCombine to emit them.

This patch only implements it for the selection dag.
GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and
GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work.

Differential Revision: https://reviews.llvm.org/D84420
2020-09-16 17:20:27 +02:00
Simon Pilgrim 3f682611ab [DAG] Remover getOperand() call. NFCI. 2020-09-16 11:18:58 +01:00
Qiu Chaofan e1669843f2 Revert "[SelectionDAG] Remove unused FP constant in getNegatedExpression"
2508ef01 doesn't totally fix the issue since we did not handle the case
when unused temporary negated result is the same with the result, which
is found by address sanitizer.
2020-09-15 22:03:50 +08:00
Simon Pilgrim 1abb4461ea StatepointLowering.cpp - remove unnecessary includes. NFCI.
These are all directly included in StatepointLowering.h
2020-09-15 12:18:23 +01:00
Simon Pilgrim bee79cdcc6 SelectionDAGBuilder.h - remove unnecessary includes. NFCI.
Reduce to forward declarations and move implicit dependencies down to the cpp files.
2020-09-15 12:18:22 +01:00
Qiu Chaofan 2508ef014e [SelectionDAG] Remove unused FP constant in getNegatedExpression
960cbc53 immediately removes nodes that won't be used to avoid
compilation time explosion. This patch adds the removal to constants to
fix PR47517.

Reviewed By: RKSimon, steven.zhang

Differential Revision: https://reviews.llvm.org/D87614
2020-09-15 17:59:10 +08:00
Craig Topper c193a689b4 [SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore.
The versions that take 'unsigned' will be removed in the future.

I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87592
2020-09-14 13:54:50 -07:00
Craig Topper 4208ea3e19 [FastISel] Bail out of selectGetElementPtr for vector GEPs.
The code that decomposes the GEP into ADD/MUL doesn't work properly
for vector GEPs. It can create bad COPY instructions or possibly
assert.

For now just bail out to SelectionDAG.

Fixes PR45906
2020-09-14 12:53:06 -07:00
Nikita Popov 53f36f06af [Legalize][ARM][X86] Add float legalization for VECREDUCE
This adds SoftenFloatRes, PromoteFloatRes and SoftPromoteHalfRes
legalizations for VECREDUCE, to fill the remaining hole in the SDAG
legalization. These legalizations simply expand the reduction and
let it be recursively legalized. For the PromoteFloatRes case at
least it is possible to do better than that, but it's pretty tricky
(because we need to consider the interaction of three different
vector legalizations and the type promotion) and probably not
really worthwhile.

I haven't added ExpandFloatRes support, as I am not familiar with
ppc_fp128.

Differential Revision: https://reviews.llvm.org/D87569
2020-09-14 20:42:09 +02:00
Nikita Popov 8e69c3cde8 [DAGCombiner] Fold fmin/fmax with INF / FLT_MAX
Similar to D87415, this folds the various float min/max opcodes
with a constant INF or -INF operand, or FLT_MAX / -FLT_MAX operand
if the ninf flag is set. Some of the folds are only possible under
nnan.

The fminnum(X, INF) with nnan and fmaxnum(X, -INF) with nnan cases
are needed to improve the VECREDUCE_FMIN/FMAX lowerings on X86,
the rest is here for the sake of completeness.

Differential Revision: https://reviews.llvm.org/D87571
2020-09-14 19:59:33 +02:00
Simon Pilgrim 00e5676cf6 [LegalizeDAG] Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI. 2020-09-14 11:09:43 +01:00
David Sherwood 15bff4dec4 [CodeGen] Fix bug in IncrementPointer
In an earlier patch I meant to add the correct flags to the ADD
node when incrementing the pointer, but forgot to pass them to
SelectionDAG::getNode.

Differential Revision: https://reviews.llvm.org/D87496
2020-09-14 08:03:55 +01:00
Craig Topper 56b33391d3 [SelectionDAG] Move ISD:PARITY formation from DAGCombine to SimplifyDemandedBits.
Previously, we formed ISD::PARITY by looking for (and (ctpop X), 1)
but the AND might be separated from the ctpop. For example if the
parity result is multiplied by 2, we'll pull the AND through the
shift.

So to handle more cases, move to SimplifyDemandedBits where we
can handle more cases that result in only the LSB of the CTPOP
being used.
2020-09-13 21:04:13 -07:00
Qiu Chaofan a4c5351986 [DAGCombiner] Propagate FMF flags in FMA folding
DAG combiner folds (fma a 1.0 b) into (fadd a b) but the flag isn't
propagated into new fadd. This patch fixes that.

Some code in visitFMA is redundant and such support for vector constants
is missing. Need follow-up patch to clean.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D87037
2020-09-14 00:19:06 +08:00
Craig Topper 61d29e0dff [LegalizeTypes] Remove a few cases from SplitVectorOperand that should never happen. NFC
CTTZ, CTLZ, CTPOP, and FCANONICALIZE all have the same input and
output types so the operand should have already been legalized when the
result type was legalized.
2020-09-12 20:59:14 -07:00
Craig Topper ad3d6f993d [SelectionDAG][X86][ARM][AArch64] Add ISD opcode for __builtin_parity. Expand it to shifts and xors.
Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop
isn't natively supported by the target, this leads to poor codegen
due to the expansion of ctpop being more complex than what is needed
for parity.

This adds a DAG combine to convert the pattern to ISD::PARITY
before operation legalization. Type legalization is updated
to handled Expanding and Promoting this operation. If after type
legalization, CTPOP is supported for this type, LegalizeDAG will
turn it back into CTPOP+AND. Otherwise LegalizeDAG will emit a
series of shifts and xors followed by an AND with 1.

I've avoided vectors in this patch to avoid more legalization
complexity for this patch.

X86 previously had a custom DAG combiner for this. This is now
moved to Custom lowering for the new opcode. There is a minor
regression in vector-reduce-xor-bool.ll, but a follow up patch
can easily fix that.

Fixes PR47433

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87209
2020-09-12 11:42:18 -07:00
Sanjay Patel 3a8ea8609b [Intrinsics] define semantics for experimental fmax/fmin vector reductions
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

This is hopefully the final remaining showstopper before we can remove
the 'experimental' from the reduction intrinsics.

No behavior was specified for the FP min/max reductions, so we have a
mess of different interpretations.

There are a few potential options for the semantics of these max/min ops.
I think this is the simplest based on current behavior/implementation:
make the reductions inherit from the existing llvm.maxnum/minnum intrinsics.
These correspond to libm fmax/fmin, and those are similar to the (now
deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing
data). So the default expansion creates calls to libm functions.

Another option would be to inherit from llvm.maximum/minimum (NaNs propagate),
but most targets just crash in codegen when given those nodes because no
default expansion was ever implemented AFAICT.

We could also just assume 'nnan' semantics by default (we are already
assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets
(AArch64, PowerPC) support the more defined behavior, so it doesn't make much
sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to
loosen the semantics.

(Note that D67507 was proposed to update the LangRef to acknowledge the more
recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do
update based on the new standard, the reduction instructions can seamlessly
inherit from whatever updates are made to the max/min intrinsics.)

x86 sees a regression here on 'nnan' tests because we have underlying,
longstanding bugs in FMF creation/propagation. Those need to be fixed apart
from this change (for example: https://llvm.org/PR35538). The expansion
sequence before this patch may not have been correct.

Differential Revision: https://reviews.llvm.org/D87391
2020-09-12 09:10:28 -04:00
Jay Foad 517202c720 [TargetLowering] Fix comments describing XOR -> OR/AND transformations 2020-09-10 13:56:34 +01:00
Kerry McLaughlin cd89f5c91b [SVE][CodeGen] Legalisation of truncate for scalable vectors
Truncating from an illegal SVE type to a legal type, e.g.
`trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>`
fails after PromoteIntOp_CONCAT_VECTORS attempts to
create a BUILD_VECTOR.

This patch changes the promote function to create a sequence of
INSERT_SUBVECTORs if the return type is scalable, and replaces
these with UNPK+UZP1 for AArch64.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D86548
2020-09-10 11:35:33 +01:00
Nikita Popov 0a5dc7effb [DAGCombiner] Fold fmin/fmax of NaN
fminnum(X, NaN) is X, fminimum(X, NaN) is NaN. This mirrors the
behavior of existing InstSimplify folds.

This is expected to improve the reduction lowerings in D87391,
which use NaN as a neutral element.

Differential Revision: https://reviews.llvm.org/D87415
2020-09-09 23:53:32 +02:00
Ulrich Weigand 1a25133bcd [DAGCombine] Skip re-visiting EntryToken to avoid compile time explosion
During the main DAGCombine loop, whenever a node gets replaced, the new
node and all its users are pushed onto the worklist.  Omit this if the
new node is the EntryToken (e.g. if a store managed to get optimized
out), because re-visiting the EntryToken and its users will not uncover
any additional opportunities, but there may be a large number of such
users, potentially causing compile time explosion.

This compile time explosion showed up in particular when building the
SingleSource/UnitTests/matrix-types-spec.cpp test-suite case on any
platform without SIMD vector support.

Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86963
2020-09-09 19:13:46 +02:00
Denis Antrushin 4358fa782e [Statepoints] Update DAG root after emitting statepoint.
Since we always generate CopyToRegs for statepoint results,
we must update DAG root after emitting statepoint, so that
these copies are scheduled before any possible local uses.
Note: getControlRoot() flushes all PendingExports, not only
those we generates for relocates. If that'll become a problem,
we can change it to flushing relocate exports only.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D87251
2020-09-09 20:22:10 +07:00
Simon Pilgrim d816499f95 [KnownBits] Move SelectionDAG::computeKnownBits ISD::ABS handling to KnownBits::abs
Move the ISD::ABS handling to a KnownBits::abs handler, to simplify future implementations in ValueTracking/GlobalISel.
2020-09-09 13:22:58 +01:00
Denis Antrushin 2a52c3301a [Statepoints] Properly handle const base pointer.
Current code in InstEmitter assumes all GC pointers are either
VRegs or stack slots - hence, taking only one operand.
But it is possible to have constant base, in which case it
occupies two machine operands.

Add a convinience function to StackMaps to get index of next
meta argument and use it in InsrEmitter to properly advance to
the next statepoint meta operand.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D87252
2020-09-09 14:07:00 +07:00
Craig Topper 844e94a502 [SelectionDAGBuilder] Remove Unnecessary FastMathFlags temporary. Use SDNodeFlags instead. NFCI
This was a missed simplication in D87200
2020-09-08 15:50:12 -07:00
Craig Topper b1e68f885b [SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.:
This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130.

This required adding a SDNodeFlags to SelectionDAG::getSetCC.

Now we manage to contant fold some stuff undefs during the
initial getNode that we don't do in later DAG combines.

Differential Revision: https://reviews.llvm.org/D87200
2020-09-08 15:27:21 -07:00
Jonas Paulsson 6dc3e22b57 [DAGTypeLegalizer] Handle ZERO_EXTEND of promoted type in WidenVecRes_Convert.
On SystemZ, a ZERO_EXTEND of an i1 vector handled by WidenVecRes_Convert()
always ended up being scalarized, because the type action of the input is
promotion which was previously an unhandled case in this method.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47132.

Differential Revision: https://reviews.llvm.org/D86268

Patch by Eli Friedman.
Review: Ulrich Weigand
2020-09-08 16:49:51 +02:00
Craig Topper da79b1eecc [SelectionDAG][X86][ARM] Teach ExpandIntRes_ABS to use sra+add+xor expansion when ADDCARRY is supported.
Rather than using SELECT instructions, use SRA, UADDO/ADDCARRY and
XORs to expand ABS. This is the multi-part version of the sequence
we use in LegalizeDAG.

It's also the same as the Custom sequence uses for i64 on 32-bit
and i128 on 64-bit. So we can remove the X86 customization.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D87215
2020-09-07 13:15:26 -07:00
Sanjay Patel 7a06b166b1 [DAGCombiner] allow more store merging for non-i8 truncated ops
This is a follow-up suggested in D86420 - if we have a pair of stores
in inverted order for the target endian, we can rotate the source
bits into place.
The "be_i64_to_i16_order" test shows a limitation of the current
function (which might be avoided if we integrate this function with
the other cases in mergeConsecutiveStores). In the earlier
"be_i64_to_i16" test, we skip the first 2 stores because we do not
match the full set as consecutive or rotate-able, but then we reach
the last 2 stores and see that they are an inverted pair of 16-bit
stores. The "be_i64_to_i16_order" test alters the program order of
the stores, so we miss matching the sub-pattern.

Differential Revision: https://reviews.llvm.org/D87112
2020-09-07 14:12:36 -04:00
Simon Wallis 79ea83e104 [SelectionDAG] memcpy expansion of const volatile struct ignores const zero
In getMemcpyLoadsAndStores(), a memcpy where the source is a zero constant is expanded to a MemOp::Set instead of a MemOp::Copy, even when the memcpy is volatile.
This is incorrect.

The fix is to add a check for volatile, and expand to MemOp::Copy in the volatile case.

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D87134
2020-09-07 13:22:09 +01:00
Simon Pilgrim e57cbcbdc1 LegalizeTypes.h - remove orphan SplitVSETCC declaration. NFCI.
The implementation no longer exists
2020-09-07 13:11:49 +01:00
Jay Foad 5350e1b509 [KnownBits] Implement accurate unsigned and signed max and min
Use the new implementation in ValueTracking, SelectionDAG and
GlobalISel.

Differential Revision: https://reviews.llvm.org/D87034
2020-09-07 09:09:01 +01:00
Jonas Paulsson 714ceefad9 [SelectionDAG] Always intersect SDNode flags during getNode() node memoization.
Previously SDNodeFlags::instersectWith(Flags) would do nothing if Flags was
in an undefined state, which is very bad given that this is the default when
getNode() is called without passing an explicit SDNodeFlags argument.

This meant that if an already existing and reused node had a flag which the
second caller to getNode() did not set, that flag would remain uncleared.

This was exposed by https://bugs.llvm.org/show_bug.cgi?id=47092, where an NSW
flag was incorrectly set on an add instruction (which did in fact overflow in
one of the two original contexts), so when SystemZElimCompare removed the
compare with 0 trusting that flag, wrong-code resulted.

There is more that needs to be done in this area as discussed here:

Differential Revision: https://reviews.llvm.org/D86871

Review: Ulrich Weigand, Sanjay Patel
2020-09-05 10:30:38 +02:00
David Sherwood 73a3d350a4 [SVE][CodeGen] Fix up warnings in sve-split-insert/extract tests
I have fixed up some more ElementCount/TypeSize related warnings in
the following tests:

  CodeGen/AArch64/sve-split-extract-elt.ll
  CodeGen/AArch64/sve-split-insert-elt.ll

In SelectionDAG::CreateStackTemporary we were relying upon the implicit
cast from TypeSize -> uint64_t when calling MachineFrameInfo::CreateStackObject.
I've fixed this by passing in the known minimum size instead, which I
believe is fine because the associated stack id indicates whether this
is a scalable object or not.

I've also fixed up a case in TargetLowering::SimplifyDemandedBits when
extracting a vector element from a scalable vector. The result is a scalar,
hence it wasn't caught at the start of the function. If the vector is
scalable we just bail out for now.

Differential Revision: https://reviews.llvm.org/D86431
2020-09-04 09:51:31 +01:00
Simon Pilgrim 1673a08044 SelectionDAG.h - remove unnecessary FunctionLoweringInfo.h include. NFCI.
Use forward declarations and move the include down to dependent files that actually use it.

This also exposes a number of implicit dependencies on KnownBits.h
2020-09-03 18:33:25 +01:00
Jay Foad 099c089d4b [APInt] New member function setBitVal
Differential Revision: https://reviews.llvm.org/D87033
2020-09-02 21:40:31 +01:00
Paul Walker f72121254d [SVE] Don't reorder subvector/binop sequences when the resulting binop is not legal.
When lowering fixed length vector operations for SVE the subvector
operations are used extensively to marshall data between scalable
and fixed-length vectors. This means that sequences like:

  extract_subvec(binop(insert_subvec(a), insert_subvec(b)))

are very common. DAGCombine only checks if the resulting binop is
legal or can be custom lowered when undoing such sequences. When
it's custom lowering that is introducing them the result is an
infinite legalise->combine->legalise loop.

This patch extends the isOperationLegalOr... functions to include
a "LegalOnly" parameter to restrict the check to legal operations
only. Although isOperationLegal could be used it's common for
the affected code paths to be visited pre and post legalisation,
so the extra parameter keeps the code tidy.

Differential Revision: https://reviews.llvm.org/D86450
2020-09-02 11:01:33 +01:00
Sander de Smalen f13beac51b [AArch64][SVE] Preserve full vector regs over EH edge.
Unwinders may only preserve the lower 64bits of Neon and SVE registers,
as only the registers in the base ABI are guaranteed to be preserved
over the exception edge. The caller will need to preserve additional
registers for when the call throws an exception and the unwinder has
tried to recover state.

For  e.g.

    svint32_t bar(svint32_t);
    svint32_t foo(svint32_t x, bool *err) {
      try { bar(x); } catch (...) { *err = true; }
      return x;
    }

`z0` needs to be spilled before the call to `bar(x)` and reloaded before
returning from foo, as the exception handler may have clobbered z0.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84737
2020-09-02 10:54:18 +01:00
Cameron McInally cfe2b81710 [SVE] Update INSERT_SUBVECTOR DAGCombine to use getVectorElementCount().
A small piece of the project to replace getVectorNumElements() with getVectorElementCount().

Differential Revision: https://reviews.llvm.org/D86894
2020-09-01 16:51:44 -05:00
Matt Arsenault 759482ddaa GlobalISel: Implement computeKnownBits for G_BSWAP and G_BITREVERSE 2020-09-01 12:49:57 -04:00
Sam Tebbs 15e880a04f [DAGCombiner] Fold an AND of a masked load into a zext_masked_load
This patch folds an AND of a masked load and build vector into a zero
extended masked load.

Differential Revision: https://reviews.llvm.org/D86789
2020-09-01 17:02:07 +01:00
David Sherwood 9fbb113247 [SVE][CodeGen] Fix TypeSize/ElementCount related warnings in sve-split-load.ll
I have fixed up a number of warnings resulting from TypeSize -> uint64_t
casts and calling getVectorNumElements() on scalable vector types. I
think most of the changes are fairly trivial except for those in
DAGTypeLegalizer::SplitVecRes_MLOAD I've tried to ensure we create
the MachineMemoryOperands in a sensible way for scalable vectors.

I have added a CHECK line to the following test:

  CodeGen/AArch64/sve-split-load.ll

that ensures no new warnings are added.

Differential Revision: https://reviews.llvm.org/D86697
2020-09-01 07:47:59 +01:00
Qiu Chaofan 5475154865 [NFC] [DAGCombiner] Refactor bitcast folding within fabs/fneg
fabs and fneg share a common transformation:

(fneg (bitconvert x)) -> (bitconvert (xor x sign))
(fabs (bitconvert x)) -> (bitconvert (and x ~sign))

This patch separate the code into a single method.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86862
2020-09-01 00:48:12 +08:00
Qiu Chaofan eb2a405c18 [NFC] [DAGCombiner] Remove unnecessary negation in visitFNEG
In visitFNEG of DAGCombiner, the folding of (fneg (fsub c, x)) is
redundant since getNegatedExpression already handles it.
2020-09-01 00:35:01 +08:00
Sanjay Patel 1c9a09f42e [DAGCombiner] skip reciprocal divisor optimization for x/sqrt(x), better
I tried to fix this in:
rG716e35a0cf53
...but that patch depends on the order that we encounter the
magic "x/sqrt(x)" expression in the combiner's worklist.

This patch should improve that by waiting until we walk the
user list to decide if there's a use to skip.

The AArch64 test reveals another (existing) ordering problem
though - we may try to create an estimate for plain sqrt(x)
before we see that it is part of a 1/sqrt(x) expression.
2020-08-31 09:35:59 -04:00
Sanjay Patel 716e35a0cf [DAGCombiner] skip reciprocal divisor optimization for x/sqrt(x)
In general, we probably want to try the multi-use reciprocal
transform before sqrt transforms, but x/sqrt(x) is a special-case
because that will always reduce to plain sqrt(x) or an estimate.

The AArch64 tests show that the transform is limited by TLI
hook to patterns where there are 3 or more uses of the divisor.
So this change can result in an extra division compared to
what we had, but that's the intended behvior based on the
current setting of that hook.
2020-08-30 10:55:45 -04:00
Nikita Popov 51d34c0c53 [TargetLowering] Strip tailing whitespace (NFC) 2020-08-29 18:09:08 +02:00
Kai Luo b904324788 [DAGCombiner] Enhance (zext(setcc))
Current `v:t = zext(setcc x,y,cc)` will be transformed to `select x, y, 1:t, 0:t, cc`. It misses some opportunities if x's type size is less than `t`'s size. This patch enhances the above transformation.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86687
2020-08-29 03:37:41 +00:00
Denis Antrushin fabd4c1ae1 [Statepoint] Always spill base pointer.
There is a subtle problem with new statepoint lowering scheme
when base and pointers are the same (see PR46917 for more context):

%1 = STATEPOINT ... %0, %0(tied-def 0)...

if, for some reason, register allocator desides to put two instances
of %0 into two different objects (registers or spill slots), we may
end up with

$reg3 = STATEPOINT ... $reg2, $reg1(tied-def 0)...

and nothing will prevent later passes to sink uses of $reg2 below
statepoint, which is incorrect.

As a short term solution, always put base pointers on stack during
lowering.
A longer term solution may be to rework MIR statepoint format to
avoid GC pointer duplication in statepoint argument list.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D86712
2020-08-28 23:22:07 +07:00
QingShan Zhang deb4b25807 [DAGCombine] Don't delete the node if it has uses immediately
This is the follow up patch for https://reviews.llvm.org/D86183 as we miss to delete the node if NegX == NegY, which has use after we create the node.
```
    if (NegX && (CostX <= CostY)) {
      Cost = std::min(CostX, CostZ);
      RemoveDeadNode(NegY);
      return DAG.getNode(Opcode, DL, VT, NegX, Y, NegZ, Flags);  #<-- NegY is used here if NegY == NegX.
    }
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86689
2020-08-28 16:13:43 +00:00
David Sherwood f4257c5832 [SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065
2020-08-28 14:43:53 +01:00
Lucas Prates 3d943bcd22 [CodeGen] Properly propagating Calling Convention information when lowering vector arguments
When joining the legal parts of vector arguments into its original value
during the lower of Formal Arguments in SelectionDAGBuilder, the Calling
Convention information was not being propagated for the handling of each
individual parts. The same did not happen when lowering calls, causing a
mismatch.

This patch fixes the issue by properly propagating the Calling
Convention details.

This fixes Bugzilla #47001.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D86715
2020-08-27 17:01:10 +01:00
Drew Wock 0ec098e22b [FPEnv] Allow fneg + strict_fadd -> strict_fsub in DAGCombiner
This is the first of a set of DAGCombiner changes enabling strictfp
optimizations. I want to test to waters with this to make sure changes
like these are acceptable for the strictfp case- this particular change
should preserve exception ordering and result precision perfectly, and
many other possible changes appear to be able to as well.

Copied from regular fadd combines but modified to preserve ordering via
the chain, this change allows strict_fadd x, (fneg y) to become
struct_fsub x, y and strict_fadd (fneg x), y to become strict_fsub y, x.

Differential Revision: https://reviews.llvm.org/D85548
2020-08-27 08:17:01 -04:00
Sanjay Patel 54a5dd485c [DAGCombiner] allow store merging non-i8 truncated ops
We have a gap in our store merging capabilities for shift+truncate
patterns as discussed in:
https://llvm.org/PR46662

I generalized the code/comments for this function in earlier commits,
so we only need ease the type restriction and adjust the address/endian
checking to make this work.

AArch64 lets us switch endian to make sure that patterns are matched
either way.

Differential Revision: https://reviews.llvm.org/D86420
2020-08-26 15:23:08 -04:00
aartbik 72305a08ff [llvm] [DAG] Fix bug in llvm.get.active.lane.mask lowering
This intrinsic only accepted proper machine vector lengths.
Fixed by this change. With unit tests.

https://bugs.llvm.org/show_bug.cgi?id=47299

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D86585
2020-08-26 10:16:31 -07:00
Craig Topper 28bd47fc47 [LegalizeTypes] Remove WidenVecRes_Shift and just use WidenVecRes_Binary
This function seems to allow for the shift amount to have a different type than the result, but I don't think we do that anywhere else for vector shifts. We also don't have any support for legalizing the shift amount alone if the result is legal and the shift amount type isn't. The code coverage report here shows this code as uncovered http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp.html

Differential Revision: https://reviews.llvm.org/D86475
2020-08-26 09:57:41 -07:00
Jay Foad 75d159f924 [LegalizeTypes] Add ROTL/ROTR to ScalarizeVectorResult.
We can scalarize these just like any other binary operation.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47303 caused by D77152.

Differential Revision: https://reviews.llvm.org/D86601
2020-08-26 14:42:57 +01:00
Jay Foad b7e3599a22 [SelectionDAG] Handle non-power-of-2 bitwidths in expandROT
Differential Revision: https://reviews.llvm.org/D86449
2020-08-26 09:20:46 +01:00
Sjoerd Meijer 39522b1e10 [SelectionDAG] Legalize intrinsic get.active.lane.mask
This adapts legalization of intrinsic get.active.lane.mask to the new semantics
as described in D86147. Because the second argument is now the loop tripcount,
we legalize this intrinsic to an 'icmp ULT' instead of an ULE when it was the
backedge-taken count.

Differential Revision: https://reviews.llvm.org/D86302
2020-08-25 15:00:10 +01:00
Paul Walker 73ac3c0ede [SVE] Lower scalable vector ISD::FNEG operations.
Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A)
transformation when -1 is expressed as an ISD::SPLAT_VECTOR.

Differential Revision: https://reviews.llvm.org/D86415
2020-08-25 11:22:28 +01:00
Venkataramanan Kumar 62e91bf563 [DAGCombine]: Fold X/Sqrt(X) to Sqrt(X)
With FMF ( "nsz" and " reassoc") fold X/Sqrt(X) to Sqrt(X).

This is done after targets have the chance to produce a
reciprocal sqrt estimate sequence because that expansion
is probably more efficient than an expansion of a
non-reciprocal sqrt. That is also why we deferred doing
this transform in IR (D85709).

Differential Revision: https://reviews.llvm.org/D86403
2020-08-24 18:16:13 -04:00
Craig Topper 43465a4375 [LegalizeTypes][X86] Add ROTL/ROTR to WidenVectorResult.
We can widen these just like any other binary operation.

Added test cases for v2i32 for X86 for coverage.

Fixes failures seen after D77152.
2020-08-24 10:10:20 -07:00
Jay Foad a522067692 [SDAG] Convert FSHL <--> FSHR if the target only supports one of them
D77152 tried to do this but got it wrong in the shift-by-zero case.
D86430 reverted the wrong code. Reimplement the optimization with
different code depending on whether the shift amount is known to be
non-zero (modulo bitwidth).

This improves code quality for fshl tests on AMDGPU, which only has an
fshr instruction.

Differential Revision: https://reviews.llvm.org/D86438
2020-08-24 17:47:10 +01:00
Bjorn Pettersson 7a4e26adc8 [SelectionDAG] Fix miscompile bug in expandFunnelShift
This is a fixup of commit 0819a6416f (D77152) which could
result in miscompiles. The miscompile could only happen for targets
where isOperationLegalOrCustom could return different values for
FSHL and FSHR.

The commit mentioned above added logic in expandFunnelShift to
convert between FSHL and FSHR by swapping direction of the
funnel shift. However, that transform is only legal if we know
that the shift count (modulo bitwidth) isn't zero.

Basically, since fshr(-1,0,0)==0 and fshl(-1,0,0)==-1 then doing a
rewrite such as fshr(X,Y,Z) => fshl(X,Y,0-Z) would be incorrect if
Z modulo bitwidth, could be zero.

```
$ ./alive-tv /tmp/test.ll

----------------------------------------
define i32 @src(i32 %x, i32 %y, i32 %z) {
%0:
  %t0 = fshl i32 %x, i32 %y, i32 %z
  ret i32 %t0
}
=>
define i32 @tgt(i32 %x, i32 %y, i32 %z) {
%0:
  %t0 = sub i32 32, %z
  %t1 = fshr i32 %x, i32 %y, i32 %t0
  ret i32 %t1
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
i32 %x = #x00000000 (0)
i32 %y = #x00000400 (1024)
i32 %z = #x00000000 (0)

Source:
i32 %t0 = #x00000000 (0)

Target:
i32 %t0 = #x00000020 (32)
i32 %t1 = #x00000400 (1024)
Source value: #x00000000 (0)
Target value: #x00000400 (1024)
```

It could be possible to add back the transform, given that logic
is added to check that (Z % BW) can't be zero. Since there were
no test cases proving that such a transform actually would be useful
I decided to simply remove the faulty code in this patch.

Reviewed By: foad, lebedev.ri

Differential Revision: https://reviews.llvm.org/D86430
2020-08-24 09:52:11 +02:00
Qiu Chaofan 1bc45b2fd8 [PowerPC] Support lowering int-to-fp on ppc_fp128
D70867 introduced support for expanding most ppc_fp128 operations. But
sitofp/uitofp is missing. This patch adds that after D81669.

Reviewed By: uweigand

Differntial Revision: https://reviews.llvm.org/D81918
2020-08-24 11:18:16 +08:00
QingShan Zhang 960cbc53ca [DAGCombine] Remove dead node when it is created by getNegatedExpression
We hit the compiling time reported by https://bugs.llvm.org/show_bug.cgi?id=46877
and the reason is the same as D77319. So we need to remove the dead node we created
to avoid increase the problem size of DAGCombiner.

Reviewed By: Spatel

Differential Revision: https://reviews.llvm.org/D86183
2020-08-24 02:50:58 +00:00
Sanjay Patel 1d0fa79824 [DAGCombiner] restrict store merge of truncs to early combining
The pattern matching does not account for truncating stores,
so it is unlikely to work at later stages. So we are likely
wasting compile-time with no hope of improvement by running
this later.
2020-08-23 10:44:23 -04:00
Sanjay Patel 79cb289a95 [DAGCombiner] add early exit for store merging of truncs
This should be NFC in terms of output because the endian
check further down would bail out too, but we are wasting
time by waiting to that point to give up. If we generalize
that function to deal with more than i8 types, we should
not have to deal with the degenerate case.
2020-08-22 16:25:16 -04:00
Sanjay Patel 2fc7c85201 [DAGCombiner] clean up merge of truncated stores; NFC
This code handles the special-case of i8 stores,
but it could be generalized to deal with other types.
2020-08-22 09:23:32 -04:00
Jay Foad 0819a6416f [SelectionDAG] Better legalization for FSHL and FSHR
In SelectionDAGBuilder always translate the fshl and fshr intrinsics to
FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and
ORs. Improve the legalization of FSHL and FSHR to avoid code quality
regressions.

Differential Revision: https://reviews.llvm.org/D77152
2020-08-21 10:32:49 +01:00
Mehdi Amini a407ec9b6d Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private.""
Was reverted because MLIR/Flang builds were broken, these APIs have been
fixed in the meantime.
2020-08-19 17:26:36 +00:00
Mehdi Amini 4fc56d70aa Revert "[NFC][llvm] Make the contructors of `ElementCount` private."
This reverts commit 264afb9e6a.
(and dependent 6b742cc48 and fc53bd610f)

MLIR/Flang are broken.
2020-08-19 17:21:37 +00:00
Francesco Petrogalli 264afb9e6a [NFC][llvm] Make the contructors of `ElementCount` private.
Differential Revision: https://reviews.llvm.org/D86120
2020-08-19 16:26:44 +00:00
David Sherwood 3f36561f69 [SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorLoads
In DAGTypeLegalizer::GenWidenVectorLoads the algorithm assumes it only
ever deals with fixed width types, hence the offsets for each individual
store never take 'vscale' into account. I've changed the code in that
function to use TypeSize instead of unsigned for tracking the remaining
load amount. In addition, I've changed the load loop to use the new
IncrementPointer helper function for updating the addresses in each
iteration, since this handles scalable vector types.

Also, I've added report_fatal_errors in GenWidenVectorExtLoads,
TargetLowering::scalarizeVectorLoad and TargetLowering::scalarizeVectorStores,
since these functions currently use a sequence of element-by-element
scalar loads/stores. In a similar vein, I've also added a fatal error
report in FindMemType for the case when we decide to return the element
type for a scalable vector type.

I've added new tests in

  CodeGen/AArch64/sve-split-load.ll
  CodeGen/AArch64/sve-ld-addressing-mode-reg-imm.ll

for the changes in GenWidenVectorLoads.

Differential Revision: https://reviews.llvm.org/D85909
2020-08-19 07:54:32 +01:00
Sanjay Patel f925fd3304 [DAGCombiner] give magic number a name in getStoreMergeCandidates; NFC 2020-08-17 15:37:55 -04:00
Sanjay Patel 046b4a550a [DAGCombiner] reduce code duplication in getStoreMergeCandidates; NFC 2020-08-17 15:37:55 -04:00
Sanjay Patel 20c85fd1ab [DAGCombiner] simplify bool return in getStoreMergeCandidates; NFC 2020-08-17 15:37:55 -04:00
Sanjay Patel 52cd8f1ecb [DAGCombiner] clean up getStoreMergeCandidates(); NFC
1. Move bailouts and local var declarations.
2. Convert if-chain to switch on StoreSource with unreachable default.
2020-08-17 15:37:54 -04:00
Sanjay Patel 27708db3e3 [DAGCombiner] convert StoreSource if-chain to switch; NFC
The "isa" checks were less constrained because they allow
target constants, but the later matching code would bail
out on those anyway, so this should be slightly more
efficient.
2020-08-17 15:37:54 -04:00
Matt Arsenault 5b53b17cd3 DAG: Add missing comment for transform 2020-08-17 10:01:12 -04:00
Matt Arsenault c7191e3185 DAG: Don't pass 0 alignment value to allowsMisalignedMemoryAccesses
I think not unconditionally passing getDstAlign is broken, but leave
that for another change.
2020-08-13 09:33:17 -04:00
Kerry McLaughlin 30af595f05 [SVE][CodeGen] Legalisation of EXTRACT_VECTOR_ELT for scalable vectors
This patch changes SplitVecOp_EXTRACT_VECTOR_ELT to work correctly
for scalable vectors and also fixes an a bug in DAGCombiner where
the scalable property is dropped in visitTRUNCATE when attempting
to fold an extract + a truncate.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85754
2020-08-13 12:32:59 +01:00
David Sherwood 6af1677161 [SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorStores
In DAGTypeLegalizer::GenWidenVectorStores the algorithm assumes it only
ever deals with fixed width types, hence the offsets for each individual
store never take 'vscale' into account. I've changed the main loop in
that function to use TypeSize instead of unsigned for tracking the
remaining store amount and offset increment. In addition, I've changed
the loop to use the new IncrementPointer helper function for updating
the addresses in each iteration, since this handles scalable vector
types.

Whilst fixing this function I also fixed a minor issue in
IncrementPointer whereby we were not adding the no-unsigned-wrap flag
for the add instruction in the same way as the fixed width case does.

Also, I've added a report_fatal_error in GenWidenVectorTruncStores,
since this code currently uses a sequence of element-by-element scalar
stores.

I've added new tests in

  CodeGen/AArch64/sve-intrinsics-stores.ll
  CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll

for the changes in GenWidenVectorStores.

Differential Revision: https://reviews.llvm.org/D84937
2020-08-13 11:07:17 +01:00
David Sherwood 3ec3fcb97a [CodeGen] In narrowExtractedVectorLoad bail out for scalable vectors
In narrowExtractedVectorLoad there is an optimisation that tries to
combine extract_subvector with a narrowing vector load. At the moment
this produces warnings due to the incorrect calls to
getVectorNumElements() for scalable vector types. I've got this
working for scalable vectors too when the extract subvector index
is a multiple of the minimum number of elements. I have added a
new variant of the function:

  MachineFunction::getMachineMemOperand

that copies an existing MachineMemOperand, but replaces the pointer
info with a null version since we cannot currently represent scaled
offsets.

I've added a new test for this particular case in:

  CodeGen/AArch64/sve-extract-subvector.ll

Differential Revision: https://reviews.llvm.org/D83950
2020-08-13 10:46:18 +01:00
David Sherwood 88bbd30736 [SVE][CodeGen] Fix issues with EXTRACT_SUBVECTOR when using scalable FP vectors
In this patch I have fixed two issues:

1. Our SVE tuple get/set intrinsics were using the wrong constant type
for the index passed to EXTRACT_SUBVECTOR. I have fixed this by using the
function SelectionDAG::getVectorIdxConstant to create the value. Also, I
have updated the documentation for EXTRACT_SUBVECTOR describing what type
the constant index should be and we now enforce this when creating the
node.
2. The AArch64 backend was missing the appropriate patterns for
extracting certain subvectors (nxv4f16 and nxv2f32) from legal SVE types.
I have added them as part of this patch.

The only way that I could find to test the new patterns was to use the
SVE tuple get intrinsics, although I realise it looks a bit unusual.
Tests added here:

  test/CodeGen/AArch64/sve-extract-subvector.ll

Differential Revision: https://reviews.llvm.org/D85516
2020-08-12 08:35:46 +01:00
Kerry McLaughlin 455ed56d48 [SVE][CodeGen] Legalisation of INSERT_VECTOR_ELT for scalable vectors
When the result type of insertelement needs to be split,
SplitVecRes_INSERT_VECTOR_ELT will try to store the vector to a
stack temporary, store the element at the location of the stack
temporary plus the index, and reload the Lo/Hi parts.

This patch does the following to ensure this works for scalable vectors:
 - Sets the StackID with getStackIDForScalableVectors() in CreateStackTemporary
 - Adds an IsScalable flag to getMemBasePlusOffset() and scales the
    offset by VScale when this is true
 - Ensures the immediate is clamped correctly by clampDynamicVectorIndex
    so that we don't try to use an out of range index

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D84874
2020-08-11 12:57:28 +01:00
Kerry McLaughlin 85c7e89f3b [CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85220
2020-08-11 12:17:10 +01:00
QingShan Zhang 61ede38da0 [CodeGen] Expand float operand for STRICT_FSETCC/STRICT_FSETCCS
This patch is the continue work of https://reviews.llvm.org/D69281
to implement the way that expands STRICT_FSETCC/STRICT_FSETCCS.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D81906
2020-08-11 05:55:00 +00:00
Craig Topper fdfdee98ac [DAGCombiner] Teach SimplifySetCC SETUGE X, SINTMIN -> SETLT X, 0 and SETULE X, SINTMAX -> SETGT X, -1.
These aren't the canonical forms we'd get from InstCombine, but
we do have X86 tests for them. Recognizing them is pretty cheap.

While there make use of APInt:isSignedMinValue/isSignedMaxValue
instead of creating a new APInt to compare with. Also use
SelectionDAG::getAllOnesConstant helper to hide the all ones
APInt creation.
2020-08-08 22:27:16 -07:00
Sanjay Patel f22ac1d15b [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division, part 2
Follow-up to D82716 / rGea71ba11ab11
We do not have the fabs removal fold in IR yet for the case
where the sqrt operand is repeated, so that's another potential
improvement.
2020-08-08 10:38:06 -04:00
Bevin Hansson 5de6c56f7e [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics.
Summary:
This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat,
which perform signed and unsigned saturating left shift,
respectively.

These are useful for implementing the Embedded-C fixed point
support in Clang, originally discussed in
http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html
and
http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html

Reviewers: leonardchan, craig.topper, bjope, jdoerfert

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83216
2020-08-07 15:09:24 +02:00
Simon Pilgrim 66a163f328 [DAG] GetDemandedBits - remove custom AND handling.
As mentioned on D85463, we should be using SimplifyMultipleUseDemandedBits (which is the default fallback).

The minor regression in illegal-bitfield-loadstore.ll will be addressed properly by D77804.
2020-08-07 12:55:47 +01:00
Simon Pilgrim fcefb53222 Remove unreachable break. NFC 2020-08-07 12:37:49 +01:00
Craig Topper ffc248f3b8 [LegalTypes] Move VSELECT node creation out of WidenVSELECTAndMask and push to 2 of the 3 callers.
One of the callers only wants the condition, but the vselect can
be simplified by getNode making it hard or impossible to retrieve
the condition.

Instead, return the condition and make the other 2 callers
responsible for creating the vselect node using the condition.
Rename the function to WidenVSELECTMask accordingly.

Differential Revision: https://reviews.llvm.org/D85468
2020-08-06 13:18:16 -07:00
Paul Walker 0d33a8ef5b [SVE] Lower scalable vector mul operations.
This allows us to remove extra patterns from AArch64SVEInstrInfo.td
because we can reuse those required for fixed length vectors.

Differential Revision: https://reviews.llvm.org/D85328
2020-08-06 11:15:35 +01:00
Simon Pilgrim 4aaf301fb8 [DAG] Fold vector (aext (load x)) -> (zext (truncate (zextload x)))
We currently don't do anything to fold any_extend vector loads as no target has such an instruction.

Instead I've added support for folding to a zextload, SimplifyDemandedBits does a good job of adjusting the zext(truncate(()) stages as required later on.

We still need the custom scalar extload handling instead of using the tryToFoldExtOfLoad helper as it has different legality tests - we can probably tweak that to reduce most of the code duplication.

Fixes the regression I mentioned in rG99a971cadff7

Differential Revision: https://reviews.llvm.org/D85129
2020-08-05 11:22:23 +01:00
Eli Friedman 4a47f1c4ce [SelectionDAG][SVE] Support scalable vectors in getConstantFP()
Differential Revision: https://reviews.llvm.org/D85249
2020-08-04 15:32:43 -07:00
Cameron McInally 0f2b47b6da [FastISel] Don't transform FSUB(-0, X) -> FNEG(X) in FastISel
This corresponds with the SelectionDAGISel change in D84056.

Also, rename some poorly named tests in CodeGen/X86/fast-isel-fneg.ll with NFC.

Differential Revision: https://reviews.llvm.org/D85149
2020-08-04 14:42:53 -05:00
Jay Foad 28e322ea93 [PowerPC] Custom lowering for funnel shifts
The custom lowering saves an instruction over the generic expansion, by
taking advantage of the fact that PowerPC shift instructions are well
defined in the shift-by-bitwidth case.

Differential Revision: https://reviews.llvm.org/D83948
2020-08-04 16:30:49 +01:00
Cameron McInally 31c7a2fd5c [FPEnv] Don't transform FSUB(-0,X)->FNEG(X) in SelectionDAGBuilder.
This patch stops unconditionally transforming FSUB(-0,X) into an FNEG(X) while building the DAG. There is also one small change to handle the new FSUB(-0,X) similarly to FNEG(X) in the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D84056
2020-08-03 10:22:25 -05:00
Simon Pilgrim b8ffbf0e02 [DAG] TargetLowering::expandMUL_LOHI - pass SDLoc as const&
Try to be more consistent with the SDLoc param in the TargetLowering methods.

This also exposes an issue where we were passing a SDNode as a SDLoc, relying on the implicit SDLoc(SDNode) constructor.
2020-08-02 15:31:36 +01:00
Simon Pilgrim d14a22da5e [DAG] TargetLowering::LowerAsmOutputForConstraint - pass SDLoc as const&
Try to be more consistent with the SDLoc param in the TargetLowering methods.
2020-08-02 15:12:02 +01:00
Matt Arsenault 57bd64ff84 Support addrspacecast initializers with isNoopAddrSpaceCast
Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs
with the DataLayout.
2020-07-31 10:42:43 -04:00
Vitaly Buka b0eb40ca39 [NFC] Remove unused GetUnderlyingObject paramenter
Depends on D84617.

Differential Revision: https://reviews.llvm.org/D84621
2020-07-31 02:10:03 -07:00
Vitaly Buka 89051ebace [NFC] GetUnderlyingObject -> getUnderlyingObject
I am going to touch them in the next patch anyway
2020-07-30 21:08:24 -07:00
Eli Friedman 7e88efa7c5 [LegalizeTypes][SVE] Support widen/split legalization for SPLAT_VECTOR
Just the obvious implementation that rewrites the result type. Also fix
warning from EXTRACT_SUBVECTOR legalization that triggers on the test.

Differential Revision: https://reviews.llvm.org/D84706
2020-07-30 16:17:45 -07:00
Jon Roelofs afae6d97fa [SelectionDAG] Fix lowering of vector geps
This fixes an assertion failure that was being triggered in
SelectionDAG::getZeroExtendInReg(), where it was trying to extend the <2xi32>
to i64 (which should have been <2xi64>).

Fixes: rdar://66016901

Differential Revision: https://reviews.llvm.org/D84884
2020-07-30 14:56:53 -06:00
Sam Tebbs 276ed5f7e4 [DAGCombiner] Fold sext_inreg of a masked load into a sign extended masked load
This patch adds a DAG combine fold for a sext(masked_load) into a sign extended masked load.

Differential Revision: https://reviews.llvm.org/D84332
2020-07-30 10:34:02 +01:00
Philip Reames 755f91f12c [Statepoint] Enable cross block relocates w/vreg lowering
This change is mechanical, it just removes the restriction and updates tests.  The key building blocks were submitted in 31342eb and 8fe2abc.

Note that this (and preceeding changes) entirely subsumes D83965.  I did includes a couple of it's tests.

From the codegen changes, an interesting observation: this doesn't actual reduce spilling, it just let's the register allocator do it's job.  That results in a slightly different overall result which has both pros and cons over the eager spill lowering.  (i.e. We'll have some perf tuning to do once this is stable.)
2020-07-29 13:32:51 -07:00
Philip Reames 8fe2abc190 [Statepoint] Consolidate relocation type tracking [NFC]
Change the way we track how a particular pointer was relocated at a statepoint in selection dag.  Previously, we used an optional<location> for the spill lowering, and a block local Register for the newly introduced vreg lowering.  Combine all three lowerings (norelocate, spill, and vreg) into a single helper class, and keep a single copy of the information.

This is submitted separately as it really does make the code more readible on it's own, but the indirect motivation is to move vreg tracking from StatepointLowering to FunctionLoweringInfo.  This is the last piece needed to support cross block relocations with vregs; that will follow in a separate (non-NFC) patch.
2020-07-29 11:45:31 -07:00
Simon Pilgrim fdc902774e [DAG][AMDGPU][X86] Add SimplifyMultipleUseDemandedBits handling for SIGN/ZERO_EXTEND + SIGN/ZERO_EXTEND_VECTOR_INREG
Peek through multiple use ops like we already do for ANY_EXTEND/ANY_EXTEND_VECTOR_INREG

Differential Revision: https://reviews.llvm.org/D84863
2020-07-29 18:10:59 +01:00
Philip Reames 31342eb63e [Statepoint] When using the tied def lowering, unconditionally use vregs [almost NFC]
This builds on 3da1a96 on the path towards supporting invokes and cross block relocations. The actual change attempts to be NFC, but does fail in one corner-case explained below.

The change itself is fairly mechanical. Rather than remember SDValues - which are inherently block local - immediately produce a virtual register copy and remember that.

Once this lands, we'll update the FunctionLoweringInfo::StatepointSpillMap map to allow register based lowerings, delete VirtRegs from StatepointLowering, and drop the restriction against cross block relocations. I deliberately separate the semantic part into it's own change for easy of understanding and fault isolation.

The corner-case which isn't quite NFC is that the old implementation implicitly CSEd gc.relocates of the same SDValue regardless of type. The new implementation still only relocates once, but it produces distinct vregs for the bitcast and it's source, whereas SelectionDAG's generic CSE was able to remove the bitcast in the old implementation. Note that the final assembly doesn't change (at least in the test), as our MI level optimizations catch the duplication.

I assert that this is an uninteresting corner-case. It's functionally correct, and if we find a case where this influences performance, we should really be canonicalizing types to i8* at the IR level.

Differential Revision: https://reviews.llvm.org/D84692
2020-07-29 09:23:52 -07:00
David Sherwood 2078771759 [SVE][CodeGen] Add simple integer add tests for SVE tuple types
I have added tests to:

  CodeGen/AArch64/sve-intrinsics-int-arith.ll

for doing simple integer add operations on tuple types. Since these
tests introduced new warnings due to incorrect use of
getVectorNumElements() I have also fixed up these warnings in the
same patch. These fixes are:

1. In narrowExtractedVectorBinOp I have changed the code to bail out
early for scalable vector types, since we've not yet hit a case that
proves the optimisations are profitable for scalable vectors.
2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced
calls to getVectorNumElements with getVectorMinNumElements in cases
that work with scalable vectors. For the other cases I have added
asserts that the vector is not scalable because we should not be
using shuffle vectors and build vectors in such cases.

Differential revision: https://reviews.llvm.org/D84016
2020-07-29 13:32:10 +01:00
David Sherwood 5d84eafc6b [CodeGen] Remove calls to getVectorNumElements in DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR
In DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR I have replaced
calls to getVectorNumElements with getVectorMinNumElements, since
this code path works for both fixed and scalable vector types. For
scalable vectors the index will be multiplied by VSCALE.

Fixes warnings in this test:

  sve-sext-zext.ll

Differential revision: https://reviews.llvm.org/D83198
2020-07-29 13:05:39 +01:00
Simon Pilgrim b4b6e77454 [DAG] isSplatValue - add support for TRUNCATE/SIGN_EXTEND/ZERO_EXTEND
These are just pass-throughs to the source operand - we can't assume that ANY_EXTEND(splat) will still be a splat though.
2020-07-28 19:56:11 +01:00
Changpeng Fang 9162b70e51 DADCombiner: Don't simplify the token factor if the node's number of operands already exceeds TokenFactorInlineLimit
Summary:
  In parallelizeChainedStores, a TokenFactor was created with the size greater than 3000.
We found that DAGCombiner::visitTokenFactor will consume a huge amount of time on
such nodes. Since the number of operands already exceeds TokenFactorInlineLimit, we propose
to give up simplification with the consideration of compile time.

Reviewers:
  @spatel, @arsenm

Differential Revision:
  https://reviews.llvm.org/D84204
2020-07-25 21:20:59 -07:00
Eric Christopher 18975762c1 Fold StatepointBB into checks as it's only used from an NDEBUG or ASSERT
context fixing an unused variable warning.
2020-07-25 18:36:53 -07:00
Philip Reames 55dae9c20c [Statepoints] Style cleanup after 3da1a963 [NFC]
Just fixing a few minor stylistic issues.
2020-07-25 16:40:39 -07:00
Philip Reames 3da1a9634e [Statepoints] Support lowering gc relocations to virtual registers
(Disabled under flag for the moment)

This is part of a larger project wherein we are finally integrating lowering of gc live operands with the register allocator.  Today, we force spill all operands in SelectionDAG.  The code to do so is distinctly non-optimal.  The approach this patch is working towards is to instead lower the relocations directly into the MI form, and let the register allocator pick which ones get spilled and which stack slots they get spilled to.  In terms of performance, the later part is actually more important as it avoids redundant shuffling of values between stack slots.

This particular change adds ISEL support to produce the variadic def STATEPOINT form required by the above.  In particular, the first N are lowered to variadic tied def/use pairs.  So new statepoint looks like this:
reloc1,reloc2,... = STATEPOINT ..., base1, derived1<tied-def0>, base2, derived2<tied-def1>, ...

N is limited by the maximal number of tied registers machine instruction can have (15 at the moment).

The current patch is restricted to handling relocations within a single basic block.  Cross block relocations (e.g. invokes) are handled via the legacy mechanism.  This restriction will be relaxed in future patches.

Patch By: dantrushin
Differential Revision: https://reviews.llvm.org/D81648
2020-07-25 14:26:05 -07:00
Craig Topper 8131e19064 [LegalizeTypes] Teach DAGTypeLegalizer::GenWidenVectorLoads to pad with undef if needed when concatenating small or loads to match a larger load
In the included test case the align 16 allowed the v23f32 load to handled as load v16f32, load v4f32, and load v4f32(one element not used). These loads all need to be concatenated together into a final vector. In this case we tried to concatenate the two v4f32 loads to match the type of the v16f32 load so we could do a second concat_vectors, but those loads alone only add up to v8f32. So we need to two v4f32 undefs to pad it.

It appears we've tried to hack around a similar issue in this code before by adding undef padding to loads in one of the earlier loops in this function. Originally in r147964 by padding all loads narrower than previous loads to the same size. Later modifed to only the last load in r293088. This patch removes that earlier code and just handles it on demand where we know we need it.

Fixes PR46820

Differential Revision: https://reviews.llvm.org/D84463
2020-07-23 19:02:03 -07:00
Nikita Popov deb4bb2b3a [IR] Add min/max/abs intrinsics
This adds the llvm.abs(), llvm.umin(), llvm.umax(), llvm.smin(),
and llvm.smax() intrinsics specified in D81829. For SelectionDAG,
the ISD opcodes and all the legalization and lowering already exist,
so this just wires them up to the intrinsic in the SDAG builder and
adds rudimentary tests. For GlobalISel only the min/max intrinsics
are wired up, as llvm.abs() will require the addition of a G_ABS op,
and corresponding legalization support.

Differential Revision: https://reviews.llvm.org/D84125
2020-07-23 20:56:19 +02:00
Florian Hahn 6c9da995fc [ScheduleDAGRRList] Pacify overload mismatch in std::min.
On systems where size() doesn't return unsigned long, this leads to an
overloading mismatch. Convert the constant to whatever type is used for
Q.size() on the system.
2020-07-23 11:56:50 +01:00
Florian Hahn 2f8e6b5f3c [ScheduleDAGRRList] Limit number of candidates to explore.
Currently popFromQueueImpl iterates over all candidates to find the best
one. While the candidate queue is small, this is not a problem. But it
becomes a problem once the queue gets larger. For example, the snippet
below takes 330s to compile with llc -O0, but completes in 3s with this
patch.

define void @test(i4000000* %ptr) {
entry:
  store i4000000 0, i4000000* %ptr, align 4
  ret void
}

This patch limits the number of candidates to check to 1000. This limit
ensures that it never triggers for test-suite/SPEC2000/SPEC2006 on X86
and AArch64 with -O3, while still drastically limiting the compile-time
in case of very large queues.

It would be even better to use a binary heap to manage to queue
(D83335), but some heuristics change the score of a node in the queue
after another node has been scheduled. I plan to address this for
backends that use the MachineScheduler in the future, but that requires
a more careful evaluation. In the meantime, the limit should help users
impacted by this issue.

The patch includes a slightly smaller version of the motivating example
as test case, to guard against the issue.

Reviewers: efriedma, paquette, niravd

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84328
2020-07-23 11:35:33 +01:00
Simon Pilgrim fa95688237 SelectionDAGBuilder.cpp - remove duplicate includes that already exist in SelectionDAGBuilder.h. NFC. 2020-07-22 14:19:41 +01:00
Matt Arsenault f659c44016 CodeGen: Add support for lowering byref attribute 2020-07-21 17:38:15 -04:00
Matt Arsenault 2fe0ea8261 DAG: Handle expanding strict_fsub into fneg and strict_fadd
The AMDGPU handling of f16 vectors is terrible still since it gets
scalarized even when the vector operation is legal.

The code is is essentially duplicated between the non-strict and
strict case. Apparently no other expansions are currently trying to do
this. This is mostly because I found the behavior of
getStrictFPOperationAction to be confusing. In the ARM case, it would
expand strict_fsub even though it shouldn't due to the later check. At
that point, the logic required to check for legality was more complex
than just duplicating the 2 instruction expansion.
2020-07-21 16:17:10 -04:00
Eli Friedman b8f765a1e1 [AArch64][SVE] Add support for trunc to <vscale x N x i1>.
This isn't a natively supported operation, so convert it to a
mask+compare.

In addition to the operation itself, fix up some surrounding stuff to
make the testcase work: we need concat_vectors on i1 vectors, we need
legalization of i1 vector truncates, and we need to fix up all the
relevant uses of getVectorNumElements().

Differential Revision: https://reviews.llvm.org/D83811
2020-07-20 13:11:02 -07:00
Florian Hahn e297006d6f [ScheduleDAG] Move DBG_VALUEs after first term forward.
MBBs are not allowed to have non-terminator instructions after the first
terminator. Currently in some cases (see the modified test),
EmitSchedule can add DBG_VALUEs after the last terminator, for example
when referring a debug value that gets folded into a TCRETURN
instruction on ARM.

This patch updates EmitSchedule to move inserted DBG_VALUEs just before
the first terminator. I am not sure if there are terminators produce
values that can in turn be used by a DBG_VALUE. In that case, moving the
DBG_VALUE might result in referencing an undefined register. But in any
case, it seems like currently there is no way to insert a proper DBG_VALUEs
for such registers anyways.

Alternatively it might make sense to just remove those extra DBG_VALUES.

I am not too familiar with the details of debug info in the backend and
would appreciate any suggestions on how to address the issue in the best
possible way.

Reviewers: vsk, aprantl, jpaquette, efriedma, paquette

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D83561
2020-07-17 10:27:43 +01:00
Matt Arsenault 9d3e56e2ee DAG: Try scalarizing when expanding saturating add/sub
In an upcoming AMDGPU patch, the scalar cases will be legal and vector
ops should be scalarized, rather than producing a long sequence of
vector ops which will also need to be scalarized.

Use a lazy heuristic that seems to work and improves the thumb2 MVE
test.
2020-07-16 14:05:16 -04:00
Matt Arsenault 023883a834 IR: Rename Argument::hasPassPointeeByValueAttr to prepare for byref
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.

The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
2020-07-16 13:50:49 -04:00
Kerry McLaughlin 2762da0a16 [SVE][CodeGen] Legalisation of masked loads and stores
Summary:
This patch modifies IncrementMemoryAddress to use a vscale
when calculating the new address if the data type is scalable.

Also adds tablegen patterns which match an extract_subvector
of a legal predicate type with zip1/zip2 instructions

Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: efriedma, david-arm

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83137
2020-07-16 10:55:45 +01:00
Hiroshi Yamauchi f233b92f92 [PGO][PGSO] Add profile guided size optimization to LegalizeDAG.
Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83333
2020-07-15 10:03:38 -07:00
Cameron McInally ae51a70030 [Legalize] Hoist invariant condition in ExpandVectorBuildThroughStack(...)
The operands of a BUILD_VECTOR must all have the same type, so we can hoist this invariant condition out of the loop.

Differential Revision: https://reviews.llvm.org/D83882
2020-07-15 11:05:20 -05:00
Tim Northover 5165b2b5fd AArch64+ARM: make LLVM consider system registers volatile.
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.

This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
2020-07-15 09:47:36 +01:00
Roger Ferrer Ibanez 14bc5e149d [DAGCombiner] Rebuild (setcc x, y, ==) from (xor (xor x, y), 1)
The existing code already considered this case. Unfortunately a typo in
the condition prevents it from triggering. Also the existing code, had
it run, forgot to do the folding.

This fixes PR42876.

Differential Revision: https://reviews.llvm.org/D65802
2020-07-15 07:34:22 +00:00
Paul Walker 6e198aae1d [SelectionDAG] Prevent warnings when extracting fixed length vector from scalable.
ComputeNumSignBits and computeKnownBits both trigger "Scalable flag
may be dropped" warnings when a fixed length vector is extracted
from a scalable vector.  This patch assumes nothing about the
demanded elements thus matching the behaviour when extracting a
scalable vector from a scalable vector.

Differential Revision: https://reviews.llvm.org/D83642
2020-07-14 11:12:56 +00:00
David Sherwood 3b8eaf26db [SVE][CodeGen] Fix implicit TypeSize->uint64_t conversion in TransformFPLoadStorePair
In DAGCombiner::TransformFPLoadStorePair we were dropping the scalable
property of TypeSize when trying to create an integer type of equivalent
size. In fact, this optimisation makes no sense for scalable types
since we don't know the size at compile time. I have changed the code
to bail out when encountering scalable type sizes.

I've added a test to

  llvm/test/CodeGen/AArch64/sve-fp.ll

that exercises this code path. The test already emits an error if it
encounters warnings due to implicit TypeSize->uint64_t conversions.

Differential Revision: https://reviews.llvm.org/D83572
2020-07-14 08:07:30 +01:00
Sanjay Patel 8779b11410 [DAGCombiner] rot i16 X, 8 --> bswap X
We have this generic transform in IR (instcombine),
but as shown in PR41098:
http://bugs.llvm.org/PR41098
...the pattern may emerge in codegen too.

x86 has a potential refinement/reversal opportunity here,
but that should come later or needs a target hook to
avoid the transform. Converting to bswap is the more
specific form, so we should use it if it is available.
2020-07-13 12:01:53 -04:00
Sanjay Patel 2df46a5743 [DAGCombiner] allow load/store merging if pairs can be rotated into place
This carves out an exception for a pair of consecutive loads that are
reversed from the consecutive order of a pair of stores. All of the
existing profitability/legality checks for the memops remain between
the 2 altered hunks of code.

This should give us the same x86 base-case asm that gcc gets in
PR41098 and PR44895:
http://bugs.llvm.org/PR41098
http://bugs.llvm.org/PR44895

I think we are missing a potential subsequent conversion to use "movbe"
if the target supports that. That might be similar to what AArch64
would use to get "rev16".

Differential Revision: https://reviews.llvm.org/D83567
2020-07-13 08:57:00 -04:00
Sanjay Patel f1bbf3acb4 Revert "[DAGCombiner] allow load/store merging if pairs can be rotated into place"
This reverts commit 591a3af5c7.
The commit message was cut off and failed to include the review citation.
2020-07-13 08:55:29 -04:00
Sanjay Patel 591a3af5c7 [DAGCombiner] allow load/store merging if pairs can be rotated into place
This carves out an exception for a pair of consecutive loads that are
reversed from the consecutive order of a pair of stores. All of the
existing profitability/legality checks for the memops remain between
the 2 altered hunks of code.

This should give us the same x86 base-case asm that gcc gets in
PR41098 and PR44895:i
http://bugs.llvm.org/PR41098
http://bugs.llvm.org/PR44895

I think we are missing a potential subsequent conversion to use "movbe"
if the target supports that. That might be similar to what AArch64
would use to get "rev16".

Differential Revision:
2020-07-13 08:53:06 -04:00
Kerry McLaughlin afcc9a81d2 [SVE][Codegen] Add a helper function for pointer increment logic
Summary:
Helper used when splitting load & store operations to calculate
the pointer + offset for the high half of the split

Reviewers: efriedma, sdesmalen, david-arm

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83577
2020-07-13 10:53:40 +01:00
Sanjay Patel 39009a8245 [DAGCombiner] tighten fast-math constraints for fma fold
fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)

This is only allowed when "reassoc" is present on the fadd.

As discussed in D80801, this transform goes beyond
what is allowed by "contract" FMF (-ffp-contract=fast).
That is because we are fusing the trailing add of 'E' with a
multiply, but without "reassoc", the code mandates that the
products A*B and C*D are added together before adding in 'E'.

I've added this example to the LangRef to try to clarify the
meaning of "contract". If that seems reasonable, we should
probably do something similar for the clang docs because
there does not appear to be any formal spec for the behavior
of -ffp-contract=fast.

Differential Revision: https://reviews.llvm.org/D82499
2020-07-12 08:51:49 -04:00
Sanjay Patel 02fec9d2a5 [DAGCombiner] move/rename variables for readability; NFC 2020-07-10 11:28:51 -04:00
David Sherwood da731894a2 [CodeGen] Replace calls to getVectorNumElements() in DAGTypeLegalizer::SetSplitVector
In DAGTypeLegalizer::SetSplitVector I have changed calls in the assert
from getVectorNumElements() to getVectorElementCount(), since this
code path works for both fixed and scalable vectors.

This fixes up one warning in the test:

  sve-sext-zext.ll

Differential Revision: https://reviews.llvm.org/D83196
2020-07-10 08:29:17 +01:00
David Sherwood 229dfb4728 [CodeGen] Replace calls to getVectorNumElements() in SelectionDAG::SplitVector
This patch replaces some invalid calls to getVectorNumElements() with calls
to getVectorMinNumElements() instead, since the code paths changed in this
patch work for both fixed and scalable vector types.

Fixes warnings in this test:

  sve-sext-zext.ll

Differential Revision: https://reviews.llvm.org/D83203
2020-07-10 08:11:30 +01:00
Sanjay Patel a46cf40240 [DAGCombiner] convert if-chain in store merging to switch; NFC 2020-07-09 17:20:04 -04:00
Sanjay Patel b476e6a642 [DAGCombiner] add helper function for store merging of loaded values; NFC 2020-07-09 17:20:04 -04:00
Sanjay Patel f98a602c2e [DAGCombiner] add helper function for store merging of extracts; NFC 2020-07-09 17:20:03 -04:00
Sanjay Patel 8d74cb01b7 [DAGCombiner] add helper function for store merging of constants; NFC 2020-07-09 17:20:03 -04:00
Sanjay Patel 6890e2a17b [DAGCombiner] add helper function to manage list of consecutive stores; NFC 2020-07-09 17:20:03 -04:00
Christopher Tetreault ff5b9a7b3b [SVE] Remove calls to VectorType::getNumElements from CodeGen
Reviewers: efriedma, fpetrogalli, sdesmalen, RKSimon, arsenm

Reviewed By: RKSimon

Subscribers: wdng, tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82210
2020-07-09 12:43:36 -07:00
Lucas Prates fc39a9ca0e [CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand
Summary:
When legalizing a biscast operation from an fp16 operand to an i16 on a
target that requires both input and output types to be promoted to
32-bits, an assertion can fail when building the new node due to a
mismatch between the the operation's result size and the type specified to
the node.

This patches fix the issue by making sure the bit width of the types
match for the FP_TO_FP16 node, covering the difference with an extra
ANYEXTEND operation.

Reviewers: ostannard, efriedma, pirama, jmolloy, plotfi

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82552
2020-07-09 09:46:17 +01:00
Qiu Chaofan 4254ed5c32 [Legalizer] Fix wrong operand in split vector helper
This should be a typo introduced in D69275, which may cause an unknown
segment fault in getNode.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D83376
2020-07-09 09:57:29 +08:00
Matt Arsenault 18bd821f02 DAG: Remove redundant finalizeLowering call
9cac4e6d1403554b06ec2fc9d834087b1234b695/D32628 intended to eliminate
this, and move all isel pseudo expansion to FinalizeISel. This was a
bad rebase or something, and failed to actually delete this call.

GlobalISel also has a redundant call of finalizeLowering. However, it
requires more work to remove it since it currently triggers a lot of
verifier errors in tests.
2020-07-08 18:48:20 -04:00
Matt Arsenault 2ec5fc0c61 DAG: Remove redundant handling of reg fixups
It looks like 9cac4e6d14 accidentally
added a second copy of this from a bad rebase or something. This
second copy was added, and the finalizeLowering call was not deleted
as intended.
2020-07-08 18:32:43 -04:00
Sanjay Patel 1265eb2d5f [DAGCombiner] clean up in mergeConsecutiveStores(); NFC 2020-07-08 14:48:05 -04:00