Commit Graph

11192 Commits

Author SHA1 Message Date
Nikita Popov 393b9e9db3 [MemLoc] Require LocationSize argument (NFC)
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
2020-11-19 21:45:52 +01:00
Leonard Chan a97f62837f [llvm][IR] Add dso_local_equivalent Constant
The `dso_local_equivalent` constant is a wrapper for functions that represents a
value which is functionally equivalent to the global passed to this. That is, if
this accepts a function, calling this constant should have the same effects as
calling the function directly. This could be a direct reference to the function,
the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the
resolved function as a call target.

When lowered, the returned address must have a constant offset at link time from
some other symbol defined within the same binary. The address of this value is
also insignificant. The name is leveraged from `dso_local` where use of a function
or variable is resolved to a symbol in the same linkage unit.

In this patch:
- Addition of `dso_local_equivalent` and handling it
- Update Constant::needsRelocation() to strip constant inbound GEPs and take
  advantage of `dso_local_equivalent` for relative references

This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959)
which makes vtables readonly. This works by replacing the dynamic relocations for
function pointers in them with static relocations that represent the offset between
the vtable and virtual functions. If a function is externally defined,
`dso_local_equivalent` can be used as a generic wrapper for the function to still
allow for this static offset calculation to be done.

See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details.

Differential Revision: https://reviews.llvm.org/D77248
2020-11-19 10:26:17 -08:00
Florian Hahn 1983acce7c
[SelDAGBuilder] Do not require simple VTs for constraints.
In some cases, the values passed to `asm sideeffect` calls cannot be
mapped directly to simple MVTs. Currently, we crash in the backend if
that happens. An example can be found in the @test_vector_too_large_r_m
test case, where we pass <9 x float> vectors. In practice, this can
happen in cases like the simple C example below.

using vec = float __attribute__((ext_vector_type(9)));
void f1 (vec m) {
  asm volatile("" : "+r,m"(m) : : "memory");
}

One case that use "+r,m" constraints for arbitrary data types in
practice is google-benchmark's DoNotOptimize.

This patch updates visitInlineAsm so that it use MVT::Other for
constraints with complex VTs. It looks like the rest of the backend
correctly deals with that and properly legalizes the type.

And we still report an error if there are no registers to satisfy the
constraint.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D91710
2020-11-19 09:31:54 +00:00
Kerry McLaughlin 306c8ab208 [SVE][CodeGen] Improve codegen of scalable masked scatters
If the scatter store is able to perform the sign/zero extend of
its index, this is folded into the instruction with refineIndexType().
Additionally, refineUniformBase() will return the base pointer and index
from an add + splat_vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90942
2020-11-13 11:19:36 +00:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Simon Pilgrim 8996742741 [KnownBits] Add KnownBits::makeConstant helper. NFCI.
Helper for cases where we need to create a KnownBits from a (fully known) constant value.
2020-11-12 16:16:04 +00:00
Pavel Iliin fdb979cfbb [NFC] [Legalize] Fix spaces and style. 2020-11-11 19:24:15 +00:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
Kerry McLaughlin 170947a5de [SVE][CodeGen] Lower scalable masked scatters
Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only)

Changes included in this patch:
 - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use.
    Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts.
 - Added the getCanonicalIndexType function to convert redundant addressing
   modes (e.g. scaling is redundant when accessing bytes)
 - Tests with 32 & 64-bit scaled & unscaled offsets

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90941
2020-11-11 11:50:22 +00:00
Kerry McLaughlin ffbbfc76ca [SVE][CodeGen] Add the isTruncatingStore flag to MSCATTER
This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter().
Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print
the details of masked scatters (is truncating, signed or scaled).

This is the first in a series of patches which adds support for scalable masked scatters

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90939
2020-11-11 10:58:24 +00:00
Chen Zheng 09e34048bf [SelectionDAG] fminnum should be a binary operator
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D91163
2020-11-11 03:41:40 -05:00
David Zarzycki a41ea782c8 [SelectionDAG] Enable CTPOP optimization fine tuning
Add a TLI hook to allow SelectionDAG to fine tune the conversion of CTPOP to a chain of "x & (x - 1)" when CTPOP isn't legal.

A subsequent patch will attempt to fine tune the X86 code gen.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89952
2020-11-09 13:49:01 -05:00
Paul Robinson 920befb337 [FastISel] Reduce spills around mem-intrinsic calls
FastISel generates instructions to materialize "local values" at the
top of a block, in the hope that these values could be reused within
the block.  To reduce spills and restores, FastISel treats calls as
sub-block boundaries, flushing the "local value map" at each call.

This patch treats the mem* intrinsics as if they were calls, because
at O0 generally they are calls.  Eliminating these spills/restores is
actually better for debugging (especially a "continue at this line"
command), code size, stack frame size, and maybe even performance.

Differential Revision: https://reviews.llvm.org/D90877
2020-11-09 09:45:14 -08:00
David Zarzycki 57e46e7123 [SelectionDAG] NFC: Hoist is legal check
This was requested during the code review of D89952.
2020-11-09 10:55:15 -05:00
Francesco Petrogalli fc2fe6817e [llvm][AArch64] Simplify (and (sign_extend..) #bitmask).
Fold

    VT = (and (sign_extend NarrowVT to VT) #bitmask)

into

    VT = (zero_extend NarrowVT)

With this combine, the test replaces a sign extended load + an
unsigned extention with a zero extended load to render one of the
operands of the last multiplication.

  BEFORE                       |  AFTER
    f_i16_i32:                 |    f_i16_i32:
         .fnstart              |           .fnstart
         ldrsh   r0, [r0]      |           ldrh    r1, [r1]
         ldrsh   r1, [r1]      |           ldrsh   r0, [r0]
         smulbb  r0, r1, r0    |           smulbb  r0, r0, r1
         uxth    r1, r1        |           mul     r0, r0, r1
         mul     r0, r0, r1    |           bx      lr
         bx      lr            |

Reviewed By: resistor

Differential Revision: https://reviews.llvm.org/D90605
2020-11-09 12:53:36 +00:00
Craig Topper 98d7e583db [LegalizeTypes] Remove unnecessary if around switch in ScalarizeVectorOperand and SplitVectorOperand. NFC
The if was checking !Res.getNode() but that's always true since
Res was initialized to SDValue() and not touched before the if.

This appears to be a leftover from a previous implementation of
Custom legalization where Res was updated instead of returning
immediately.
2020-11-05 11:00:51 -08:00
Simon Pilgrim bf04e34383 [DAG] computeKnownBits - Replace ISD::SREM handling with KnownBits::srem to reduce code duplication 2020-11-05 17:12:58 +00:00
Simon Pilgrim e237d56b43 [KnownBits] Move ValueTracking/SelectionDAG UREM KnownBits handling to KnownBits::urem. NFCI.
Both these have the same implementation - so move them to a single KnownBits copy.

GlobalISel will be able to use this as well with minimal effort.
2020-11-05 14:30:59 +00:00
Simon Pilgrim 32bee18b84 [KnownBits] Move ValueTracking/SelectionDAG UDIV KnownBits handling to KnownBits::udiv. NFCI.
Both these have the same implementation - so move them to a single KnownBits copy.

GlobalISel will be able to use this as well with minimal effort.
2020-11-05 13:42:42 +00:00
Cameron McInally c126eb7529 [SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL
Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247.

Differential Revision: https://reviews.llvm.org/D90644
2020-11-04 14:20:31 -06:00
Fraser Cormack f99580c1e5 [DAGCombine] Fix bug in load scalarization
Summary:
For vector element types which are not byte-sized, we would generate
incorrect scalar offsets and produce incorrect codegen.

This optimization could potentially be supported in the future, e.g. by
loading in bytes, then shifting and masking out the remaining bits of
the vector element. However, without an upstream target to test against
it's best to avoid the bad codegen in the simplest possible way.

Related to this bug:

  https://bugs.llvm.org/show_bug.cgi?id=27600

Reviewed by: foad

Differential Revision: https://reviews.llvm.org/D78568
2020-11-04 19:02:40 +00:00
Kerry McLaughlin f2412d372d [SVE][CodeGen] Lower scalable integer vector reductions
This patch uses the existing LowerFixedLengthReductionToSVE function to also lower
scalable vector reductions. A separate function has been added to lower VECREDUCE_AND
& VECREDUCE_OR operations with predicate types using ptest.

Lowering scalable floating-point reductions will be addressed in a follow up patch,
for now these will hit the assertion added to expandVecReduce() in TargetLowering.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D89382
2020-11-04 11:38:49 +00:00
Simon Pilgrim 825e517e34 [DAG] computeKnownBits - Replace ISD::MUL handling with the common KnownBits::computeForMul implementation 2020-11-04 11:32:08 +00:00
Simon Pilgrim e9b88c754a [DAG] computeKnownBits - Move ISD::SRA handling into KnownBits::ashr
As discussed on D90527, we should be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.
2020-11-03 18:09:33 +00:00
Simon Pilgrim cb798f040a [DAG] computeKnownBits - Move (most) ISD::SRL handling into KnownBits::lshr
As discussed on D90527, we should be be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.

The refactor to use the KnownBits fixed/min/max constant helpers allows us to hit a couple of cases that we were missing before.

We still need the getValidMinimumShiftAmountConstant case as KnownBits doesn't handle per-element vector cases.
2020-11-03 17:30:36 +00:00
Simon Pilgrim cab21d4fa8 [DAG] computeKnownBits - Move (most) ISD::SHL handling into KnownBits::shl
As discussed on D90527, we should be be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.

The refactor to use the KnownBits fixed/min/max constant helpers allows us to hit a couple of cases that we were missing before.

We still need the getValidMinimumShiftAmountConstant case as KnownBits doesn't handle per-element vector cases.
2020-11-03 14:22:28 +00:00
Qiu Chaofan 1f852ba853 [PowerPC] Avoid unnecessary fadd for unsigned to ppcf128
Unsigned 32-bit or shorter integer to ppcf128 conversion are currently
expanded as signed-to-double with an extra fadd to 'complement'. But on
PowerPC we have native instruction to directly convert unsigned to
double since ISA v2.06. This patch exploits it.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D89786
2020-11-01 23:22:47 +08:00
Cameron McInally dda1e74b58 [Legalize] Add legalizations for VECREDUCE_SEQ_FADD
Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass.

Differential Revision: https://reviews.llvm.org/D90247
2020-10-30 16:02:55 -05:00
Nikita Popov 6c2ad4cf87 [SDAG] Extract helper to determine neutral element (NFC)
Make the existing VECREDUCE based code more generic, but expressing
it in terms of the neutral value of the base opcode instead.
2020-10-29 22:05:06 +01:00
Nikita Popov a5f172927d [SDAG] Fix neutral value for vecreduce_fadd
The neutral value for FADD is -0.0, not 0.0, so this is what we
need to pad vectors with.
2020-10-29 21:27:59 +01:00
Nikita Popov 91bf172088 [SDAG] Extract helper to get vecreduce base opcode (NFC) 2020-10-29 20:22:22 +01:00
Simon Pilgrim f53d7f55f1 [DAG] Move canFoldInAddressingMode before foldBinOpIntoSelect. NFC.
Reduces the diff in D90113.
2020-10-28 12:16:05 +00:00
Sven van Haastregt 5d03080092 [TargetLowering] Add i1 condition for bit comparison fold
For i1 types, boolean false is represented identically regardless of
the boolean content, so we can allow optimizations that otherwise
would not be correct for booleans with false represented as a negative
one.

Patch by Erik Hogeman.

Differential Revision: https://reviews.llvm.org/D90145
2020-10-27 12:22:20 +00:00
Peter Waller 5b742a0c10 [SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination
The modified code in visitSTORE was missing a scalable vector check, and still
using the now deprecated implicit cast of TypeSize to uint64_t through the
overloaded operator. This patch fixes these issues.

This brings the logic in line with the comment on the context line immediately
above the added precondition.

Add a test in sve-redundant-store.ll that the warning is not triggered.

Differential Revision: https://reviews.llvm.org/D89701
2020-10-26 16:37:48 +00:00
Peter Waller 6536d6040f Revert "[SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination"
This reverts commit 4604441386.

Reverting because it was not the intended version of the patch, which
follows this patch.
2020-10-26 16:37:00 +00:00
Peter Waller 4604441386 [SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination
The modified code in visitSTORE was missing a scalable vector check, and still
using the now deprecated implicit cast of TypeSize to uint64_t through the
overloaded operator. This patch fixes these issues.

This brings the logic in line with the comment on the context line immediately
above the added precondition.

Add a test in Redundantstores.ll that the warning is not triggered.
2020-10-26 16:23:42 +00:00
Simon Pilgrim ce356e1546 [DAG] Add BuildVectorSDNode::getRepeatedSequence helper to recognise multi-element splat patterns
Replace the X86 specific isSplatZeroExtended helper with a generic BuildVectorSDNode method.

I've just used this to simplify the X86ISD::BROADCASTM lowering so far (and remove isSplatZeroExtended), but we should be able to use this in more places to lower to complex broadcast patterns.

Differential Revision: https://reviews.llvm.org/D87930
2020-10-24 12:23:09 +01:00
Simon Pilgrim 62b17a7697 [LegalizeTypes] Legalize vector rotate operations
Lower vector rotate operations as long as the legalization occurs outside of LegalizeVectorOps.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47320

Patch By: @rsanthir.quic (Ryan Santhirarajan)

Differential Revision: https://reviews.llvm.org/D89497
2020-10-24 11:30:32 +01:00
Cameron McInally a1cc274cb3 [SVE] Lower fixed length VECREDUCE_SEQ_FADD operation
Differential Revision: https://reviews.llvm.org/D89162
2020-10-23 16:24:02 -05:00
Denis Antrushin 4f7ee55971 Revert "[Statepoints] Allow deopt GC pointer on VReg if gc-live bundle is empty."
Downstream testing revealed some problems with this patch.
Reverting while investigating.
This reverts commit 2b96dcebfa.
2020-10-23 21:55:06 +07:00
Craig Topper 9e884169a2 [FPEnv][X86][SystemZ] Use different algorithms for i64->double uint_to_fp under strictfp to avoid producing -0.0 when rounding toward negative infinity
Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes.

There are still problems with unsigned i32 conversions too which I'll try to fix in another patch.

Fixes part of PR47393

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87115
2020-10-21 18:12:54 -07:00
Gaurav Jain 4634ad6c0b [NFC] Set return type of getStackPointerRegisterToSaveRestore to Register
Differential Revision: https://reviews.llvm.org/D89858
2020-10-21 16:19:38 -07:00
Simon Pilgrim 88523f6f4b [DAG] getNode(ISD::EXTRACT_SUBVECTOR) Drop unnecessary N2C null check - we assert that this isn't null and have already used the pointer. NFCI.
Fixes cppcheck + null dereference warning.
2020-10-21 11:53:44 +01:00
Sven van Haastregt bfc961aeb2 [TargetLowering] Check boolean content when folding bit compare
Updates an optimization that relies on boolean contents being either 0
or 1 to properly check for this before triggering.

The following:
  (X & 8) != 0 --> (X & 8) >> 3
Produces unexpected results when a boolean 'true' value is represented
by negative one.

Patch by Erik Hogeman.

Differential Revision: https://reviews.llvm.org/D89390
2020-10-21 11:46:55 +01:00
David Sherwood 5b17b323a6 [SVE][CodeGen] Replace use of TypeSize comparison operator in CreateStackTemporary
We were previously relying upon the TypeSize comparison operators to
obtain the maximum size of two types, however use of such operators is
being deprecated in favour of making the caller aware that it could
be dealing with scalable vector types. I have changed the code to assert
that the two types have the same scalable property and thus we can
simply take the maximum of the known minimum sizes instead.

Differential Revision: https://reviews.llvm.org/D88563
2020-10-21 08:31:36 +01:00
Qiu Chaofan 1b2fe71ecf [DAGCombiner] Tighten reasscociation of visitFMA
From LangRef, FMF contract should not enable reassociating to form
arbitrary contractions. So it should not help rearrange nodes like
(fma (fmul x, c1), c2, y) into (fma x, c1*c2, y).

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89527
2020-10-20 10:13:01 +08:00
Craig Topper edd0cb11bd [SelectionDAG][X86] Enable SimplifySetCC CTPOP transforms for vector splats
This enables these transforms for vectors:
(ctpop x) u< 2 -> (x & x-1) == 0
(ctpop x) u> 1 -> (x & x-1) != 0
(ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0)
(ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)

All enabled if CTPOP isn't Legal. This differs from the scalar
behavior where the first two are done unconditionally and the
last two are done if CTPOP isn't Legal or Custom. The Legal
check produced better results for vectors based on X86's
custom handling. Might be worth re-visiting scalars here.

I disabled the looking through truncate for vectors. The
code that creates new setcc can use the same result VT as the
original setcc even if we truncated the input. That may work
work for most scalars, but definitely wouldn't work for vectors
unless it was a vector of i1.

Fixes or at least improves PR47825

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89346
2020-10-19 12:56:59 -07:00
Amy Kwan 6a946fd06f [DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use isOperationLegalOrCustom directly instead.
MULH is often expanded on targets.
This patch removes the isMulhCheaperThanMulShift hook and uses
isOperationLegalOrCustom instead.

Differential Revision: https://reviews.llvm.org/D80485
2020-10-19 12:23:04 -05:00
David Sherwood 3945b69e81 [SVE][CodeGen] Replace more TypeSize comparison operators with their scalar equivalents
In certain places in llvm/lib/CodeGen we were relying upon the TypeSize
comparison operators when in fact the code was only ever expecting
either scalar values or fixed width vectors. This patch changes a few
functions that were always expecting to work on scalar or fixed width
types:

1. DAGCombiner::mergeTruncStores - deals with scalar integers only.
2. DAGCombiner::ReduceLoadWidth - not valid for vectors.
3. DAGCombiner::createBuildVecShuffle - should only be used for
   fixed width vectors.
4. SelectionDAGLegalize::ExpandFCOPYSIGN and
   SelectionDAGLegalize::getSignAsIntValue - only work on scalars.

Differential Revision: https://reviews.llvm.org/D88562
2020-10-19 08:38:50 +01:00
David Sherwood 35a531fb45 [SVE][CodeGen][NFC] Replace TypeSize comparison operators with their scalar equivalents
In certain places in llvm/lib/CodeGen we were relying upon the TypeSize
comparison operators when in fact the code was only ever expecting
either scalar values or fixed width vectors. I've changed some of these
places to use the equivalent scalar operator.

Differential Revision: https://reviews.llvm.org/D88482
2020-10-19 08:30:31 +01:00
David Sherwood f693f915a0 [SVE][CodeGen] Replace uses of TypeSize comparison operators
In certain places in the code we can never end up in a situation where
we're mixing fixed width and scalable vector types. For example,
we can't have truncations and extends that change the lane count. Also,
in other places such as GenWidenVectorStores and GenWidenVectorLoads we
know from the behaviour of FindMemType that we can never choose a vector
type with a different scalable property.

In various places I have used EVT::bitsXY functions instead of
TypeSize::isKnownXY, where it probably makes sense to keep an assert
that scalable properties match.

Differential Revision: https://reviews.llvm.org/D88654
2020-10-19 08:08:41 +01:00
Craig Topper 278bd06891 [TargetLowering] Extract simplifySetCCs ctpop into a separate function. NFCI
As requested in D89346. This allows us to add some early outs.

I reordered some checks a little bit to make the more common bail outs happen earlier. Like checking opcode before checking hasOneUse. And I moved the bit width check to make sure it was safe to look through a truncate to the spot where we look through truncates instead of after.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89494
2020-10-16 19:47:56 -07:00
Denis Antrushin 8f0ddd4a1a [Statepoints] Remove MI limit on number of tied operands.
After D87915 statepoint can have more than 15 tied operands.
Remove this restriction from statepoint lowering code.
2020-10-15 19:02:38 +07:00
Craig Topper 50c9f1e11d [TargetLowering] Replace Log2_32_Ceil with Log2_32 in SimplifySetCC ctpop combine.
This combine can look through (trunc (ctpop X)). When doing this
it tries to make sure the trunc doesn't lose any information
from the ctpop. It does this by checking that the truncated type
has more bits that Log2_32_Ceil of the ctpop type. The Ceil is
unnecessary and pessimizes non-power of 2 types.

For example, ctpop of i256 requires 9 bits to represent the max
value of 256. But ctpop of i255 only requires 8 bits to represent
the max result of 255. Log2_32_Ceil of 256 and 255 both return 8
while Log2_32 returns 8 for 256 and 7 for 255

The code with popcnt enabled is a regression for this test case,
but it does match what already happens with i256 truncated to i9.
Since power of 2 is more likely, I don't think it should block
this change.

Differential Revision: https://reviews.llvm.org/D89412
2020-10-15 01:05:07 -07:00
Jeremy Morse c4e7857d4e [DebugInstrRef] Create DBG_INSTR_REFs in SelectionDAG
When given the -experimental-debug-variable-locations option (via -Xclang
or to llc), have SelectionDAG generate DBG_INSTR_REF instructions instead
of DBG_VALUE. For now, this only happens in a limited circumstance: when
the value referred to is not a PHI and is defined in the current block.
Other situations introduce interesting problems, addresed in later patches.

Practically, this patch hooks into InstrEmitter and if it can find a
defining instruction for a value, gives it an instruction number, and
points the DBG_INSTR_REF at that <instr, operand> pair.

Differential Revision: https://reviews.llvm.org/D85747
2020-10-14 14:24:08 +01:00
Craig Topper 1687a8d83b [X86][SelectionDAG] Add SADDO_CARRY and SSUBO_CARRY to support multipart signed add/sub overflow legalization.
This passes existing X86 test but I'm not sure if it handles all type
legalization cases it needs to.

Alternative to D89200

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D89222
2020-10-12 23:18:29 -07:00
Simon Pilgrim c252200e4d [DAG][ARM][MIPS][RISCV] Improve funnel shift promotion to use 'double shift' patterns
Based on a discussion on D88783, if we're promoting a funnel shift to a width at least twice the size as the original type, then we can use the 'double shift' patterns (shifting the concatenated sources).

Differential Revision: https://reviews.llvm.org/D89139
2020-10-12 14:11:02 +01:00
David Sherwood c5ba0d33cc [SVE] Make ElementCount and TypeSize use a new PolySize class
I have introduced a new template PolySize class, where the template
parameter determines the type of quantity, i.e. for an element
count this is just an unsigned value. The ElementCount class is
now just a simple derivation of PolySize<unsigned>, whereas TypeSize
is more complicated because it still needs to contain the uint64_t
cast operator, since there are still many places in the code that
rely upon this implicit cast. As such the class also still needs
some of it's own operators.

I've tried to minimise the amount of code in the base PolySize
class, which led to a couple of changes:

1. In some places we were relying on '==' operator comparisons
between ElementCounts and the scalar value 1. I didn't put this
operator in the new PolySize class, and thought it was actually
clearer to use the isScalar() function instead.
2. I removed the isByteSized function and replaced it with calls
to isKnownMultipleOf(8).

I've also renamed NextPowerOf2 to be coefficientNextPowerOf2 so
that it's more consistent with coefficientDivideBy.

Differential Revision: https://reviews.llvm.org/D88409
2020-10-12 08:23:38 +01:00
Krzysztof Parzyszek 61eaa2e14a [SDAG] Remember to set UndefElts in isSplatValue for SPLAT_VECTOR 2020-10-10 19:42:24 -05:00
Denis Antrushin 2b96dcebfa [Statepoints] Allow deopt GC pointer on VReg if gc-live bundle is empty.
Currently we allow passing pointers from deopt bundle on VReg only if
they were seen in list of gc-live pointers passed on VRegs.
This means that for the case of empty gc-live bundle we spill deopt
bundle's pointers. This change allows lowering deopt pointers to VRegs
in case of empty gc-live bundle. In case of non-empty gc-live bundle,
behavior does not change.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D88999
2020-10-10 14:58:08 +07:00
Esme-Yi e9fd8823ba [DAGCombiner] Add decomposition patterns for Mul-by-Imm.
Summary: This patch is derived from D87384.
In this patch we expand the existing decomposition of mul-by-constant to be more general by implementing 2 patterns:
```
  mul x, (2^N + 2^M) --> (add (shl x, N), (shl x, M))
  mul x, (2^N - 2^M) --> (sub (shl x, N), (shl x, M))
```
The conversion will be trigged if the multiplier is a big constant that the target can't use a single multiplication instruction to handle. This is controlled by the hook `decomposeMulByConstant`.
More over, the conversion benefits from an ILP improvement since the instructions are independent. A case with the sequence like following also gets benefit since a shift instruction is saved.

```
*res1 = a * 0x8800;
*res2 = a * 0x8080;
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D88201
2020-10-09 08:51:40 +00:00
Fangrui Song c3de9a9e69 Fix incorect Register -> MCRegister conversion
getReg returns a Register which may represent a virtual register.
2020-10-08 21:40:48 -07:00
Amara Emerson e72cfd938f Rename the VECREDUCE_STRICT_{FADD,FMUL} SDNodes to VECREDUCE_SEQ_{FADD,FMUL}.
The STRICT was causing unnecessary confusion. I think SEQ is a more accurate
name for what they actually do, and the other obvious option of "ORDERED"
has the issue of already having a meaning in FP contexts.

Differential Revision: https://reviews.llvm.org/D88791
2020-10-07 10:45:09 -07:00
Amara Emerson 322d0afd87 [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787
2020-10-07 10:36:44 -07:00
Jay Foad 1aa8e6a51a [SDag] SimplifyDemandedBits: simplify to FP constant if all bits known
We were already doing this for integer constants. This patch implements
the same thing for floating point constants.

Differential Revision: https://reviews.llvm.org/D88570
2020-10-07 09:24:38 +01:00
Denis Antrushin c08d48fc2d [Statepoints] Change statepoint machine instr format to better suit VReg lowering.
Current Statepoint MI format is this:

   STATEPOINT
   <id>, <num patch bytes >, <num call arguments>, <call target>,
   [call arguments...],
   <StackMaps::ConstantOp>, <calling convention>,
   <StackMaps::ConstantOp>, <statepoint flags>,
   <StackMaps::ConstantOp>, <num deopt args>, [deopt args...],
   <gc base/derived pairs...> <gc allocas...>

Note that GC pointers are listed in pairs <base,derived>.
This causes base pointers to appear many times (at least twice) in
instruction, which is bad for us when VReg lowering is ON.
The problem is that machine operand tiedness is 1-1 relation, so
it might look like this:

  %vr2 = STATEPOINT ... %vr1, %vr1(tied-def0)

Since only one instance of %vr1 is tied, that may lead to incorrect
codegen (see PR46917 for more details), so we have to always spill
base pointers. This mostly defeats new VReg lowering scheme.

This patch changes statepoint instruction format so that every
gc pointer appears only once in operand list. That way they all can
be tied. Additional set of operands is added to preserve base-derived
relation required to build stackmap.
New statepoint has following format:

  STATEPOINT
  <id>, <num patch bytes>, <num call arguments>, <call target>,
  [call arguments...],
  <StackMaps::ConstantOp>, <calling convention>,
  <StackMaps::ConstantOp>, <statepoint flags>,
  <StackMaps::ConstantOp>, <num deopt args>, [deopt args...],
  <StackMaps::ConstantOp>, <num gc pointers>, [gc pointers...],
  <StackMaps::ConstantOp>, <num gc allocas>,  [gc allocas...]
  <StackMaps::ConstantOp>, <num entries in gc map>, [base/derived indices...]

Changes are:
  - every gc pointer is listed only once in a flat length-prefixed list;
  - alloca list is prefixed with its length too;
  - following alloca list is length-prefixed list of base-derived
    indices of pointers from gc pointer list. Note that indices are
    logical (number of pointer), not absolute (index of machine operand).

Differential Revision: https://reviews.llvm.org/D87154
2020-10-06 17:40:29 +07:00
David Sherwood 4ed47d50ea [SVE][CodeGen] Fix DAGCombiner::ForwardStoreValueToDirectLoad for scalable vectors
In DAGCombiner::ForwardStoreValueToDirectLoad I have fixed up some
implicit casts from TypeSize -> uint64_t and replaced calls to
getVectorNumElements() with getVectorElementCount(). There are some
simple cases of forwarding that we can definitely support for
scalable vectors, i.e. when the store and load are both scalable
vectors and have the same size. I have added tests for the new
code paths here:

  CodeGen/AArch64/sve-forward-st-to-ld.ll

Differential Revision: https://reviews.llvm.org/D87098
2020-10-06 08:04:03 +01:00
Craig Topper 1127662c6d [SelectionDAG] Make sure FMF are propagated when getSetcc canonicalizes FP constants to RHS.
getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize
FP constants to the RHS. When this happens we should create the node
with the FMF that was requested. By using FlagInserter when can ensure
any calls to getNode/getSetcc during canonicalization will also get the flags.

Differential Revision: https://reviews.llvm.org/D88063
2020-10-05 14:55:23 -07:00
David Sherwood fa0293081d [SVE][CodeGen] Fix TypeSize/ElementCount related warnings in sve-split-store.ll
I have fixed up a number of warnings resulting from TypeSize -> uint64_t
casts and calling getVectorNumElements() on scalable vector types. I
think most of the changes are fairly trivial except for those in
DAGTypeLegalizer::SplitVecRes_MSTORE I've tried to ensure we create
the MachineMemoryOperands in a sensible way for scalable vectors.

I have added a CHECK line to the following test:

 CodeGen/AArch64/sve-split-store.ll

that ensures no new warnings are added.

Differential Revision: https://reviews.llvm.org/D86928
2020-10-05 19:27:00 +01:00
Qiu Chaofan b326d4ff94 [SelectionDAG] Don't remove unused negated constant immediately
This reverts partial of a2fb5446 (actually, 2508ef01) about removing
negated FP constant immediately if it has no uses. However, as discussed
in bug 47517, there're cases when NegX is folded into constant from
other places while NegY is removed by that line of code and NegX is
equal to NegY. In these cases, NegX is deleted before used and crash
happens. So revert the code and add necessary test case.
2020-10-06 01:16:45 +08:00
Sanjay Patel 2ccbf3dbd5 [SDAG] fold x * 0.0 at node creation time
In the motivating case from https://llvm.org/PR47517
we create a node that does not get constant folded
before getNegatedExpression is attempted from some
other node, and we crash.

By moving the fold into SelectionDAG::simplifyFPBinop(),
we get the constant fold sooner and avoid the problem.
2020-10-04 11:31:57 -04:00
Denis Antrushin 7b19cd06d7 [Statepoints][ISEL] visitGCRelocate: chain to current DAG root.
This is similar to D87251, but for CopyFromRegs nodes.
Even for local statepoint uses we generate CopyToRegs/CopyFromRegs
nodes.  When generating CopyFromRegs in visitGCRelocate, we must chain
to current DAG root, not EntryNode, to ensure proper ordering of copy
w.r.t. statepoint node producing result for it.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D88639
2020-10-02 21:41:22 +07:00
David Sherwood b8ce6a6756 [SVE][CodeGen] Add new EVT/MVT getFixedSizeInBits() functions
When we know that a particular type is always going to be fixed
width we have so far been writing code like this:

  getSizeInBits().getFixedSize()

Since we are doing this in quite a few places now it seems to make
sense to add a new helper function that allows us to replace
these calls with a single getFixedSizeInBits() call.

Differential Revision: https://reviews.llvm.org/D88649
2020-10-02 07:47:31 +01:00
Kerry McLaughlin fcf70e1e3b [SVE][CodeGen] Lower scalable fp_extend & fp_round operations
This patch adds FP_EXTEND_MERGE_PASSTHRU & FP_ROUND_MERGE_PASSTHRU
ISD nodes, used to lower scalable vector fp_extend/fp_round operations.
fp_round has an additional argument, the 'trunc' flag, which is an integer of zero or one.

This also fixes a warning introduced by the new tests added to sve-split-fcvt.ll,
resulting from an implicit TypeSize -> uint64_t cast in SplitVecOp_FP_ROUND.

Reviewed By: sdesmalen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88321
2020-10-01 12:17:37 +01:00
Krzysztof Parzyszek db04bec5f1 [SDAG] Do not convert undef to 0 when folding CONCAT/BUILD_VECTOR
Differential Revision: https://reviews.llvm.org/D88273
2020-09-29 09:12:26 -05:00
Jay Foad 781edd501c [SDag] Verify DAG divergence after dumping. NFC.
When debugging, it's useful to be able to see the DAG that has just
failed divergence verification.
2020-09-29 14:05:07 +01:00
Jay Foad d6b04f3937 [SDag] Refactor and simplify divergence calculation and checking. NFC. 2020-09-29 14:05:07 +01:00
David Sherwood bafdd11326 [SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy
After some recent upstream discussion we decided that it was best
to avoid having the / operator for both ElementCount and TypeSize,
since this could give the impression that these classes can be used
in the same way as basic integer integer types. However, division
for scalable types is a bit odd because we are only dividing the
minimum quantity by a value, as opposed to something like:

  (MinSize * Vscale) / SomeValue

This is why when performing division it's important the caller
first establishes whether the operation makes sense, perhaps by
calling isKnownMultipleOf() prior to division. The caller must now
explictly call divideCoefficientBy() on the class to perform the
operation.

Differential Revision: https://reviews.llvm.org/D87700
2020-09-28 08:03:00 +01:00
Nikita Popov f229bf2e12 [Legalize][X86] Improve nnan fmin/fmax vector reduction
Use +/-Inf or +/-Largest as neutral element for nnan fmin/fmax
reductions. This avoids dropping any FMF flags. Preserving the
nnan flag in particular is important to get a good lowering on X86.

Differential Revision: https://reviews.llvm.org/D87586
2020-09-27 10:47:35 +02:00
Simon Pilgrim a61272a900 [DAG] Fold vector mul(x,0)/mul(x,1) to a clearing mask
If we're multiplying all elements of a vector by '0' or '1' then we can more efficiently perform this as a clearing mask (that is likely to further simplify to a shuffle blend).

This was noticed when reviewing D87502 but seems to help idiv/irem by constant cases even more as '0'/'1' values are often used for 'passthrough' cases.

Differential Revision: https://reviews.llvm.org/D88225
2020-09-26 14:31:57 +01:00
Qiu Chaofan c0f8e4c06c [SelectionDAG] Add guard to automatically insert flags
This is like FastMathFlagGuard in IR. Since we use SDAG instance to get
values, it's with SelectionDAG. By creating a FlagInserter in current
scope, all values created by getNode will get the flags if no Flags
argument provided.

In this patch, I applied it to floating point operations folding part in
DAG combiner, and removed Flags passing to getNode to show its effect.
Other places in DAG combiner and other helper methods similar to getNode
also need this. They can be done in follow-up patches.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D87361
2020-09-26 13:57:52 +08:00
Dávid Bolvanský 179e15d53a [SystemZ] Optimize bcmp calls (PR47420)
Solves https://bugs.llvm.org/show_bug.cgi?id=47420

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87988
2020-09-25 17:55:39 +02:00
Bill Wendling 0c0c57f7b2 Revert "[CodeGen] Postprocess PHI nodes for callbr"
Accidental commit.

This reverts commit 7f4c940bd0.
2020-09-24 14:35:23 -07:00
Bill Wendling 7f4c940bd0 [CodeGen] Postprocess PHI nodes for callbr
When processing PHI nodes after a callbr, we need to make sure that the
PHI nodes on the default branch are resolved after the callbr
(inserted after INLINEASM_BR). The PHI node values on the indirect
branches are processed before the INLINEASM_BR.

Differential Revision: https://reviews.llvm.org/D86260
2020-09-24 14:34:28 -07:00
Eli Friedman 3f739f736b [SelectionDAG][GISel] Make LegalizeDAG lower FNEG using integer ops.
Previously, if a floating-point type was legal, but FNEG wasn't legal,
we would use FSUB.  Instead, we should use integer ops, to preserve the
semantics.  (Alternatively, there's a compiler-rt call we could use, but
there isn't much reason to use that.)

It turns out we actually are still using this obscure codepath in a few
cases: on some targets, we have "legal" floating-point types that don't
actually support any floating-point operations.  In particular, ARM and
AArch64 are using this path.

The implementation for SelectionDAG is pretty simple because we can
reuse the infrastructure from FCOPYSIGN.

See also 9a3dc3e, the corresponding change to type legalization.

Also includes a "bonus" change to STRICT_FSUB legalization, so we can
lower a STRICT_FSUB to a float libcall.

Includes the changes to both LegalizeDAG and GlobalISel so we don't have
inconsistent results in the future.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46792 .

Differential Revision: https://reviews.llvm.org/D84287
2020-09-23 14:10:33 -07:00
David Sherwood e077367a28 [SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits
An existing function Type::getScalarSizeInBits returns a uint64_t
instead of a TypeSize class because the caller is requesting a
scalar size, which cannot be scalable. This patch makes other
similar functions requesting a scalar size consistent with that,
thereby eliminating more than 1000 implicit TypeSize -> uint64_t
casts.

Differential revision: https://reviews.llvm.org/D87889
2020-09-23 09:20:08 +01:00
Simon Pilgrim 4dada8d617 [DAG] Remove DAGTypeLegalizer::GenWidenVectorTruncStores (PR42046)
Just scalarize trunc stores - GenWidenVectorTruncStores does the same thing but is flawed (PR42046) and unused.

Differential Revision: https://reviews.llvm.org/D87708
2020-09-22 17:24:45 +01:00
Denis Antrushin ee86688b81 [Statepoints][ISEL] gc.relocate uniquification should be based on SDValue, not IR Value.
When exporting statepoint results to virtual registers we try to avoid
generating exports for duplicated inputs. But we erroneously use
IR Value* to check if inputs are duplicated. Instead, we should use
SDValue, because even different IR values can get lowered to the same
SDValue.
I'm adding a (degenerate) test case which emphasizes importance of this
feature for invoke statepoints.
If we fail to export only unique values we will end up with something
like that:

  %0 = STATEPOINT
  %1 = COPY %0

landing_pad:
  <use of %1>

And when exceptional path is taken, %1 is left uninitialized (COPY is never
execute).

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D87695
2020-09-21 19:44:46 +07:00
Alexander Belyaev 17dc729bd4 Revert "[NFC][ScheduleDAG] Remove unused EntrySU SUnit"
This reverts commit 0345d88de6.

Google internal backend uses EntrySU, we are looking into removing
dependency on it.

Differential Revision: https://reviews.llvm.org/D88018
2020-09-21 13:33:05 +02:00
Lucas Prates 53d238a961 [CodeGen] Fixing inconsistent ABI mangling of vlaues in SelectionDAGBuilder
SelectionDAGBuilder was inconsistently mangling values based on ABI
Calling Conventions when getting them through copyFromRegs in
SelectionDAGBuilder, causing duplicate value type convertions for
function arguments. The checking for the mangling requirement was based
on the value's originating instruction and was performed outside of, and
inspite of, the regular Calling Convention Lowering.

The issue could be observed in a scenario such as:

```
%arg1 = load half, half* %const, align 2
%arg2 = call fastcc half @someFunc()
call fastcc void @otherFunc(half %arg1, half %arg2)
; Here, %arg2 was incorrectly mangled twice, as the CallConv data from
; the call to @someFunc() was taken into consideration for the check
; when getting the value for processing the call to @otherFunc(...),
; after the proper convertion had taken place when lowering the return
; value of the first call.
```

This patch fixes the issue by disregarding the Calling Convention
information for such copyFromRegs, making sure the ABI mangling is
properly contanined in the Calling Convention Lowering.

This fixes Bugzilla #47454.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87844
2020-09-21 10:05:34 +01:00
Qiu Chaofan 1d782c2987 [PowerPC] Pass nofpexcept flag to custom lowered constrained ops
This is a follow-up of D86605. For strict DAG FP node, if its FP
exception behavior metadata is ignore, it should have nofpexcept flag.
But during custom lowering, this flag isn't passed down.

This is also seen on X86 target.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87390
2020-09-21 10:44:25 +08:00
Fangrui Song 6913812abc Fix some clang-tidy bugprone-argument-comment issues 2020-09-19 20:41:25 -07:00
Francis Visoiu Mistrih 0345d88de6 [NFC][ScheduleDAG] Remove unused EntrySU SUnit
EntrySU doesn't seem to be used at all when building the ScheduleDAG.

Differential Revision: https://reviews.llvm.org/D87867
2020-09-18 09:50:47 -07:00
Simon Pilgrim d967aaa8fa [DAG] BuildVectorSDNode::getSplatValue - pull out repeated getNumOperands() calls. NFCI. 2020-09-18 16:10:23 +01:00
Qiu Chaofan a2fb5446be [SelectionDAG] Check any use of negation result before removal
2508ef01 fixed a bug about constant removal in negation. But after
sanitizing check I found there's still some issue about it so it's
reverted.

Temporary nodes will be removed if useless in negation. Before the
removal, they'd be checked if any other nodes used it. So the removal
was moved after getNode. However in rare cases the node to be removed is
the same as result of getNode. We missed that and will be fixed by this
patch.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D87614
2020-09-17 16:00:54 +08:00
Craig Topper e30371d99d [DAGCombiner] Teach visitMSTORE to replace an all ones mask with an unmasked store.
Similar to what done in D87788 for MLOAD.

Again I've skipped indexed, truncating, and compressing stores.
2020-09-16 16:42:22 -07:00
Craig Topper 89ee4c0314 [DAGCombiner] Teach visitMLOAD to replace an all ones mask with an unmasked load
If we have an all ones mask, we can just a regular masked load. InstCombine already gets this in IR. But the all ones mask can appear after type legalization.

Only avx512 test cases are affected because X86 backend already looks for element 0 and the last element being 1. It replaces this with an unmasked load and blend. The all ones mask is a special case of that where the blend will be removed. That transform is only enabled on avx2 targets. I believe that's because a non-zero passthru on avx2 already requires a separate blend so its more profitable to handle mixed constant masks.

This patch adds a dedicated all ones handling to the target independent DAG combiner. I've skipped extending, expanding, and index loads for now. X86 doesn't use index so I don't know much about it. Extending made me nervous because I wasn't sure I could trust the memory VT had the right element count due to some weirdness in vector splitting. For expanding I wasn't sure if we needed different undef handling.

Differential Revision: https://reviews.llvm.org/D87788
2020-09-16 13:21:16 -07:00
Sebastian Neubauer 833b3b0d3a [AMDGPU] Add v3f16/v3i16 support to SDag
Fix lowering and instruction selection for v3x16 types
and enable InstCombine to emit them.

This patch only implements it for the selection dag.
GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and
GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work.

Differential Revision: https://reviews.llvm.org/D84420
2020-09-16 17:20:27 +02:00
Simon Pilgrim 3f682611ab [DAG] Remover getOperand() call. NFCI. 2020-09-16 11:18:58 +01:00
Qiu Chaofan e1669843f2 Revert "[SelectionDAG] Remove unused FP constant in getNegatedExpression"
2508ef01 doesn't totally fix the issue since we did not handle the case
when unused temporary negated result is the same with the result, which
is found by address sanitizer.
2020-09-15 22:03:50 +08:00
Simon Pilgrim 1abb4461ea StatepointLowering.cpp - remove unnecessary includes. NFCI.
These are all directly included in StatepointLowering.h
2020-09-15 12:18:23 +01:00
Simon Pilgrim bee79cdcc6 SelectionDAGBuilder.h - remove unnecessary includes. NFCI.
Reduce to forward declarations and move implicit dependencies down to the cpp files.
2020-09-15 12:18:22 +01:00
Qiu Chaofan 2508ef014e [SelectionDAG] Remove unused FP constant in getNegatedExpression
960cbc53 immediately removes nodes that won't be used to avoid
compilation time explosion. This patch adds the removal to constants to
fix PR47517.

Reviewed By: RKSimon, steven.zhang

Differential Revision: https://reviews.llvm.org/D87614
2020-09-15 17:59:10 +08:00
Craig Topper c193a689b4 [SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore.
The versions that take 'unsigned' will be removed in the future.

I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87592
2020-09-14 13:54:50 -07:00
Craig Topper 4208ea3e19 [FastISel] Bail out of selectGetElementPtr for vector GEPs.
The code that decomposes the GEP into ADD/MUL doesn't work properly
for vector GEPs. It can create bad COPY instructions or possibly
assert.

For now just bail out to SelectionDAG.

Fixes PR45906
2020-09-14 12:53:06 -07:00
Nikita Popov 53f36f06af [Legalize][ARM][X86] Add float legalization for VECREDUCE
This adds SoftenFloatRes, PromoteFloatRes and SoftPromoteHalfRes
legalizations for VECREDUCE, to fill the remaining hole in the SDAG
legalization. These legalizations simply expand the reduction and
let it be recursively legalized. For the PromoteFloatRes case at
least it is possible to do better than that, but it's pretty tricky
(because we need to consider the interaction of three different
vector legalizations and the type promotion) and probably not
really worthwhile.

I haven't added ExpandFloatRes support, as I am not familiar with
ppc_fp128.

Differential Revision: https://reviews.llvm.org/D87569
2020-09-14 20:42:09 +02:00
Nikita Popov 8e69c3cde8 [DAGCombiner] Fold fmin/fmax with INF / FLT_MAX
Similar to D87415, this folds the various float min/max opcodes
with a constant INF or -INF operand, or FLT_MAX / -FLT_MAX operand
if the ninf flag is set. Some of the folds are only possible under
nnan.

The fminnum(X, INF) with nnan and fmaxnum(X, -INF) with nnan cases
are needed to improve the VECREDUCE_FMIN/FMAX lowerings on X86,
the rest is here for the sake of completeness.

Differential Revision: https://reviews.llvm.org/D87571
2020-09-14 19:59:33 +02:00
Simon Pilgrim 00e5676cf6 [LegalizeDAG] Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI. 2020-09-14 11:09:43 +01:00
David Sherwood 15bff4dec4 [CodeGen] Fix bug in IncrementPointer
In an earlier patch I meant to add the correct flags to the ADD
node when incrementing the pointer, but forgot to pass them to
SelectionDAG::getNode.

Differential Revision: https://reviews.llvm.org/D87496
2020-09-14 08:03:55 +01:00
Craig Topper 56b33391d3 [SelectionDAG] Move ISD:PARITY formation from DAGCombine to SimplifyDemandedBits.
Previously, we formed ISD::PARITY by looking for (and (ctpop X), 1)
but the AND might be separated from the ctpop. For example if the
parity result is multiplied by 2, we'll pull the AND through the
shift.

So to handle more cases, move to SimplifyDemandedBits where we
can handle more cases that result in only the LSB of the CTPOP
being used.
2020-09-13 21:04:13 -07:00
Qiu Chaofan a4c5351986 [DAGCombiner] Propagate FMF flags in FMA folding
DAG combiner folds (fma a 1.0 b) into (fadd a b) but the flag isn't
propagated into new fadd. This patch fixes that.

Some code in visitFMA is redundant and such support for vector constants
is missing. Need follow-up patch to clean.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D87037
2020-09-14 00:19:06 +08:00
Craig Topper 61d29e0dff [LegalizeTypes] Remove a few cases from SplitVectorOperand that should never happen. NFC
CTTZ, CTLZ, CTPOP, and FCANONICALIZE all have the same input and
output types so the operand should have already been legalized when the
result type was legalized.
2020-09-12 20:59:14 -07:00
Craig Topper ad3d6f993d [SelectionDAG][X86][ARM][AArch64] Add ISD opcode for __builtin_parity. Expand it to shifts and xors.
Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop
isn't natively supported by the target, this leads to poor codegen
due to the expansion of ctpop being more complex than what is needed
for parity.

This adds a DAG combine to convert the pattern to ISD::PARITY
before operation legalization. Type legalization is updated
to handled Expanding and Promoting this operation. If after type
legalization, CTPOP is supported for this type, LegalizeDAG will
turn it back into CTPOP+AND. Otherwise LegalizeDAG will emit a
series of shifts and xors followed by an AND with 1.

I've avoided vectors in this patch to avoid more legalization
complexity for this patch.

X86 previously had a custom DAG combiner for this. This is now
moved to Custom lowering for the new opcode. There is a minor
regression in vector-reduce-xor-bool.ll, but a follow up patch
can easily fix that.

Fixes PR47433

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87209
2020-09-12 11:42:18 -07:00
Sanjay Patel 3a8ea8609b [Intrinsics] define semantics for experimental fmax/fmin vector reductions
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

This is hopefully the final remaining showstopper before we can remove
the 'experimental' from the reduction intrinsics.

No behavior was specified for the FP min/max reductions, so we have a
mess of different interpretations.

There are a few potential options for the semantics of these max/min ops.
I think this is the simplest based on current behavior/implementation:
make the reductions inherit from the existing llvm.maxnum/minnum intrinsics.
These correspond to libm fmax/fmin, and those are similar to the (now
deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing
data). So the default expansion creates calls to libm functions.

Another option would be to inherit from llvm.maximum/minimum (NaNs propagate),
but most targets just crash in codegen when given those nodes because no
default expansion was ever implemented AFAICT.

We could also just assume 'nnan' semantics by default (we are already
assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets
(AArch64, PowerPC) support the more defined behavior, so it doesn't make much
sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to
loosen the semantics.

(Note that D67507 was proposed to update the LangRef to acknowledge the more
recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do
update based on the new standard, the reduction instructions can seamlessly
inherit from whatever updates are made to the max/min intrinsics.)

x86 sees a regression here on 'nnan' tests because we have underlying,
longstanding bugs in FMF creation/propagation. Those need to be fixed apart
from this change (for example: https://llvm.org/PR35538). The expansion
sequence before this patch may not have been correct.

Differential Revision: https://reviews.llvm.org/D87391
2020-09-12 09:10:28 -04:00
Jay Foad 517202c720 [TargetLowering] Fix comments describing XOR -> OR/AND transformations 2020-09-10 13:56:34 +01:00
Kerry McLaughlin cd89f5c91b [SVE][CodeGen] Legalisation of truncate for scalable vectors
Truncating from an illegal SVE type to a legal type, e.g.
`trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>`
fails after PromoteIntOp_CONCAT_VECTORS attempts to
create a BUILD_VECTOR.

This patch changes the promote function to create a sequence of
INSERT_SUBVECTORs if the return type is scalable, and replaces
these with UNPK+UZP1 for AArch64.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D86548
2020-09-10 11:35:33 +01:00
Nikita Popov 0a5dc7effb [DAGCombiner] Fold fmin/fmax of NaN
fminnum(X, NaN) is X, fminimum(X, NaN) is NaN. This mirrors the
behavior of existing InstSimplify folds.

This is expected to improve the reduction lowerings in D87391,
which use NaN as a neutral element.

Differential Revision: https://reviews.llvm.org/D87415
2020-09-09 23:53:32 +02:00
Ulrich Weigand 1a25133bcd [DAGCombine] Skip re-visiting EntryToken to avoid compile time explosion
During the main DAGCombine loop, whenever a node gets replaced, the new
node and all its users are pushed onto the worklist.  Omit this if the
new node is the EntryToken (e.g. if a store managed to get optimized
out), because re-visiting the EntryToken and its users will not uncover
any additional opportunities, but there may be a large number of such
users, potentially causing compile time explosion.

This compile time explosion showed up in particular when building the
SingleSource/UnitTests/matrix-types-spec.cpp test-suite case on any
platform without SIMD vector support.

Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86963
2020-09-09 19:13:46 +02:00
Denis Antrushin 4358fa782e [Statepoints] Update DAG root after emitting statepoint.
Since we always generate CopyToRegs for statepoint results,
we must update DAG root after emitting statepoint, so that
these copies are scheduled before any possible local uses.
Note: getControlRoot() flushes all PendingExports, not only
those we generates for relocates. If that'll become a problem,
we can change it to flushing relocate exports only.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D87251
2020-09-09 20:22:10 +07:00
Simon Pilgrim d816499f95 [KnownBits] Move SelectionDAG::computeKnownBits ISD::ABS handling to KnownBits::abs
Move the ISD::ABS handling to a KnownBits::abs handler, to simplify future implementations in ValueTracking/GlobalISel.
2020-09-09 13:22:58 +01:00
Denis Antrushin 2a52c3301a [Statepoints] Properly handle const base pointer.
Current code in InstEmitter assumes all GC pointers are either
VRegs or stack slots - hence, taking only one operand.
But it is possible to have constant base, in which case it
occupies two machine operands.

Add a convinience function to StackMaps to get index of next
meta argument and use it in InsrEmitter to properly advance to
the next statepoint meta operand.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D87252
2020-09-09 14:07:00 +07:00
Craig Topper 844e94a502 [SelectionDAGBuilder] Remove Unnecessary FastMathFlags temporary. Use SDNodeFlags instead. NFCI
This was a missed simplication in D87200
2020-09-08 15:50:12 -07:00
Craig Topper b1e68f885b [SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.:
This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130.

This required adding a SDNodeFlags to SelectionDAG::getSetCC.

Now we manage to contant fold some stuff undefs during the
initial getNode that we don't do in later DAG combines.

Differential Revision: https://reviews.llvm.org/D87200
2020-09-08 15:27:21 -07:00
Jonas Paulsson 6dc3e22b57 [DAGTypeLegalizer] Handle ZERO_EXTEND of promoted type in WidenVecRes_Convert.
On SystemZ, a ZERO_EXTEND of an i1 vector handled by WidenVecRes_Convert()
always ended up being scalarized, because the type action of the input is
promotion which was previously an unhandled case in this method.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47132.

Differential Revision: https://reviews.llvm.org/D86268

Patch by Eli Friedman.
Review: Ulrich Weigand
2020-09-08 16:49:51 +02:00
Craig Topper da79b1eecc [SelectionDAG][X86][ARM] Teach ExpandIntRes_ABS to use sra+add+xor expansion when ADDCARRY is supported.
Rather than using SELECT instructions, use SRA, UADDO/ADDCARRY and
XORs to expand ABS. This is the multi-part version of the sequence
we use in LegalizeDAG.

It's also the same as the Custom sequence uses for i64 on 32-bit
and i128 on 64-bit. So we can remove the X86 customization.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D87215
2020-09-07 13:15:26 -07:00
Sanjay Patel 7a06b166b1 [DAGCombiner] allow more store merging for non-i8 truncated ops
This is a follow-up suggested in D86420 - if we have a pair of stores
in inverted order for the target endian, we can rotate the source
bits into place.
The "be_i64_to_i16_order" test shows a limitation of the current
function (which might be avoided if we integrate this function with
the other cases in mergeConsecutiveStores). In the earlier
"be_i64_to_i16" test, we skip the first 2 stores because we do not
match the full set as consecutive or rotate-able, but then we reach
the last 2 stores and see that they are an inverted pair of 16-bit
stores. The "be_i64_to_i16_order" test alters the program order of
the stores, so we miss matching the sub-pattern.

Differential Revision: https://reviews.llvm.org/D87112
2020-09-07 14:12:36 -04:00
Simon Wallis 79ea83e104 [SelectionDAG] memcpy expansion of const volatile struct ignores const zero
In getMemcpyLoadsAndStores(), a memcpy where the source is a zero constant is expanded to a MemOp::Set instead of a MemOp::Copy, even when the memcpy is volatile.
This is incorrect.

The fix is to add a check for volatile, and expand to MemOp::Copy in the volatile case.

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D87134
2020-09-07 13:22:09 +01:00
Simon Pilgrim e57cbcbdc1 LegalizeTypes.h - remove orphan SplitVSETCC declaration. NFCI.
The implementation no longer exists
2020-09-07 13:11:49 +01:00
Jay Foad 5350e1b509 [KnownBits] Implement accurate unsigned and signed max and min
Use the new implementation in ValueTracking, SelectionDAG and
GlobalISel.

Differential Revision: https://reviews.llvm.org/D87034
2020-09-07 09:09:01 +01:00
Jonas Paulsson 714ceefad9 [SelectionDAG] Always intersect SDNode flags during getNode() node memoization.
Previously SDNodeFlags::instersectWith(Flags) would do nothing if Flags was
in an undefined state, which is very bad given that this is the default when
getNode() is called without passing an explicit SDNodeFlags argument.

This meant that if an already existing and reused node had a flag which the
second caller to getNode() did not set, that flag would remain uncleared.

This was exposed by https://bugs.llvm.org/show_bug.cgi?id=47092, where an NSW
flag was incorrectly set on an add instruction (which did in fact overflow in
one of the two original contexts), so when SystemZElimCompare removed the
compare with 0 trusting that flag, wrong-code resulted.

There is more that needs to be done in this area as discussed here:

Differential Revision: https://reviews.llvm.org/D86871

Review: Ulrich Weigand, Sanjay Patel
2020-09-05 10:30:38 +02:00
David Sherwood 73a3d350a4 [SVE][CodeGen] Fix up warnings in sve-split-insert/extract tests
I have fixed up some more ElementCount/TypeSize related warnings in
the following tests:

  CodeGen/AArch64/sve-split-extract-elt.ll
  CodeGen/AArch64/sve-split-insert-elt.ll

In SelectionDAG::CreateStackTemporary we were relying upon the implicit
cast from TypeSize -> uint64_t when calling MachineFrameInfo::CreateStackObject.
I've fixed this by passing in the known minimum size instead, which I
believe is fine because the associated stack id indicates whether this
is a scalable object or not.

I've also fixed up a case in TargetLowering::SimplifyDemandedBits when
extracting a vector element from a scalable vector. The result is a scalar,
hence it wasn't caught at the start of the function. If the vector is
scalable we just bail out for now.

Differential Revision: https://reviews.llvm.org/D86431
2020-09-04 09:51:31 +01:00
Simon Pilgrim 1673a08044 SelectionDAG.h - remove unnecessary FunctionLoweringInfo.h include. NFCI.
Use forward declarations and move the include down to dependent files that actually use it.

This also exposes a number of implicit dependencies on KnownBits.h
2020-09-03 18:33:25 +01:00
Jay Foad 099c089d4b [APInt] New member function setBitVal
Differential Revision: https://reviews.llvm.org/D87033
2020-09-02 21:40:31 +01:00
Paul Walker f72121254d [SVE] Don't reorder subvector/binop sequences when the resulting binop is not legal.
When lowering fixed length vector operations for SVE the subvector
operations are used extensively to marshall data between scalable
and fixed-length vectors. This means that sequences like:

  extract_subvec(binop(insert_subvec(a), insert_subvec(b)))

are very common. DAGCombine only checks if the resulting binop is
legal or can be custom lowered when undoing such sequences. When
it's custom lowering that is introducing them the result is an
infinite legalise->combine->legalise loop.

This patch extends the isOperationLegalOr... functions to include
a "LegalOnly" parameter to restrict the check to legal operations
only. Although isOperationLegal could be used it's common for
the affected code paths to be visited pre and post legalisation,
so the extra parameter keeps the code tidy.

Differential Revision: https://reviews.llvm.org/D86450
2020-09-02 11:01:33 +01:00
Sander de Smalen f13beac51b [AArch64][SVE] Preserve full vector regs over EH edge.
Unwinders may only preserve the lower 64bits of Neon and SVE registers,
as only the registers in the base ABI are guaranteed to be preserved
over the exception edge. The caller will need to preserve additional
registers for when the call throws an exception and the unwinder has
tried to recover state.

For  e.g.

    svint32_t bar(svint32_t);
    svint32_t foo(svint32_t x, bool *err) {
      try { bar(x); } catch (...) { *err = true; }
      return x;
    }

`z0` needs to be spilled before the call to `bar(x)` and reloaded before
returning from foo, as the exception handler may have clobbered z0.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84737
2020-09-02 10:54:18 +01:00
Cameron McInally cfe2b81710 [SVE] Update INSERT_SUBVECTOR DAGCombine to use getVectorElementCount().
A small piece of the project to replace getVectorNumElements() with getVectorElementCount().

Differential Revision: https://reviews.llvm.org/D86894
2020-09-01 16:51:44 -05:00
Matt Arsenault 759482ddaa GlobalISel: Implement computeKnownBits for G_BSWAP and G_BITREVERSE 2020-09-01 12:49:57 -04:00
Sam Tebbs 15e880a04f [DAGCombiner] Fold an AND of a masked load into a zext_masked_load
This patch folds an AND of a masked load and build vector into a zero
extended masked load.

Differential Revision: https://reviews.llvm.org/D86789
2020-09-01 17:02:07 +01:00
David Sherwood 9fbb113247 [SVE][CodeGen] Fix TypeSize/ElementCount related warnings in sve-split-load.ll
I have fixed up a number of warnings resulting from TypeSize -> uint64_t
casts and calling getVectorNumElements() on scalable vector types. I
think most of the changes are fairly trivial except for those in
DAGTypeLegalizer::SplitVecRes_MLOAD I've tried to ensure we create
the MachineMemoryOperands in a sensible way for scalable vectors.

I have added a CHECK line to the following test:

  CodeGen/AArch64/sve-split-load.ll

that ensures no new warnings are added.

Differential Revision: https://reviews.llvm.org/D86697
2020-09-01 07:47:59 +01:00
Qiu Chaofan 5475154865 [NFC] [DAGCombiner] Refactor bitcast folding within fabs/fneg
fabs and fneg share a common transformation:

(fneg (bitconvert x)) -> (bitconvert (xor x sign))
(fabs (bitconvert x)) -> (bitconvert (and x ~sign))

This patch separate the code into a single method.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86862
2020-09-01 00:48:12 +08:00
Qiu Chaofan eb2a405c18 [NFC] [DAGCombiner] Remove unnecessary negation in visitFNEG
In visitFNEG of DAGCombiner, the folding of (fneg (fsub c, x)) is
redundant since getNegatedExpression already handles it.
2020-09-01 00:35:01 +08:00
Sanjay Patel 1c9a09f42e [DAGCombiner] skip reciprocal divisor optimization for x/sqrt(x), better
I tried to fix this in:
rG716e35a0cf53
...but that patch depends on the order that we encounter the
magic "x/sqrt(x)" expression in the combiner's worklist.

This patch should improve that by waiting until we walk the
user list to decide if there's a use to skip.

The AArch64 test reveals another (existing) ordering problem
though - we may try to create an estimate for plain sqrt(x)
before we see that it is part of a 1/sqrt(x) expression.
2020-08-31 09:35:59 -04:00
Sanjay Patel 716e35a0cf [DAGCombiner] skip reciprocal divisor optimization for x/sqrt(x)
In general, we probably want to try the multi-use reciprocal
transform before sqrt transforms, but x/sqrt(x) is a special-case
because that will always reduce to plain sqrt(x) or an estimate.

The AArch64 tests show that the transform is limited by TLI
hook to patterns where there are 3 or more uses of the divisor.
So this change can result in an extra division compared to
what we had, but that's the intended behvior based on the
current setting of that hook.
2020-08-30 10:55:45 -04:00
Nikita Popov 51d34c0c53 [TargetLowering] Strip tailing whitespace (NFC) 2020-08-29 18:09:08 +02:00
Kai Luo b904324788 [DAGCombiner] Enhance (zext(setcc))
Current `v:t = zext(setcc x,y,cc)` will be transformed to `select x, y, 1:t, 0:t, cc`. It misses some opportunities if x's type size is less than `t`'s size. This patch enhances the above transformation.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86687
2020-08-29 03:37:41 +00:00
Denis Antrushin fabd4c1ae1 [Statepoint] Always spill base pointer.
There is a subtle problem with new statepoint lowering scheme
when base and pointers are the same (see PR46917 for more context):

%1 = STATEPOINT ... %0, %0(tied-def 0)...

if, for some reason, register allocator desides to put two instances
of %0 into two different objects (registers or spill slots), we may
end up with

$reg3 = STATEPOINT ... $reg2, $reg1(tied-def 0)...

and nothing will prevent later passes to sink uses of $reg2 below
statepoint, which is incorrect.

As a short term solution, always put base pointers on stack during
lowering.
A longer term solution may be to rework MIR statepoint format to
avoid GC pointer duplication in statepoint argument list.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D86712
2020-08-28 23:22:07 +07:00
QingShan Zhang deb4b25807 [DAGCombine] Don't delete the node if it has uses immediately
This is the follow up patch for https://reviews.llvm.org/D86183 as we miss to delete the node if NegX == NegY, which has use after we create the node.
```
    if (NegX && (CostX <= CostY)) {
      Cost = std::min(CostX, CostZ);
      RemoveDeadNode(NegY);
      return DAG.getNode(Opcode, DL, VT, NegX, Y, NegZ, Flags);  #<-- NegY is used here if NegY == NegX.
    }
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86689
2020-08-28 16:13:43 +00:00
David Sherwood f4257c5832 [SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065
2020-08-28 14:43:53 +01:00
Lucas Prates 3d943bcd22 [CodeGen] Properly propagating Calling Convention information when lowering vector arguments
When joining the legal parts of vector arguments into its original value
during the lower of Formal Arguments in SelectionDAGBuilder, the Calling
Convention information was not being propagated for the handling of each
individual parts. The same did not happen when lowering calls, causing a
mismatch.

This patch fixes the issue by properly propagating the Calling
Convention details.

This fixes Bugzilla #47001.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D86715
2020-08-27 17:01:10 +01:00
Drew Wock 0ec098e22b [FPEnv] Allow fneg + strict_fadd -> strict_fsub in DAGCombiner
This is the first of a set of DAGCombiner changes enabling strictfp
optimizations. I want to test to waters with this to make sure changes
like these are acceptable for the strictfp case- this particular change
should preserve exception ordering and result precision perfectly, and
many other possible changes appear to be able to as well.

Copied from regular fadd combines but modified to preserve ordering via
the chain, this change allows strict_fadd x, (fneg y) to become
struct_fsub x, y and strict_fadd (fneg x), y to become strict_fsub y, x.

Differential Revision: https://reviews.llvm.org/D85548
2020-08-27 08:17:01 -04:00