Commit Graph

11688 Commits

Author SHA1 Message Date
Dylan Fleming 8ae9ab43dd [SVE] Fixed cast<FixedVectorType> on scalable vector in SelectionDAGBuilder::getUniformBase
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105350
2021-07-07 10:48:17 +01:00
David Green 4ce26deac2 [DAG] Reassociate Add with Or
We already have reassociation code for Adds and Ors separately in DAG
combiner, this adds it for the combination of the two where Ors act like
Adds. It reassociates (add (or (x, c), y) -> (add (add (x, y), c)) where
we know that the Ors operands have no common bits set, and the Or has
one use.

Differential Revision: https://reviews.llvm.org/D104765
2021-07-07 10:21:07 +01:00
Jeremy Morse 2b2ffb7bdc [DebugInfo][InstrRef][3/4] Produce DBG_INSTR_REFs for all variable locations
This patch emits DBG_INSTR_REFs for two remaining flavours of variable
locations that weren't supported: copies, and inter-block VRegs. There are
still some locations that must be represented by DBG_VALUE such as
constants, but they're mostly independent of optimisations.

For variable locations that refer to values defined in different blocks,
vregs are allocated before isel begins, but the defining instruction
might not exist until late in isel. To get around this, emit
DBG_INSTR_REFs in a "half done" state, where the first operand refers to a
VReg. Then at the end of isel, patch these back up to refer to
instructions, using the finalizeDebugInstrRefs method.

Copies are something that I complained about the original RFC, and I
really don't want to have to put instruction numbers on copies. They don't
define a value: they move them. To address this isel, salvageCopySSA
interprets:
 * COPYs,
 * SUBREG_TO_REG,
 * Anything that isCopyInstr thinks is a copy.
And follows chains of copies back to the defining instruction that they
read from. This relies on any physical registers that COPYs read being
defined in the same block, or being entry-block arguments. For the former
we can put an instruction number on the defining instruction; for the
latter we can drop a DBG_PHI that reads the incoming value.

Differential Revision: https://reviews.llvm.org/D88896
2021-07-06 18:31:38 +01:00
David Stuttard 83cb9632a1 [DAGCombiner] Add support for mulhi const folding in DAGCombiner
Differential Revision: https://reviews.llvm.org/D103323

Change-Id: I4ffaaa32301795ba8a339567a68e77fe0862b869
2021-07-05 12:01:26 +01:00
Paul Walker 287d39dd5a [NFC] Fix a few whitespace issues and typos. 2021-07-04 11:49:58 +01:00
Simon Pilgrim 80dd591610 [SelectionDAG] Replace APInt.lshr().trunc() with APInt.extractBits() where possible. NFC.
This also allows us to use KnownBits::extractBits in one case.
2021-07-03 16:33:00 +01:00
Simon Pilgrim e2e44c3da9 [SelectionDAG] Use KnownBits::insertBits instead of separate APInt::insertBits calls. NFC. 2021-07-03 16:32:59 +01:00
Craig Topper af331e8284 [SelectionDAG] Rename memory VT argument for getMaskedGather/getMaskedScatter from VT to MemVT.
Use getMemoryVT() in MGATHER/MSCATTER DAG combines instead of
using the passthru or store value VT for this argument.
2021-07-02 17:37:40 -07:00
Roman Lebedev c2c0d3ea89
Revert "[WebAssembly] Implementation of global.get/set for reftypes in LLVM IR"
This reverts commit 4facbf213c.

```
********************
FAIL: LLVM :: CodeGen/WebAssembly/funcref-call.ll (44466 of 44468)
******************** TEST 'LLVM :: CodeGen/WebAssembly/funcref-call.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /builddirs/llvm-project/build-Clang12/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/WebAssembly/funcref-call.ll --mtriple=wasm32-unknown-unknown -asm-verbose=false -mattr=+reference-types | /builddirs/llvm-project/build-Clang12/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/WebAssembly/funcref-call.ll
--
Exit Code: 2

Command Output (stderr):
--
llc: /repositories/llvm-project/llvm/include/llvm/Support/LowLevelTypeImpl.h:44: static llvm::LLT llvm::LLT::scalar(unsigned int): Assertion `SizeInBits > 0 && "invalid scalar size"' failed.

```
2021-07-02 11:49:51 +03:00
Paulo Matos 4facbf213c [WebAssembly] Implementation of global.get/set for reftypes in LLVM IR
Reland of 31859f896.

This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.

Differential Revision: https://reviews.llvm.org/D104797
2021-07-02 09:46:28 +02:00
Craig Topper 066524ea54 [ScalarizeMaskedMemIntrin][SelectionDAGBuilder] Use the element type to calculate alignment for gather/scatter when alignment operand is 0.
Previously we used the vector type, but we're loading/storing
invididual elements so I think only element alignment should matter.

Noticed while looking at the code for something else so I don't
have a test case.

Differential Revision: https://reviews.llvm.org/D105220
2021-07-01 19:08:47 -07:00
Bradley Smith 2668727929 [SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR
Inserting into a smaller-than-legal scalable vector would result in an
internal compiler error. For example, inserting a <vscale x 4 x i8> into
a <vscale x 8 x i8> (both illegal vector types for SVE) would cause a
crash.

This crash was happening because there was no code to promote (legalise)
the result of an INSERT_SUBVECTOR node.

This patch implements PromoteIntRes_INSERT_SUBVECTOR, which legalises
the ISD node. This is currently done by going through memory. This is
necessary because of the requirement that the SubVec parameter of the
INSERT_SUBVECTOR node must be smaller than the Vec parameter, which
means that INSERT_SUBVECTOR cannot always have a legal result/operand
types.

Co-Authored-by: Joe Ellis <joe.ellis@arm.com>

Differential Revision: https://reviews.llvm.org/D102766
2021-07-01 17:05:53 +01:00
Bradley Smith 01b846674d [AArch64][SVE] Add support for fixed length MSCATTER/MGATHER
Since gather lowering can now lower to nodes that may need expansion via
the vector legalizer, do MGATHER lowering via vector legalizer.

Additionally, as part of adding passthru support for fixed typed
gathers, fix passthru support for scalable types.

Depends on D104910

Differential Revision: https://reviews.llvm.org/D104217
2021-07-01 12:13:59 +01:00
Bradley Smith 002911503f [TargetLowering][AArch64][SVE] Take into account accessed type when clamping address
When clamping the index for a memory access to a stacked vector we must
take into account the entire type being accessed, not just assume that
we are accessing only a single element.

Differential Revision: https://reviews.llvm.org/D105016
2021-06-30 13:30:18 +01:00
Craig Topper 9132299836 [LegalizeTypes][VE] Don't Expand BITREVERSE/BSWAP during type legalization promotion if they will be promoted for NVT in op legalization.
We were trying to expand these if they were going to be expanded
in op legalization so that we generated the minimum number of
operations. We failed to take into account that NVT could be
promoted to another legal type in op legalization.

Hoping this fixes the issue on the VE target reported as a follow
up to D96681. The check line changes were taken from before
1e46b6f401 so this patch does
appear to improve some cases that had previously regressed.
2021-06-29 11:00:11 -07:00
Melanie Blower 931e95687d [llvm][clang][fpenv] Create new intrinsic llvm.arith.fence to control FP optimization at expression level
This intrinsic blocks floating point transformations by the optimizer.

Author: Pengfei

Reviewed By: LuoYuanke, Andy Kaylor, Craig Topper, kpn

Differential Revision: https://reviews.llvm.org/D99675
2021-06-28 12:26:52 -04:00
David Green 2887f14639 [ISel] Port AArch64 SABD and UABD to DAGCombine
This ports the AArch64 SABD and USBD over to DAG Combine, where they can be
used by more backends (notably MVE in a follow-up patch). The matching code
has changed very little, just to handle legal operations and types
differently. It selects from (ABS (SUB (EXTEND a), (EXTEND b))), producing
a ubds/abdu which is zexted to the original type.

Differential Revision: https://reviews.llvm.org/D91937
2021-06-26 19:34:16 +01:00
David Green b8c8bb0769 [DAG] Fold neg(splat(neg(x)) -> splat(x)
This add as a fold of sub(0, splat(sub(0, x))) -> splat(x). This can
come up in the lowering of right shifts under AArch64, where we generate
a shift left of a negated number.

Differential Revision: https://reviews.llvm.org/D103755
2021-06-25 19:53:29 +01:00
Martin Storsjö 42f74e8249 [llvm] Rename StringRef _lower() method calls to _insensitive()
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
2021-06-25 00:22:01 +03:00
Craig Topper 03f9e04bc3 [TargetLowering][ARM] Don't alter opaque constants in TargetLowering::ShrinkDemandedConstant.
We don't constant fold based on demanded bits elsewhere in
SimplifyDemandedBits, so I don't think we should shrink them either.

The affected ARM test changes because a constant become non-opaque
and eventually enabled some constant folding. This no longer happens.
I checked and InstCombine is able to simplify this test. I'm not sure exactly
what it was trying to test.

Reviewed By: lebedev.ri, dmgreen

Differential Revision: https://reviews.llvm.org/D104832
2021-06-24 10:09:36 -07:00
Stephen Tozer c72705678c Partial Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
This is a partial reapply of the original commit and the followup commit
that were previously reverted; this reapply also includes a small fix
for a potential source of non-determinism, but also has a small change
to turn off variadic debug value salvaging, to ensure that any future
revert/reapply steps to disable and renable this feature do not risk
causing conflicts.

Differential Revision: https://reviews.llvm.org/D91722

This reverts commit 386b66b2fc.
2021-06-24 09:46:38 +01:00
Jinsong Ji c125af82a5 [DAGCombine] Check reassoc flags in aggressive fsub fusion
The is from discussion in https://reviews.llvm.org/D104247#inline-993387

The contract and reassoc flags shouldn't imply each other .

All the aggressive fsub fusion reassociate operations,
we should guard them with reassoc flag check.

Reviewed By: mcberg2017

Differential Revision: https://reviews.llvm.org/D104723
2021-06-23 13:59:40 +00:00
Jinsong Ji 3996311ee1 [DAGCombine] reassoc flag shouldn't enable contract
According to IR LangRef, the FMF flag:

contract
Allow floating-point contraction (e.g. fusing a multiply followed by an
addition into a fused multiply-and-add).

reassoc
Allow reassociation transformations for floating-point instructions.
This may dramatically change results in floating-point.

My understanding is that these two flags shouldn't imply each other,
as we might have a SDNode that can be reassociated with others, but
not contractble.

eg: We may want following fmul/fad/fsub to freely reassoc, but don't
want fma being generated here.

   %F = fmul reassoc double %A, %B         ; <double> [#uses=1]
   %G = fmul reassoc double %C, %D         ; <double> [#uses=1]
   %H = fadd reassoc double %F, %G         ; <double> [#uses=1]
   %I = fsub reassoc double %H, %E         ; <double> [#uses=1]

Before https://reviews.llvm.org/D45710, `reassoc` flag actually
did not imply isContratable either.

The current implementation also only check the flag in fadd node,
ignoring fmul node, this patch update that as well.

Reviewed By: spatel, qiucf

Differential Revision: https://reviews.llvm.org/D104247
2021-06-21 21:15:43 +00:00
Simon Pilgrim 7353beda4a [DAG] SelectionDAG::computeKnownBits - use APInt::insertBits to merge subvector knownbits. NFCI.
As noticed on D104472 we can use APInt::insertBits which will avoid a lot of temporary APInt creations
2021-06-18 14:59:01 +01:00
Bjorn Pettersson 4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
David Spickett e4ecd83fe9 [llvm][AArch64] Handle arrays of struct properly (from IR)
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.

This fixes https://bugs.llvm.org/show_bug.cgi?id=46996

One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)

This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
  ret [ 1 x %T1 ] zeroinitializer
}
```

We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.

This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.

You get this:
```
  renamable $d0 = FMOVD0
  $w0 = COPY killed renamable $d0
```

Rather than this:
```
  $d0 = FMOVD0
  $w0 = COPY $wzr
```

The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.

This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.

Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.

New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D104123
2021-06-16 13:56:01 +00:00
Dylan Fleming dab05335a6 [SVE] Fix PromoteIntRes_TRUNCATE not to call getVectorNumElements
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D104115
2021-06-16 13:09:43 +01:00
Roman Lebedev 585e65d330
[TLI] SimplifyDemandedVectorElts(): handle SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(?, 0))
Iff we have `SCALAR_TO_VECTOR` (and we demand it's only defined 0'th element),
and said scalar was produced by `EXTRACT_VECTOR_ELT` from the 0'th element
of some vector, then we can just continue traversal into said source vector.

This comes up in X86 vector uniform shift lowering.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D104250
2021-06-14 23:52:53 +03:00
Saleem Abdulrasool 5b5833b9e0 SelectionDAG: repair the Windows build
6e5628354e regressed the Windows build as
the return type no longer matched in both branches for the return value
type deduction.  This uses a bit more compiler magic to deal with that.
2021-06-14 08:25:36 -07:00
Roman Lebedev 0f94c3c80d
[NFC][DAGCombine] Extract getFirstIndexOf() lambda back into a function
Not all supported compilers like such lambdas, at least one buildbot is unhappy.
2021-06-14 16:25:59 +03:00
Roman Lebedev 6e5628354e
[DAGCombine] reduceBuildVecToShuffle(): sort input vectors by decreasing size
The sorting, obviously, must be stable, else we will have random assembly fluctuations.

Apparently there was no test coverage that would benefit from that,
so i've added one test.

The sorting consists of two parts - just sort the input vectors,
and recompute the shuffle mask -> input vector mapping.
I don't believe we need to do anything else.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D104187
2021-06-14 16:18:37 +03:00
Jeroen Dobbelaere bb8ce25e88 Intrinsic::getName: require a Module argument
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.

This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.

Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99173
2021-06-14 14:52:29 +02:00
Carl Ritson cfbb92441f [SDAG] Fix pow2 assumption when splitting vectors
When reducing vector builds to shuffles it possible that
the DAG combiner may try to extract invalid subvectors.

This happens as the existing code assumes vectors will be power
of 2 sizes, which is already untrue, but becomes more noticable
with v6 and v7 types.
Specifically the existing code assumes that half PowerOf2Ceil of
a given vector index will fit twice into a given vector.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103880
2021-06-11 08:58:16 +09:00
David Spickett 64de8763aa Revert "Implementation of global.get/set for reftypes in LLVM IR"
This reverts commit 31859f896c.

Causing SVE and RISCV-V test failures on bots.
2021-06-10 10:11:17 +00:00
Paulo Matos 31859f896c Implementation of global.get/set for reftypes in LLVM IR
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D95425
2021-06-10 10:07:45 +02:00
Sanjay Patel dd763ac791 [SDAG] fix miscompile from merging stores of different sizes
As shown in:
https://llvm.org/PR50623
...and the similar tests here, we were not accounting for
store merging of different sizes that do not cover the
entire range of the wide value to be stored.

This is the easy fix: just make sure that all of the
original stores are the same size, so when we calculate
the wide width, it's a simple N * M check.

This still allows all of the motivating optimizations from:
D86420 / 54a5dd485c
D87112 / 7a06b166b1

We could enhance this code to track individual bytes and
allow merging multiple sizes.
2021-06-09 09:51:39 -04:00
Simon Pilgrim 114e712c34 InstrEmitter.cpp - don't dereference a dyn_cast<>.
dyn_cast<> can return nullptr which we would then dereference - use cast<> which will assert that the type is correct.
2021-06-08 17:59:04 +01:00
Simon Pilgrim 61a2d6bfe4 [DAG] foldShuffleOfConcatUndefs - ensure shuffles of upper (undef) subvector elements is undef (PR50609)
shuffle(concat(x,undef),concat(y,undef)) -> concat(shuffle(x,y),shuffle(x,y))

If the original shuffle references any of the upper (undef) subvector elements, ensure the split shuffle masks uses undef instead of an out-of-bounds value.

Fixes PR50609
2021-06-08 15:49:41 +01:00
Hans Wennborg 386b66b2fc Revert "3rd Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands""
> This reapplies c0f3dfb9, which was reverted following the discovery of
> crashes on linux kernel and chromium builds - these issues have since
> been fixed, allowing this patch to re-land.

This reverts commit 36ec97f76a.

The change caused non-determinism in the compiler, see comments on the code
review at https://reviews.llvm.org/D91722.

Reverting to unbreak people's builds until that can be addressed.

This also reverts the follow-up "[DebugInfo] Limit the number of values
that may be referenced by a dbg.value" in
a0bd6105d8.
2021-06-08 14:54:08 +02:00
David Green b889c6ee99 [DAG] Allow isNullOrNullSplat to see truncated zeroes
This sets the AllowTruncation flag on isConstOrConstSplat in
isNullOrNullSplat, allowing it to see truncated constant zeroes on
architectures such as AArch64, where only a i32.i64 are legal. As a
truncation of 0 is always 0, this should always be valid, allowing some
extra folding to happen including some of the cases from D103755.

Differential Revision: https://reviews.llvm.org/D103756
2021-06-08 10:18:58 +01:00
Arthur Eubanks 47211fa889 Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry"
Needs to be discussed more.

This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46
This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1
This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23
This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd
2021-06-07 16:07:44 -07:00
Guillaume Chatelet 1da2c7d25c [NFC] Fix semantic discrepancy for MVT::LAST_VALUETYPE
Differential Revision: https://reviews.llvm.org/D103251
2021-06-07 10:04:16 +00:00
Fraser Cormack aec9cbbeb8 [SelectionDAG] Extend FoldConstantVectorArithmetic to SPLAT_VECTOR
This patch extends the SelectionDAG's ability to constant-fold vector
arithmetic to include support for SPLAT_VECTOR. This is not only for
scalable-vector types but also for fixed-length vector types, which
helps Hexagon in a couple of cases.

The original RISC-V test case was in fact an infinite DAGCombine loop.
The pattern `and (truncate v1), (truncate v2)` can be combined to
`truncate (and v1, v2)` but the truncate can similarly be combined back
to `truncate (and v1, v2)` (but, crucially, only when one of `v1` or
`v2` is a constant vector).

It wasn't exposed in on fixed-length types because a TRUNCATE of a
constant BUILD_VECTOR was folded into the BUILD_VECTOR itself, whereas
this did not happen for the equivalent (scalable-vector) SPLAT_VECTOR.

Reviewed By: RKSimon, craig.topper

Differential Revision: https://reviews.llvm.org/D103246
2021-06-04 09:53:15 +01:00
Arthur Eubanks 9255a5c1ba [TargetLowering] Only inspect attributes in the arguments for ArgListEntry
Parameter attributes are considered part of the function [1], and like
mismatched calling conventions [2], we can't have the verifier check for
mismatched parameter attributes.

Issues can be diagnosed with D103412.

[1] https://llvm.org/docs/LangRef.html#parameter-attributes
[2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D101806
2021-06-03 15:52:01 -07:00
Fraser Cormack 1de1887f5f [CodeGen] Fix a scalable-vector crash in VSELECT legalization
The `DAGTypeLegalizer::WidenVSELECTMask` function is not (yet) ready for
scalable vector types, and has numerous places in which it tries to grab
either the fixed size or number of elements of its types.

I believe that it should be possible to update this method to properly
account for scalable-vector types, but we don't have test cases for
that; RISC-V bails out early on as it has legal i1 vector masks. As
such, this patch just prevents it from crashing.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103536
2021-06-03 10:24:55 +01:00
Sanjay Patel 0718ac706d [SDAG] allow cast folding for vector sext-of-setcc with signed compare
This extends 434c8e013a and ede3982792 to handle signed
predicates by sign-extending the setcc operands.

This is not shown directly in https://llvm.org/PR50055 ,
but the pattern is visible by changing the unsigned convert
to signed in the source code.
2021-06-02 15:05:02 -04:00
Sanjay Patel ede3982792 [SDAG] allow more cast folding for vector sext-of-setcc
This is a follow-up to D103280 that eases the use restrictions,
so we can handle the motivating case from:
https://llvm.org/PR50055

The loop code is adapted from similar use checks in
ExtendUsesToFormExtLoad() and SliceUpLoad(). I did not see an
easier way to filter out non-chain uses of load values.

Differential Revision: https://reviews.llvm.org/D103462
2021-06-02 13:14:49 -04:00
Bjorn Pettersson 536e02a23c [CodeGen] Refactor libcall lookups for RTLIB::POWI_*
Use RuntimeLibcalls to get a common way to pick correct RTLIB::POWI_*
libcall for a given value type.

This includes a small refactoring of ExpandFPLibCall and
ExpandArgFPLibCall in SelectionDAGLegalize to share a bit of code,
plus adding an ExpandFPLibCall version that can be called directly
when expanding FPOWI/STRICT_FPOWI to ensure that we actually use
the same RTLIB::Libcall when expanding the libcall as we used when
checking the legality of such a call by doing a getLibcallName check.

Differential Revision: https://reviews.llvm.org/D103050
2021-06-02 11:40:34 +02:00
Bjorn Pettersson d1273d39d3 [LegalizeTypes] Avoid promotion of exponent in FPOWI
The FPOWI DAG node is normally lowered to a libcall to one of the
RTLIB::POWI* runtime functions and the exponent should normally
have a type matching sizeof(int) when making the call. Thus,
type promotion of the exponent could lead to an FPOWI with a type
for the second operand that would be incorrect when doing the
libcall (a situation which would be hard to detect post-legalization
if we allow such FPOWI nodes).

This patch is changing DAGTypeLegalizer::PromoteIntOp_FPOWI to
do the rewrite into a libcall directly instead of promoting the
operand. This way we can check that the exponent is smaller than
sizeof(int) and we can let TargetLowering handle promotion as
part of making the libcall. It could be noticed here that makeLibCall
has some knowledge about targets such as 64-bit RISCV, for which the
libcall argument should be extended to a type larger than sizeof(int).

Differential Revision: https://reviews.llvm.org/D102950
2021-06-02 11:40:34 +02:00
Sanjay Patel 1b14f3951a [SDAG] add helper function for sext-of-setcc folds; NFC
Try to make this easier to read as noted in D103280
2021-06-01 08:07:17 -04:00
Sanjay Patel 63fe4cb082 [SDAG] add check to sext-of-setcc fold to bypass changing a legal op
I accidentaly pushed a draft of D103280 that was discussed
during the review, but it was not supposed to be the final
version.

Rather than revert and recommit, I'm updating the existing
code. This way we have a record of the codegen diff that
would result if we decide to remove this predicate in the
future.
2021-05-31 08:58:11 -04:00
Sanjay Patel 434c8e013a [SDAG] try harder to fold casts into vector compare
sext (vsetcc X, Y) --> vsetcc (zext X), (zext Y) --
(when the zexts are free and a bunch of other conditions)

We have a couple of similar folds to this already for vector selects,
but this pattern slips through because it is only a setcc.

The tests are based on the motivating case from:
https://llvm.org/PR50055
...but we need extra logic to get that example, so I've left that as
a TODO for now.

Differential Revision: https://reviews.llvm.org/D103280
2021-05-31 07:14:01 -04:00
Florian Hahn 126f90b252
[DAGCombine] Poison-prove scalarizeExtractedVectorLoad.
extractelement is poison if the index is out-of-bounds, so just
scalarizing the load may introduce an out-of-bounds load, which is UB.

To avoid introducing new UB, we can mask the index so it only contains
valid indices.

Fixes PR50382.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D103077
2021-05-30 11:40:55 +01:00
Arthur Eubanks 71cca4f728 Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry"
This reverts commit 1c7f32334d.

Some code still needs to properly set parameter ABI attributes, see
D101806.
2021-05-29 23:08:15 -07:00
Arthur Eubanks 3a6f12f915 Revert "[NFC] Use ArgListEntry indirect types more in ISel lowering"
This reverts commit bc7d15c61d.

Dependent change is to be reverted.
2021-05-29 22:40:33 -07:00
Eli Friedman 0b3b0a727a [AArch64][RISCV] Make sure isel correctly honors failure orderings.
If a cmpxchg specifies acquire or seq_cst on failure, make sure we
generate code consistent with that ordering even if the success ordering
is not acquire/seq_cst.

At one point, it was ambiguous whether this sort of construct was valid,
but the C++ standad and LLVM now accept arbitrary combinations of
success/failure orderings.

This doesn't address the corresponding issue in AtomicExpand. (This was
reported as https://bugs.llvm.org/show_bug.cgi?id=33332 .)

Fixes https://bugs.llvm.org/show_bug.cgi?id=50512.

Differential Revision: https://reviews.llvm.org/D103284
2021-05-28 12:47:40 -07:00
Craig Topper 2830d924b0 [VP] Make getMaskParamPos/getVectorLengthParamPos return unsigned. Lowercase function names.
Parameter positions seem like they should be unsigned.

While there, make function names lowercase per coding standards.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D103224
2021-05-28 11:28:47 -07:00
Craig Topper d24d2447cd [SelectionDAG] Fix typo in assert. NFC 2021-05-28 10:37:11 -07:00
Tim Northover 9ff2eb1ea5 SwiftTailCC: teach verifier musttail rules applicable to this CC.
SwiftTailCC has a different set of requirements than the C calling convention
for a tail call. The exact argument sequence doesn't have to match, but fewer
ABI-affecting attributes are allowed.

Also make sure the musttail diagnostic triggers if a musttail call isn't
actually a tail call.
2021-05-28 11:12:00 +01:00
Fraser Cormack 5a80dc4988 [VP][SelectionDAG] Add a target-configurable EVL operand type
This patch adds a way for the target to configure the type it uses for
the explicit vector length operands of VP SDNodes. The type must be a
legal integer type (there is still no target-independent legalization of
this operand) and must currently be at least as big as i32, the type
used by the IR intrinsics. An implicit zero-extension takes place on
targets which choose a larger type. All VP nodes should be created with
this type used for the EVL operand.

This allows 64-bit RISC-V to avoid custom legalization of all VP nodes,
keeping them in their target-independent form for that bit longer.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D103027
2021-05-27 15:27:36 +01:00
Fraser Cormack b7101e218c [DAGCombine][RISCV] Don't try to trunc-store combined vector stores
DAGCombine's `mergeStoresOfConstantsOrVecElts` optimization is told
whether it's to use vector types and also whether it's to issue a
truncating store. However, the truncating store code path assumes a
scalar integer `ConstantSDNode`, and when using vector types it creates
either a `BUILD_VECTOR` or `CONCAT_VECTORS` to store: neither of which
is a constant.

The `riscv64` target is able to expose a crash here because it switches
on both code paths at the same time. The `f32` is stored as `i32` which
must be promoted to `i64`, necessitating a truncating store.
It also decides later that it prefers a vector store of `v2f32`.

While vector truncating stores are legal, this combine is not able to
emit them. We also don't have a test case. This patch adds an assert to
catch this case more gracefully, and updates one of the caller functions
to the function to turn off the use of truncating stores when preferring
vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103173
2021-05-27 14:16:32 +01:00
Fraser Cormack 772b58a641 [SelectionDAG][RISCV] Don't unroll 0/1-type bool VSELECTs
This patch extends the cases in which the legalizer is able to express
VSELECT in terms of XOR/AND/OR. When dealing with a VSELECT between
boolean vector types, the mask itself is an all-ones or all-ones value
of the operand type, so a 0/1 boolean type behaves identically to a 0/-1
type.

This greatly helps RISC-V which relies on expansion for these nodes. It
also allows scalable-vector bool VSELECTs to use the default expansion,
where before it would crash in SelectionDAG::UnrollVectorOp.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103147
2021-05-27 10:08:57 +01:00
Jonas Paulsson d058262b14 [SystemZ] Support i128 inline asm operands.
Support virtual, physical and tied i128 register operands in inline assembly.

i128 is on SystemZ not really supported and is not a legal type and generally
such a value will be split into two i64 parts. There are however some
instructions that require a pair of two GPR64 registers contained in the GR128
bit reg class, which is untyped.

For inline assmebly operands, it proved to be very cumbersome to first follow
the general behavior of splitting an i128 operand into two parts and then
later rebuild the INLINEASM MI to have one GR128 register. Instead, some
minor common code changes were made to SelectionDAGBUilder to only create one
GR128 register part to begin with. In particular:

- getNumRegisters() now has an optional parameter "RegisterVT" which is
  passed by AddInlineAsmOperands() and GetRegistersForValue().

- The bitcasting in GetRegistersForValue is not performed if RegVT is
  Untyped.

- The RC for a tied use in AddInlineAsmOperands() is now computed either from
  the tied def (virtual register), or by getMinimalPhysRegClass() (physical
  register).

- InstrEmitter.cpp:EmitCopyFromReg() has been fixed so that the register
  class (DstRC) can also be computed for an illegal type.

In the SystemZ backend getNumRegisters(), splitValueIntoRegisterParts() and
joinRegisterPartsIntoValue() have been implemented to handle i128 operands.

Differential Revision: https://reviews.llvm.org/D100788

Review: Ulrich Weigand
2021-05-26 10:08:32 -05:00
Michael Liao c9dd29925f [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics.
- When memory intrinsics, such as memcpy, the attached scoped AA
  metadata is not passed down to the backend. As a result, the backend
  cannot schedule relevant memory operations around them following that
  hint. In this patch, SelectionDAG is enhanced to propagate that
  metadata (scoped AA only) when they are lowered into loads and stores.

Differential Revision: https://reviews.llvm.org/D102215
2021-05-25 14:42:26 -04:00
LemonBoy fd5cc41818 [SelectionDAG] Fix argument copy elision with irregular types
D29668 enabled to avoid a useless copy of the argument value into an alloca if the caller places it in memory (as it often happens on x86) by directly forwarding the pointer to it. This optimization is illegal if the type contains padding bytes: if a truncating store into the alloca is replaced the upper bits are filled with garbage and produce code misbehaving at runtime.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D102153
2021-05-22 09:43:37 +02:00
Stephen Tozer 36ec97f76a 3rd Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
This reapplies c0f3dfb9, which was reverted following the discovery of
crashes on linux kernel and chromium builds - these issues have since
been fixed, allowing this patch to re-land.

This reverts commit 4397b7095d.
2021-05-21 11:06:20 +01:00
Jessica Clarke e10958c807 [SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.

This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.

This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.

Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits,
where zero-extending atomics were incorrectly returning 0 rather than
the (slightly confusing) required return value of 1.

Re-landed again after D102819 fixed PowerPC to correctly zero-extend all
of its atomics as it claimed to do, since the combination of that bug
and this optimisation caused buildbot regressions.

Reviewed By: RKSimon, atanasyan

Differential Revision: https://reviews.llvm.org/D101342
2021-05-20 20:34:23 +01:00
Fraser Cormack 26bd2250c1 [RISCV] Ensure shuffle splat operands are type-legal
The use of `SelectionDAG::getSplatValue` isn't guaranteed to return a
type-legal splat value as it may implicitly extract a vector element
from another shuffle. It is not permitted to introduce an illegal type
when lowering shuffles.

This patch addresses the crash by adding a boolean flag to
`getSplatValue`, defaulting to false, which when set will ensure a
type-legal return value. If it is unable to do that it will fail to
return a splat value.

I've been through the existing uses of `getSplatValue` in other targets
and was unable to find a need or test cases showing a need to update
their uses. In some cases, the call is made during `LegalizeVectorOps`
which may still produce illegal scalar types. In other situations, the
illegally-typed splat value may be quickly patched up to a legal type
(such as any-extending the returned `extract_vector_elt` up to a legal
type) before `LegalizeDAG` notices.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102687
2021-05-20 18:00:03 +01:00
Stephen Tozer cf725dde9c [DebugInfo] Handle DIArgList in FastISel or GlobalIsel
Currently, variadic dbg.values (i.e. those using a DIArgList as part of
their location) are not handled properly by FastISel or GlobalISel, and
will produce invalid DBG_VALUE instructions if they encounter them. This
patch fixes this issue by emitting undef DBG_VALUE instructions for
variadic dbg.values, so that no incorrect instruction is produced and
any prior variable location is terminated.

This is simply a quick-fix to prevent errors; a correct implementation
should come later for these ISel pipelines to ensure that we do not drop
debug information unnecessarily.

Differential Revision: https://reviews.llvm.org/D102500
2021-05-20 17:37:28 +01:00
David Sherwood a21bff0673 [CodeGen] Add support for widening the result of EXTRACT_SUBVECTOR
When trying to return a type such as <vscale x 1 x i32> from a
function we crash in DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR
when attempting to get the fixed number of elements in the vector.

For the simple case we are dealing with, i.e. extracting
<vscale x 1 x i32> from index 0 of input vector <vscale x 4 x i32>
we can simply rely upon existing code that just returns the input.

Differential Revision: https://reviews.llvm.org/D102605
2021-05-20 12:27:08 +01:00
David Sherwood d07d5c1b06 [CodeGen] Add support for widening INSERT_SUBVECTOR operands
When attempting to return something like a <vscale x 1 x i32>
type from a function we end up trying to widen the vector by
inserting a <vscale x 1 x i32> subvector into an undefined
<vscale x 4 x i32> vector. However, during legalisation we
then attempt to widen the INSERT_SUBVECTOR operands and hit
an error in WidenVectorOperand.

This patch adds a new WidenVecOp_INSERT_SUBVECTOR function
that currently only supports inserting subvectors into undefined
vectors.

Differential Revision: https://reviews.llvm.org/D102501
2021-05-20 10:37:03 +01:00
Sanjay Patel 6025663578 [SDAG] propagate FMF from target-specific IR intrinsics
This is a step towards relying more on node-level FMF rather than function-wide
or target settings.
I think it was just an oversight that we didn't get this path in D87361
or follow-on patches.

The lack of FMF propagation is blocking D90901 from converting tests to IR-level FMF.

We can't do much more than this currently because we also fail to propagate flags
from x86-specific node to generic FMA node. That would be another patch, so the
test just verifies that we can transfer from IR to initial SDAG node.

Differential Revision: https://reviews.llvm.org/D102725
2021-05-19 07:50:50 -04:00
Arthur Eubanks bc7d15c61d [NFC] Use ArgListEntry indirect types more in ISel lowering
For opaque pointers, we're trying to avoid uses of
PointerType::getElementType().

A couple of ISel places use PointerType::getElementType(). Some of these
are easy to fix by using ArgListEntry's indirect types.

The inalloca type wasn't stored there, as opposed to preallocated and
byval which have their indirect types available, so add it and use it.

This is a reland after an MSan fix in D102667.

Differential Revision: https://reviews.llvm.org/D101713
2021-05-18 14:30:22 -07:00
Arthur Eubanks 1c7f32334d [TargetLowering] Only inspect attributes in the arguments for ArgListEntry
Parameter attributes are considered part of the function [1], and like
mismatched calling conventions [2], we can't have the verifier check for
mismatched parameter attributes.

This is a reland after fixing MSan issues in D102667.

[1] https://llvm.org/docs/LangRef.html#parameter-attributes
[2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D101806
2021-05-18 14:30:22 -07:00
Ten Tzen 797ad70152 [Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 1
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.

This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
This is the first step of this project; only X86_64 target is enabled in this patch.

Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.

NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.

The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules:
* First, no exception can move in or out of _try region., i.e., no "potential
  faulty instruction can be moved across _try boundary.
* Second, the order of exceptions for instructions 'directly' under a _try
  must be preserved (not applied to those in callees).
* Finally, global states (local/global/heap variables) that can be read
  outside of _try region must be updated in memory (not just in register)
  before the subsequent exception occurs.

The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding
process.

Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.

This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.

One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:

A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing
SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others.  If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:

Warning: jump bypasses variable with a non-trivial destructor

In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.

Implementation:
Part-1: Clang implementation described below.

Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end().
_scope_begin() is immediately added after ctor() is called and EHStack is pushed.
So it must be an invoke, not a call. With that it's also guaranteed an
EH-cleanup-pad is created regardless whether there exists a call in this scope.
_scope_end is added before dtor(). These two intrinsics make the computation of
Block-State possible in downstream code gen pass, even in the presence of
ctor/dtor inlining.

Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark
_try boundary and to prevent from exceptions being moved across _try boundary.
All memory instructions inside a _try are considered as 'volatile' to assure
2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's
acceptable as the amount of code directly under _try is very small.

Part-2 (will be in Part-2 patch): LLVM implementation described below.

For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).

For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.

The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.

Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.html
https://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html

Differential Revision: https://reviews.llvm.org/D80344/new/
2021-05-17 22:42:17 -07:00
Serguei Katkov 57c660f374 [Statepoint Lowering] Cleanup: remove unused option statepoint-always-spill-base. 2021-05-18 12:15:15 +07:00
Simon Pilgrim c29522d648 [TargetLowering] prepareUREMEqFold/prepareSREMEqFold - account for non legal shift types
Ensure we tell getShiftAmountTy that we're working with pre-legalized types to prevent cases where the (legalized) shift type can no longer handle the (non-legalized) type width.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34366
2021-05-17 11:03:27 +01:00
Fraser Cormack 85e31eddf2 [DAGCombiner] Relax an assertion to an early return
The select-of-constants transform was asserting that its constant vector
inputs did not implicitly truncate their input without that as an
explicit precondition to the function. This patch relaxes that assertion
into an early return to skip the optimization.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D102393
2021-05-17 09:15:55 +01:00
Arthur Eubanks 341902672c Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry"
This reverts commit 16748bd2fb.

Causes https://crbug.com/1209013
2021-05-16 22:02:10 -07:00
Arthur Eubanks 7647cb14dc Revert "[NFC] Use ArgListEntry indirect types more in ISel lowering"
This reverts commit 85af8a8c1b.
2021-05-16 22:00:54 -07:00
Pan, Tao 976a3e5f61 [SelectionDAG] Make fast and linearize visible by clang -pre-RA-sched
ScheduleDAGFast.cpp is compiled to object file, but the ScheduleDAGFast
object file isn't linked into clang executable file as no symbol is
referred by outside. Add calling to createXxx of ScheduleDAGFast.cpp,
then the ScheduleDAGFast object file will be linked into clang
executable file. The static RegisterScheduler will register scheduler
fast and linearize at clang boot time.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D101601
2021-05-17 11:25:15 +08:00
Nikita Popov fb9ed1979a [IR] Add BasicBlock::isEntryBlock() (NFC)
This is a recurring and somewhat awkward pattern. Add a helper
method for it.
2021-05-15 12:41:58 +02:00
Sanjay Patel 9dfd7f9b67 [SDAG] reduce code duplication for extend_vec_inreg combines; NFC
These are identical so far, and I was looking at adding a fold
for a pattern with scalar_to_vector which would also nd up duplicated.
2021-05-14 08:29:57 -04:00
Tim Northover ea0eec69f1 IR+AArch64: add a "swiftasync" argument attribute.
This extends any frame record created in the function to include that
parameter, passed in X22.

The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:

  * 0b0000 => a normal, non-extended record with just [FP, LR]
  * 0b0001 => the extended record [X22, FP, LR]
  * 0b1111 => kernel space, and a non-extended record.

All other values are currently reserved.

If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.

There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
2021-05-14 11:43:58 +01:00
cynecx 8ec9fd4839 Support unwinding from inline assembly
I've taken the following steps to add unwinding support from inline assembly:

1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax:

```
invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"()
    to label %exit unwind label %uexit
```

2.) Add Bitcode writing/reading support + LLVM-IR parsing.

3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled.

4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind.

5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled.

6.) Don't allow unwinding callbr.

Reviewed By: Amanieu

Differential Revision: https://reviews.llvm.org/D95745
2021-05-13 19:13:03 +01:00
Max Kazantsev d8b37de8a4 [GC][NFC] Move GCStrategy from CodeGen to IR
We want it to be available in analyzes so that we could use the
CodeGen notion in middle-end passes (for example, to check if
a GC may free some particular pointer).

This is a preparatory patch that simply moves the files around.

Note: if this causes some build issues, this patch must just be reverted.

Differential Revision: https://reviews.llvm.org/D100557
Reviewed By: reames
2021-05-13 12:31:59 +07:00
Fraser Cormack c5ec00e62b [TargetLowering] Improve legalization of scalable vector types
This patch extends the vector type-conversion and legalization capabilities of
scalable vector types.

Firstly, `vscale x 1` types now behave more like the corresponding `vscale x
2+` types. This enables the integer promotion legalization of extended scalable
types, such as the promotion of `<vscale x 1 x i5>` to `<vscale x 1 x i8>`.

These `vscale x 1` types are also now better handled by
`getVectorTypeBreakdown`, where what looks like older handling for 1-element
fixed-length vector types was spuriously updated to include scalable types.

Widening of scalable types is now better supported, by using `INSERT_SUBVECTOR`
to insert the smaller scalable vector "value" type into the wider scalable
vector "part" type. This allows AArch64 to pass and return `vscale x 1` types
by value by widening.

There are still cases where we are unable to legalize `vscale x 1` types, such
as where expansion would require splitting the vector in two.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D102073
2021-05-12 16:33:07 +01:00
Stefan Pintilie 8d37411e48 Revert "[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics"
This reverts commit 6c80361b84.
Breaks PowerPC Big Endian buildbots.
2021-05-12 09:46:18 -05:00
Hendrik Greving 762ac725bf [DAGCombiner] Fix DAG combine store elimination, different address space.
Fixes a bug in the DAG combiner that eliminates the stores because it missed
to inspect the address space of the pointers.

%v = load %ptr_as1
// no chain side effect
store %v, %ptr_as2

As well as

store %v, %ptr_as1
store %v, %ptr_as2

Fixes a test for above in X86.

Differential Revision: https://reviews.llvm.org/D102096
2021-05-12 07:14:22 -07:00
Arthur Eubanks 85af8a8c1b [NFC] Use ArgListEntry indirect types more in ISel lowering
For opaque pointers, we're trying to avoid uses of
PointerType::getElementType().

A couple of ISel places use PointerType::getElementType(). Some of these
are easy to fix by using ArgListEntry's indirect types.

The inalloca type wasn't stored there, as opposed to preallocated and
byval which have their indirect types available, so add it and use it.

Differential Revision: https://reviews.llvm.org/D101713
2021-05-10 13:05:15 -07:00
Arthur Eubanks 16748bd2fb [TargetLowering] Only inspect attributes in the arguments for ArgListEntry
Parameter attributes are considered part of the function [1], and like
mismatched calling conventions [2], we can't have the verifier check for
mismatched parameter attributes.

[1] https://llvm.org/docs/LangRef.html#parameter-attributes
[2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D101806
2021-05-10 12:35:11 -07:00
Bradley Smith 635164b95a [AArch64][SVE] Improve SVE codegen for fixed length BITCAST
Expanding a fixed length operation involves wrapping the operation in an
insert/extract subvector pair, as such, when this is done to bitcast we
end up with an extract_subvector of a bitcast. DAGCombine tries to
convert this into a bitcast of an extract_subvector which restores the
initial fixed length bitcast, causing an infinite loop of legalization.

As part of this patch, we must make sure the above DAGCombine does not
trigger after legalization if the created bitcast would not be legal.

Differential Revision: https://reviews.llvm.org/D101990
2021-05-10 14:43:53 +01:00
Fraser Cormack 6db0cedd23 [LegalizeVectorOps][RISCV] Add scalable-vector SELECT expansion
This patch extends VectorLegalizer::ExpandSELECT to permit expansion
also for scalable vector types. The only real change is conditionally
checking for BUILD_VECTOR or SPLAT_VECTOR legality depending on the
vector type.

We can use this to fix "cannot select" errors for scalable vector
selects on the RISCV target. Note that in future patches RISCV will
possibly custom-lower vector SELECTs to VSELECTs for branchless codegen.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102063
2021-05-10 08:22:35 +01:00
Simon Pilgrim dd21c6b843 [DAG] Ensure all SD classes consistently return a const reference with getDebugLoc(). NFCI.
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.
2021-05-07 14:48:23 +01:00
Stephen Tozer ce0c1f3ced [DebugInfo] Fix crash when emitting an invalidated SDDbgValue
This patch fixes a crash in the compiler that occurs when certain
invalidated SDDbgValues are emitted. The cause of this was that we would
attempt to check the liveness of the debug value's operands, which
triggers an assert if any of those operands are invalid. This patch
changes this check such that it only occurs if the SDDbgValue is valid;
if not, the check is irrelevant anyway, so can be safely ignored.

Differential Revision: https://reviews.llvm.org/D101540
2021-05-07 13:13:56 +01:00
Simon Pilgrim 280aa3415e [DAG] Add a generic expansion for SHIFT_PARTS opcodes using funnel shifts
Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts.

Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication.

I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to).

NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly.

Differential Revision: https://reviews.llvm.org/D101987
2021-05-07 13:12:30 +01:00
Jessica Clarke 6c80361b84 [SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.

This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.

This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.

Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits,
where zero-extending atomics were incorrectly returning 0 rather than
the (slightly confusing) required return value of 1.

Reviewed By: RKSimon, atanasyan

Differential Revision: https://reviews.llvm.org/D101342
2021-05-06 04:01:20 +01:00
Jessica Clarke 897d7bceb9 Revert "[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics"
This seems to have broken sanitizers, giving lots of

  Assertion `NumBits <= MAX_INT_BITS && "bitwidth too large"' failed.

failures across multiple targets (currently X86 and PowerPC). Reverting
until I have a chance to reproduce and debug.

This reverts commit 6e876f9ded.
2021-05-05 17:02:05 +01:00
Jessica Clarke 6e876f9ded [SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.

This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.

This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.

Reviewed By: RKSimon, atanasyan

Differential Revision: https://reviews.llvm.org/D101342
2021-05-05 16:34:45 +01:00
Christudasan Devadasan 80c79035ef DAG: Cleanup assertion in EmitFuncArgumentDbgValue
Removing an assertion introduced with D68945. The
patch was later reverted with 6531a78ac4, but failed
to remove this assertion. It causes a problem while
trying to split a 64-bit argument into sub registers.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D101594
2021-05-04 21:48:58 +05:30
Tomas Matheson 9d86095ff8 Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0"
This reverts commit 753185031d.
2021-05-03 21:48:20 +01:00
Tomas Matheson 753185031d [CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0
atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.

To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.

Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)

Differential Revision: https://reviews.llvm.org/D101164

Originally submitted as 3338290c18.
Reverted in c7df6b1223.
2021-05-03 20:25:15 +01:00
Craig Topper 6430430958 [TableGen] Use sign rotated VBR for OPC_EmitInteger.
This allows for a much more efficient encoding for small negative
numbers by storing the sign bit first and negating the rest of
the bits. This was already being used for OPC_CheckInteger.

For every in tree target this affects, the table got smaller.
R600GenDAGISel.inc saw the largest reduction of 7K.

I did have to add a new opcode for StringIntegers used for
register class ids and subregister indices since we don't have the
integer value to encode. The enum name is emitted directly into
the table. Previously assumed the enum would expand to a positive
7-bit number. We might be able to just shift that right by 1 and
assume it is a positive 6 bit number, but that will need more
investigation.
2021-05-02 12:40:44 -07:00
Nathan Chancellor 4397b7095d
Revert "Re-reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands""
This reverts commit 791930d740, as per
https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy.

I observed breakage with the Linux kernel, as reported at
https://reviews.llvm.org/D91722#2724321

Fixes exist at
https://reviews.llvm.org/D101523
https://reviews.llvm.org/D101540

but they have not landed so to unbreak the tree for the weekend, revert
this commit.

Commit b11e4c9907 ("Revert "[DebugInfo] Drop DBG_VALUE_LISTs with an
excessive number of debug operands"") only reverted one follow-up fix,
not the original patch that broke the kernel.

e
2021-04-30 20:23:21 -07:00
Tomas Matheson c7df6b1223 Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0"
This reverts commit 3338290c18.

Broke expensive checks on debian.
2021-04-30 16:53:14 +01:00
Tomas Matheson 3338290c18 [CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0
atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.

To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.

Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)

Differential Revision: https://reviews.llvm.org/D101164
2021-04-30 16:40:33 +01:00
Craig Topper 0c330afdfa [RISCV] Enable SPLAT_VECTOR for fixed vXi64 types on RV32.
This replaces D98479.

This allows type legalization to form SPLAT_VECTOR_PARTS so we don't
lose the splattedness when the scalar type is split.

I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so
we can continue using non-VL nodes for scalable vectors.

I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes
to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR
with other operations. Especially interesting is a splat BUILD_VECTOR of
the extract_vector_elt which can become a splat shuffle, but won't if
we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR
or add visitSPLAT_VECTOR.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100803
2021-04-29 08:20:09 -07:00
Craig Topper 3067520bf4 [SelectionDAG] Use a VTSDNode to store the saturation width for FP_TO_SINT_SAT/FP_TO_UINT_SAT
Previously we used an i32 constant to store the saturation width, but i32 isn't
legal on RISCV64. This wasn't a big deal to fix, but it is extra work for the
type legalizer.

This patch uses a VTSDNode to store the type similar to SEXT_INREG. This makes
it opaque to the type legalizer.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101262
2021-04-27 14:38:42 -07:00
Dávid Bolvanský ef2dc7ed9f [Analysis] Attribute alignment should not prevent tail call optimization
Fixes tail folding issue mentioned in D100879.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D101230
2021-04-24 19:57:42 +02:00
Stephen Tozer 791930d740 Re-reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
Previous build failures were caused by an error in bitcode reading and
writing for DIArgList metadata, which has been fixed in e5d844b587.
There were also some unnecessary asserts that were being triggered on
certain builds, which have been removed.

This reverts commit dad5caa59e.
2021-04-23 10:54:01 +01:00
Craig Topper 5185b52988 [RISCV] Fix crash with fptosi.sat/fptoui.sat intrinsics on RV64. Add test cases.
Add PromoteIntOp_FP_TO_XINT_SAT to type legalize the bit width
operand from i32 to i64 for RV64.

Add test cases for the saturating intrinsics for half/float/double
and i32/i64. CodeGen is definitely not optimal. We can probably
make use of the native behavior of fcvt instructions in many cases.

Fixes PR50083
2021-04-22 15:18:15 -07:00
Jun Ma 978eb3f168 [DAGCombiner] Allow operand of step_vector to be negative.
It is proper to relax non-negative limitation of step_vector.
Also this patch adds more combines for step_vector:
(sub X, step_vector(C)) -> (add X,  step_vector(-C))

Differential Revision: https://reviews.llvm.org/D100812
2021-04-22 20:58:03 +08:00
Simon Pilgrim d860bf2d0e [DAG] TargetLowering.cpp - breakup if-else chains where each block returns. NFCI.
Match style guide that requests that if+return blocks are separate.
2021-04-21 11:17:27 +01:00
Fraser Cormack c141bd3cf9 [DAGCombiner] Support all-ones/all-zeros SPLAT_VECTOR in more combines
This patch adds incrementally-better support for SPLAT_VECTOR in a
handful of vector combines by changing a few more
isBuildVectorAllOnes/isBuildVectorAllZeros to the equivalent
isConstantSplatVectorAllOnes/Zeros calls.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D100851
2021-04-21 11:05:37 +01:00
Philip Reames 4824d876f0 Revert "Allow invokable sub-classes of IntrinsicInst"
This reverts commit d87b9b81cc.

Post commit review raised concerns, reverting while discussion happens.
2021-04-20 15:38:38 -07:00
Philip Reames d87b9b81cc Allow invokable sub-classes of IntrinsicInst
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.

This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.

After this lands and has baked for a couple days, planned cleanups:
    Make GCStatepointInst a IntrinsicInst subclass.
    Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
    Do the same in SelectionDAG.
    Do the same in FastISEL.

Differential Revision: https://reviews.llvm.org/D99976
2021-04-20 15:03:49 -07:00
Simon Pilgrim 2a419a0b99 [X86][SSE] combineX86ShuffleChain - check if we're blending with zero into already zero elements
Add a SelectionDAG::MaskedElementsAreZero helper that wraps SelectionDAG::MaskedValueIsZero testing for entirely zero vector elements
2021-04-20 17:09:49 +01:00
Simon Pilgrim e156f2515c [DAG] SelectionDAG.cpp - breakup if-else chains where each block returns. NFCI.
Match style guide that requests that if+return blocks are separate.
2021-04-20 12:37:00 +01:00
Jun Ma 1ef5699d1a [DAGCombiner] Support fold zero scalar vector.
This patch changes ISD::isBuildVectorAllZeros to
ISD::isConstantSplatVectorAllZeros which handles zero sclar vector.

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D100813
2021-04-20 16:28:43 +08:00
Fraser Cormack 457da7f298 [SelectionDAG] Relax constraints on STEP_VECTOR step operand
This patch relaxes the requirement that the STEP_VECTOR step constant
must be of a type at least as large as the vector element type. This
does not permit its use on targets which have legal vector element types
larger than the largest legal scalar type, such as i64 vectors on RV32.

As such, the requirement has been loosened so that the step operand must
be any scalar type so long as the constant immediate is non-negative and
the value fits inside the vector element type.

This limits combining optimizations in certain circumstances but in
practice it's unlikely to be a hindrance.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D100660
2021-04-20 08:41:42 +01:00
David Sherwood 83f5fa519e [CodeGen] Improve code generation for clamping of constant indices with scalable vectors
When trying to clamp a constant index into a scalable vector we can
test if the index is less than the minimum number of elements in the
vector. If so, we can simply return the index because we know it is
guaranteed to fit inside the vector.

Differential Revision: https://reviews.llvm.org/D100639
2021-04-19 08:34:17 +01:00
Serge Guelton d6de1e1a71 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
Simon Pilgrim 37a4621fb6 [DAG] SelectionDAG::isSplatValue - early out if binop is not splat. NFCI.
Just return false if we fail to match splats - the remainder of the code is for (fixed)vector operations - shuffles/insertions etc.
2021-04-16 18:26:33 +01:00
Momchil Velikov f9d932e673 [clang][AArch64] Correctly align HFA arguments when passed on the stack
When we pass a AArch64 Homogeneous Floating-Point
Aggregate (HFA) argument with increased alignment
requirements, for example

    struct S {
      __attribute__ ((__aligned__(16))) double v[4];
    };

Clang uses `[4 x double]` for the parameter, which is passed
on the stack at alignment 8, whereas it should be at
alignment 16, following Rule C.4 in
AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules)

Currently we don't have a way to express in LLVM IR the
alignment requirements of the function arguments. The align
attribute is applicable to pointers only, and only for some
special ways of passing arguments (e..g byval). When
implementing AAPCS32/AAPCS64, clang resorts to dubious hacks
of coercing to types, which naturally have the needed
alignment. We don't have enough types to cover all the
cases, though.

This patch introduces a new use of the stackalign attribute
to control stack slot alignment, when and if an argument is
passed in memory.

The attribute align is left as an optimizer hint - it still
applies to pointer types only and pertains to the content of
the pointer, whereas the alignment of the pointer itself is
determined by the stackalign attribute.

For byval arguments, the stackalign attribute assumes the
role, previously perfomed by align, falling back to align if
stackalign` is absent.

On the clang side, when passing arguments using the "direct"
style (cf. `ABIArgInfo::Kind`), now we can optionally
specify an alignment, which is emitted as the new
`stackalign` attribute.

Patch by Momchil Velikov and Lucas Prates.

Differential Revision: https://reviews.llvm.org/D98794
2021-04-15 22:58:14 +01:00
Jun Ma 7e1422c1e4 [DAGCombiner] Fold step_vector with add/mul/shl
This patch implements some DAG combines for STEP_VECTOR:
add step_vector(C1), step_vector(C2) -> step_vector(C1+C2)
add (add X step_vector(C1)), step_vector(C2) -> add X step_vector(C1+C2)
mul step_vector(C1), C2 -> step_vector(C1*C2)
shl step_vector(C1), C2 -> step_vector(C1<<C2)

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D100088
2021-04-15 18:06:35 +08:00
Tim Northover 6401b78ab3 SDAG: constant fold bf16 -> i16 casts
This direction is particularly useful because i16 constants are much more
likely to be legal than bf16.
2021-04-14 11:27:46 +01:00
Tim Northover 5e3d9fcc3a StackProtector: ensure protection does not interfere with tail call frame.
The IR stack protector pass must insert stack checks before the call instead of
between it and the return.

Similarly, SDAG one should recognize that ADJCALLFRAME instructions could be
part of the terminal sequence of a tail call. In this case because such call
frames cannot be nested in LLVM the stack protection code must skip over the
whole sequence (or risk clobbering argument registers).
2021-04-13 15:14:57 +01:00
Amy Huang dad5caa59e Revert "Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands""
This change causes an assert / segmentation fault in LTO builds.

This reverts commit f2e4f3eff3.
2021-04-12 20:10:17 -07:00
Stephen Tozer f2e4f3eff3 Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
The causes of the previous build errors have been fixed in revisions
aa3e78a59f, and
140757bfaa

This reverts commit f40976bd01.
2021-04-12 16:57:29 +01:00
Stephen Tozer aa3e78a59f Reapply "[DebugInfo] Correctly track SDNode dependencies for list debug values"
Fixed memory leak error by using BumpAllocator for SDDbgValue arrays.

This reverts commit 1b589172bd.
2021-04-12 12:51:29 +01:00
dfukalov d066079728 [NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset.
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98027
2021-04-09 12:54:22 +03:00
Stephen Tozer 1b589172bd Revert "[DebugInfo] Correctly track SDNode dependencies for list debug values"
Reverted due to failure on the sanitizer-x86_64-linux-fast bot.

This reverts commit e10493eb50.
2021-04-08 17:55:45 +01:00
Stephen Tozer e10493eb50 [DebugInfo] Correctly track SDNode dependencies for list debug values
During SelectionDAG, we must track the SDNodes that each SDDbgValue depends on
to compute its value. These are ultimately derived from the location operands to
the SDDbgValue, but were stored in a separate vector prior to this patch. This
resulted in cases where one of the lists was updated incorrectly, resulting in
crashes during compilation. This patch fixes the issue by directly recomputing
the dependency list from the SDDbgOperands in getDependencies().

Differential Revision: https://reviews.llvm.org/D99423
2021-04-08 17:01:45 +01:00
Craig Topper 67953311e2 [SelectionDAG] Teach SelectionDAG::FoldConstantArithmetic to handle SPLAT_VECTOR
This allows FoldConstantArithmetic to handle SPLAT_VECTOR in
addition to BUILD_VECTOR. This allows it to support scalable
vectors. I'm also allowing fixed length SPLAT_VECTOR which is
used by some targets, but I'm not familiar enough to write tests
for those targets.

I had to block this function from running on CONCAT_VECTORS to
avoid calling getNode for a CONCAT_VECTORS of 2 scalars.
This can happen because the 2 operand getNode calls this
function for any opcode. Previously we were protected because
CONCAT_VECTORs of BUILD_VECTOR is folded to a larger BUILD_VECTOR
before that call. But it's not always possible to fold a CONCAT_VECTORS
of SPLAT_VECTORs, and we don't even try.

This fixes PR49781 where DAG combine thought constant folding
should be possible, but FoldConstantArithmetic couldn't do it.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D99682
2021-04-07 10:03:33 -07:00
Yevgeny Rouban 3e738afae4 [Statepoint Lowering] Allow other than N byte sized types in deopt bundle
I do not see any bit-width restriction from the point of the
LLVM Lang Ref - Operand Bundles on the types of the deopt bundle
operands. Statepoint Lowering seems to be able to work with any
types.
This patch relaxes the two related assertions and adds a new test
for this change.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D100006
2021-04-07 17:48:31 +07:00
Philip Reames fb41cae039 More precisely type code used for gc.relocate assertions [nfc] 2021-04-06 11:27:36 -07:00
Simon Pilgrim ddbb58736a [KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI.
As promised in D98866
2021-04-06 10:11:41 +01:00
Nikita Popov 665065821e [FastISel] Remove kill tracking
This is a followup to D98145: As far as I know, tracking of kill
flags in FastISel is just a compile-time optimization. However,
I'm not actually seeing any compile-time regression when removing
the tracking. This probably used to be more important in the past,
before FastRA was switched to allocate instructions in reverse
order, which means that it discovers kills as a matter of course.

As such, the kill tracking doesn't really seem to serve a purpose
anymore, and just adds additional complexity and potential for
errors. This patch removes it entirely. The primary changes are
dropping the hasTrivialKill() method and removing the kill
arguments from the emitFast methods. The rest is mechanical fixup.

Differential Revision: https://reviews.llvm.org/D98294
2021-04-03 15:50:13 +02:00
Simon Pilgrim 4ea5475a3f [KnownBits] Add KnownBits::haveNoCommonBitsSet helper. NFCI.
Include exhaustive test coverage.
2021-04-02 21:44:33 +01:00
Jun Ma 274ac9d40e [AArch64][SVE] Lowering sve.dot to DOT node
Differential Revision: https://reviews.llvm.org/D99699
2021-04-02 20:05:17 +08:00
Sander de Smalen 0f7bbbc481 Always emit error for wrong interfaces to scalable vectors, unless cmdline flag is passed.
In order to bring up scalable vector support in LLVM incrementally,
we introduced behaviour to emit a warning, instead of an error, when
asking the wrong question of a scalable vector, like asking for the
fixed number of elements.

This patch puts that behaviour under a flag. The default behaviour is
that the compiler will always error, which means that all LLVM unit
tests and regression tests will now fail when a code-path is taken that
still uses the wrong interface.

The behaviour to demote an error to a warning can be individually enabled
for tools that want to support experimental use of scalable vectors.
This patch enables that behaviour when driving compilation from Clang.
This means that for users who want to try out scalable-vector support,
fixed-width codegen support, or build user-code with scalable vector
intrinsics, Clang will not crash and burn when the compiler encounters
such a case.

This allows us to do away with the following pattern in many of the SVE tests:
  RUN: .... 2>%t
  RUN: cat %t | FileCheck --check-prefix=WARN
  WARN-NOT: warning: ...

The behaviour to emit warnings is only temporary and we expect this flag
to be removed in the future when scalable vector support is more stable.

This patch also has fixes the following tests:
 unittests:
   ScalableVectorMVTsTest.SizeQueries
   SelectionDAGAddressAnalysisTest.unknownSizeFrameObjects
   AArch64SelectionDAGTest.computeKnownBitsSVE_ZERO_EXTEND_VECTOR_INREG

 regression tests:
   Transforms/InstCombine/vscale_gep.ll

Reviewed By: paulwalker-arm, ctetreau

Differential Revision: https://reviews.llvm.org/D98856
2021-04-02 10:55:22 +01:00
Simon Pilgrim 77d625f8d8 [DAG] MergeInnerShuffle with BinOps - sometimes accept undef mask elements
If the inner shuffle already contains undef elements, then accept them in the merged shuffle as well.

This helps some X86 HADD/SUB patterns where slow targets were ending up with HADD/SUB because the (un)merged shuffles were stuck either side of the ADD/SUB - meaning we ended up with a total cost much higher than the "2*shuffle+add" that a slow target usually expands a HADD/SUB to.
2021-04-01 14:33:00 +01:00
Simonas Kazlauskas 777a58e05b Support {S,U}REMEqFold before legalization
This allows these optimisations to apply to e.g. `urem i16` directly
before `urem` is promoted to i32 on architectures where i16 operations
are not intrinsically legal (such as on Aarch64). The legalization then
later can happen more directly and generated code gets a chance to avoid
wasting time on computing results in types wider than necessary, in the end.

Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D88785
2021-04-01 01:35:41 +03:00
Craig Topper 9e00b6660d [SelectionDAG] Remove unneeded vector resize from the end of FoldConstantArithmetic. NFC
There's an assert right before that makes sure the size already matches.
Earlier in this function's life, scalars and vectors shared more
code.
2021-03-31 12:33:10 -07:00
Tomas Matheson a9968c0a33 [NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.

This patch attempts to clarify the situation by separating them and introducing
new names:

 - shouldRealignStack - true if there is any reason the stack should be
   realigned

 - canRealignStack - true if we are still able to realign the stack (e.g. we
   can still reserve/have reserved a frame pointer)

 - hasStackRealignment = shouldRealignStack && canRealignStack (not target
   customisable)

Targets can now override shouldRealignStack to indicate that stack realignment
is required.

This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).

Differential Revision: https://reviews.llvm.org/D98716

Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
2021-03-30 17:31:39 +01:00
Bradley Smith 9745dce8c3 [SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps
This is currently performed in SelectionDAGLegalize, here we make it also
happen in LegalizeVectorOps, allowing a target to lower the SETCC condition
codes first in LegalizeVectorOps and then lower to a custom node afterwards,
without having to duplicate all of the SETCC condition legalization in the
target specific lowering.

As a result of this, fixed length floating point SETCC nodes can now be
properly lowered for SVE.

Differential Revision: https://reviews.llvm.org/D98939
2021-03-29 15:32:25 +01:00
Florian Hahn eb3d9f2eb6
[SelDag] Add isIntOrFPConstant helper function.
This patch adds a new isIntOrFPConstant  helper function to check if a
SDValue is a integer of FP constant. This pattern is used in various
places.

There also are places that incorrectly just check for integer constants,
e.g. D99384, so hopefully this helper will help people avoid that issue.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D99428
2021-03-28 12:48:58 +01:00
David Sherwood 748ae5281d [IR][SVE] Add new llvm.experimental.stepvector intrinsic
This patch adds a new llvm.experimental.stepvector intrinsic,
which takes no arguments and returns a linear integer sequence of
values of the form <0, 1, ...>. It is primarily intended for
scalable vectors, although it will work for fixed width vectors
too. It is intended that later patches will make use of this
new intrinsic when vectorising induction variables, currently only
supported for fixed width. I've added a new CreateStepVector
method to the IRBuilder, which will generate a call to this
intrinsic for scalable vectors and fall back on creating a
ConstantVector for fixed width.

For scalable vectors this intrinsic is lowered to a new ISD node
called STEP_VECTOR, which takes a single constant integer argument
as the step. During lowering this argument is set to a value of 1.
The reason for this additional argument at the codegen level is
because in future patches we will introduce various generic DAG
combines such as

  mul step_vector(1), 2 -> step_vector(2)
  add step_vector(1), step_vector(1) -> step_vector(2)
  shl step_vector(1), 1 -> step_vector(2)
  etc.

that encourage a canonical format for all targets. This hopefully
means all other targets supporting scalable vectors can benefit
from this too.

I've added cost model tests for both fixed width and scalable
vectors:

  llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll
  llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll

as well as codegen lowering tests for fixed width and scalable
vectors:

  llvm/test/CodeGen/AArch64/neon-stepvector.ll
  llvm/test/CodeGen/AArch64/sve-stepvector.ll

See this thread for discussion of the intrinsic:
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html
2021-03-23 10:43:35 +00:00
Craig Topper 2f13e63f9e [LegalizeDAG] Add asserts to verify the types of custom legalized operation matches the original node.
We've messed this up a few times recently on RISCV. Experiments
with these asserts found a couple issues on other targets as well.
They've all been cleaned up now so we can put in these asserts to
catch future issues

I had to waive Glue because ADDC/ADDE/etc legalization replaces
Glue with i32 on at least AArch64. X86 used to do the same before
we switched to ADDCARRY. So I guess that's just how that works.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D98979
2021-03-22 10:28:51 -07:00
Craig Topper 30080b003e [DAGCombiner] Minor compile time improvement to (sext_in_reg (sign_extend_vector_inreg x)) optimization.
Don't bother calling ComputeNumSignBits if N00Bits < ExtVTBits. No
matter what answer we get back this will be true:
(N00Bits - DAG.ComputeNumSignBits(N00, DemandedSrcElts)) < ExtVTBits)

So we might as well save the computation. This makes the code more
consistent with the similar (sext_in_reg (sext x)) handling above.
2021-03-21 11:16:41 -07:00
Simon Pilgrim 64c2641c89 [DAG] Limit (sext_in_reg (zero_extend_vector_inreg x)) to exact sign extension
As commented by @craig.topper on rG1ba5c550d418, we can't guarantee that we'll be extending zero bits, just sign bit. So, revert to the old code for zero_extend_vector_inreg cases.
2021-03-21 14:01:37 +00:00
Simon Pilgrim 9d2df96407 [DAG] computeKnownBits - add ISD::MULHS/MULHU/SMUL_LOHI/UMUL_LOHI handling
Reuse the existing KnownBits multiplication code to handle the 'extend + multiply + extract high bits' pattern for multiply-high ops.

Noticed while looking at the codegen for D88785 / D98587 - the patch helps division-by-constant expansion code in particular, which suggests that we might have some further KnownBits div/rem cases we could handle - but this was far easier to implement.

Differential Revision: https://reviews.llvm.org/D98857
2021-03-19 16:02:31 +00:00
Simon Pilgrim ffb2887103 [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),undef) -> bop(shuffle'(x,y),shuffle'(z,w))
Followup to D96345, handle unary shuffles of binops (as well as binary shuffles) if we can merge the shuffle with inner operand shuffles.

Differential Revision: https://reviews.llvm.org/D98646
2021-03-19 14:14:56 +00:00
Craig Topper 182b831aeb [DAGCombiner][RISCV] Teach visitMGATHER/MSCATTER to remove gather/scatters with all zeros masks that use SPLAT_VECTOR.
Previously only all zeros BUILD_VECTOR was recognized.
2021-03-18 15:34:14 -07:00
Simon Pilgrim 1ba5c550d4 [DAG] Improve folding (sext_in_reg (*_extend_vector_inreg x)) -> (sext_vector_inreg x)
Extend this to support ComputeNumSignBits of the (used) source vector elements so that we can handle more than just the case where we're sext_in_reg from the source element signbit.

Noticed while investigating the poor codegen in D98587.
2021-03-18 15:34:53 +00:00
Simon Pilgrim b1afa187c8 [DAG] SelectionDAG::isSplatValue - add ISD::ABS handling
Add ISD::ABS to the existing unary instructions handling for splat detection

This is similar to D83605, but doesn't appear to need to touch any of the wasm refactoring.

Differential Revision: https://reviews.llvm.org/D98778
2021-03-18 10:28:29 +00:00
Stephen Tozer 3bfddc2593 Reapply "[DebugInfo] Handle multiple variable location operands in IR"
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.

This reverts commit 01ac6d1587.
2021-03-17 16:45:25 +00:00
Hans Wennborg 01ac6d1587 Revert "[DebugInfo] Handle multiple variable location operands in IR"
This caused non-deterministic compiler output; see comment on the
code review.

> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232

This reverts commit df69c69427.
2021-03-17 13:36:48 +01:00
serge-sans-paille 6e040a19db [NFC] Wisely nest dyn_cast in FunctionLoweringInfo
Take advantage of the inheritance tree to avoid a few comparison.
2021-03-16 10:22:44 +01:00
Fraser Cormack 0035decae7 [CodeGen] Fix issues with scalable-vector INSERT/EXTRACT_SUBVECTORs
This patch addresses a few issues when dealing with scalable-vector
INSERT_SUBVECTOR and EXTRACT_SUBVECTOR nodes.

When legalizing in DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR, we
store the low and high halves to the stack separately. The offset for
the high half was calculated incorrectly.

Additionally, we can optimize this process when we can detect that the
subvector is contained entirely within the low/high split vector type.
While this optimization is valid on scalable vectors, when performing
the 'high' optimization, the subvector must also be a scalable vector.
Note that the 'low' optimization is still conservative: it may be
possible to insert v2i32 into the low half of a split nxv1i32/nxv1i32,
but we can't guarantee it. It is always possible to insert v2i32 into
nxv2i32 or v2i32 into nxv4i32+2 as we know vscale is at least 1.

Lastly, in SelectionDAG::isSplatValue, we early-exit on the extracted subvector value
type being a scalable vector, forgetting that we can also extract a
fixed-length vector from a scalable one.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98495
2021-03-15 17:04:21 +00:00
Craig Topper 5b825433d7 [DAGCombiner] Optimize 1-bit smulo to AND+SETNE.
A 1-bit smulo overflows is both inputs are -1 since the result
should be +1 which can't be represented in a signed 1 bit value.

We can detect this with an AND and a setcc. The multiply result
can also use the same AND.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97634
2021-03-13 09:39:36 -08:00
Craig Topper 2ea7014089 [DAGCombiner] Use isConstantSplatVectorAllZeros/Ones instead of isBuildVectorAllZeros/Ones in visitMSTORE and visitMLOAD.
This allows us to optimize when the mask is a splat_vector in
addition to build_vector.
2021-03-12 12:14:56 -08:00
LemonBoy cfe69c8efd [SelectionDAG] Improve scalarization of irregular vector types
Use a more general strategy when splitting a vector into scalar parts (and vice-versa) to correctly handle vector types whose element size is not a power of 2 (and a multiple of 8).

Reviewed By: atanasyan

Differential Revision: https://reviews.llvm.org/D98273
2021-03-11 19:57:13 +01:00
Stephen Tozer f40976bd01 Revert "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
This reverts commit c0f3dfb9f1.

Reverted due to an error on the clang-x64-windows-msvc buildbot.
2021-03-11 14:48:01 +00:00
gbtozers c0f3dfb9f1 [DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands
This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic
operations with two or more non-const operands; this includes the GetElementPtr
instruction, and most Binary Operator instructions. These salvages produce
DIArgList locations and are only valid for dbg.values, as currently variadic
DIExpressions must use DW_OP_stack_value. This functionality is also only added
for salvageDebugInfoForDbgValues; other functions that directly call
salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be
updated in a later patch.

Differential Revision: https://reviews.llvm.org/D91722
2021-03-11 13:33:49 +00:00
Serguei Katkov 0480927712 [Statepoint Lowering] Handle the case with several gc.result
Recently gc.result has been marked with readnone instead of readonly and
this opens a door for different optimization to duplicate gc.result.
Statepoint lowering is not ready to see several gc.results.
The problem appears when there are gc.results with one located in the same
basic block and another located in other basic block.
In this case we need both export VR and fill local setValue.

Note that this case is not sufficient optimization done before CodeGen.
It is evident that local gc.result dominates all other gc.results and it is handled
by GVN and EarlyCSE.

But anyway, even if IR is not optimal Backend should not crash on a valid IR.

Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D98393
2021-03-11 18:44:44 +07:00
Quentin Colombet 66dab2fa84 [NFC] Fix compiler warnings
Fix warnings caused by -Wrange-loop-analysis.

Patch by Xiaoqing Wu <xiaoqing_wu@apple.com>

Differential Revision: https://reviews.llvm.org/D98298
2021-03-10 11:03:50 -08:00
Craig Topper 9106d04554 [RISCV][SelectionDAG] Introduce an ISD::SPLAT_VECTOR_PARTS node that can represent a splat of 2 i32 values into a nxvXi64 vector for riscv32.
On riscv32, i64 isn't a legal scalar type but we would like to
support scalable vectors of i64.

This patch introduces a new node that can represent a splat made
of multiple scalar values. I've used this new node to solve the current
crashes we experience when getConstant is used after type legalization.

For RISCV, we are now default expanding SPLAT_VECTOR to SPLAT_VECTOR_PARTS
when needed and then handling the SPLAT_VECTOR_PARTS later during
LegalizeOps. I've remove the special case I previously put in for
ABS for D97991 as the default expansion is now able to succesfully
use getConstant.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98004
2021-03-10 09:46:18 -08:00
Jinzheng Tu 481079e284 [NFC] Unify FIME with FIXME in comments
There are 5 occurrences FIME and 15333 FIXME. All of them should be FIXME.

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D98321
2021-03-10 14:00:51 +01:00
Serguei Katkov 2fccd1b00a [Statepoint Lowering] Fix the crash with gc.relocate in a separate block
If it was decided to relocate derived pointer using the spill its value is
not exported in general case.
When gc.relocate is located in an another block than a statepoint we cannot
get SD for derived value but for spill case it is not required at all.
However implementation of gc.relocate lowering unconditionally request SD value
causing the assert triggering.

The CL fixes this by handling spill case earlier than SD is really required.

Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D98324
2021-03-10 19:51:04 +07:00
Nikita Popov 55ae279ba7 [FastISel] Don't trivially kill extractvalues (PR49467)
All extractvalues of the same value at the same index will map to
the same register, so even if one specific extractvalue only has
one use, we should not mark it as a trivial kill, as there may be
more extractvalues later.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49467.

Differential Revision: https://reviews.llvm.org/D98145
2021-03-09 18:46:38 +01:00
gbtozers df69c69427 [DebugInfo] Handle multiple variable location operands in IR
This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.

Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.

Differential Revision: https://reviews.llvm.org/D88232
2021-03-09 16:44:38 +00:00
gbtozers 5491a86f59 [DebugInfo] Emit DBG_VALUE_LIST from ISel
This patch completes ISel support for DIArgList dbg.values by allowing
SDDbgValues with multiple location operands to be emitted as DBG_VALUE_LIST
instructions.

The primary change of this patch is refactoring EmitDbgValue by pulling location
operand emission out to the new function AddDbgValueLocationOps, which is used
for both DIArgList and single value dbg.values. Outside of that, the only
behaviour change is that the scheduler has a lambda added, HasUnknownVReg, to
prevent us from attempting to emit a DBG_VALUE_LIST before all of its used VRegs
have become available.

Differential Revision: https://reviews.llvm.org/D88592
2021-03-09 12:17:39 +00:00
Cullen Rhodes 2750f3ed31 [IR] Introduce llvm.experimental.vector.splice intrinsic
This patch introduces a new intrinsic @llvm.experimental.vector.splice
that constructs a vector of the same type as the two input vectors,
based on a immediate where the sign of the immediate distinguishes two
variants. A positive immediate specifies an index into the first vector
and a negative immediate specifies the number of trailing elements to
extract from the first vector.

For example:

  @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E>  ; index
  @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing element count

These intrinsics support both fixed and scalable vectors, where the
former is lowered to a shufflevector to maintain existing behaviour,
although while marked as experimental the recommended way to express
this operation for fixed-width vectors is to use shufflevector. For
scalable vectors where it is not possible to express a shufflevector
mask for this operation, a new ISD node has been implemented.

This is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker and Cullen Rhodes.

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D94708
2021-03-09 10:44:22 +00:00
gbtozers 93b170ea24 [DebugInfo] Handle dbg.values with multiple variable location operands in ISel
This patch adds partial support in Instruction Selection for dbg.values that use
a DIArgList. This patch does not add support for producing DBG_VALUE_LIST, but
adds the logic for processing DIArgLists within the ISel pass. This change is
largely focused on handleDebugValue and some of the functions that it calls.
Outside of this, salvageDebugInfo and transferDbgValues have been modified to
replace individual operands instead of the entire value; dangling debug info for
variadic debug values is not currently supported (but may be added later).

Differential Revision: https://reviews.llvm.org/D88589
2021-03-09 09:48:03 +00:00
Jessica Paquette f7d73a6b9e [SelectionDAG] Don't scalarize vector fpround sources that don't need it.
Similar to the workaround code in ScalarizeVecRes_UnaryOp, ScalarizeVecRes_SETCC
, ScalarizeVecRes_VSELECT, etc.

If we have a case like this:

```
define <1 x half> @func(<1 x float> %x) {
  %tmp = fptrunc <1 x float> %x to <1 x half>
  ret <1 x half> %tmp
}
```

On AArch64, the <1 x float> is legal. So, this will crash if we call
GetScalarizedVector on it.

Differential Revision: https://reviews.llvm.org/D98208
2021-03-08 14:37:33 -08:00
Stephen Tozer c0450af559 Fix: [DebugInfo] Support representation of multiple location operands in SDDbgValue
Removes a "default" label from a fully covered switch, causing errors on
-Wcovered-switch-default builds.
2021-03-08 19:14:12 +00:00
gbtozers 9525af7b91 [DebugInfo] Support representation of multiple location operands in SDDbgValue
This patch modifies the class that represents debug values during ISel,
SDDbgValue, to support multiple location operands (to represent a dbg.value that
uses a DIArgList). Part of this class's functionality has been split off into a
new class, SDDbgOperand.

The new class SDDbgOperand represents a single value, corresponding to an SSA
value or MachineOperand in the IR and MIR respectively. Members of SDDbgValue
that were previously related to that specific value (as opposed to the
variable or DIExpression), such as the Kind enum, have been moved to
SDDbgOperand. SDDbgValue now contains an array of SDDbgOperand instead, allowing
it to hold more than one of these values.

All changes outside SDDbgValue are simply updates to use the new interface.

Differential Revision: https://reviews.llvm.org/D88585
2021-03-08 18:45:17 +00:00
gbtozers e5d958c456 [DebugInfo] Support DIArgList in DbgVariableIntrinsic
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
location operand, resulting in a significant change to its interface. This patch
does not update all IR passes to support multiple location operands in a
dbg.value; the only change is to update the DbgVariableIntrinsic interface and
its uses. All code outside of the intrinsic classes assumes that an intrinsic
will always have exactly one location operand; they will still support
DIArgLists, but only if they contain exactly one Value.

Among other changes, the setOperand and setArgOperand functions in
DbgVariableIntrinsic have been made private. This is to prevent code from
setting the operands of these intrinsics directly, which could easily result in
incorrect/invalid operands being set. This does not prevent these functions from
being called on a debug intrinsic at all, as they can still be called on any
CallInst pointer; it is assumed that any code directly setting the operands on a
generic call instruction is doing so safely. The intention for making these
functions private is to prevent DIArgLists from being overwritten by code that's
naively trying to replace one of the Values it points to, and also to fail fast
if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
corresponding DIExpression.
2021-03-08 14:36:13 +00:00
Craig Topper 0eb405c3b8 [SelectionDAG] Add computeKnownBits support for ISD::USUBSAT.
The result of ISD::USUBSAT will never be larger than the LHS. We
can use this to put a bound on the number of leading zeros.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D98133
2021-03-07 09:48:42 -08:00
LemonBoy 2ec43e4167 [LegalizeDAG] Implement promotion rules for SELECT_CC
Implement the promotion rule for SELECT_CC nodes by upcasting all the parameters and downcasting the result.
The AArch64 target makes use of this rule and, since it was not implemented, in some cases the instruction selector would hit an assertion upon encountering the illegal node.

This patch requires D97840, the included test cases hit both problems.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97859
2021-03-05 18:22:55 +01:00
Craig Topper ad532be012 [SelectionDAG] Assert that operands to SelectionDAG::getNode are not DELETED_NODE to catch issues like PR49393 earlier.
I'm not sure this would catch all such issues, but it would catch some.

The problem for PR49393 was that we were holding a reference to a node that
wasn't connect edto the DAG across a function that could delete unused nodes. In
this particular case we managed to try to use the deleted node while it was in
the deleted state before its memory got recycled.

It could also happen that we delete the node, something allocates a new node
which recycles the memory. Then  we try to use the reference we were holding and
it is now a completely different node with different valid opcode. This patch
would not catch that.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D97969
2021-03-04 23:05:32 -08:00
Craig Topper 74e6030bcb [TargetLowering] Use HandleSDNodes to prevent nodes from being deleted by recursive calls in getNegatedExpression.
For binary or ternary ops we call getNegatedExpression multiple
times and then compare costs. While we're doing this we need to
hold a node from the first call across the second call, but its
not yet attached to the DAG. Its possible the second call creates
an identical node and then decides it didn't need it so will try
to delete it if it has no uses. This can cause a reference to the
node we're holding further up the call stack to become invalidated.

To prevent this, we can use a HandleSDNode to artifically give
the node a use without connecting it to the DAG.

I've used a std::list of HandleSDNodes so we can create handles
only when we have a node to hold. HandleSDNode does not have
default constructor and cannot be copied or moved.

Fixes PR49393.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D97914
2021-03-04 22:48:25 -08:00
Akira Hatanaka 1900503595 [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.

Original commit message:

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-03-04 11:22:30 -08:00
Simon Pilgrim 7d3d9fe8cd [DAG] TargetLowering::BuildUDIV - use APInt as const ref. NFCI.
Fixes clang-tidy warning.
2021-03-04 12:15:08 +00:00
Craig Topper 90b7825598 [LegalizeVectorTypes] Remove a tautological compare. 2021-03-03 23:26:00 -08:00
Hans Wennborg 0a5dd06718 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.

> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
>   which indicates the call is implicitly followed by a marker
>   instruction and an implicit retainRV/claimRV call that consumes the
>   call result. In addition, it emits a call to
>   @llvm.objc.clang.arc.noop.use, which consumes the call result, to
>   prevent the middle-end passes from changing the return type of the
>   called function. This is currently done only when the target is arm64
>   and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
>   with the operand bundle in the IR and removes the inserted calls after
>   processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
>   operand bundle. It doesn't remove the operand bundle on the call since
>   the backend needs it to emit the marker instruction. The retainRV and
>   claimRV calls are emitted late in the pipeline to prevent optimization
>   passes from transforming the IR in a way that makes it harder for the
>   ARC middle-end passes to figure out the def-use relationship between
>   the call and the retainRV/claimRV calls (which is the cause of
>   PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
>   nothing in the callee prevents it from being paired up with the
>   retainRV/claimRV call in the caller. It then inserts a release call if
>   claimRV is attached to the call since autoreleaseRV+claimRV is
>   equivalent to a release. If it cannot find an autoreleaseRV call, it
>   tries to transfer the operand bundle to a function call in the callee.
>   This is important since the ARC optimizer can remove the autoreleaseRV
>   returning the callee result, which makes it impossible to pair it up
>   with the retainRV/claimRV call in the caller. If that fails, it simply
>   emits a retain call in the IR if retainRV is attached to the call and
>   does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
>   constant value if the call has the operand bundle. This ensures the
>   call always has at least one user (the call to
>   @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
>   multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
>   calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808

This reverts commit ed4718eccb.
2021-03-03 15:51:40 +01:00
Craig Topper 543b901e58 [LegalizeVectorTypes] Improve SplitVecRes_INSERT_SUBVECTOR to handle subvector being in the high half of the split or not at element 0 of the low half.
This function isn't exercised in lit tests today today according to
the code coverage report. But will be after the tests in D97543 and
D97559.

Posting this patch to help a crash that Fraser hit.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97582
2021-03-02 21:14:13 -08:00
Sanjay Patel 415c67ba4c [SDAG] allow partial undef vector constants with select->logic folds
This is an enhancement suggested in the original review/commit:
D97730 / 7fce3322a2
2021-03-02 14:29:15 -05:00
Sanjay Patel 7fce3322a2 [SDAG] allow vector types for select->logic folds
This prepares codegen for a change that will remove the identical
folds from IR because they are not poison-safe. See
D93065 / D97360
for details.

We already generically support scalar types, and there are various
target-specific transforms that overlap the vector folds. For example,
x86 recognizes the and patterns, but not or. We can end up with 1
extra instruction there, but I think that is still preferred over the
blendv alternative that loads a constant vector.

If this is not optimal, then it should be fixed with a later transform
(this change is not expected to result in any regressions because
InstCombine currently does the same thing).

Removing custom code and supporting undefs in constant-pattern-matching
can be follow-up changes.

Differential Revision: https://reviews.llvm.org/D97730
2021-03-02 09:25:10 -05:00
Simon Pilgrim c0d4b44e6a [DAG] DAGCombiner::tryStoreMergeOfLoads - remove unused StartAddress variable. NFCI.
Noticed in "initialization is never read" clang-tidy warning - the only StartAddress set/used is inside the load combine loop.
2021-03-02 13:29:31 +00:00
Sanjay Patel 154c47dc06 [SDAG] add helper for select->logic folds; NFC
This set of transforms should be extended to handle vector types.
2021-03-01 16:24:15 -05:00
Craig Topper e745f7c563 [LegalizeTypes] Improve ExpandIntRes_XMULO codegen.
The code previously used two BUILD_PAIRs to concatenate the two UMULO
results with 0s in the lower bits to match original VT. Then it created
an ADD and a UADDO with the original bit width. Each of those operations
need to be expanded since they have illegal types.

Since we put 0s in the lower bits before the ADD, the lower half of the
ADD result will be 0. So the lower half of the UADDO result is
solely determined by the other operand. Since the UADDO need to
be split in half, we don't really needd an operation for the lower
bits. Unfortunately, we don't see that in type legalization and end up
creating something more complicated and DAG combine or
lowering aren't always able to recover it.

This patch directly generates the narrower ADD and UADDO to avoid
needing to legalize them. Now only the MUL is done on the original
type.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97440
2021-03-01 09:54:32 -08:00
Simon Pilgrim 9dd83f5ee8 [DAG] visitVECTOR_SHUFFLE - attempt to match commuted shuffles with MergeInnerShuffle.
Try to match "shuffle(C, shuffle(A, B, M0), M1) -> shuffle(A, B, M2)" etc. by using MergeInnerShuffle's commuted inner shuffle mode.
2021-03-01 10:42:11 +00:00
Fraser Cormack 6718fda6ad [CodeGen] Fix issues with subvector intrinsic index types
This patch addresses issues arising from the fact that the index type
used for subvector insertion/extraction is inconsistent between the
intrinsics and SDNodes. The intrinsic forms require i64 whereas the
SDNodes use the type returned by SelectionDAG::getVectorIdxTy.

Rather than update the intrinsic definitions to use an overloaded index
type, this patch fixes the issue by transforming the index to the
correct type as required. Any loss of index bits going from i64 to a
smaller type is unexpected, and will be caught by an assertion in
SelectionDAG::getVectorIdxConstant.

The patch also updates the documentation for INSERT_SUBVECTOR and adds
an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR.
This necessitated changes to AArch64 which was using i64 for
EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed
its codegen after updating the backend accordingly.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D97459
2021-03-01 10:28:21 +00:00
Serguei Katkov 65fb706231 [Statepoint Lowering] Consider dead deopt gc values together with other gc values
Currently dead gc value mentioned in the deopt section are not listed in gc section
and so are processed separately.
With this CL all deopt gc values are considered as base pointers and processed in the
same way as other gc values.

The fact that deopt gc pointer is a base pointer was used all the time but
it is explicitly documented here by putting the value in SI.Base.

The idea of the patch comes from Philip Reames.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D97554
2021-03-01 17:23:02 +07:00
Simon Pilgrim 64c41301ce [DAG] visitVECTOR_SHUFFLE - move shuffle canonicalization/merges all under the same legality test. NFCI.
Minor cleanup to move related combines closer together to make it more coherent, without changing the ordering.
2021-03-01 09:42:00 +00:00
Serguei Katkov 06c5119c76 [Statepoint lowering] Require spill of deopt value in case its type is not legal
If the type of the deopt operand has an illegal type and we want to use
register for it then it needs to be legalized.
This is not supported currently by legalizer and it is not actually clear how to
legalize this type of values.

Instead we just spill such values and use spill slot location in statepoint.

Originally tests were created by Philip Reames.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D97541
2021-03-01 10:23:53 +07:00
Craig Topper 5de09ef02e [DAGCombiner][X86] Don't peek through ANDs on the shift amount in matchRotateSub when called from MatchFunnelPosNeg.
Peeking through AND is only valid if the input to both shifts is
the same. If the inputs are different, then the original pattern
ORs the two values when the masked shift amount is 0. This is ok
if the values are the same since the OR would be a NOP which is
why its ok for rotate.

Fixes PR49365 and reverts PR34641

Differential Revision: https://reviews.llvm.org/D97637
2021-02-28 12:58:00 -08:00
Craig Topper ca5247bb17 [DAGCombiner] Don't skip no overflow check on UMULO if the first computeKnownBits call doesn't return any 0 bits.
Even if the first computeKnownBits call doesn't have any zero
bits it is possible the other operand has bitwidth-1 leading zero.
In that case overflow is still impossible. So always call computeKnownBits
for both operands.
2021-02-28 08:26:22 -08:00
Heejin Ahn aa097ef8d4 [WebAssembly] Fix reverse mapping in WasmEHFuncInfo
D97247 added the reverse mapping from unwind destination to their
source, but it had a critical bug; sources can be multiple, because
multiple BBs can have a single BB as their unwind destination.

This changes `WasmEHFuncInfo::getUnwindSrc` to `getUnwindSrcs` and makes
it return a vector rather than a single BB. It does not return the const
reference to the existing vector but creates a new vector because
`WasmEHFuncInfo` stores not `BasicBlock*` or `MachineBasicBlock*` but
`PointerUnion` of them. Also I hoped to unify those methods for
`BasicBlock` and `MachineBasicBlock` into one using templates to reduce
duplication, but failed because various usages require `BasicBlock*` to
be `const` but it's hard to make it `const` for `MachineBasicBlock`
usages.

Fixes https://github.com/emscripten-core/emscripten/issues/13514.
(More precisely, fixes
https://github.com/emscripten-core/emscripten/issues/13514#issuecomment-784708744)

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D97583
2021-02-26 17:12:10 -08:00
Craig Topper eea53b142d [DAGCombiner] Optimize SMULO/UMULO if we can prove that overflow is impossible.
Using ComputeNumSignBits or computeKnownBits we might be able
to determine that overflow is impossible.

This especially helps after type legalization if the type was
promoted from a type with half the bits or more. Type legalization
conservatively creates a promoted smulo/umulo and an overflow
check for the promoted bits. The overflow from the promoted
smulo/umulo is ORed with the result of the promoted bits
overflow check. Proving that the promoted smulo/umulo can never
overflow will leave us with just the promoted bits overflow check.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97160
2021-02-26 14:50:03 -08:00
Simon Pilgrim aefe8f2f6c [DAG] Fold vXi1 multiplies -> and
This allows us to remove X86 custom lowering of vXi1 MUL, which helps simplify a load of mask math.

Mentioned in D97478 post review.
2021-02-26 11:46:12 +00:00
Simon Pilgrim 73adc26ac0 [DAG] expandAddSubSat - break if-else chain. NFCI.
Fix styleguide issue - each if() block always returns so we don't need to make them a if-else chain.
2021-02-26 11:02:08 +00:00
Simon Pilgrim 9490b9f14b [DAG] Move simplification of SADDSAT/SSUBSAT/UADDSAT/USUBSAT of vXi1 to getNode()
As discussed on D97276 we should be able to always do this in node creation, we don't need a combine.
2021-02-25 17:49:26 +00:00
David Sherwood 87dbcd8865 [CodeGen] Canonicalise adds/subs of i1 vectors using XOR
When calling SelectionDAG::getNode() to create an ADD or SUB
of two vectors with i1 element types we can canonicalise this
to use XOR instead, where 1+1 is treated as wrapping around
to 0 and 0-1 wraps to 1.

I've added the following tests for SVE targets:

  CodeGen/AArch64/sve-pred-arith.ll

and modified some X86 tests to reflect the much simpler codegen
required.

Differential Revision: https://reviews.llvm.org/D97276
2021-02-25 10:31:26 +00:00
Craig Topper fe50be12c8 [LegalizeIntegerTypes] Further improve ExpandIntRes_SADDSUBO for targets where SADDO/SSUBO aren't supported.
Rather than converting 3 signbits to bools and comparing them,
we can do bitwise logic on the whole vector and convert the
resulting sign bit to a bool at the end.

This is still a different algorithm than what we do in LegalizeDAG
through expandSADDOSSUBO. That algorithm needs to know that the
RHS of SSUBO is > 0, but that's costly when the type is split.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97325
2021-02-24 10:05:38 -08:00
Simon Pilgrim 8082bfe7e5 [DAG] Add basic mul-with-overflow constant folding support
As noticed on D97160
2021-02-24 11:09:02 +00:00
Craig Topper cb6fc4b0a3 [LegalizeIntegerTypes] Use GetExpandedInteger instead of SplitInteger in ExpandIntRes_XMULO.
We know the input is going to be expanded as well, so we should
just ask for the already expanded operands. Otherwise we create
nodes that are just going to need to be legalized.
2021-02-23 23:53:45 -08:00
Heejin Ahn ea8c6375e3 [WebAssembly] Fix incorrect grouping and sorting of exceptions
This CL is not big but contains changes that span multiple analyses and
passes. This description is very long because it tries to explain basics
on what each pass/analysis does and why we need this change on top of
that. Please feel free to skip parts that are not necessary for your
understanding.

---

`WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next
unwind destination>. The value (unwind dest) here is where an exception
should end up when it is not caught by the key (EH pad). We record this
info in WasmEHPrepare to fix catch mismatches, because the CFG itself
does not have this info. A CFG only contains BBs and
predecessor-successor relationship between them, but in `WasmEHFuncInfo`
the unwind destination BB is not necessarily a successor or the key EH
pad BB. Their relationship can be intuitively explained by this C++ code
snippet:
```
try {
  try {
    foo();
  } catch (int) { // EH pad
    ...
  }
} catch (...) {   // unwind destination
}
```
So when `foo()` throws, it goes to `catch (int)` first. But if it is not
caught by it, it ends up in the next unwind destination `catch (...)`.
This unwind destination is what you see in `catchswitch`'s
`unwind label %bb` part.

---

`WebAssemblyExceptionInfo` groups exceptions so that they can be sorted
continuously together in CFGSort, as we do for loops. What this analysis
does is very simple: it creates a single `WebAssemblyException` per EH
pad, and all BBs that are dominated by that EH pad are included in this
exception. We also identify subexception relationship in this way: if
EHPad A domiantes EHPad B, EHPad B's exception is a subexception of
EHPad A's exception.

This simple rule turns out to be incorrect in some cases. In
`WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means
semantically EHPad B should not be included in EHPad A's exception,
because it does not make sense to rethrow/delegate to an inner scope.
This is what happened in CFGStackify as a result of this:
```
try
  try
  catch
    ...   <- %dest_bb is among here!
  end
delegate %dest_bb
```

So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to
make sure excptions' unwind destinations are not subexceptions of
their unwind sources in `WasmEHFuncInfo`.

But this alone does not prevent `dest_bb` in the example above from
being sorted within the inner `catch`'s exception, even if its exception
is not a subexception of that `catch`'s exception anymore, because of
how CFGSort works, which will be explained below.

---

CFGSort places BBs within the same `SortRegion` (loop or exception)
continuously together so they can be demarcated with `loop`-`end_loop`
or `catch`-`end_try` in CFGStackify.

`SortRegion` is a wrapper for one of `MachineLoop` or
`WebAssemblyException`. `SortRegionInfo` already does some complicated
things because there discrepancies between those two data structures.
`WebAssemblyException` is what we control, and it is defined as an EH
pad as its header and BBs dominated by the header as its BBs (with a
newly added exception of unwind destinations explained in the previous
paragraph). But `MachineLoop` is an LLVM data structure and uses the
standard loop detection algorithm. So by the algorithm, BBs that are 1.
dominated by the loop header and 2. have a path back to its header.
Because of the second condition, many BBs that are dominated by the loop
header are not included in the loop. So BBs that contain `return` or
branches to outside of the loop are not technically included in
`MachineLoop`, but they can be sorted together with the loop with no
problem.

Maybe to relax the condition, in CFGSort, when we are in a `SortRegion`
we allow sorting of not only BBs that belong to the current innermost
region but also BBs that are by the current region header.
(This was written this way from the first version written by Dan, when
only loops existed.) But now, we have cases in exceptions when EHPad B
is the unwind destination for EHPad A, even if EHPad B is dominated by
EHPad A it should not be included in EHPad A's exception, and should not
be sorted within EHPad A.

One way to make things work, at least correctly, is change `dominates`
condition to `contains` condition for `SortRegion` when sorting BBs, but
this will change compilation results for existing non-EH code and I
can't be sure it will not degrade performance or code size. I think it
will degrade performance because it will force many BBs dominated by a
loop, which don't have the path back to the header, to be placed after
the loop and it will likely to create more branches and blocks.

So this does a little hacky check when adding BBs to `Preferred` list:
(`Preferred` list is a ready list. CFGSort maintains ready list in two
priority queues: `Preferred` and `Ready`. I'm not very sure why, but it
was written that way from the beginning. BBs are first added to
`Preferred` list and then some of them are pushed to `Ready` list, so
here we only need to guard condition for `Preferred` list.)

When adding a BB to `Preferred` list, we check if that BB is an unwind
destination of another BB. To do this, this adds the reverse mapping,
`UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB
is an unwind destination, it checks if the current stack of regions
(`Entries`) contains its source BB by traversing the stack backwards. If
we find its unwind source in there, we add the BB to its `Deferred`
list, to make sure that unwind destination BB is added to `Preferred`
list only after that region with the unwind source BB is sorted and
popped from the stack.

---

This does not contain a new test that crashes because of this bug, but
this fix changes the result for one of existing test case. This test
case didn't crash because it fortunately didn't contain `delegate` to
the incorrectly placed unwind destination BB.

Fixes https://github.com/emscripten-core/emscripten/issues/13514.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D97247
2021-02-23 14:54:55 -08:00
Craig Topper eb165090bb [LegalizeIntegerTypes] Improve ExpandIntRes_SADDSUBO codegen on targets without SADDO/SSUBO.
This code creates 3 setccs that need to be expanded. It was
creating a sign bit test as setge X, 0 which is non-canonical.
Canonical would be setgt X, -1. This misses the special case in
IntegerExpandSetCCOperands for sign bit tests that assumes
canonical form. If we don't hit this special case we end up
with a multipart setcc instead of just checking the sign of
the high part.

To fix this I've reversed the polarity of all of the setccs to
setlt X, 0 which is canonical. The rest of the logic should
still work. This seems to produce better code on RISCV which
lacks a setgt instruction.

This probably still isn't the best code sequence we could use here.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97181
2021-02-23 09:40:32 -08:00
Heejin Ahn a08e609d2e [WebAssembly] Rename methods in WasmEHFuncInfo (NFC)
This renames variable and method names in `WasmEHFuncInfo` class to be
simpler and clearer. For example, unwind destinations are EH pads by
definition so it doesn't necessarily need to be included in every method
name. Also I am planning to add the reverse mapping in a later CL,
something like `UnwindDestToSrc`, so this renaming will make meanings
clearer.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D97173
2021-02-22 12:16:11 -08:00
Kazu Hirata ffba9e596d [CodeGen] Use range-based for loops (NFC) 2021-02-21 19:58:07 -08:00
Craig Topper 1a6c1ac686 [SelectionDAG][RISCV] Teach ComputeNumSignBits to handle SREM.
This also removes a pattern from RISCV that is no longer needed
since the sexti32 on the LHS of the srem in the pattern implies
the result is sign extended so the sign_extend_inreg should be
removed in DAG combine now.

Reviewed By: luismarques, RKSimon

Differential Revision: https://reviews.llvm.org/D97133
2021-02-21 11:13:36 -08:00
Simon Pilgrim 38ab47c813 [DAG] Match USUBSAT patterns through zext/trunc
This patch handles usubsat patterns hidden through zext/trunc and uses the getTruncatedUSUBSAT helper to determine if the USUBSAT can be correctly performed in the truncated form:

zext(x) >= y ? x - trunc(y) : 0 --> usubsat(x,trunc(umin(y,SatLimit)))
zext(x) >  y ? x - trunc(y) : 0 --> usubsat(x,trunc(umin(y,SatLimit)))

Based on original examples:

void foo(unsigned short *p, int max, int n) {
    int i;
    unsigned m;
    for (i = 0; i < n; i++) {
        m = *--p;
        *p = (unsigned short)(m >= max ? m-max : 0);
    }
}

Differential Revision: https://reviews.llvm.org/D25987
2021-02-21 15:26:54 +00:00
Kazu Hirata 0b417ba20f [CodeGen] Use range-based for loops (NFC) 2021-02-20 21:46:02 -08:00
Simon Pilgrim 761bbed264 [DAG] foldSubToUSubSat - fold sub(a,trunc(umin(zext(a),b))) -> usubsat(a,trunc(umin(b,SatLimit)))
This moves the last custom x86 USUBSAT fold to generic DAGCombine.

Completes PR40111

Differential Revision: https://reviews.llvm.org/D96703
2021-02-20 12:02:07 +00:00
Simon Pilgrim 5d3930bb8f [DAG] visitTRUNCATE - attempt to truncate USUBSAT
Fold trunc(usubsat(zext(x),y)) -> usubsat(x,trunc(umin(y,satlimit)))
2021-02-19 14:26:05 +00:00
Simon Pilgrim 53e83afcaf [DAG] getTruncatedUSUBSAT - always truncate operands. NFCI.
As noticed on D96703, we're always truncating the operands so should use getNode(ISD::TRUNCATE) instead of getZExtOrTrunc.
2021-02-18 21:28:55 +00:00
Guozhi Wei 66f2d09ebf [DAGCombiner] Transform (zext (select c, load1, load2)) -> (select c, zextload1, zextload2)
If extload is legal, following transform
    (zext (select c, load1, load2)) -> (select c, zextload1, zextload2)
can save one ext instruction.

Differential Revision: https://reviews.llvm.org/D95086
2021-02-18 13:15:20 -08:00
Bradley Smith 8bad8a43c3 [AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD
Adjust generateFMAsInMachineCombiner to return false if SVE is present
in order to combine fmul+fadd into fma. Also add new pseudo instructions
so as to select the most appropriate of FMLA/FMAD depending on register
allocation.

Depends on D96599

Differential Revision: https://reviews.llvm.org/D96424
2021-02-18 16:55:16 +00:00
Craig Topper 61d4d9a5d3 [TableGen][SelectionDAG] Improve efficiency of encoding negative immediates for isel's CheckInteger opcode.
CheckInteger uses an int64_t encoded using a variable width encoding
that is optimized for encoding a number with a lot of leading zeros.
Negative numbers have no leading zeros so use the largest encoding
requiring 9 bytes.

I believe its most like we want to check for positive and negative
numbers near 0. -1 is quite common due to its use in the 'not'
idiom.

To optimize for this, we can borrow an idea from the bitcode format
and move the sign bit to bit 0 with the magnitude stored in the
upper bits. This will drastically increase the number of leading
zeros for small magnitudes. Then we can run this value through
VBR encoding.

This gives a small reduction in the table size on all in tree
targets except VE where size increased by about 300 bytes due
to intrinsic ids now requiring 3 bytes instead of 2. Since the
intrinsic enum space is shared by all targets this an unfortunate
consquence of where VE is currently located in the range.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D96317
2021-02-18 08:53:17 -08:00
Simon Pilgrim 87fbc06d06 [DAG] Pull out getTruncatedUSUBSAT helper from foldSubToUSubSat. NFCI.
This will simplify an incoming generic implementation of D25987.

I'll rebase D96703 shortly to support this.
2021-02-17 12:17:08 +00:00
Simon Pilgrim 05c64ea672 [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) (REAPPLIED)
Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))

Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles.

Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle.

This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise.

Fixes issue raised by @saugustine in rG5aa8f4c0843a where we were failing to replace null shuffle operands from MergeInnerShuffle to UNDEFs.

Differential Revision: https://reviews.llvm.org/D96345
2021-02-17 11:42:43 +00:00
Sterling Augustine 5aa8f4c084 Revert "[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))"
This reverts commit 5dfba562dd.

That commit causes an assertion failure with the following repro:

typedef long b __attribute__((__vector_size__(16)));
b *d;
b e;
b __attribute__((__always_inline__)) c(b h, b i) {
  return (__attribute__((__vector_size__(8 * sizeof(short)))) short)h + i;
}
j() {
  b k, l, m, n, o[6], p, q;
  m = d[5];
  b r = m;
  b s = f(r, 8);
  q = s;
  l = d[1];
  p = l;
  t(q);
  n = c(m, l);
  o[1] = c(s, f(p, 8));
  k = __builtin_shufflevector(n, o[1], 0, 2);
  e = __builtin_ia32_psrlwi128(k, j);
}

./bin/clang -cc1 -triple x86_64-grtev4-linux-gnu -emit-obj -O1 -std=c99 test.c
2021-02-16 12:48:15 -08:00
Simon Pilgrim df45c18135 [DAG] PromoteIntRes_ADDSUBSHLSAT - promote ISD::UADDSAT as clamped add
Similar to D96622, we're better off just promoting uaddsat(x,y) -> umin(add(x,y),c) instead of trying to perform a shifted uaddsat.

I initially tried to just use shifted promotion in cases where we didn't have a legal/custom umin - but we don't appear to have any targets that have uaddsat but not umin, so imo we're better off always using the umin and avoid an untested shifted uaddsat code path.

Differential Revision: https://reviews.llvm.org/D96767
2021-02-16 17:37:44 +00:00
Craig Topper 064ada4ec6 [SelectionDAG][AArch64] Restrict matchUnaryPredicate to only handle SPLAT_VECTOR for scalable vectors.
fde2466171 added support for
scalable vectors to matchUnaryPredicate by handling SPLAT_VECTOR in
addition to BUILD_VECTOR. This was used to enabled UDIV/SDIV/UREM/SREM
by constant expansion in BuildUDIV/BuildSDIV in TargetLowering.cpp

The caller there expects to call getBuildVector from the match factors.
This leads to a crash right now if there is a SPLAT_VECTOR of
fixed vectors since the number of vectors won't match the number
of elements.

To fix this, this patch updates the callers to check the opcode
instead of whether the type is fixed or scalable. This assumes
that only 3 opcodes are handled by matchUnaryPredicate so
I've added an assertion to the final else to check that opcode.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D96174
2021-02-16 09:22:46 -08:00
Simon Pilgrim 5dfba562dd [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))
Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))

Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles.

Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle.

This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise.

Differential Revision: https://reviews.llvm.org/D96345
2021-02-16 15:46:34 +00:00
Simon Pilgrim 420420de57 [DAG] Avoid APInt copies by directly using the APInt reference from getAPIntValue. NFCI. 2021-02-16 13:50:34 +00:00
Simon Pilgrim dd879f7dc9 [DAG] Use APInt::extractBits instead of lshr().trunc(). NFCI.
Avoids so many APInt instances by directly using the APInt reference from getAPIntValue.
2021-02-16 13:50:33 +00:00
Craig Topper eb75f250fe [RISCV][LegalizeTypes] Try to expand BITREVERSE before promoting if the promoted BITREVERSE would expand anyway.
If we're going to end up expanding anyway, we should do it early
so we don't create extra operations to handle the bytes added by
promotion.

Simlilar was done for BSWAP previously.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D96681
2021-02-15 12:33:16 -08:00
Simon Pilgrim e47f21da61 [DAG] visitVSELECT - move OpLHS == LHS into inner if() in USUBSAT matching. NFCI.
This will be necessary for the update of D25987 where we'll need to match OpLHS against other ops.
2021-02-15 18:27:00 +00:00
Caroline Concatto 2d728bbff5 [CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse
This patch adds  a new intrinsic experimental.vector.reduce that takes a single
vector and returns a vector of matching type but with the original lane order
 reversed. For example:

```
vector.reverse(<A,B,C,D>) ==> <D,C,B,A>
```

The new intrinsic supports fixed and scalable vectors types.
The fixed-width vector relies on shufflevector to maintain existing behaviour.
Scalable vector uses the new ISD node - VECTOR_REVERSE.

This new intrinsic is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker (@paulwalker-arm).

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Differential Revision: https://reviews.llvm.org/D94883
2021-02-15 13:39:43 +00:00
Arlo Siemsen 080866470d Add ehcont section support
In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling.

This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker.

This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag.

The change includes a test that when the module flag is enabled the section is correctly generated.

The set of exception continuation information includes returns from exceptional control flow (catchret in llvm).

In order to collect catchret we:
1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation,
2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and
3) Combines these targets with the other EHCont targets that were already being collected.

Change originally authored by Daniel Frampton <dframpto@microsoft.com>

For more details, see MSVC documentation for `/guard:ehcont`
  https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D94835
2021-02-15 14:27:12 +08:00
Simon Pilgrim 6f5a805bbb [DAG] Fold i1/vXi1 saddsat/uaddsat(x,y) -> or(x,y)
Alive2: https://alive2.llvm.org/ce/z/FzcrpH
2021-02-13 15:02:01 +00:00
Simon Pilgrim 0df15e5eff [DAG] Fold i1/vXi1 ssubsat/usubsat(x,y) -> and(x,~y)
Alive2: https://alive2.llvm.org/ce/z/4nkNGh
2021-02-13 13:21:15 +00:00
Simon Pilgrim 60ba5397df [DAG] PromoteIntRes_ADDSUBSHLSAT - use promoted ISD::USUBSAT directly
As discussed on D96413, as long as the promoted bits of the args are zero we can use the basic ISD::USUBSAT pattern directly, without the shifting like we do for other ops.

I think something similar should be possible for ISD::UADDSAT as well, which I'll look at later.

Also, create a ISD::USUBSAT node directly - this will be expanded back by the legalizer later on if necessary.

Differential Revision: https://reviews.llvm.org/D96622
2021-02-13 12:35:10 +00:00
Simon Pilgrim 7ad0c573bd [DAG] Fix shift amount limit in SimplifyDemandedBits trunc(shift(x,c)) to truncated bitwidth
We lost this in D56387/rG69bc0990a9181e6eb86228276d2f59435a7fae67 - where I got the src/dst bitwidths mixed up and assumed getValidShiftAmountConstant would catch it.

Patch by @craig.topper - confirmed by @Carrot that it fixes PR49162
2021-02-13 12:00:08 +00:00
Simon Pilgrim 4841a225b7 [DAG] Move basic USUBSAT pattern matches from X86 to DAGCombine
Begin transitioning the X86 vector code to recognise sub(umax(a,b) ,b) or sub(a,umin(a,b)) USUBSAT patterns to make it more generic and available to all targets.

This initial patch just moves the basic umin/umax patterns to DAG, removing some vector-only checks on the way - these are some of the patterns that the legalizer will try to expand back to so we can be reasonably relaxed about matching these pre-legalization.

We can handle the trunc(sub(..))) variants as well, which helps with patterns where we were promoting to a wider type to detect overflow/saturation.

The remaining x86 code requires some cleanup first - some of it isn't actually tested etc. I also need to resurrect D25987.

Differential Revision: https://reviews.llvm.org/D96413
2021-02-12 18:22:57 +00:00
Akira Hatanaka ed4718eccb [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-12 09:51:57 -08:00
Simon Pilgrim 2465541dc0 [DAG] DAGTypeLegalizer::PromoteIntRes_ADDSUBSHLSAT - break if-else chain. NFCI.
Style fixup - the if() block always returns so we can pull out the contents of the else() block.
2021-02-12 10:33:12 +00:00
Craig Topper 5744502a13 [TargetLowering][RISCV][AArch64][PowerPC] Enable BuildUDIV/BuildSDIV on illegal types before type legalization if we can find a larger legal type that supports MUL.
If we wait until the type is legalized, we'll lose information
about the orginal type and need to use larger magic constants.
This gets especially bad on RISCV64 where i64 is the only legal
type.

I've limited this to simple scalar types so it only works for
i8/i16/i32 which are most likely to occur. For more odd types
we might want to do a small promotion to a type where MULH is legal
instead.

Unfortunately, this does prevent some urem/srem+seteq matching since
that still require legal types.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D96210
2021-02-11 09:43:13 -08:00
Simon Pilgrim 5beebf9c58 [DAG] foldLogicOfSetCCs - Generalize and/or (setcc X, CMax, ne), (setcc X, CMin, ne/eq) fold. NFCI.
Prep work to add support for non-uniform vectors - replace APInt values with using the SDValue ops directly.
2021-02-11 17:09:01 +00:00
Thomas Preud'homme bad0290ce3 Improve STRICT_FSETCC codegen in absence of no NaN
As for SETCC, use a less expensive condition code when generating
STRICT_FSETCC if the node is known not to have Nan.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D91972
2021-02-11 14:19:43 +00:00
Joe Ellis 67464dfe36 [DebugInfo] Only perform TypeSize -> unsigned cast when necessary
This commit moves a line in SelectionDAGBuilder::handleDebugValue to
avoid implicitly casting a TypeSize object to an unsigned earlier than
necessary. It was possible that we bail out of the loop before the value
is ever used, which means we could create a superfluous TypeSize
warning.

Reviewed By: DavidTruby

Differential Revision: https://reviews.llvm.org/D96423
2021-02-11 13:54:09 +00:00
Hongtao Yu 1cb47a063e [CSSPGO] Unblock optimizations with pseudo probe instrumentation.
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:

1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95982
2021-02-10 12:43:17 -08:00
Luís Marques acac29ca42 [DAGCombiner] Don't fold FCOPYSIGN vector sign operand casts
Avoid doing the following combine for vector types:

```
copysign(x, fp_extend(y)) -> copysign(x, y)
copysign(x, fp_round(y)) -> copysign(x, y)
```

That combine seemed to impede the selection of vector instruction and cause
a mess in some circumstances.

Differential Revision: https://reviews.llvm.org/D96037
2021-02-10 14:25:24 +00:00
Kazu Hirata 7e75f6fc1d [SelectionDAG] Use range-based for loops (NFC) 2021-02-09 22:14:30 -08:00
Nico Weber de1966e542 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 4a64d8fe39.
Makes clang crash when buildling trivial iOS programs, see comment
after https://reviews.llvm.org/D92808#2551401
2021-02-09 11:06:32 -05:00
Nemanja Ivanovic a5222aa085 [DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets
As of commit 284f2bffc9, the DAG Combiner gets rid of the masking of the
input to this node if the mask only keeps the bottom 16 bits. This is because
the underlying library function does not use the high order bits. However, on
PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits
from the register. Therefore, the library implementation of __gnu_h2f_ieee will
return an incorrect result if the bits aren't cleared.

This combine is desired for ARM (and possibly other targets) so this patch adds
a query to Target Lowering to check if this zeroing needs to be kept.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092

Differential revision: https://reviews.llvm.org/D96283
2021-02-09 06:33:48 -06:00