Commit Graph

827 Commits

Author SHA1 Message Date
John Brawn 0bb9a27c98 [FPEnv][AArch64] Add lowering and instruction selection for strict conversions
Strict fp-to-int and int-to-fp conversions can be handled in the same way that
the non-strict versions are (by using the appropriate instruction or converting
to a function call when we have no instruction).

Differential Revision: https://reviews.llvm.org/D73625
2020-01-30 13:50:06 +00:00
John Brawn 258d8dd76a [FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUND
This gets selected to the appropriate fcvt instruction. Handling from there on
isn't fully correct yet, as we need to model fcvt reading and writing to fpsr
and fpcr.

Differential Revision: https://reviews.llvm.org/D73201
2020-01-30 12:51:25 +00:00
John Brawn 2224407ef5 Add lowering of STRICT_FSETCC and STRICT_FSETCCS
These become STRICT_FCMP and STRICT_FCMPE, which then get selected to the
corresponding FCMP and FCMPE instructions, though the handling from there on
isn't fully correct as we don't model reads and writes to FPCR and FPSR.

Differential Revision: https://reviews.llvm.org/D73368
2020-01-30 10:40:55 +00:00
Kerry McLaughlin aa0f37e14a [AArch64][SVE] Add first-faulting load intrinsic
Summary:
Implements the llvm.aarch64.sve.ldff1 intrinsic and DAG
combine rules for first-faulting loads with sign & zero extends

Reviewers: sdesmalen, efriedma, andwar, dancgr, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73025
2020-01-23 11:57:16 +00:00
Sander de Smalen 4cf16efe49 [AArch64][SVE] Add patterns for unpredicated load/store to frame-indices.
This patch also fixes up a number of cases in DAGCombine and
SelectionDAGBuilder where the size of a scalable vector is used in a
fixed-width context (thus triggering an assertion failure).

Reviewers: efriedma, c-rhodes, rovka, cameron.mcinally

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71215
2020-01-22 14:32:27 +00:00
Kerry McLaughlin cdcc4f2a44 [AArch64][SVE] Add intrinsic for non-faulting loads
Summary:
This patch adds the llvm.aarch64.sve.ldnf1 intrinsic, plus
DAG combine rules for non-faulting loads and sign/zero extends

Reviewers: sdesmalen, efriedma, andwar, dancgr, mgudim, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71698
2020-01-22 11:15:20 +00:00
Sander de Smalen 67d4c9924c Add support for (expressing) vscale.
In LLVM IR, vscale can be represented with an intrinsic. For some targets,
this is equivalent to the constexpr:

  getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1

This can be used to propagate the value in CodeGenPrepare.

In ISel we add a node that can be legalized to one or more
instructions to materialize the runtime vector length.

This patch also adds SVE CodeGen support for VSCALE, which maps this
node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD]
instructions (scaled multiples of 2, 4, or 8 bytes, respectively).

Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner

Reviewed by: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68203
2020-01-22 10:09:27 +00:00
Florian Hahn 535ed62c5f [AArch64] Add custom store lowering for 256 bit non-temporal stores.
Currently we fail to lower non-termporal stores for 256+ bit vectors
to STNPQ, because type legalization will split them up to 128 bit stores
and because there are no single non-temporal stores, creating STPNQ
in the Load/Store optimizer would be quite tricky.

This patch adds custom lowering for 256 bit non-temporal vector stores
to improve the generated code.

Reviewers: dmgreen, samparker, t.p.northover, ab

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D72919
2020-01-21 14:53:40 -08:00
Andrzej Warzynski 7e717b3990 [AArch64][SVE] Extend int_aarch64_sve_ld1_gather_imm
The ACLE distinguishes between the following addressing modes for gather
loads:
  * "scalar base, vector offset", and
  * "vector base, scalar offset".
For the "vector base, scalar offset" case, the
`int_aarch64_sve_ld1_gather_imm` intrinsic was added in 79f2422d.
Currently, that intrinsic assumes that the scalar offset is passed as an
immediate.  As a result, it does not cater for cases where scalar offset
is stored in a register.

In this patch `int_aarch64_sve_ld1_gather_imm` is extended so that all
cases are covered:
* `int_aarch64_sve_ld1_gather_imm` is renamed as
  `int_aarch64_sve_ld1_gather_scalar_offset`
* new DAG combine rules are added for GLD1_IMM for scenarios where the
  offset is a non-immediate scalar or an out-of-range immediate
* sve-intrinsics-gather-loads-vector-base.ll is renamed as
  sve-intrinsics-gather-loads-vector-base-imm-offset.ll
* sve-intrinsics-gather-loads-vector-base-scalar-offset.ll is added to test
  file for non-immediate offsets

Similar changes are made for scatter store intrinsics.

Reviewed By: sdesmalen, efriedma

Differential Revision: https://reviews.llvm.org/D71773
2020-01-20 12:19:18 +00:00
Matt Arsenault 0d0fce42b0 GlobalISel: Preserve load/store metadata in IRTranslator
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.

Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
2020-01-16 13:49:43 -05:00
Benjamin Kramer 06cfcdcca7 [AArch64][SVE] Fold variable into assert to silence unused variable warnings in Release builds 2020-01-15 12:50:27 +01:00
Cullen Rhodes 93a4dede3a [AArch64][SVE] Add ptest intrinsics
Summary:
Implements the following intrinsics:

    * @llvm.aarch64.sve.ptest.any
    * @llvm.aarch64.sve.ptest.first
    * @llvm.aarch64.sve.ptest.last

Reviewers: sdesmalen, efriedma, dancgr, mgudim, cameron.mcinally, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72398
2020-01-15 11:15:01 +00:00
Danilo Carvalho Grael 2d7e757a83 [AArch64][SVE] Add patterns for some arith SVE instructions.
Summary: Add patterns for the following instructions:
- smax, smin, umax, umin

Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin

Subscribers: amehsan

Differential Revision: https://reviews.llvm.org/D71779
2020-01-13 11:39:42 -05:00
KAWASHIMA Takahiro 10c11e4e2d This option allows selecting the TLS size in the local exec TLS model,
which is the default TLS model for non-PIC objects. This allows large/
many thread local variables or a compact/fast code in an executable.

Specification is same as that of GCC. For example, the code model
option precedes the TLS size option.

TLS access models other than local-exec are not changed. It means
supoort of the large code model is only in the local exec TLS model.

Patch By KAWASHIMA Takahiro (kawashima-fj <t-kawashima@fujitsu.com>)
Reviewers: dmgreen, mstorsjo, t.p.northover, peter.smith, ostannard
Reviewd By: peter.smith
Committed by: peter.smith

Differential Revision: https://reviews.llvm.org/D71688
2020-01-13 10:16:53 +00:00
Jessica Paquette ceb801612a [AArch64] Don't generate libcalls for wide shifts on Darwin
Similar to cff90f07cb.

Darwin doesn't always use compiler-rt, and so we can't assume that these
functions are available (at least on arm64).
2020-01-10 15:58:51 -08:00
Matt Arsenault 255cc5a760 CodeGen: Use LLT instead of EVT in getRegisterByName
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
2020-01-09 17:37:52 -05:00
QingShan Zhang 2133d3c558 [DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG for vector type as 'expand' instead of 'legal'
For now, we didn't set the default operation action for SIGN_EXTEND_INREG for
vector type, which is 0 by default, that is legal. However, most target didn't
have native instructions to support this opcode. It should be set as expand by
default, as what we did for ANY_EXTEND_VECTOR_INREG.

Differential Revision: https://reviews.llvm.org/D70000
2020-01-03 03:26:41 +00:00
Jay Foad 13a7a4ccbf Remove unneeded extra variable realArgIdx. NFC. 2020-01-02 14:27:32 +00:00
Andrzej Warzynski 404da13e1e [AArch64][SVE] Gather loads: pass 32 bit unpacked offsets as nxv2i32
Summary:
Currently 32 bit unpacked offsets are passed as nxv2i64. However, as
pointed out in https://reviews.llvm.org/D71074, using nxv2i32 instead
would improve consistency with:
  * how other arguments are treated
  * how scatter stores are implemented
This patch makes sure that 32 bit unpacked offsets are passes as nxv2i32
instead of nxv2i64.

Reviewers: sdesmalen, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71724
2020-01-02 13:01:28 +00:00
Craig Topper 4ae3120ed8 [LegalizeVectorOps][AArch64] Stop asking for v4f16 fp_round and fp_extend to be promoted.
These operations are needed as building blocks for promoting so they
can't be promoted themselves.

This appeared to work because the fp_extend query type for operation
actions is the result type, not the input type so it never triggered
in the legalizer.

For fp_round, the vector op legalizer just ended up creating a
nop fp_extend that was elided by getNode, followed by a nop
fp_round that was also elided by getNode. This was followed by
a final fp_round from v4f32 back to vf416 which was CSEd to the
original node. Then legalize vector ops just believed that node
legalized to itself. LegalizeDAG took another crack at promoting
it, but didn't have a handler so just skipped it with a debug
message saying it wasn't promoted.

This patch just removes the operation actions to avoid this
non-sense. Found while trying to refactor LegalizeVectorOps to
handle multiple result nodes better.
2019-12-31 15:04:12 -08:00
Sanjay Patel 8cefc37be5 [DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast()) support
This moves the X86 specific transform from rL364407
into DAGCombiner to generically handle 'little to big' cases
(for example: extract_subvector(v2i64 bitcast(v16i8))). This
allows us to remove both the x86 implementation and the aarch64
bitcast(extract_subvector(bitcast())) combine.

Earlier patches that dealt with regressions initially exposed
by this patch:
rG5e5e99c041e4
rG0b38af89e2c0

Patch by: @RKSimon (Simon Pilgrim)

Differential Revision: https://reviews.llvm.org/D63815
2019-12-23 10:11:45 -05:00
Martin Storsjö 5a751e747d [AArch64] [Windows] Use COFF stubs for calls to extern_weak functions
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).

Improve the classifyGlobalFunctionReference method to set
MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in
AArch64TargetLowering::LowerCall to use the return value from
classifyGlobalFunctionReference for these cases.

Add code in both AArch64FastISel and GlobalISel/IRTranslator to
bail out for function calls to extern weak functions on windows,
to let SelectionDAG handle them.

This matches what was done for X86 in 6bf108d77a.

Differential Revision: https://reviews.llvm.org/D71721
2019-12-23 12:13:49 +02:00
Sanjay Patel 0b38af89e2 [AArch64] match splat of bitcasted extract subvector to DUPLANE
This is another potential regression exposed by D63815.

Here we peek through a bitcast to find an extract subvector and
scale the splat offset based on that:
splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC'

Differential Revision: https://reviews.llvm.org/D71672
2019-12-22 08:37:03 -05:00
Cullen Rhodes 974f00a436 [AArch64][SVE] Fold constant multiply of element count
Summary:
E.g.

  %0 = tail call i64 @llvm.aarch64.sve.cntw(i32 31)
  %mul = mul i64 %0, <const>

Should emit:

  cntw    x0, all, mul #<const>

For <const> in the range 1-16.

Patch by Kerry McLaughlin

Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71014
2019-12-20 11:58:00 +00:00
Cullen Rhodes 3f9005eb89 Recommit "[AArch64][SVE] Add permutation and selection intrinsics"
Recommit 23c28c4043 (reverted in
dcb48f50bd) with a fix for an assert
"Request for a fixed size on a scalable object" being triggered in
`LowerSVEIntrinsicEXT`. The fix is to call `getKnownMinSize` on the
TypeSize object.
2019-12-20 10:45:17 +00:00
Cullen Rhodes dcb48f50bd Revert "[AArch64][SVE] Add permutation and selection intrinsics"
This reverts commit 23c28c4043.

It caused build failures in the following expensive checks builders:

    http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1295
    http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/700

Reverting for now whilst I figure what the issue is.
2019-12-19 14:26:14 +00:00
Cullen Rhodes 23c28c4043 [AArch64][SVE] Add permutation and selection intrinsics
Summary:
Adds the following intrinsics:

    * @llvm.aarch64.sve.clasta
    * @llvm.aarch64.sve.clasta_n
    * @llvm.aarch64.sve.clastb
    * @llvm.aarch64.sve.clastb_n
    * @llvm.aarch64.sve.compact
    * @llvm.aarch64.sve.ext
    * @llvm.aarch64.sve.lasta
    * @llvm.aarch64.sve.lastb
    * @llvm.aarch64.sve.rev
    * @llvm.aarch64.sve.splice
    * @llvm.aarch64.sve.tbl
    * @llvm.aarch64.sve.trn1
    * @llvm.aarch64.sve.trn2
    * @llvm.aarch64.sve.uzp1
    * @llvm.aarch64.sve.uzp2
    * @llvm.aarch64.sve.zip1
    * @llvm.aarch64.sve.zip2

Reviewers: sdesmalen, efriedma, dancgr, mgudim, huntergr, rengolin

Reviewed By: sdesmalen, efriedma

Subscribers: kmclaughlin, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71401
2019-12-19 13:18:40 +00:00
Cullen Rhodes 49199465a3 [AArch64][SVE] Implement ptrue intrinsic
Reviewers: sdesmalen, eli.friedman, dancgr, mgudim, cameron.mcinally,
huntergr, efriedma

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71457
2019-12-19 11:02:05 +00:00
Victor Campos 364b8f5fbe [AArch64] Improve codegen of volatile load/store of i128
Summary:
Instead of generating two i64 instructions for each load or store of a
volatile i128 value (two LDRs or STRs), now emit a single LDP or STP.

Reviewers: labrinea, t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69559
2019-12-18 10:03:12 +00:00
Andrzej Warzynski 7e20c3a71d [Aarch64][SVE] Add intrinsics for scatter stores
Summary:
This patch adds the following SVE intrinsics for scatter stores:
* 64-bit offsets:
  * @llvm.aarch64.sve.st1.scatter (unscaled)
  * @llvm.aarch64.sve.st1.scatter.index (scaled)
* 32-bit unscaled offsets:
  * @llvm.aarch64.sve.st1.scatter.uxtw (zero-extended offset)
  * @llvm.aarch64.sve.st1.scatter.sxtw (sign-extended-offset)
* 32-bit scaled offsets:
  * @llvm.aarch64.sve.st1.scatter.uxtw.index (zero-extended offset)
  * @llvm.aarch64.sve.st1.scatter.sxtw.index (sign-extended offset)
* vector base + immediate:
  * @llvm.aarch64.sve.st1.scatter.imm

Reviewers: rengolin, efriedma, sdesmalen

Reviewed By: efriedma, sdesmalen

Subscribers: kmclaughlin, eli.friedman, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71074
2019-12-16 11:52:53 +00:00
Alex Richardson be15dfa88f [NFC] Use EVT instead of bool for getSetCCInverse()
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).

In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.

Committing this change will significantly reduce our merge conflicts
for each upstream merge.

Reviewers: spatel, bogner

Reviewed By: bogner

Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70917
2019-12-13 12:22:03 +00:00
Kerry McLaughlin 4194ca8e5a Recommit "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"
Updated pred_load patterns added to AArch64SVEInstrInfo.td by this patch
to use reg + imm non-temporal loads to fix previous test failures.

Original commit message:

Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.
2019-12-13 10:08:20 +00:00
Cullen Rhodes bbd16b6876 [AArch64][SVE] Remove nxv1f32 and nxv1f64 as legal types
Summary: Also cleans up ZPR register class definition.

Reviewers: sdesmalen, cameron.mcinally, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71351
2019-12-12 09:49:22 +00:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
Kerry McLaughlin c0a3ab3655 Revert "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"
This reverts commit 3f5bf35f86 as it was
causing build failures in llvm-clang-x86_64-expensive-checks:

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/392
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1045
2019-12-11 13:58:39 +00:00
Andrzej Warzynski 65651f197a [AArch64][SVE] Add DAG combine rules for gather loads and sext/zext
Summary:
These changes allow us to support sign-extending gather loads with the
exisiting intrinsics (i.e. @llvm.aarch64.sve.ld1.gather.*).

Reviewers: sdesmalen, huntergr, kmclaughlin, efriedma, rengolin, rovka, dancgr, mgudim

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential revision: https://reviews.llvm.org/D70812
2019-12-11 12:56:18 +00:00
Kerry McLaughlin 3f5bf35f86 [AArch64][SVE] Implement intrinsics for non-temporal loads & stores
Summary:
Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.

Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71000
2019-12-11 11:13:51 +00:00
Cullen Rhodes 1b9a608c84 [AArch64][SVE] Add wide compare immediate patterns
Summary:
Recognize wide compares where the wide operand is a splat of a scalar
value in the appropriate range and convert to the immediate variant of
the instruction.

Patch by Graham Hunter

Reviewers: sdesmalen, efriedma, dancgr, rovka, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71009
2019-12-10 10:41:22 +00:00
Eli Friedman f1ddef34f1 [AArch64][SVE] Implement SPLAT_VECTOR for i1 vectors.
The generated sequence with whilelo is unintuitive, but it's the best
I could come up with given the limited number of SVE instructions that
interact with scalar registers. The other sequence I was considering
was something like dup+cmpne, but an extra scalar instruction seems
better than an extra vector instruction.

Differential Revision: https://reviews.llvm.org/D71160
2019-12-09 15:09:33 -08:00
Danilo Carvalho Grael b29916cec3 [AArch64][SVE] Integer reduction instructions pattern/intrinsics.
Added pattern matching/intrinsics for the following SVE instructions:

-- saddv, uaddv
-- smaxv, sminv, umaxv, uminv
-- orv, eorv, andv
2019-12-05 09:59:19 -05:00
Sander de Smalen 79f2422d6a [Aarch64][SVE] Add intrinsics for gather loads (vector + imm)
This patch adds intrinsics for SVE gather loads from memory addresses generated by a vector base plus immediate index:
  * @llvm.aarch64.sve.ld1.gather.imm

This intrinsics maps 1-1 to the corresponding SVE instruction (example for half-words):
  * ld1h { z0.d }, p0/z, [z0.d, #16]

Committed on behalf of Andrzej Warzynski (andwar)

Reviewers: sdesmalen, huntergr, kmclaughlin, eli.friedman, rengolin, rovka, dancgr, mgudim, efriedma

Reviewed By: sdesmalen

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70806
2019-12-03 15:19:16 +00:00
Sander de Smalen 8bf31e28d7 [Aarch64][SVE] Add intrinsics for gather loads with 32-bits offsets
This patch adds intrinsics for SVE gather loads for which the offsets are 32-bits wide and are:
* unscaled
  * @llvm.aarch64.sve.ld1.gather.sxtw
  * @llvm.aarch64.sve.ld1.gather.uxtw
* scaled (offsets become indices)
  * @llvm.arch64.sve.ld1.gather.sxtw.index
  * @llvm.arch64.sve.ld1.gather.uxtw.index
The offsets are either zero (uxtw) or sign (sxtw) extended to 64 bits.

These intrinsics map 1-1 to the corresponding SVE instructions (examples for half-words):
* unscaled
  * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw]
  * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw]
* scaled
  * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw #1]
  * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw #1]

Committed on behalf of Andrzej Warzynski (andwar)

Reviewers: sdesmalen, kmclaughlin, eli.friedman, rengolin, rovka, huntergr, dancgr, mgudim, efriedma

Reviewed By: sdesmalen

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70782
2019-12-03 14:48:29 +00:00
Sander de Smalen 6e51ceba53 [AArch64][SVE] Add intrinsics for gather loads with 64-bit offsets
This patch adds the following intrinsics for gather loads with 64-bit offsets:
      * @llvm.aarch64.sve.ld1.gather (unscaled offset)
      * @llvm.aarch64.sve.ld1.gather.index (scaled offset)

These intrinsics map 1-1 to the following AArch64 instructions respectively (examples for half-words):
      * ld1h { z0.d }, p0/z, [x0, z0.d]
      * ld1h { z0.d }, p0/z, [x0, z0.d, lsl #1]

Committing on behalf of Andrzej Warzynski (andwar)

Reviewers: sdesmalen, huntergr, rovka, mgudim, dancgr, rengolin, efriedma

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70542
2019-12-03 12:55:03 +00:00
Kerry McLaughlin 7483eb656f [AArch64][SVE] Implement shift intrinsics
Summary:
Adds the following intrinsics:
- asr & asrd
- insr
- lsl & lsr

This patch also adds a new AArch64ISD node (INSR) to represent the int_aarch64_sve_insr intrinsic.

Reviewers: huntergr, sdesmalen, dancgr, mgudim, rengolin, efriedma

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70437
2019-12-03 11:47:12 +00:00
Matt Arsenault b696b9dba7 DAG: Add function context to isFMAFasterThanFMulAndFAdd
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.

AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
2019-11-19 19:25:26 +05:30
Graham Hunter 3f08ad611a [SVE][CodeGen] Scalable vector MVT size queries
* Implements scalable size queries for MVTs, split out from D53137.

* Contains a fix for FindMemType to avoid using scalable vector type
  to contain non-scalable types.

* Explicit casts for several places where implicit integer sign
  changes or promotion from 32 to 64 bits caused problems.

* CodeGenDAGPatterns will treat scalable and non-scalable vector types
  as different.

Reviewers: greened, cameron.mcinally, sdesmalen, rovka

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D66871
2019-11-18 12:30:59 +00:00
Sander de Smalen 84a0c8e3ae [AArch64][SVE] Spilling/filling of SVE callee-saves.
Implement the spills/fills of callee-saved SVE registers using STR and LDR
instructions.

Also adds the `aarch64_sve_vector_pcs` attribute to specify the
callee-saved registers to be used for functions that return SVE vectors or
take SVE vectors as arguments. The callee-saved registers are vector
registers z8-z23 and predicate registers p4-p15.

The overal frame-layout with SVE will be as follows:

   +-------------+
   | stack args  |
   +-------------+
   | Callee Saves|
   |   X29, X30  |
   |-------------| <- FP
   | SVE Callee  | < //////////////
   | saved regs  | < //////////////
   |    z23      | < //////////////
   |     :       | < // SCALABLE //
   |    z8       | < //////////////
   |    p15      | < /// STACK ////
   |     :       | < //////////////
   |    p4       | < //// AREA ////
   +-------------+ < //////////////
   |     :       | < //////////////
   |  SVE locals | < //////////////
   |     :       | < //////////////
   +-------------+
   |/////////////| alignment gap.
   |     :       |
   | Stack objs  |
   |     :       |
   +-------------+ <- SP after call and frame-setup

Reviewers: cameron.mcinally, efriedma, greened, thegameg, ostannard, rengolin

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D68996
2019-11-11 09:03:19 +00:00
Eli Friedman 5df3a87224 [AArch64][X86] Don't assume __powidf2 is available on Windows.
We had some code for this for 32-bit ARM, but this doesn't really need
to be in target-specific code; generalize it.

(I think this started showing up recently because we added an
optimization that converts pow to powi.)

Differential Revision: https://reviews.llvm.org/D69013
2019-11-08 12:43:21 -08:00
David Green 2179867ddc [AArch64] Select saturating Neon instructions
This adds some extra patterns to select AArch64 Neon SQADD, UQADD, SQSUB
and UQSUB from the existing target independent sadd_sat, uadd_sat,
ssub_sat and usub_sat nodes.

It does not attempt to replace the existing int_aarch64_neon_uqadd
intrinsic nodes as they are apparently used for both scalar and vector,
and need to be legal on scalar types for some of the patterns to work.
The int_aarch64_neon_uqadd on scalar would move the two integers into
floating point registers, perform a Neon uqadd and move the value back.
I don't believe this is good idea for uadd_sat to do the same as the
scalar alternative is simpler (an adds with a csinv). For signed it may
be smaller, but I'm not sure about it being better.

So this just adds some extra patterns for the existing vector
instructions, matching on the _sat nodes.

Differential Revision: https://reviews.llvm.org/D69374
2019-10-31 17:28:36 +00:00
Ehsan Amiri ed7bcb2cb1 [AArch64][SVE] Add patterns for some integer vector instructions
Add pattern matching for SVE vector instructions:

-- add, sub, and, or, xor instructions
-- sqadd, uqadd, sqsub, uqsub target-independent intrinsics
-- bic intrinsics
-- predicated add, sub, subr intrinsics

Patch Review: https://reviews.llvm.org/D69128
Patch authored by: dancgr (Danilo Carvalho Grael)
2019-10-30 21:52:19 -04:00
Andrew Paverd d157a9bc8b Add Windows Control Flow Guard checks (/guard:cf).
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.

Reviewers: thakis, rnk, theraven, pcc

Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65761
2019-10-28 15:19:39 +00:00
Kerry McLaughlin da720a38b9 [AArch64][SVE] Implement masked load intrinsics
Summary:
Adds support for codegen of masked loads, with non-extending,
zero-extending and sign-extending variants.

Reviewers: huntergr, rovka, greened, dmgreen

Reviewed By: dmgreen

Subscribers: dmgreen, samparker, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68877
2019-10-28 10:06:14 +00:00
Graham Hunter 84da2596f9 [AArch64][SVE] Add SPLAT_VECTOR ISD Node
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.

Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.

At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.

I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.

Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy

Reviewed By: jmolloy

Differential Revision: https://reviews.llvm.org/D47775

llvm-svn: 375222
2019-10-18 11:48:35 +00:00
Kerry McLaughlin 0c7cc383e5 [AArch64][SVE] Implement unpack intrinsics
Summary:
Implements the following intrinsics:
  - int_aarch64_sve_sunpkhi
  - int_aarch64_sve_sunpklo
  - int_aarch64_sve_uunpkhi
  - int_aarch64_sve_uunpklo

This patch also adds AArch64ISD nodes for UNPK instead of implementing
the intrinsics directly, as they are required for a future patch which
implements the sign/zero extension of legal vectors.

This patch includes tests for the Subdivide2Argument type added by D67549

Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka

Reviewed By: greened

Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D67550

llvm-svn: 375210
2019-10-18 09:40:16 +00:00
Graham Hunter b302561b76 [SVE][IR] Scalable Vector size queries and IR instruction support
* Adds a TypeSize struct to represent the known minimum size of a type
  along with a flag to indicate that the runtime size is a integer multiple
  of that size
* Converts existing size query functions from Type.h and DataLayout.h to
  return a TypeSize result
* Adds convenience methods (including a transparent conversion operator
  to uint64_t) so that most existing code 'just works' as if the return
  values were still scalars.
* Uses the new size queries along with ElementCount to ensure that all
  supported instructions used with scalable vectors can be constructed
  in IR.

Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen

Reviewed By: rovka, sdesmalen

Differential Revision: https://reviews.llvm.org/D53137

llvm-svn: 374042
2019-10-08 12:53:54 +00:00
Nikola Prica 02682498b8 [ISEL][ARM][AARCH64] Tracking simple parameter forwarding registers
Support for tracking registers that forward function parameters into the
following function frame. For now we only support cases when parameter
is forwarded through single register.

Reviewers: aprantl, vsk, t.p.northover

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D66953

llvm-svn: 374033
2019-10-08 09:43:05 +00:00
Matt Arsenault f24ac13aaa TLI: Remove DAG argument from getRegisterByName
Replace with the MachineFunction. X86 is the only user, and only uses
it for the function. This removes one obstacle from using this in
GlobalISel. The other is the more tolerable EVT argument.

The X86 use of the function seems questionable to me. It checks hasFP,
before frame lowering.

llvm-svn: 373292
2019-10-01 01:44:39 +00:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
Hans Wennborg 3740ae3b8a Revert r372893 "[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets"
This caused severe compile-time regressions, see PR43455.

> Modern processors predict the targets of an indirect branch regardless of
> the size of any jump table used to glean its target address.  Moreover,
> branch predictors typically use resources limited by the number of actual
> targets that occur at run time.
>
> This patch changes the semantics of the option `-max-jump-table-size` to limit
> the number of different targets instead of the number of entries in a jump
> table.  Thus, it is now renamed to `-max-jump-table-targets`.
>
> Before, when `-max-jump-table-size` was specified, it could happen that
> cluster jump tables could have targets used repeatedly, but each one was
> counted and typically resulted in tables with the same number of entries.
> With this patch, when specifying `-max-jump-table-targets`, tables may have
> different lengths, since the number of unique targets is counted towards the
> limit, but the number of unique targets in tables is the same, but for the
> last one containing the balance of targets.
>
> Differential revision: https://reviews.llvm.org/D60295

llvm-svn: 373060
2019-09-27 09:54:26 +00:00
Evandro Menezes 3bd8ba156b [CodeGen] Replace -max-jump-table-size with -max-jump-table-targets
Modern processors predict the targets of an indirect branch regardless of
the size of any jump table used to glean its target address.  Moreover,
branch predictors typically use resources limited by the number of actual
targets that occur at run time.

This patch changes the semantics of the option `-max-jump-table-size` to limit
the number of different targets instead of the number of entries in a jump
table.  Thus, it is now renamed to `-max-jump-table-targets`.

Before, when `-max-jump-table-size` was specified, it could happen that
cluster jump tables could have targets used repeatedly, but each one was
counted and typically resulted in tables with the same number of entries.
With this patch, when specifying `-max-jump-table-targets`, tables may have
different lengths, since the number of unique targets is counted towards the
limit, but the number of unique targets in tables is the same, but for the
last one containing the balance of targets.

Differential revision: https://reviews.llvm.org/D60295

llvm-svn: 372893
2019-09-25 16:10:20 +00:00
Florian Hahn 364a23427b [AArch64] Convert neon_ushl and neon_sshl with positive constants to VSHL.
I think we should be able to use shl instead of sshl and ushl for
positive constant shift values, unless I am missing something.

We already have the machinery in place to ensure we only replace
nodes, if the shift value is positive and <= the element width.

This is a generalization of an earlier patch rL372565.

Reviewers: t.p.northover, samparker, dmgreen, anemet

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D67955

llvm-svn: 372824
2019-09-25 08:22:05 +00:00
Florian Hahn 3e2fdbee80 [AArch64] support neon_sshl and neon_ushl in performIntrinsicCombine.
Try to generate ushll/sshll for aarch64_neon_ushl/aarch64_neon_sshl,
if their first operand is extended and the second operand is a constant

Also adds a few tests marked with FIXME, where we can further increase
codegen.

Reviewers: t.p.northover, samparker, dmgreen, anemet

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D62308

llvm-svn: 372565
2019-09-23 09:38:53 +00:00
Benjamin Kramer df4b9a3f4f Hide implementation details in namespaces.
llvm-svn: 372113
2019-09-17 12:56:29 +00:00
Graham Hunter 1a9195d817 [SVE][MVT] Fixed-length vector MVT ranges
* Reordered MVT simple types to group scalable vector types
    together.
  * New range functions in MachineValueType.h to only iterate over
    the fixed-length int/fp vector types.
  * Stopped backends which don't support scalable vector types from
    iterating over scalable types.

Reviewers: sdesmalen, greened

Reviewed By: greened

Differential Revision: https://reviews.llvm.org/D66339

llvm-svn: 372099
2019-09-17 10:19:23 +00:00
Kerry McLaughlin e55b3bf40e [SVE][Inline-Asm] Add constraints for SVE predicate registers
Summary:
Adds the following inline asm constraints for SVE:
  - Upl: One of the low eight SVE predicate registers, P0 to P7 inclusive
  - Upa: SVE predicate register with full range, P0 to P15

Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, cameron.mcinally, greened, rengolin

Reviewed By: rovka

Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66524

llvm-svn: 371967
2019-09-16 09:45:27 +00:00
Tim Northover f1c2892912 AArch64: support arm64_32, an ILP32 slice for watchOS.
This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM.
FastISel is mostly disabled for now since it would generate incorrect code for
ILP32.

llvm-svn: 371722
2019-09-12 10:22:23 +00:00
Guillaume Chatelet ad1cea0dda [Alignment][NFC] Use Align with TargetLowering::setPrefFunctionAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67267

llvm-svn: 371212
2019-09-06 15:03:49 +00:00
Guillaume Chatelet 9fcf066d0c [Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67278

llvm-svn: 371210
2019-09-06 14:51:15 +00:00
Guillaume Chatelet 4fc3ad9e13 [Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67229

llvm-svn: 371200
2019-09-06 12:48:34 +00:00
Guillaume Chatelet aff45e4b23 [LLVM][Alignment] Make functions using log of alignment explicit
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:

 - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
 - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
 - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,

Reviewers: lattner, thegameg, courbet

Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65945

llvm-svn: 371045
2019-09-05 10:00:22 +00:00
Kerry McLaughlin 7b5c6b8d86 [SVE][Inline-Asm] Fix -Wimplicit-fallthrough in AArch64ISelLowering.cpp
Summary: Adds break to 'x' case in getRegForInlineAsmConstraint added by D66302, fixing the unintentional fallthrough.

Reviewers: sdesmalen, rovka, cameron.mcinally, greened, gribozavr, ruiu

Reviewed By: sdesmalen

Subscribers: bjope, javed.absar, tschuett, kristof.beyls, rkruppe, psnobl, llvm-commits, cfe-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67095

llvm-svn: 370769
2019-09-03 15:45:42 +00:00
Kerry McLaughlin da4ef9b4c8 [SVE][Inline-Asm] Support for SVE asm operands
Summary:
Adds the following inline asm constraints for SVE:
  - w: SVE vector register with full range, Z0 to Z31
  - x: Restricted to registers Z0 to Z15 inclusive.
  - y: Restricted to registers Z0 to Z7 inclusive.

This change also adds the "z" modifier to interpret a register as an SVE register.

Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness.

Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened

Reviewed By: sdesmalen

Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66302

llvm-svn: 370673
2019-09-02 16:12:31 +00:00
Shiva Chen b39876d8cd [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall
The patch fixed the issue that RV64 didn't clear the upper bits
when return complex floating value with lp64 ABI.

float _Complex
complex_add(float _Complex a, float _Complex b)
{
   return a + b;
}

RealResult = zero_extend(RealA + RealB)
ImageResult = ImageA + ImageB
Return (RealResult | (ImageResult << 32))

The patch introduces shouldExtendTypeInLibCall target hook to suppress
the AssertZext generation when lowering floating LibCall.

Thanks to Eli's comments from the Bugzilla
https://bugs.llvm.org/show_bug.cgi?id=42820

Differential Revision: https://reviews.llvm.org/D65497

llvm-svn: 370275
2019-08-28 23:40:37 +00:00
Hans Wennborg cff90f07cb [SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711)
Neither libgcc or compiler-rt are usually used on Windows, so these
functions can't be called.

Differential revision: https://reviews.llvm.org/D66880

llvm-svn: 370204
2019-08-28 13:55:10 +00:00
Tim Northover a7f226f9db AArch64: avoid creating cycle in DAG for post-increment NEON ops.
Inserting a value into Visited has the effect of terminating a search for
predecessors if that node is seen. This is legitimate for the base address, and
acts as a slight performance optimization, but the vector-building node can be
paert of a legitimate cycle so we shouldn't stop searching there.

PR43056.

llvm-svn: 370036
2019-08-27 10:21:11 +00:00
Simon Pilgrim c88408cf85 Use VT::getHalfNumVectorElementsVT helpers in a few places. NFCI.
llvm-svn: 369751
2019-08-23 12:37:09 +00:00
Shiva Chen 72a41e7b0d [TargetLowering] Remove optional arguments passing to makeLibCall
The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497.
The struct contain argument flags which will pass to makeLibCall function.
The patch should not has any functionality changes.

Differential Revision: https://reviews.llvm.org/D65795

llvm-svn: 369622
2019-08-22 04:59:43 +00:00
Eli Friedman b5eb3e1e82 [AArch64] Remove incorrect usage of MONonTemporal.
This has no effect at the moment, but might matter if we try to
implement non-temporal loads in the future.

llvm-svn: 368770
2019-08-13 23:12:14 +00:00
Daniel Sanders 5ae66e56cf [aarch64] Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Manual fixups in:
AArch64InstrInfo.cpp - genFusedMultiply() now takes a Register* instead of unsigned*
AArch64LoadStoreOptimizer.cpp - Ternary operator was ambiguous between Register/MCRegister. Settled on Register

Depends on D65919

Reviewers: aemerson

Subscribers: jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision for full review was: https://reviews.llvm.org/D65962

llvm-svn: 368628
2019-08-12 22:40:53 +00:00
Evandro Menezes a005c1ac4f [AArch64] Expand bcmp() for small block lengths
Patch D56593 by @courbet results in calls to `bcmp()` in some cases, should
the target support the it.  Unless `TTI::MemCmpExpansionOptions()`
is overridden by the target.

In a proprietary benchmark we see a performance drop of about 12% on PNG
compression before this patch, though it passes all tests.

This patch mirrors X86 for AArch64 and initializes
`TTI::MemCmpExpansionOptions()` to then expand calls to `bcmp()` when
appropriate.  No tuning of the parameters was performed, but, at this point,
it's enough to recover the performance drop above.

This problem also exists on ARM.  Once a consensus is reached for AArch64, we
can work to fix ARM as well.

Authors:
- Evandro Menezes (@evandro) <e.menezes@samsung.com>
- Brian Rzycki (@brzycki) <b.rzycki@samsung.com>

Differential revision: https://reviews.llvm.org/D64805

llvm-svn: 367898
2019-08-05 18:09:14 +00:00
Cullen Rhodes 2a48176373 [AArch64] Implement initial SVE calling convention support
Summary:

This patch adds initial support for the SVE calling convention such that
SVE types can be passed as arguments and return values to/from a
subroutine.

The SVE AAPCS states [1]:

    z0-z7 are used to pass scalable vector arguments to a subroutine,
    and to return scalable vector results from a function. If a
    subroutine takes arguments in scalable vector or predicate
    registers, or if it is a function that returns results in such
    registers, it must ensure that the entire contents of z8-z23 are
    preserved across the call. In other cases it need only preserve the
    low 64 bits of z8-z15, as described in §5.1.2.

    p0-p3 are used to pass scalable predicate arguments to a subroutine
    and to return scalable predicate results from a function. If a
    subroutine takes arguments in scalable vector or predicate
    registers, or if it is a function that returns results in these
    registers, it must ensure that p4-p15 are preserved across the call.
    In other cases it need not preserve any scalable predicate register
    contents.

SVE predicate and data registers are passed indirectly (i.e. spilled to the
stack and pass the address) if they exceed the registers used for argument
passing defined by the PCS referenced above.  Until SVE stack support is merged
we can't spill SVE registers to the stack, so currently an llvm_unreachable is
used where we will eventually handle this.

[1] https://static.docs.arm.com/100986/0000/100986_0000.pdf

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D65448

llvm-svn: 367859
2019-08-05 13:44:10 +00:00
Florian Hahn e3ea97b049 [AArch64] Skip isZIPMask check for masks with an odd number of elements.
We process 2 elements at a time and expect the number of elements to be
even. Similar to D60690.

Reviewers: dmgreen, samparker, t.p.northover

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D65400

llvm-svn: 367831
2019-08-05 11:12:23 +00:00
Guillaume Chatelet c97a3d15d2 [LLVM][Alignment] Introduce Alignment Type
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jfb, jakehehrlich

Reviewed By: jfb

Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65514

llvm-svn: 367828
2019-08-05 11:02:05 +00:00
Bill Wendling 41a2847a9a Emit diagnostic if an inline asm constraint requires an immediate
Summary:
An inline asm call can result in an immediate after inlining. Therefore emit a
diagnostic here if constraint requires an immediate but one isn't supplied.

Reviewers: joerg, mgorny, efriedma, rsmith

Reviewed By: joerg

Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60942

llvm-svn: 367750
2019-08-03 05:52:47 +00:00
Peter Collingbourne 33773d5cfc SelectionDAG, MI, AArch64: Widen target flags fields/arguments from unsigned char to unsigned.
This makes the field wider than MachineOperand::SubReg_TargetFlags so that
we don't end up silently truncating any higher bits. We should still catch
any bits truncated from the MachineOperand field as a consequence of the
assertion in MachineOperand::setTargetFlags().

Differential Revision: https://reviews.llvm.org/D65465

llvm-svn: 367474
2019-07-31 20:14:09 +00:00
Roman Lebedev 017e272c3a [Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold
Summary:
This was originally reported in D62818.
https://rise4fun.com/Alive/oPH

InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression
will be hoisted out of a loop if `Y` is invariant and `X` is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.

Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.

Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit`
instruction pattern. And to not get into endless combine loop.

Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm

Reviewed By: spatel

Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62871

llvm-svn: 366955
2019-07-24 22:57:22 +00:00
Amara Emerson 13af1ed8e3 [GlobalISel] Support for inlining memcpy, memset and memmove calls.
This introduces a new family of combiner helper routines that re-use the
target specific cost model from SelectionDAG, and generate inline implementations
of the memcpy family of intrinsics.

The combines are only enabled at optimization levels higher than -O0, and give
very substantial performance improvements.

Differential Revision: https://reviews.llvm.org/D65167

llvm-svn: 366951
2019-07-24 22:17:31 +00:00
Evgeniy Stepanov d752f5e953 Basic codegen for MTE stack tagging.
Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.

Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.

Differential Revision: https://reviews.llvm.org/D64172

llvm-svn: 366360
2019-07-17 19:24:02 +00:00
Roman Lebedev c4b83a6054 [Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457)
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.

Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.

Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.

Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel

Reviewed By: efriedma

Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64090

llvm-svn: 365010
2019-07-03 09:41:35 +00:00
Tom Tan 7ecb5145ba [COFF, ARM64] Fix encoding of debugtrap for Windows
On Windows ARM64, intrinsic __debugbreak is compiled into brk #0xF000 which is
mapped to llvm.debugtrap in Clang. Instruction brk #F000 is the defined break
point instruction on ARM64 which is recognized by Windows debugger and
exception handling code, so llvm.debugtrap should map to it instead of
redirecting to llvm.trap (brk #1) as the default implementation.

Differential Revision: https://reviews.llvm.org/D63635

llvm-svn: 364115
2019-06-21 23:38:05 +00:00
Simon Pilgrim 4e0648a541 [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123)
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.

This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.

If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.

Differential Revision: https://reviews.llvm.org/D63075

llvm-svn: 363179
2019-06-12 17:14:03 +00:00
Sam Parker a0bd6f8a1a [AArch64] Check for simple type in FPToUInt
DAGCombiner was hitting a SimpleType assertion when trying to combine
a v3f32 before type legalization.

bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41916

Differential Revision: https://reviews.llvm.org/D62734

llvm-svn: 362365
2019-06-03 08:49:17 +00:00
Adhemerval Zanella 34d8daae53 [AArch64] Handle ISD::LRINT and ISD::LLRINT
This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus
fcvtzs. It currently only handles the scalar version.

Reviewed By: SjoerdMeijer, mstorsjo

Differential Revision: https://reviews.llvm.org/D62018

llvm-svn: 361877
2019-05-28 21:04:29 +00:00
Reid Kleckner b7a78c7dff [AArch64] Preserve X8 for thunks ending in variadic musttail calls
Summary:
On Windows, X8 may be used to pass in the address of an aggregate that
is returned indirectly. Therefore, it should be forwarded to variadic
musttail calls and preserved in thunks.

Fixes PR41997

Reviewers: mgrang, efriedma

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62344

llvm-svn: 361585
2019-05-24 01:27:20 +00:00
Florian Hahn 4a8835c655 [AArch64] Skip mask checks for masks with an odd number of elements.
Some checks in isShuffleMaskLegal expect an even number of elements,
e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access
invalid elements and crash. This patch adds checks to the impacted
functions.

Fixes PR41951

Reviewers: t.p.northover, dmgreen, samparker

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D60690

llvm-svn: 361235
2019-05-21 10:05:26 +00:00
Adhemerval Zanella 2d28db6b9f [AArch64] Handle ISD::LROUND and ISD::LLROUND
This patch optimizes ISD::LROUND and ISD::LLROUND to fcvtas
instruction. It currently only handles the scalar version.

llvm-svn: 360894
2019-05-16 13:30:18 +00:00
Mandeep Singh Grang 5dc8aeb26d [COFF, ARM64] Fix ABI implementation of struct returns
Summary:
Refer the ABI doc at: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values

Related clang patch: D60349

Reviewers: rnk, efriedma, TomTan, ssijaric

Reviewed By: rnk, efriedma

Subscribers: mstorsjo, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60348

llvm-svn: 359934
2019-05-03 21:12:36 +00:00
Sjoerd Meijer 180f1ae57c [TargetLowering] Change getOptimalMemOpType to take a function attribute list
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.

This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.

Differential Revision: https://reviews.llvm.org/D59785

llvm-svn: 359537
2019-04-30 08:38:12 +00:00
Craig Topper 35fe07916a [AArch64] Teach getTestBitOperand to look through ANY_EXTENDS
This patch teach getTestBitOperand to look through ANY_EXTENDs when the extended bits aren't used. The test case changed here is based what D60358 did to test16 in tbz-tbnz.ll. So this patch will avoid that regression.

Differential Revision: https://reviews.llvm.org/D60482

llvm-svn: 358108
2019-04-10 17:27:29 +00:00
Evandro Menezes 85bd3978ae [IR] Refactor attribute methods in Function class (NFC)
Rename the functions that query the optimization kind attributes.

Differential revision: https://reviews.llvm.org/D60287

llvm-svn: 357731
2019-04-04 22:40:06 +00:00