Commit Graph

4816 Commits

Author SHA1 Message Date
Amara Emerson 89e84dec18 [AArch64][GlobalISel] Fix fallbacks introduced for G_SITOFP in 8f283cafdd
If we have an integer->fp convert that has differing sizes, e.g. s32 to s64,
then don't try to convert it to AArch64::G_SITOF since it won't select.
2021-01-15 01:10:49 -08:00
KAWASHIMA Takahiro b54337070b [AArch64] Add Fujitsu A64FX scheduling model
Basic support of A64FX was added in D75594 but its scheduling model
was missing. This commit adds the scheduling model. Also, this commit
amends/adds some subtarget parameters of A64FX.

The A64FX Microarchitecture Manual, which is source information of
this commit, is on GitHub.

https://github.com/fujitsu/A64FX/

Differential Revision: https://reviews.llvm.org/D93791
2021-01-15 17:14:04 +09:00
Kazu Hirata 7dc3575ef2 [llvm] Remove redundant return and continue statements (NFC)
Identified with readability-redundant-control-flow.
2021-01-14 20:30:34 -08:00
Amara Emerson 8f283cafdd [AArch64][GlobalISel] Add selection support for fpr bank source variants of G_SITOFP and G_UITOFP.
In order to import patterns for these, we need to define new ops that can map to
the AArch64ISD::[SU]ITOF nodes. We then transform fpr->fpr variants of the generic
opcodes to these custom opcodes in preisel-lowering. We have to do it here and
not the PostLegalizer combiner because this has to run after regbankselect.

Differential Revision: https://reviews.llvm.org/D94702
2021-01-14 19:31:19 -08:00
Amara Emerson 036bc798f2 [AArch64][GlobalISel] Assign FPR banks to loads which are used by integer->float conversions.
G_[US]ITOFP users of loads on AArch64 can operate on both gpr and fpr banks for scalars.
Because of this, if their source is a load, then that load can be assigned to an fpr
bank and therefore avoid having to do a cross bank copy via a gpr->fpr conversion.

Differential Revision: https://reviews.llvm.org/D94701
2021-01-14 16:33:34 -08:00
Martin Storsjö dbaa6a1858 Revert "[AArch64] Attempt to sink mul operands"
This reverts commit dda60035e9.

This commit caused failures to compile some sources, erroring out
with "error in backend: Cannot select: t85: v2i32 = AArch64ISD::DUP t15",
see https://reviews.llvm.org/D91271 for the full reproduction case.
2021-01-14 17:28:18 +02:00
Lucas Prates 2b1e25befe [AArch64] Adding ACLE intrinsics for the LS64 extension
This introduces the ARMv8.7-A LS64 extension's intrinsics for 64 bytes
atomic loads and stores: `__arm_ld64b`, `__arm_st64b`, `__arm_st64bv`,
and `__arm_st64bv0`. These are selected into the LS64 instructions
LD64B, ST64B, ST64BV and ST64BV0, respectively.

Based on patches written by Simon Tatham.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D93232
2021-01-14 09:43:58 +00:00
Jordan Rupprecht 752fafda3d [NFC] Fix -Wsometimes-uninitialized
After 49142991a6, clang detects that MUL may be uninitialized. Set it to nullptr to suppress this check.

Adding an assert to check that it is ultimately set fails two test cases. Since this is not a new issue, leave the assertion commented out until a code owner can fix the bug. The two failing test cases are noted in the assertion comment.
2021-01-13 20:32:38 -08:00
Kazu Hirata 4c1617dac8 [llvm] Use std::any_of (NFC) 2021-01-13 19:14:44 -08:00
Muhammad Asif Manzoor 4e8e888905 [AArch64][GlobalISel] Add support for FCONSTANT of FP128 type
Add support for G_FCONSTANT of FP128 (Quadruple precision) type.
It replaces the constant by emitting a load with a constant pool entry.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D94437
2021-01-13 10:46:10 -05:00
Nicholas Guy dda60035e9 [AArch64] Attempt to sink mul operands
Following on from D91255, this patch is responsible for sinking relevant mul
operands to the same block so that umull/smull instructions can be correctly
generated by the mul combine implemented in the aforementioned patch.

Differential revision: https://reviews.llvm.org/D91271
2021-01-13 15:23:36 +00:00
Cullen Rhodes ad85e39670 [SVE] Add ISel pattern for addvl
Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D94504
2021-01-13 10:57:49 +00:00
Joe Ellis 3122c66aee [AArch64][SVE] Remove chains of unnecessary SVE reinterpret intrinsics
This commit extends SVEIntrinsicOpts::optimizeConvertFromSVBool to
identify and remove longer chains of redundant SVE reintepret
intrinsics. For example, the following chain of redundant SVE
reinterprets is now recognised as redundant:

    %a = <vscale x 2 x i1>
    %1 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 2 x i1> %a)
    %2 = <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %1)
    %3 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 4 x i1> %2)
    %4 = <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %3)
    %5 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 4 x i1> %4)
    %6 = <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %5)
    ret <vscale x 2 x i1> %6

and will be replaced with:

    ret <vscale x 2 x i1> %a

Eliminating these can sometimes mean emitting fewer unnecessary
loads/stores when lowering to assembly.

Differential Revision: https://reviews.llvm.org/D94074
2021-01-13 09:44:09 +00:00
Hsiangkai Wang 914e2f5a02 [NFC] Use generic name for scalable vector stack ID.
Differential Revision: https://reviews.llvm.org/D94471
2021-01-13 10:57:43 +08:00
Martin Storsjö d1fa7afc7a [AArch64] [Windows] Properly add :lo12: reloc specifiers when generating assembly
This makes sure that assembly output actually can be assembled.

Set the correct MCExpr relocations specifier VK_PAGEOFF - and also
set VK_PAGE consistently even though it's not visible in the assembly
output.

Differential Revision: https://reviews.llvm.org/D94365
2021-01-12 23:56:03 +02:00
Bjorn Pettersson 32c073acb3 [GlobalISel] Map extractelt to G_EXTRACT_VECTOR_ELT
Before this patch there was generic mapping from vector_extract
to G_EXTRACT_VECTOR_ELT added in SelectionDAGCompat.td. That
mapping is now replaced by a mapping from extractelt instead.

The reasoning is that vector_extract is marked as deprecated,
so it is assumed that a majority of targets will use extractelt
and not vector_extract (and that the long term solution for all
targets would be to use extractelt).

Targets like AArch64 that still use vector_extract can add an
additional mapping from the deprecated vector_extract as target
specific tablegen definitions. Such a mapping is added for AArch64
in this patch to avoid breaking tests.

When adding the extractelt => G_EXTRACT_VECTOR_ELT mapping we
triggered some new code paths in GlobalISelEmitter, ending up in
an assert when trying to import a pattern containing EXTRACT_SUBREG
for ARM. Therefore this patch also adds a "failedImport" warning
for that situation (instead of hitting the assert).

Differential Revision: https://reviews.llvm.org/D93416
2021-01-11 21:53:56 +01:00
Kerry McLaughlin c37f68a888 [SVE][CodeGen] Fix legalisation of floating-point masked gathers
Changes in this patch:
- When lowering floating-point masked gathers, cast the result of the
  gather back to the original type with reinterpret_cast before returning.
- Added patterns for reinterpret_casts from integer to floating point, and
  concat_vector patterns for bfloat16.
- Tests for various legalisation scenarios with floating point types.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D94171
2021-01-11 10:57:46 +00:00
Kazu Hirata e3d3dbd339 [llvm] Ensure newlines at the end of files (NFC)
This patch eliminates pesky "No newline at end of file" messages from
git diff.
2021-01-10 09:24:57 -08:00
Mark Murray 7d4a8bc417 [AArch64] Add +flagm archictecture option, allowing the v8.4a flag modification extension.
Differential Revision: https://reviews.llvm.org/D94081
2021-01-08 13:21:12 +00:00
Mark Murray af7cce2fa4 [AArch64] Add +pauth archictecture option, allowing the v8.3a pointer authentication extension.
Differential Revision: https://reviews.llvm.org/D94083
2021-01-08 13:21:11 +00:00
Nicholas Guy ed23229a64 [AArch64] Fix crash caused by invalid vector element type
Fixes a crash caused by D91255, when LLVMTy is null when
calling changeExtendedVectorElementType.

Differential Revision: https://reviews.llvm.org/D94234
2021-01-08 12:02:54 +00:00
David Sherwood d1bf26fd94 [AArch64][SVE] Add lowering for llvm abs intrinsic
Add functionality to permit lowering of the abs and neg intrinsics
using the passthru variants.

Differential Revision: https://reviews.llvm.org/D94160
2021-01-08 08:55:25 +00:00
Kazu Hirata b934160aaa [Target] Use llvm::find_if (NFC) 2021-01-07 20:29:36 -08:00
Cameron McInally f4013359b3 [SVE] Add unpacked scalable floating point ZIP/UZP/TRN patterns
Differential Revision: https://reviews.llvm.org/D94193
2021-01-07 09:56:53 -06:00
Simon Pilgrim 037b058e41 [AArch64] SVEIntrinsicOpts - use range loop and cast<> instead of dyn_cast<> for dereferenced pointer. NFCI.
Don't directly dereference a dyn_cast<> - use cast<> so we assert for the correct type.

Also, simplify the for loop to a range loop.

Fixes clang static analyzer warning.
2021-01-07 14:21:55 +00:00
Caroline Concatto 01c190e907 [AArch64][CostModel]Fix gather scatter cost model
This patch fixes a bug introduced in the patch:
https://reviews.llvm.org/D93030

This patch pulls the test for scalable vector to be the first instruction
to be checked. This avoids the Gather and Scatter cost model for AArch64 to
compute the number of vector elements for something that is not a vector and
therefore crashing.
2021-01-07 14:02:08 +00:00
Nicholas Guy 350247a93c [AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))
Performing this rearrangement allows for existing patterns
to match cases where the vector may be built after an extend,
instead of before.

Differential Revision: https://reviews.llvm.org/D91255
2021-01-06 16:02:16 +00:00
Tomas Matheson 643e3c9076 [AArch64] Add BRB IALL and BRB INJ instructions
BRB IALL: Invalidate the Branch Record Buffer
BRB INJ: Branch Record Injection into the Branch Record Buffer

Parser changes based on work by Simon Tatham.

These are two-word mnemonics. The assembly parser works by special-casing
the mnemonic in order to parse the second word as a plain identifier token.

Reviewed by: MarkMurrayARM

Differential Revision: https://reviews.llvm.org/D93899
2021-01-06 12:10:22 +00:00
David Green a9b6440edd [AArch64] Handle any extend whilst lowering addw/addl/subw/subl
This adds an extra tablegen PatFrag, zanyext, which matches either any
extend or zext and uses that in the aarch64 backend to handle any
extends in addw/addl/subw/subl patterns.

Differential Revision: https://reviews.llvm.org/D93833
2021-01-06 10:35:23 +00:00
David Green 78d8a821e2 [AArch64] Handle any extend whilst lowering mull
Demanded bits may turn a sext or zext into an anyext if the top bits are
not needed. This currently prevents the lowering to instructions like
mull, addl and addw. This patch fixes the mull generation by keeping it
simple and treating them like zextends.

Differential Revision: https://reviews.llvm.org/D93832
2021-01-06 10:08:43 +00:00
Sander de Smalen a7e3339f3b [AArch64][SVE] Emit DWARF location expression for SVE stack objects.
Extend PEI to emit a DWARF expression for StackOffsets that have
a fixed and scalable component. This means the expression that needs
to be added is either:
  <base> + offset
or:
  <base> + offset + scalable_offset * scalereg

where for SVE, the scale reg is the Vector Granule Dwarf register, which
encodes the number of 64bit 'granules' in an SVE vector and which
the debugger can evaluate at runtime.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D90020
2021-01-06 09:40:53 +00:00
Sander de Smalen a9f5e4375b [AArch64] Use faddp to implement fadd reductions.
Custom-expand legal VECREDUCE_FADD SDNodes
to benefit from pair-wise faddp instructions.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D59259
2021-01-06 09:36:51 +00:00
Kazu Hirata cd088ba7e6 [llvm] Use llvm::lower_bound and llvm::upper_bound (NFC) 2021-01-05 21:15:59 -08:00
Christudasan Devadasan d68458bd56 [GlobalISel] Base implementation for sret demotion.
If the return values can't be lowered to registers
SelectionDAG performs the sret demotion. This patch
contains the basic implementation for the same in
the GlobalISel pipeline.

Furthermore, targets should bring relevant changes
during lowerFormalArguments, lowerReturn and
lowerCall to make use of this feature.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D92953
2021-01-06 10:30:50 +05:30
Bradley Smith c73ae747cb [AArch64][SVE] Add optimization to remove redundant ptest instructions
Co-Authored-by: Graham Hunter <graham.hunter@arm.com>
Co-Authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D93292
2021-01-05 15:28:36 +00:00
Paul Walker eba6deab22 [SVE] Lower vector CTLZ, CTPOP and CTTZ operations.
CTLZ and CTPOP are lowered to CLZ and CNT instructions respectively.

CTTZ is not a native SVE operation but is instead lowered to:
  CTTZ(V) => CTLZ(BITREVERSE(V))

In the case of fixed-length support using SVE we also lower CTTZ
operating on NEON sized vectors because of its reliance on
BITREVERSE which is also lowered to SVE intructions at these lengths.

Differential Revision: https://reviews.llvm.org/D93607
2021-01-05 10:42:35 +00:00
Kazu Hirata eb198f4c3c [llvm] Use llvm::any_of (NFC) 2021-01-04 11:42:47 -08:00
Caroline Concatto 060cfd9795 [AArch64][SVE]Add cost model for masked gather and scatter for scalable vector.
A new TTI interface has been added 'Optional <unsigned>getMaxVScale' that
    returns the maximum vscale for a given target.
    When known getMaxVScale is used to compute the cost of masked gather scatter
    for scalable vector.

    Depends on D92094

    Differential Revision: https://reviews.llvm.org/D93030
2021-01-04 13:59:58 +00:00
Florian Hahn d38a0258a5
[AArch64] Add patterns for FMCLA*_indexed.
This patch adds patterns for the indexed variants of FCMLA. Mostly based
on a patch by Tim Northover.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D92947
2021-01-04 13:45:51 +00:00
Usman Nadeem 685c8b537a [AARCH64] Improve accumulator forwarding for Cortex-A57 model
The old CPU model only had MLA->MLA forwarding. I added some missing
MUL->MLA read advances and a missing absolute diff accumulator read
advance according to the Cortex A57 Software Optimization Guide.

The patch improves performance in EEMBC rgbyiqv2 by about 6%-7% and
spec2006/milc by 8% (repeated runs on multiple devices), causes no
significant regressions (none in SPEC).

Differential Revision: https://reviews.llvm.org/D92296
2021-01-04 10:58:43 +00:00
David Sherwood a65092040a [SVE] Fix inline assembly parsing crash
This patch fixes a crash encountered when compiling this code:

  ...
  float16_t a;
  __asm__("fminv %h[a], %[b], %[c].h"
          : [a] "=r" (a)
          : [b] "Upl" (b), [c] "w" (c))

The issue here is when using the 'h' modifier for a register
constraint 'r'.

Differential Revision: https://reviews.llvm.org/D93537
2021-01-04 09:11:05 +00:00
Mark Murray 5abfeccf10 [ARM][AArch64] Add Cortex-A78C Support for Clang and LLVM
This patch upstreams support for the Armv8-a Cortex-A78C
processor for AArch64 and ARM.

In detail:

Adding cortex-a78c as cpu option for aarch64 and arm targets in clang
Adding Cortex-A78C CPU name and ProcessorModel in llvm
Details of the CPU can be found here:
https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78c
2020-12-29 10:18:59 +00:00
Nikita Popov fb77d95022 [AArch64] Fix legalization of i128 ctpop without neon
If neon is disabled, LowerCTPOP will return SDValue() to indicate
that normal legalization should be used. However, ReplaceNodeResults
does not check for this and pushes the empty SDValue() onto the
result vector, which will subsequently result in a crash.

Differential Revision: https://reviews.llvm.org/D93825
2020-12-27 17:24:41 +01:00
Amara Emerson e0721a0992 [AArch64][GlobalISel] Notify observer of mutated instruction for shift custom legalization.
No test for this because it's a CSE verifier failure that's only exposed in a
WIP patch for enabling CSE throughout the AArch64 GISel pipeline.
2020-12-25 00:31:47 -08:00
Kazu Hirata d6ff5cf995 [Target] Use llvm::any_of (NFC) 2020-12-24 19:43:26 -08:00
Matt Arsenault 581d13f8ae GlobalISel: Return APInt from getConstantVRegVal
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.

Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).

One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
2020-12-22 22:23:58 -05:00
Paul Walker 8eec7294fe [SVE] Lower vector BITREVERSE and BSWAP operations.
These operations are lowered to RBIT and REVB instructions
respectively.  In the case of fixed-length support using SVE we
also lower BITREVERSE operating on NEON sized vectors as this
results in fewer instructions.

Differential Revision: https://reviews.llvm.org/D93606
2020-12-22 16:49:50 +00:00
Kazu Hirata 966f1431de [Target] Use llvm::erase_if (NFC) 2020-12-20 17:43:22 -08:00
Lucas Prates 1a9577bde1 [AArch64] Add support for ls64 to the .arch_extension asm directive
This adds support for the 'ls64' AArch64 extension to the `.arch_extension`
asm directive.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D92574
2020-12-18 15:55:55 +00:00
Tomas Matheson fc712eb7aa [AArch64] Fix Copy Elemination for negative values
Redundant Copy Elimination was eliminating a MOVi32imm -1 when it
determined that the value of the destination register is already -1.
However, it didn't take into account that the MOVi32imm zeroes the upper
32 bits (which are FFFFFFFF) and therefore cannot be eliminated.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D93100
2020-12-18 13:30:46 +00:00