Commit Graph

4980 Commits

Author SHA1 Message Date
Fangrui Song 36e62b1ff7 [AArch64] Fix -Wunused-but-set-variable in GCC -DLLVM_ENABLE_ASSERTIONS=off build 2021-01-20 10:14:11 -08:00
Simon Pilgrim cad4275d69 [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE
Add DemandedElts support inside the TRUNCATE analysis.

Differential Revision: https://reviews.llvm.org/D56387
2021-01-20 15:39:58 +00:00
Amanieu d'Antras 21bfd068b3 [AArch64] Add support for the GNU ILP32 ABI
Add the aarch64[_be]-*-gnu_ilp32 targets to support the GNU ILP32 ABI for AArch64.

The needed codegen changes were mostly already implemented in D61259, which added support for the watchOS ILP32 ABI. The main changes are:
- Wiring up the new target to enable ILP32 codegen and MC.
- ILP32 va_list support.
- ILP32 TLSDESC relocation support.

There was existing MC support for ELF ILP32 relocations from D25159 which could be enabled by passing "-target-abi ilp32" to llvm-mc. This was changed to check for "gnu_ilp32" in the target triple instead. This shouldn't cause any issues since the existing support was slightly broken: it was generating ELF64 objects instead of the ELF32 object files expected by the GNU ILP32 toolchain.

This target has been tested by running the full rustc testsuite on a big-endian ILP32 system based on the GCC ILP32 toolchain.

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D94143
2021-01-20 13:34:47 +00:00
Mark Murray cab20f6105 [AArch64] Add missing "flagm" feature to the .arch_extension directive.
Depends on D94970

Differential Revision: https://reviews.llvm.org/D94971
2021-01-20 11:57:39 +00:00
Mark Murray f344c028de [AArch64] Add missing "pauth" feature to the .arch_extension directive.
Differential Revision: https://reviews.llvm.org/D94970
2021-01-20 11:57:39 +00:00
Gabriel Hjort Åkerlund 2aeaaf841b [GlobalISel] Add missing operand update when copy is required
When constraining an operand register using constrainOperandRegClass(),
the function may emit a COPY in case the provided register class does
not match the current operand register class. However, the operand
itself is not updated to make use of the COPY, thereby resulting in
incorrect code. This patch fixes that bug by updating the machine
operand accordingly.

Reviewed By: dsanders

Differential Revision: https://reviews.llvm.org/D91244
2021-01-20 10:32:52 +01:00
Tim Northover 6259fbd8b6 AArch64: add apple-a14 as a CPU
This CPU supports all v8.5a features except BTI, and so identifies as v8.5a to
Clang. A bit weird, but the best way for things like xnu to detect the new
features it cares about.
2021-01-19 14:04:53 +00:00
Caroline Concatto 172f1f8952 [AArch64][SVE]Add cost model for vector reduce for scalable vector
This patch computes the cost for vector.reduce<operand> for scalable vectors.
The cost is split into two parts:  the legalization cost and the horizontal
reduction.

Differential Revision: https://reviews.llvm.org/D93639
2021-01-19 11:54:16 +00:00
Nicholas Guy 16bf02c3a1 Reland "[AArch64] Attempt to sink mul operands""
This relands dda60035e9,
which was reverted by dbaa6a1858
2021-01-18 16:00:22 +00:00
Nicholas Guy f5fcbe4e3c [AArch64] Further restricts when a dup(*ext) can be rearranged
In most cases, the dup(*ext) pattern can be rearranged to perform
the extension on the vector side, allowing for further vector-specific
optimisations to be made. However the initial checks for this conversion
were insufficient, allowing invalid encodings to be attempted (causing
compilation to fail).

Differential Revision: https://reviews.llvm.org/D94778
2021-01-18 16:00:21 +00:00
Amara Emerson 8456c3a789 AArch64: fix regression introduced by fcmp immediate selection.
Forgot to check if the predicate is safe to commutate operands.
2021-01-15 22:53:25 -08:00
Amara Emerson aa8a2d8a3d [AArch64][GlobalISel] Select immediate fcmp if the zero is on the LHS. 2021-01-15 14:31:39 -08:00
Stephan Herhut 061d152085 [SVE] Fix unused variable.
Introduced by [SVE] Restrict the usage of REINTERPRET_CAST.

Differential Revision: https://reviews.llvm.org/D94773
2021-01-15 15:10:33 +01:00
Paul Walker 2b8db40c92 [SVE] Restrict the usage of REINTERPRET_CAST.
In order to limit the number of combinations of REINTERPRET_CAST,
whilst at the same time prevent overlap with BITCAST, this patch
establishes the following rules:

1. The operand and result element types must be the same.
2. The operand and/or result type must be an unpacked type.

Differential Revision: https://reviews.llvm.org/D94593
2021-01-15 11:32:13 +00:00
Amara Emerson 89e84dec18 [AArch64][GlobalISel] Fix fallbacks introduced for G_SITOFP in 8f283cafdd
If we have an integer->fp convert that has differing sizes, e.g. s32 to s64,
then don't try to convert it to AArch64::G_SITOF since it won't select.
2021-01-15 01:10:49 -08:00
KAWASHIMA Takahiro b54337070b [AArch64] Add Fujitsu A64FX scheduling model
Basic support of A64FX was added in D75594 but its scheduling model
was missing. This commit adds the scheduling model. Also, this commit
amends/adds some subtarget parameters of A64FX.

The A64FX Microarchitecture Manual, which is source information of
this commit, is on GitHub.

https://github.com/fujitsu/A64FX/

Differential Revision: https://reviews.llvm.org/D93791
2021-01-15 17:14:04 +09:00
Kazu Hirata 7dc3575ef2 [llvm] Remove redundant return and continue statements (NFC)
Identified with readability-redundant-control-flow.
2021-01-14 20:30:34 -08:00
Amara Emerson 8f283cafdd [AArch64][GlobalISel] Add selection support for fpr bank source variants of G_SITOFP and G_UITOFP.
In order to import patterns for these, we need to define new ops that can map to
the AArch64ISD::[SU]ITOF nodes. We then transform fpr->fpr variants of the generic
opcodes to these custom opcodes in preisel-lowering. We have to do it here and
not the PostLegalizer combiner because this has to run after regbankselect.

Differential Revision: https://reviews.llvm.org/D94702
2021-01-14 19:31:19 -08:00
Amara Emerson 036bc798f2 [AArch64][GlobalISel] Assign FPR banks to loads which are used by integer->float conversions.
G_[US]ITOFP users of loads on AArch64 can operate on both gpr and fpr banks for scalars.
Because of this, if their source is a load, then that load can be assigned to an fpr
bank and therefore avoid having to do a cross bank copy via a gpr->fpr conversion.

Differential Revision: https://reviews.llvm.org/D94701
2021-01-14 16:33:34 -08:00
Martin Storsjö dbaa6a1858 Revert "[AArch64] Attempt to sink mul operands"
This reverts commit dda60035e9.

This commit caused failures to compile some sources, erroring out
with "error in backend: Cannot select: t85: v2i32 = AArch64ISD::DUP t15",
see https://reviews.llvm.org/D91271 for the full reproduction case.
2021-01-14 17:28:18 +02:00
Lucas Prates 2b1e25befe [AArch64] Adding ACLE intrinsics for the LS64 extension
This introduces the ARMv8.7-A LS64 extension's intrinsics for 64 bytes
atomic loads and stores: `__arm_ld64b`, `__arm_st64b`, `__arm_st64bv`,
and `__arm_st64bv0`. These are selected into the LS64 instructions
LD64B, ST64B, ST64BV and ST64BV0, respectively.

Based on patches written by Simon Tatham.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D93232
2021-01-14 09:43:58 +00:00
Jordan Rupprecht 752fafda3d [NFC] Fix -Wsometimes-uninitialized
After 49142991a6, clang detects that MUL may be uninitialized. Set it to nullptr to suppress this check.

Adding an assert to check that it is ultimately set fails two test cases. Since this is not a new issue, leave the assertion commented out until a code owner can fix the bug. The two failing test cases are noted in the assertion comment.
2021-01-13 20:32:38 -08:00
Kazu Hirata 4c1617dac8 [llvm] Use std::any_of (NFC) 2021-01-13 19:14:44 -08:00
Muhammad Asif Manzoor 4e8e888905 [AArch64][GlobalISel] Add support for FCONSTANT of FP128 type
Add support for G_FCONSTANT of FP128 (Quadruple precision) type.
It replaces the constant by emitting a load with a constant pool entry.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D94437
2021-01-13 10:46:10 -05:00
Nicholas Guy dda60035e9 [AArch64] Attempt to sink mul operands
Following on from D91255, this patch is responsible for sinking relevant mul
operands to the same block so that umull/smull instructions can be correctly
generated by the mul combine implemented in the aforementioned patch.

Differential revision: https://reviews.llvm.org/D91271
2021-01-13 15:23:36 +00:00
Cullen Rhodes ad85e39670 [SVE] Add ISel pattern for addvl
Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D94504
2021-01-13 10:57:49 +00:00
Joe Ellis 3122c66aee [AArch64][SVE] Remove chains of unnecessary SVE reinterpret intrinsics
This commit extends SVEIntrinsicOpts::optimizeConvertFromSVBool to
identify and remove longer chains of redundant SVE reintepret
intrinsics. For example, the following chain of redundant SVE
reinterprets is now recognised as redundant:

    %a = <vscale x 2 x i1>
    %1 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 2 x i1> %a)
    %2 = <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %1)
    %3 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 4 x i1> %2)
    %4 = <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %3)
    %5 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 4 x i1> %4)
    %6 = <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %5)
    ret <vscale x 2 x i1> %6

and will be replaced with:

    ret <vscale x 2 x i1> %a

Eliminating these can sometimes mean emitting fewer unnecessary
loads/stores when lowering to assembly.

Differential Revision: https://reviews.llvm.org/D94074
2021-01-13 09:44:09 +00:00
Hsiangkai Wang 914e2f5a02 [NFC] Use generic name for scalable vector stack ID.
Differential Revision: https://reviews.llvm.org/D94471
2021-01-13 10:57:43 +08:00
Martin Storsjö d1fa7afc7a [AArch64] [Windows] Properly add :lo12: reloc specifiers when generating assembly
This makes sure that assembly output actually can be assembled.

Set the correct MCExpr relocations specifier VK_PAGEOFF - and also
set VK_PAGE consistently even though it's not visible in the assembly
output.

Differential Revision: https://reviews.llvm.org/D94365
2021-01-12 23:56:03 +02:00
Bjorn Pettersson 32c073acb3 [GlobalISel] Map extractelt to G_EXTRACT_VECTOR_ELT
Before this patch there was generic mapping from vector_extract
to G_EXTRACT_VECTOR_ELT added in SelectionDAGCompat.td. That
mapping is now replaced by a mapping from extractelt instead.

The reasoning is that vector_extract is marked as deprecated,
so it is assumed that a majority of targets will use extractelt
and not vector_extract (and that the long term solution for all
targets would be to use extractelt).

Targets like AArch64 that still use vector_extract can add an
additional mapping from the deprecated vector_extract as target
specific tablegen definitions. Such a mapping is added for AArch64
in this patch to avoid breaking tests.

When adding the extractelt => G_EXTRACT_VECTOR_ELT mapping we
triggered some new code paths in GlobalISelEmitter, ending up in
an assert when trying to import a pattern containing EXTRACT_SUBREG
for ARM. Therefore this patch also adds a "failedImport" warning
for that situation (instead of hitting the assert).

Differential Revision: https://reviews.llvm.org/D93416
2021-01-11 21:53:56 +01:00
Kerry McLaughlin c37f68a888 [SVE][CodeGen] Fix legalisation of floating-point masked gathers
Changes in this patch:
- When lowering floating-point masked gathers, cast the result of the
  gather back to the original type with reinterpret_cast before returning.
- Added patterns for reinterpret_casts from integer to floating point, and
  concat_vector patterns for bfloat16.
- Tests for various legalisation scenarios with floating point types.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D94171
2021-01-11 10:57:46 +00:00
Kazu Hirata e3d3dbd339 [llvm] Ensure newlines at the end of files (NFC)
This patch eliminates pesky "No newline at end of file" messages from
git diff.
2021-01-10 09:24:57 -08:00
Mark Murray 7d4a8bc417 [AArch64] Add +flagm archictecture option, allowing the v8.4a flag modification extension.
Differential Revision: https://reviews.llvm.org/D94081
2021-01-08 13:21:12 +00:00
Mark Murray af7cce2fa4 [AArch64] Add +pauth archictecture option, allowing the v8.3a pointer authentication extension.
Differential Revision: https://reviews.llvm.org/D94083
2021-01-08 13:21:11 +00:00
Nicholas Guy ed23229a64 [AArch64] Fix crash caused by invalid vector element type
Fixes a crash caused by D91255, when LLVMTy is null when
calling changeExtendedVectorElementType.

Differential Revision: https://reviews.llvm.org/D94234
2021-01-08 12:02:54 +00:00
David Sherwood d1bf26fd94 [AArch64][SVE] Add lowering for llvm abs intrinsic
Add functionality to permit lowering of the abs and neg intrinsics
using the passthru variants.

Differential Revision: https://reviews.llvm.org/D94160
2021-01-08 08:55:25 +00:00
Kazu Hirata b934160aaa [Target] Use llvm::find_if (NFC) 2021-01-07 20:29:36 -08:00
Cameron McInally f4013359b3 [SVE] Add unpacked scalable floating point ZIP/UZP/TRN patterns
Differential Revision: https://reviews.llvm.org/D94193
2021-01-07 09:56:53 -06:00
Simon Pilgrim 037b058e41 [AArch64] SVEIntrinsicOpts - use range loop and cast<> instead of dyn_cast<> for dereferenced pointer. NFCI.
Don't directly dereference a dyn_cast<> - use cast<> so we assert for the correct type.

Also, simplify the for loop to a range loop.

Fixes clang static analyzer warning.
2021-01-07 14:21:55 +00:00
Caroline Concatto 01c190e907 [AArch64][CostModel]Fix gather scatter cost model
This patch fixes a bug introduced in the patch:
https://reviews.llvm.org/D93030

This patch pulls the test for scalable vector to be the first instruction
to be checked. This avoids the Gather and Scatter cost model for AArch64 to
compute the number of vector elements for something that is not a vector and
therefore crashing.
2021-01-07 14:02:08 +00:00
Nicholas Guy 350247a93c [AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))
Performing this rearrangement allows for existing patterns
to match cases where the vector may be built after an extend,
instead of before.

Differential Revision: https://reviews.llvm.org/D91255
2021-01-06 16:02:16 +00:00
Tomas Matheson 643e3c9076 [AArch64] Add BRB IALL and BRB INJ instructions
BRB IALL: Invalidate the Branch Record Buffer
BRB INJ: Branch Record Injection into the Branch Record Buffer

Parser changes based on work by Simon Tatham.

These are two-word mnemonics. The assembly parser works by special-casing
the mnemonic in order to parse the second word as a plain identifier token.

Reviewed by: MarkMurrayARM

Differential Revision: https://reviews.llvm.org/D93899
2021-01-06 12:10:22 +00:00
David Green a9b6440edd [AArch64] Handle any extend whilst lowering addw/addl/subw/subl
This adds an extra tablegen PatFrag, zanyext, which matches either any
extend or zext and uses that in the aarch64 backend to handle any
extends in addw/addl/subw/subl patterns.

Differential Revision: https://reviews.llvm.org/D93833
2021-01-06 10:35:23 +00:00
David Green 78d8a821e2 [AArch64] Handle any extend whilst lowering mull
Demanded bits may turn a sext or zext into an anyext if the top bits are
not needed. This currently prevents the lowering to instructions like
mull, addl and addw. This patch fixes the mull generation by keeping it
simple and treating them like zextends.

Differential Revision: https://reviews.llvm.org/D93832
2021-01-06 10:08:43 +00:00
Sander de Smalen a7e3339f3b [AArch64][SVE] Emit DWARF location expression for SVE stack objects.
Extend PEI to emit a DWARF expression for StackOffsets that have
a fixed and scalable component. This means the expression that needs
to be added is either:
  <base> + offset
or:
  <base> + offset + scalable_offset * scalereg

where for SVE, the scale reg is the Vector Granule Dwarf register, which
encodes the number of 64bit 'granules' in an SVE vector and which
the debugger can evaluate at runtime.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D90020
2021-01-06 09:40:53 +00:00
Sander de Smalen a9f5e4375b [AArch64] Use faddp to implement fadd reductions.
Custom-expand legal VECREDUCE_FADD SDNodes
to benefit from pair-wise faddp instructions.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D59259
2021-01-06 09:36:51 +00:00
Kazu Hirata cd088ba7e6 [llvm] Use llvm::lower_bound and llvm::upper_bound (NFC) 2021-01-05 21:15:59 -08:00
Christudasan Devadasan d68458bd56 [GlobalISel] Base implementation for sret demotion.
If the return values can't be lowered to registers
SelectionDAG performs the sret demotion. This patch
contains the basic implementation for the same in
the GlobalISel pipeline.

Furthermore, targets should bring relevant changes
during lowerFormalArguments, lowerReturn and
lowerCall to make use of this feature.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D92953
2021-01-06 10:30:50 +05:30
Bradley Smith c73ae747cb [AArch64][SVE] Add optimization to remove redundant ptest instructions
Co-Authored-by: Graham Hunter <graham.hunter@arm.com>
Co-Authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D93292
2021-01-05 15:28:36 +00:00
Paul Walker eba6deab22 [SVE] Lower vector CTLZ, CTPOP and CTTZ operations.
CTLZ and CTPOP are lowered to CLZ and CNT instructions respectively.

CTTZ is not a native SVE operation but is instead lowered to:
  CTTZ(V) => CTLZ(BITREVERSE(V))

In the case of fixed-length support using SVE we also lower CTTZ
operating on NEON sized vectors because of its reliance on
BITREVERSE which is also lowered to SVE intructions at these lengths.

Differential Revision: https://reviews.llvm.org/D93607
2021-01-05 10:42:35 +00:00
Kazu Hirata eb198f4c3c [llvm] Use llvm::any_of (NFC) 2021-01-04 11:42:47 -08:00
Caroline Concatto 060cfd9795 [AArch64][SVE]Add cost model for masked gather and scatter for scalable vector.
A new TTI interface has been added 'Optional <unsigned>getMaxVScale' that
    returns the maximum vscale for a given target.
    When known getMaxVScale is used to compute the cost of masked gather scatter
    for scalable vector.

    Depends on D92094

    Differential Revision: https://reviews.llvm.org/D93030
2021-01-04 13:59:58 +00:00
Florian Hahn d38a0258a5
[AArch64] Add patterns for FMCLA*_indexed.
This patch adds patterns for the indexed variants of FCMLA. Mostly based
on a patch by Tim Northover.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D92947
2021-01-04 13:45:51 +00:00
Usman Nadeem 685c8b537a [AARCH64] Improve accumulator forwarding for Cortex-A57 model
The old CPU model only had MLA->MLA forwarding. I added some missing
MUL->MLA read advances and a missing absolute diff accumulator read
advance according to the Cortex A57 Software Optimization Guide.

The patch improves performance in EEMBC rgbyiqv2 by about 6%-7% and
spec2006/milc by 8% (repeated runs on multiple devices), causes no
significant regressions (none in SPEC).

Differential Revision: https://reviews.llvm.org/D92296
2021-01-04 10:58:43 +00:00
David Sherwood a65092040a [SVE] Fix inline assembly parsing crash
This patch fixes a crash encountered when compiling this code:

  ...
  float16_t a;
  __asm__("fminv %h[a], %[b], %[c].h"
          : [a] "=r" (a)
          : [b] "Upl" (b), [c] "w" (c))

The issue here is when using the 'h' modifier for a register
constraint 'r'.

Differential Revision: https://reviews.llvm.org/D93537
2021-01-04 09:11:05 +00:00
Mark Murray 5abfeccf10 [ARM][AArch64] Add Cortex-A78C Support for Clang and LLVM
This patch upstreams support for the Armv8-a Cortex-A78C
processor for AArch64 and ARM.

In detail:

Adding cortex-a78c as cpu option for aarch64 and arm targets in clang
Adding Cortex-A78C CPU name and ProcessorModel in llvm
Details of the CPU can be found here:
https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78c
2020-12-29 10:18:59 +00:00
Nikita Popov fb77d95022 [AArch64] Fix legalization of i128 ctpop without neon
If neon is disabled, LowerCTPOP will return SDValue() to indicate
that normal legalization should be used. However, ReplaceNodeResults
does not check for this and pushes the empty SDValue() onto the
result vector, which will subsequently result in a crash.

Differential Revision: https://reviews.llvm.org/D93825
2020-12-27 17:24:41 +01:00
Amara Emerson e0721a0992 [AArch64][GlobalISel] Notify observer of mutated instruction for shift custom legalization.
No test for this because it's a CSE verifier failure that's only exposed in a
WIP patch for enabling CSE throughout the AArch64 GISel pipeline.
2020-12-25 00:31:47 -08:00
Kazu Hirata d6ff5cf995 [Target] Use llvm::any_of (NFC) 2020-12-24 19:43:26 -08:00
Matt Arsenault 581d13f8ae GlobalISel: Return APInt from getConstantVRegVal
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.

Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).

One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
2020-12-22 22:23:58 -05:00
Paul Walker 8eec7294fe [SVE] Lower vector BITREVERSE and BSWAP operations.
These operations are lowered to RBIT and REVB instructions
respectively.  In the case of fixed-length support using SVE we
also lower BITREVERSE operating on NEON sized vectors as this
results in fewer instructions.

Differential Revision: https://reviews.llvm.org/D93606
2020-12-22 16:49:50 +00:00
Kazu Hirata 966f1431de [Target] Use llvm::erase_if (NFC) 2020-12-20 17:43:22 -08:00
Lucas Prates 1a9577bde1 [AArch64] Add support for ls64 to the .arch_extension asm directive
This adds support for the 'ls64' AArch64 extension to the `.arch_extension`
asm directive.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D92574
2020-12-18 15:55:55 +00:00
Tomas Matheson fc712eb7aa [AArch64] Fix Copy Elemination for negative values
Redundant Copy Elimination was eliminating a MOVi32imm -1 when it
determined that the value of the destination register is already -1.
However, it didn't take into account that the MOVi32imm zeroes the upper
32 bits (which are FFFFFFFF) and therefore cannot be eliminated.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D93100
2020-12-18 13:30:46 +00:00
Paul Walker c0bc169cb1 [NFC][SVE] Clean up bfloat isel patterns that emit non-bfloat instructions.
During isel there's no need to protect illegal types. Patch also
adds a missing unit test for tbl2 intrinsic using bfloat types.

Differential Revision: https://reviews.llvm.org/D93404
2020-12-18 13:20:41 +00:00
Kerry McLaughlin 52e4084d9c [SVE][CodeGen] Vector + immediate addressing mode for masked gather/scatter
This patch extends LowerMGATHER/MSCATTER to make use of the vector + reg/immediate
addressing modes for scalable masked gathers & scatters.

selectGatherScatterAddrMode checks if the base pointer is null, in which case
we can swap the base pointer and the index, e.g.
     getelementptr nullptr, <vscale x N x T> (splat(%offset)) + %indices)
  -> getelementptr %offset, <vscale x N x T> %indices

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D93132
2020-12-18 11:56:36 +00:00
Lucas Prates 51fe17b047 [AArch64] Add support for the SPE-EEF feature
This is an addition to the existing Statistical Profiling extension, which
introduces an extra system register that is enabled by the new 'spe-eef'
subtarget feature.

Patch written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D92391
2020-12-18 11:11:56 +00:00
Lucas Prates da21f7ec14 [AArch64] Add support for the Branch Record Buffer extension
This introduces asm support for the Branch Record Buffer extension, through
the new 'brbe' subtarget feature. It consists of a new set of system registers
that enable the handling of branch records.

Patch written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D92389
2020-12-18 11:11:06 +00:00
Cullen Rhodes 7c8796f9db [TTI] Add supportsScalableVectors target hook
This is split off from D91718 and adds a new target hook
supportsScalableVectors that can be queried to check if scalable vectors
are supported by the backend. For AArch64 this returns true if SVE is
enabled.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D93060
2020-12-18 10:37:01 +00:00
Lucas Prates c4d851b079 [ARM][AAarch64] Initial command-line support for v8.7-A
This introduces command-line support for the 'armv8.7-a' architecture name
(and an alias without the '-', as usual), and for the 'ls64' extension name.

Based on patches written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91776
2020-12-17 13:47:28 +00:00
Lucas Prates 313889191e [AArch64] Adding the v8.7-A LD64B/ST64B Accelerator extension
This adds support for the v8.7-A LD64B/ST64B Accelerator extension
through a subtarget feature called "ls64". It adds four 64-byte
load/store instructions with an operand in the new GPR64x8 register
class, and one system register that's part of the same extension.

Based on patches written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91775
2020-12-17 13:46:23 +00:00
Lucas Prates 97c006aabb [AArch64] Add a GPR64x8 register class
This adds a GPR64x8 register class that will be needed as the data
operand to the LD64B/ST64B family of instructions in the v8.7-A
Accelerator Extension, which load or store a contiguous range of eight
x-regs. It has to be its own register class so that register allocation
will have visibility of the full set of registers actually read/written
by the instructions, which will be needed when we add intrinsics and/or
inline asm access to this piece of architecture.

Patch written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91774
2020-12-17 13:45:46 +00:00
Lucas Prates 42b92b31b8 [ARM][AArch64] Adding basic support for the v8.7-A architecture
This introduces support for the v8.7-A architecture through a new
subtarget feature called "v8.7a". It adds two new "WFET" and "WFIT"
instructions, the nXS limited-TLB-maintenance qualifier for DSB and TLBI
instructions, a new CPU id register, ID_AA64ISAR2_EL1, and the new
HCRX_EL2 system register.

Based on patches written by Simon Tatham and Victor Campos.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91772
2020-12-17 13:45:08 +00:00
Lucas Prates 83ea17fc5f [NFC][AArch64] Capturing multiple feature requirements in AsmParser messages
This enables the capturing of multiple required features in the AArch64
AsmParser's SysAlias error messages.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D92388
2020-12-17 13:44:17 +00:00
Lucas Prates b5bbb4b2b7 [NFC][AArch64] Move AArch64 MSR/MRS into a new decoder namespace
This removes the general forms of the AArch64 MSR and MRS instructions
from the same decoding table that contains many more specific
instructions that supersede them. They're now in a separate decoding
table of their own, called "Fallback", which is only consulted in the
event of the main decoder table failing to produce an answer.

This should avoid decoding conflicts on future specialized instructions
in the MSR space.

Patch written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91771
2020-12-17 13:40:10 +00:00
Kerry McLaughlin 6d2a78996b [SVE][CodeGen] Add bfloat16 support to scalable masked gather
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D93307
2020-12-17 11:08:15 +00:00
QingShan Zhang ebdd20f430 Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128
X86 and AArch64 expand it as libcall inside the target. And PowerPC also
want to expand them as libcall for P8. So, propose an implement in the
legalizer to common the logic and remove the code for X86/AArch64 to
avoid the duplicate code.

Reviewed By: Craig Topper

Differential Revision: https://reviews.llvm.org/D91331
2020-12-17 07:59:30 +00:00
Fangrui Song 1bd928e50b [AArch64InstPrinter] Use * 4096 instead of << 12
Left shirting a negative integer is undefined before C++20.
2020-12-16 14:02:25 -08:00
Fangrui Song 66bcbdbc9c [AArch64InstPrinter] Change printADRPLabel to print the target address in hexadecimal form
Similar to D77853. Change ADRP to print the target address in hex, instead of the raw immediate.
The behavior is similar to GNU objdump but we also include `0x`.

Note: GNU objdump is not consistent whether or not to emit `0x` for different architectures. We try emitting 0x consistently for all targets.

```
GNU objdump:       adrp x16, 10000000
Old llvm-objdump:  adrp x16, #0
New llvm-objdump:  adrp x16, 0x10000000
```

`adrp Xd, 0x...` assembles to a relocation referencing `*ABS*+0x10000` which is not intended. We need to use a linker or use yaml2obj.
The main test is `test/tools/llvm-objdump/ELF/AArch64/pcrel-address.yaml`

Differential Revision: https://reviews.llvm.org/D93241
2020-12-16 09:20:55 -08:00
Paul Walker 632f4d2747 [NFC] Fix a few SVEInstrInfo related stylistic issues. 2020-12-15 16:10:38 +00:00
Paul Walker b74c4dbb96 [SVE] Move INT_TO_FP i1 promotion into custom lowering.
AddPromotedToType is being used to legalise INT_TO_FP operations
when the source is a predicate. The point where this introduces
vector extends might cause problems in the future so this patch
falls back to manual promotion within custom lowering.

Differential Revision: https://reviews.llvm.org/D90093
2020-12-15 11:57:07 +00:00
Kerry McLaughlin c5ced82c8e [SVE][CodeGen] Lower scalable floating-point vector reductions
Changes in this patch:
-  Minor changes to the LowerVECREDUCE_SEQ_FADD function added by @cameron.mcinally
   to also work for scalable types
- Added TableGen patterns for FP reductions with unpacked types (nxv2f16, nxv4f16 & nxv2f32)
- Asserts added to expandFMINNUM_FMAXNUM & expandVecReduceSeq for scalable types

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D93050
2020-12-14 11:45:42 +00:00
Chen Zheng 4830d458dd [MachineCombiner][NFC] Add MustReduceRegisterPressure goal
add a new goal MustReduceRegisterPressure for machine combiner pass.

PowerPC will use this new goal to do some register pressure related optimization.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D92068
2020-12-14 00:02:42 -05:00
Florian Hahn 46bc40e502
Recommit "[AArch64] Lower calls with rv_marker attribute."
This recommits a87fccb3ff with a fix to mark the destination operand
of the marker instruction as def, to fix a machine verifier failure.

This reverts the revert commit c0f2cea7c0.
2020-12-13 16:20:39 +00:00
Florian Hahn c0f2cea7c0
Revert "[AArch64] Lower calls with rv_marker attribute ."
This reverts commit a87fccb3ff.

A test appears to fail with expensive checks. Reverting while I
investigate.
2020-12-11 20:12:59 +00:00
Florian Hahn a87fccb3ff
[AArch64] Lower calls with rv_marker attribute .
This patch adds support for lowering function calls with the
rv_marker attribute. The goal is to expand such calls to the
following sequence of instructions:

    BL @fn
    mov x29, x29

This sequence of instructions triggers Objective-C runtime optimizations,
hence we want to ensure no instructions get moved in between them.
This patch achieves that by adding a new CALL_RVMARKER ISD node,
which gets turned into the BLR_RVMARKER pseudo, which eventually gets
expanded into the sequence mentioned above. The sequence is then marked
as instruction bundle, to avoid anything being moved in between.

@ahatanak is working on using this attribute in the front- & middle-end.

Together with the front- & middle-end changes, this should address
PR31925 for AArch64.

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D92569
2020-12-11 19:45:44 +00:00
Amara Emerson c29af37c6c [AArch64] Don't try to compress jump tables if there are any inline asm instructions.
Inline asm can contain constructs like .bytes which may have arbitrary size.
In some cases, this causes us to miscalculate the size of blocks and therefore
offsets, causing us to incorrectly compress a JT.

To be safe, just bail out of the whole thing if we find any inline asm.

Fixes PR48255

Differential Revision: https://reviews.llvm.org/D92865
2020-12-10 12:20:02 -08:00
Kerry McLaughlin abe7775f5a [SVE][CodeGen] Extend index of masked gathers
This patch changes performMSCATTERCombine to also promote the indices of
masked gathers where the element type is i8 or i16, and adds various tests
for gathers with illegal types.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91433
2020-12-10 13:54:45 +00:00
Florian Hahn 77fd12a66e
[AArch64] Add aarch64_neon_vcmla{_rot{90,180,270}} intrinsics.
Add builtins required to implement vcmla and rotated variants from
the ACLE

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D92929
2020-12-09 19:46:49 +00:00
Kerry McLaughlin 05edfc5475 [SVE][CodeGen] Add DAG combines for s/zext_masked_gather
This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true:
 - fold (and (masked_gather x)) -> (zext_masked_gather x)
 - fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x)

LowerMGATHER has also been updated to fetch the LoadExtType associated with the
gather and also use this value to determine the correct masked gather opcode to use.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92230
2020-12-09 11:53:19 +00:00
Tim Northover 45de42116e AArch64: use correct operand for ubsantrap immediate.
I accidentally pushed the wrong patch originally.
2020-12-09 10:17:16 +00:00
Jessica Paquette 40d1fb2229 [AArch64][GlobalISel] Swap select operands when inverting condition code
This was not obvious when reading the imported tablegen patterns in
AArch64GenDAGISel.

Update select-select.mir.
2020-12-08 14:17:26 -08:00
Jessica Paquette 21308c2b4c [AArch64][GlobalISel] Check if G_SELECT has been optimized when folding binops
`TryFoldBinOpIntoSelect` didn't have a check for `Optimized`, meaning you could
end up folding twice. (e.g. a select with a G_ADD on the true side, and a G_SUB
on the false side)

Add in the missing `if` and a test.
2020-12-08 13:47:08 -08:00
Florian Hahn 4c69b1b98a
[AArch64] Fix rottype use in complex instr defs.
It seems like the order here is wrong. Types like i32 do not take any
arguments.

Currently this is not a problem, because the patterns are not actually
used with any nodes, but will fail once it is used with real ISD nodes.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91345
2020-12-08 21:11:33 +00:00
Jessica Paquette 5b5d3fa9d9 [AArch64][GlobalISel] Fold G_SELECT cc, %t, (G_ADD %x, 1) -> CSINC %t, %x, cc
This implements

```
G_SELECT cc, %true, (G_ADD %x, 1) -> CSINC %true, %x, cc
G_SELECT cc, (G_ADD %x, 1), %false -> CSINC %x, %false, inv_cc
```

Godbolt example: https://godbolt.org/z/eoPqKq

Differential Revision: https://reviews.llvm.org/D92868
2020-12-08 10:53:37 -08:00
Jessica Paquette cd9a52b99e [AArch64][GlobalISel] Fold binops on the true side of G_SELECT
This implements the following folds:

```
G_SELECT cc, (G_SUB 0, %x), %false -> CSNEG %x, %false, inv_cc
G_SELECT cc, (G_XOR x, -1), %false -> CSINV %x, %false, inv_cc
```

This is similar to the folds introduced in
5bc0bd05e6.

In 5bc0bd05e6 I mentioned that we may prefer to do
this in AArch64PostLegalizerLowering.

I think that it's probably better to do this in the selector. The way we select
G_SELECT depends on what register banks end up being assigned to it. If we did
this in AArch64PostLegalizerLowering, then we'd end up checking *every* G_SELECT
to see if it's worth swapping operands. Doing it in the selector allows us to
restrict the optimization to only relevant G_SELECTs.

Also fix up some comments in `TryFoldBinOpIntoSelect` which are kind of
confusing IMO.

Example IR: https://godbolt.org/z/3qPGca

Differential Revision: https://reviews.llvm.org/D92860
2020-12-08 10:42:59 -08:00
Jessica Paquette ce199667f6 [AArch64][GlobalISel] Don't explicitly write to the zero register in emitCMN
This case was missed in 78ccb0359d.

Differential Revision: https://reviews.llvm.org/D92438
2020-12-08 10:42:05 -08:00
Huihui Zhang 8e6fc1f97e [AArch64][SVE] Add lowering for llvm.maxnum|minnum for scalable type.
LLVM intrinsic llvm.maxnum|minnum is overloaded intrinsic, can be used on any
floating-point or vector of floating-point type.
This patch extends current infrastructure to support scalable vector type.

This patch also fix a warning message of incorrect use of EVT::getVectorNumElements()
for scalable type, when DAGCombiner trying to split scalable vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92607
2020-12-08 09:35:53 -08:00
Jessica Paquette b15491eb33 [AArch64][GlobalISel] Select G_SADDO and G_SSUBO
We didn't have selector support for these.

Selection code is similar to `getAArch64XALUOOp` in AArch64ISelLowering. Similar
to that code, this returns the AArch64CC and the instruction produced. In SDAG,
this is used to optimize select + overflow and condition branch + overflow
pairs. (See `AArch64TargetLowering::LowerBR_CC` and
`AArch64TargetLowering::LowerSelect`)

(G_USUBO should be easy to add here, but it isn't legalized right now.)

This also factors out the existing G_UADDO selection code, and removes an
unnecessary check for s32/s64. AFAIK, we shouldn't ever get anything other than
s32/s64. It makes more sense for this to be handled by the type assertion in
`emitAddSub`.

Differential Revision: https://reviews.llvm.org/D92610
2020-12-08 09:18:28 -08:00
David Sherwood 59f17b57d9 [SVE] Fix crashes with inline assembly
All the crashes found compiling inline assembly are fixed in this
patch by changing AArch64TargetLowering::getRegForInlineAsmConstraint
to be more resilient to mismatched value and register types. For
example, it makes no sense to request a predicate register for
a nxv2i64 type and so on.

Tests have been added here:

  test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll

Differential Revision: https://reviews.llvm.org/D92554
2020-12-08 13:48:43 +00:00
Tim Northover c5978f42ec UBSAN: emit distinctive traps
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
2020-12-08 10:28:26 +00:00
Jessica Paquette d49f6491b6 [AArch64][GlobalISel] Refactor G_BRCOND selection
`selectCompareBranch` was hard to understand.

Also, it was being needlessly pessimistic with the `ProduceNonFlagSettingCondBr`
case. It assumed that everything in `selectCompareBranch` would emit a TB(N)Z
or C(B)NZ. That's not true; the G_FCMP + G_BRCOND case would never emit those
instructions, and the G_ICMP + G_BRCOND case was capable of emitting an integer
compare + Bcc.

- Refactor `selectCompareBranch` into separate functions based off of what is
feeding the G_BRCOND's condition.

- Move G_BRCOND selection code from `select` to `selectCompareBranch`.

- Remove duplicated constraint code from the code originally in `select`;
  `emitTestBit` already handles that, so no need to constrain twice.

- Factor out the G_FCMP + G_BRCOND case into `selectCompareBranchFedByFCmp`.

- Split the G_ICMP + G_BRCOND case into an optimization function,
`tryOptCompareBranchFedByICmp` and a general selection function,
`selectCompareBranchFedByICmp`.

- Reduce the number of things passed to `tryOptAndIntoCompareBranch`.

- Improve documentation.

- Give some variables more descriptive names.

Other than improving the code generation for functions with
speculative_load_hardening by getting the logic correct, this is NFC.

Differential Revision: https://reviews.llvm.org/D92582
2020-12-07 17:24:23 -08:00
Jessica Paquette 195a7af0ab [AArch64][GlobalISel] Narrow 128-bit regs to 64-bit regs in emitTestBit
When we have a 128-bit register, emitTestBit would incorrectly narrow to 32
bits always. If the bit number was > 32, then we would need a TB(N)ZX. This
would cause a crash, as we'd have the wrong register class. (PR48379)

This generalizes `narrowExtReg` into `moveScalarRegClass`.

This also allows us to remove `widenGPRBankRegIfNeeded` entirely, since
`selectCopy` correctly handles SUBREG_TO_REG etc.

This does create some codegen changes (since `selectCopy` uses the `all`
regclass variants). However, I think that these will likely be optimized away,
and we can always improve the `selectCopy` code. It looks like we should
revisit `selectCopy` at this point, and possibly refactor it into at least one
`emit` function.

Differential Revision: https://reviews.llvm.org/D92707
2020-12-07 15:04:33 -08:00
Amara Emerson 2ac4d0f45a [AArch64] Fix some minor coding style issues in AArch64CompressJumpTables 2020-12-07 12:48:09 -08:00
Kerry McLaughlin 111f559bbd [SVE][CodeGen] Call refineIndexType & refineUniformBase from visitMGATHER
The refineIndexType & refineUniformBase functions added by D90942 can also be used to
improve CodeGen of masked gathers.

These changes were split out from D91092

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92319
2020-12-07 13:20:19 +00:00
Kerry McLaughlin f6dd32fd35 [SVE][CodeGen] Lower scalable masked gathers
Lowers the llvm.masked.gather intrinsics (scalar plus vector addressing mode only)

Changes in this patch:
- Add custom lowering for MGATHER, using getGatherVecOpcode() to choose the appropriate
  gather load opcode to use.
- Improve codegen with refineIndexType/refineUniformBase, added in D90942
- Tests added for gather loads with 32 & 64-bit scaled & unscaled offsets.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91092
2020-12-07 12:20:41 +00:00
Craig Topper c55d9af8c0 [AArch64] Add custom lowering for ISD::ABS
Instead of trying to pattern match the code produced by ISD::ABS expansion, just custom legalize ISD::ABS to the desired sequence.

The one test change is because a DAG combine for (neg (abs)) is no longer firing because ISD::ABS is now Custom instead of Expand.

Differential Revision: https://reviews.llvm.org/D92154
2020-12-04 10:45:31 -08:00
Ahmed Bougacha f77c948d56 [Triple][MachO] Define "arm64e", an AArch64 subarch for Pointer Auth.
This also teaches MachO writers/readers about the MachO cpu subtype,
beyond the minimal subtype reader support present at the moment.

This also defines a preprocessor macro to allow users to distinguish
__arm64__ from __arm64e__.

arm64e defaults to an "apple-a12" CPU, which supports v8.3a, allowing
pointer-authentication codegen.
It also currently defaults to ios14 and macos11.

Differential Revision: https://reviews.llvm.org/D87095
2020-12-03 07:53:59 -08:00
Jessica Paquette c82f002cea [AArch64][GlobalISel] Don't write to WZR in non-flag-setting G_BRCOND case
We are avoiding writing to WZR just about everywhere else.

Also update the code to use MachineIRBuilder for the sake of consistency.

We also didn't have a GlobalISel testcase for this path, so add a simple one
now.

Differential Revision: https://reviews.llvm.org/D90626
2020-12-01 16:45:37 -08:00
Jessica Paquette 6c3fa97d8a [AArch64][GlobalISel] Select Bcc when it's better than TB(N)Z
Instead of falling back to selecting TB(N)Z when we fail to select an
optimized compare against 0, select Bcc instead.

Also simplify selectCompareBranch a little while we're here, because the logic
was kind of hard to follow.

At -O0, this is a 0.1% geomean code size improvement for CTMark.

A simple example of where this can kick in is here:
https://godbolt.org/z/4rra6P

In the example above, GlobalISel currently produces a subs, cset, and tbnz.
SelectionDAG, on the other hand, just emits a compare and b.le.

Differential Revision: https://reviews.llvm.org/D92358
2020-12-01 15:45:14 -08:00
Amara Emerson 87ff156414 [AArch64][GlobalISel] Fix crash during legalization of a vector G_SELECT with scalar mask.
The lowering of vector selects needs to first splat the scalar mask into a vector
first.

This was causing a crash when building oggenc in the test suite.

Differential Revision: https://reviews.llvm.org/D91655
2020-11-30 16:37:49 -08:00
Sjoerd Meijer 630d37dc1b [AArch64] Enable Cortex-A55 schedmodel
The model was committed in 4b8ade837e
but not yet enabled to allow for a few fix ups. This adds a few
of these fixes, and also a LLVM MCA test to check most instructions.
While I do have plans to look into some more tuning, it's time to
enable this as it better than using the A53 schedule.

Differential Revision: https://reviews.llvm.org/D88017
2020-11-30 19:28:34 +00:00
Sjoerd Meijer 5110ff0817 [AArch64][CostModel] Fix cost for mul <2 x i64>
This was modeled to have a cost of 1, but since we do not have a MUL.2d this is
scalarized into vector inserts/extracts and scalar muls.

Motivating precommitted test is test/Transforms/SLPVectorizer/AArch64/mul.ll,
which we don't want to SLP vectorize.

Test Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
unfortunately needed changing, but the reason is documented in
LoopVectorize.cpp:6855:

  // The cost of executing VF copies of the scalar instruction. This opcode
  // is unknown. Assume that it is the same as 'mul'.

which I will address next as a follow up of this.

Differential Revision: https://reviews.llvm.org/D92208
2020-11-30 11:36:55 +00:00
Simon Pilgrim 969918e177 [DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal
If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion.

Allows us to move the x86-specific lowering to the generic expansion code.

Differential Revision: https://reviews.llvm.org/D92183
2020-11-27 11:18:58 +00:00
Mark Murray 2b6691894a [ARM][AArch64] Adding Neoverse N2 CPU support
Add support for the Neoverse N2 CPU to the ARM and AArch64 backends.

Differential Revision: https://reviews.llvm.org/D91695
2020-11-25 11:42:54 +00:00
Kerry McLaughlin 603d40da9d [SVE][CodeGen] Add a DAG combine to extend mscatter indices
This patch adds a target-specific DAG combine for mscatter to promote indices
with element types i8 or i16 before legalisation, plus various tests with illegal types.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90945
2020-11-25 11:18:22 +00:00
Amara Emerson ca7fdf7ce0 [AArch64][GlobalISel] Add pre-isel lowering to convert p0 G_DUPs to use s64.
This uses the same reasoning as other similar conversions just before selection,
without it we miss out on selection because the importer considers s64 and p0
distinct types.
2020-11-23 22:59:35 -08:00
Amara Emerson 0fb76b9035 [AArch64][GlobalISel] Make <2 x p0> of G_SHUFFLE_VECTOR legal. 2020-11-23 22:59:35 -08:00
Martin Storsjö 6f792041a5 Reapply "[CodeGen] [WinException] Only produce handler data at the end of the function if needed"
This reapplies 36c64af9d7 in updated
form.

Emit the xdata for each function at .seh_endproc. This keeps the
exact same output header order for most code generated by the LLVM
CodeGen layer. (Sections still change order for code built from
assembly where functions lack an explicit .seh_handlerdata
directive, and functions with chained unwind info.)

The practical effect should be that assembly output lacks
superfluous ".seh_handlerdata; .text" pairs at the end of functions
that don't handle exceptions, which allows such functions to use
the AArch64 packed unwind format again.

Differential Revision: https://reviews.llvm.org/D87448
2020-11-23 23:17:03 +02:00
Craig Topper 4252f7773a [SelectionDAG][ARM][AArch64][Hexagon][RISCV][X86] Add SDNPCommutative to fma and fmad nodes in tablegen. Remove explicit commuted patterns from targets.
X86 was already specially marking fma as commutable which allowed
tablegen to autogenerate commuted patterns. This moves it to the target
independent definition and fix up the targets to remove now
unneeded patterns.

Unfortunately, the tests change because the commuted version of
the patterns are generating operands in a different than the
explicit patterns.

Differential Revision: https://reviews.llvm.org/D91842
2020-11-23 10:09:20 -08:00
Jay Foad 000400ca0a Fix speling in comments. NFC. 2020-11-23 14:43:24 +00:00
Ella Ma 1756d67934 [llvm][clang][mlir] Add checks for the return values from Target::createXXX to prevent protential null deref
All these potential null pointer dereferences are reported by my static analyzer for null smart pointer dereferences, which has a different implementation from `alpha.cplusplus.SmartPtr`.

The checked pointers in this patch are initialized by Target::createXXX functions. When the creator function pointer is not correctly set, a null pointer will be returned, or the creator function may originally return a null pointer.

Some of them may not make sense as they may be checked before entering the function, but I fixed them all in this patch. I submit this fix because 1) similar checks are found in some other places in the LLVM codebase for the same return value of the function; and, 2) some of the pointers are dereferenced before they are checked, which may definitely trigger a null pointer dereference if the return value is nullptr.

Reviewed By: tejohnson, MaskRay, jpienaar

Differential Revision: https://reviews.llvm.org/D91410
2020-11-21 21:04:12 -08:00
Amara Emerson c58df88886 [AArch64][GlobalISel] Make G_EXTRACT_VECTOR_ELT of <2 x p0> legal.
Also fix a selection issue for this which was using LLT::isScalar() when it
should have been using !isVector(), add test for that too.
2020-11-20 14:07:45 -08:00
Sjoerd Meijer 412237dcd0 [AArch64] Enable post RA scheduler for Cortex-R82
Just something I forgot when I added the R82. Need to have a look
at crypto and fusing, but will do that as a follow up.

Differential Revision: https://reviews.llvm.org/D91848
2020-11-20 14:04:26 +00:00
Pavel Iliin 4d7df43ffd [AArch64] Out-of-line atomics (-moutline-atomics) implementation.
This patch implements out of line atomics for LSE deployment
mechanism. Details how it works can be found in llvm/docs/Atomics.rst
Options -moutline-atomics and -mno-outline-atomics to enable and disable it
were added to clang driver. This is clang and llvm part of out-of-line atomics
interface, library part is already supported by libgcc. Compiler-rt
support is provided in separate patch.

Differential Revision: https://reviews.llvm.org/D91157
2020-11-20 13:30:12 +00:00
Adhemerval Zanella 807320119f [AArch64] Lower fptrunc/fpext from/to FP128t to/from FP16
The compiler-rt part which adds the emitted symbols is handled in
a subsequent patch.

Differential Revision: https://reviews.llvm.org/D91731
2020-11-19 15:14:50 -03:00
Florian Hahn b2f4c5fddc
[AsmWriter] Factor out mnemonic generation to accessible getMnemonic.
This patch factors out the part of printInstruction that gets the
mnemonic string for a given MCInst. This is intended to be used
subsequently for the instruction-mix remarks to display the final
mnemonic (D90040).

Unfortunately making `getMnemonic` available to the AsmPrinter
seems to require making it virtual. Not sure if there's a way around
that with the current layering of the AsmPrinters.

Reviewed By: Paul-C-Anagnostopoulos

Differential Revision: https://reviews.llvm.org/D90039
2020-11-17 09:47:38 +00:00
Jessica Paquette 5bc0bd05e6 [AArch64][GlobalISel] Fold G_XOR x, -1 into G_SELECT and select CSINV
When we see

```
xor = G_XOR xor_lhs, -1
select = G_SELECT cc, tval, xor
```

Fold this into

```
select = CSINV tval, xor_lhs, cc
```

Update select-select.mir to reflect the changes.

For now, only handle the case where the G_XOR is the false-value for the
G_SELECT. It may make more sense to handle the true-value case in post-legalizer
lowering.

Differential Revision: https://reviews.llvm.org/D90774
2020-11-16 14:14:14 -08:00
Amara Emerson 0b6090699a [AArch64][GlobalISel] Look through a G_ZEXT when trying to match shift-extended register offsets.
The G_ZEXT in these cases seems to actually come from a combine that we do but
SelectionDAG doesn't. Looking through it allows us to match "uxtw #2" addressing
modes.

Differential Revision: https://reviews.llvm.org/D91475
2020-11-16 10:50:46 -08:00
Caroline Concatto 6c4d8f4651 [AArch64] Add check for widening instruction for SVE.
This patch fixes the function isWideningInstruction for scalable vectors.
Now the cost model can check the widening pattern for SVE.

Differential Revision: https://reviews.llvm.org/D91260
2020-11-16 12:30:08 +00:00
Jessica Paquette 9a8bfe3835 [AArch64][GlobalISel] Select G_SELECT cc, t, (G_SUB 0, x) -> CSNEG t, x, cc
When we see

```
%sub = G_SUB 0, %x
%select = G_SELECT %cc, %t, %sub
```

Fold away the G_SUB by producing

```
%select = CSNEG %t, %x, cc
```

Simple IR example: https://godbolt.org/z/K8TEnh

This is valid on both sides of the select, but for now, just handle one side.
It may make more sense to handle swapping sides during post-legalizer lowering.

Differential Revision: https://reviews.llvm.org/D90723
2020-11-13 10:12:51 -08:00
Jessica Paquette 6c20c1da1e [AArch64][GlobalISel] NFC: Use CmpInst::isUnsigned instead of static helper
Reducing some code duplication.

We had a helper for checking if a predicate is unsigned. Remove that and use
the existing function in Instructions.cpp.

Differential Revision: https://reviews.llvm.org/D91288
2020-11-13 09:35:42 -08:00
Jessica Paquette b184a2eccf [GlobalISel] Add matchers for specific constants and a matcher for negations
It's fairly common to need matchers for a specific constant value, or for
common idioms like finding a negated register.

Add

- `m_SpecificICst`, which returns true when matching a specific value..
- `m_ZeroInt`, which returns true when an integer 0 is matched.
- `m_Neg`, which returns when a register is negated.

Also update a few places which use idioms related to the new matchers.

Differential Revision: https://reviews.llvm.org/D91397
2020-11-13 09:24:54 -08:00
Kerry McLaughlin 306c8ab208 [SVE][CodeGen] Improve codegen of scalable masked scatters
If the scatter store is able to perform the sign/zero extend of
its index, this is folded into the instruction with refineIndexType().
Additionally, refineUniformBase() will return the base pointer and index
from an add + splat_vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90942
2020-11-13 11:19:36 +00:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Jessica Paquette d0ba6c4002 [AArch64][GlobalISel] Select CSINC and CSINV for G_SELECT with constants
Select the following:

- G_SELECT cc, 0, 1 -> CSINC zreg, zreg, cc
- G_SELECT cc 0, -1 -> CSINV zreg, zreg cc
- G_SELECT cc, 1, f -> CSINC f, zreg, inv_cc
- G_SELECT cc, -1, f -> CSINV f, zreg, inv_cc
- G_SELECT cc, t, 1 -> CSINC t, zreg, cc
- G_SELECT cc, t, -1 -> CSINC t, zreg, cc

(IR example: https://godbolt.org/z/YfPna9)

These correspond to a bunch of the AArch64csel patterns in AArch64InstrInfo.td.

Unfortunately, it doesn't seem like we can import patterns that use NZCV like
those ones do. E.g.

```
def : Pat<(AArch64csel GPR32:$tval, (i32 1), (i32 imm:$cc), NZCV),
          (CSINCWr GPR32:$tval, WZR, (i32 imm:$cc))>;
```

So we have to manually select these for now.

This replaces `selectSelectOpc` with an `emitSelect` function, which performs
these optimizations.

Differential Revision: https://reviews.llvm.org/D90701
2020-11-12 14:44:01 -08:00
David Sherwood 3225fcf11e [SVE] Deal with SVE tuple call arguments correctly when running out of registers
When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack.

I've fixed this by ensuring that:

1. When we don't have enough registers to allocate the whole block
   we mark any remaining SVE registers temporarily as allocated.
2. We temporarily remove the InConsecutiveRegs flags from the last
   tuple part argument and reinvoke the autogenerated calling
   convention handler. Doing this prevents the code from entering
   an infinite recursion and, in combination with 1), ensures we
   switch over to the Indirect mode.
3. After allocating a GPR register for the pointer to the tuple we
   then deallocate any SVE registers we marked as allocated in 1).
   We also set the InConsecutiveRegs flags back how they were before.
4. I've changed the AArch64ISelLowering LowerCALL and
   LowerFormalArguments functions to detect the start of a tuple,
   which involves allocating a single stack object and doing the
   correct numbers of legal loads and stores.

Differential Revision: https://reviews.llvm.org/D90219
2020-11-12 08:41:50 +00:00
Amara Emerson ad376657c1 [AArch64][GlobalISel] Optimize G_PTR_ADD with a negated offset to be a G_SUB. 2020-11-11 22:46:53 -08:00
Jessica Paquette 7a70a2f04d [AArch64][GlobalISel] Mark G_FCONSTANT as legal when there is full fp16 support
When there is full fp16 support, there is no reason to widen 16-bit
G_FCONSTANTs to 32 bits. Mark them as legal in this case.

Also, we currently import a pattern for materializing a 16-bit 0.0.
Add a testcase showing we select it.

(All other 16-bit G_FCONSTANTS are not yet selected.)

Differential Revision: https://reviews.llvm.org/D89164
2020-11-11 13:25:11 -08:00
Jessica Paquette c42053f79b [AArch64][GlobalISel] Select arith extended add/sub in manual selection code
The manual selection code for add/sub was not checking if it was possible to
fold in shifts + extends (the *rx opcode variants).

As a result, we could never select things like

```
cmp x1, w0, uxtw #2
```

Because we don't import any patterns for compares.

This adds support for the arithmetic shifted register forms and updates tests
for instructions selected using `emitADD`, `emitADDS`, and `emitSUBS`.

This is a 0.1% geomean code size improvement on SPECINT2000 at -Os.

Differential Revision: https://reviews.llvm.org/D91207
2020-11-11 09:26:03 -08:00
Jessica Paquette f0580c73bb [AArch64][GlobalISel] Select negative arithmetic immediates in manual selector
Previously, we only handled negative arithmetic immediates in the imported
selector code.

Since we don't import code for, say, compares, we were missing opportunities
for things like

```
%cst:gpr(s64) = G_CONSTANT i64 -10
%cmp:gpr(s32) = G_ICMP intpred(eq), %reg0(s64), %cst
->
%adds = ADDSXri %reg0, 10, 0, implicit-def $nzcv
%cmp = CSINCWr $wzr, $wzr, 1, implicit $nzcv
```

Instead, we would have to materialize the constant and emit a SUBS.

This adds support for selection like above for SUB, SUBS, ADD, and ADDS.

This is a 0.1% geomean code size improvement on SPECINT2000 at -Os.

Differential Revision: https://reviews.llvm.org/D91108
2020-11-11 09:20:05 -08:00
Caroline Concatto 37f4ccb275 [AArch64]Add memory op cost model for SVE
This patch adds/fixes memory op cost model for SVE with fixed-width
vector.

Differential Revision: https://reviews.llvm.org/D90950
2020-11-11 12:49:19 +00:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
Kerry McLaughlin 170947a5de [SVE][CodeGen] Lower scalable masked scatters
Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only)

Changes included in this patch:
 - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use.
    Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts.
 - Added the getCanonicalIndexType function to convert redundant addressing
   modes (e.g. scaling is redundant when accessing bytes)
 - Tests with 32 & 64-bit scaled & unscaled offsets

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90941
2020-11-11 11:50:22 +00:00
Amara Emerson 2262393090 [AArch64][GlobalISel] Port some AArch64 target specific MUL combines from SDAG.
These do things like turn a multiply of a pow-2+1 into a shift and and add,
which is a common pattern that pops up, and is universally better than expensive
madd instructions with a constant.

I've added check lines to an existing codegen test since the code being ported
is almost identical, however the mul by negative pow2 constant tests don't generate
the same code because we're missing some generic G_MUL combines still.

Differential Revision: https://reviews.llvm.org/D91125
2020-11-10 22:21:13 -08:00
Pablo Barrio 642b21beba [AArch64] Enable RAS 1.1 system registers in all AArch64
Some use cases (e.g. kernel devs) have strict requirements to only enable
features available with -march=armv8-a, e.g. no armv8.1-a. Enabling RAS 1.1 in
all AArch64 means they can consider to support it.

Bear in mind that the first versions of the Armv8 architecture still do not
support RAS 1.1. This patch only lets devs write code with the user-friendly
register mnemonic instead of the ugly generic S<op0>_<op1>_<Cn>_<Cm>_<op2>.
They still need to place runtime checks to make sure that the CPU to run on
supports RAS 1.1.

Differential Revision: https://reviews.llvm.org/D90594
2020-11-10 12:13:33 +00:00
Mirko Brkusanin a75d6178b8 [GlobalISel] Add combine for (x | mask) -> x when (x | mask) == x
If we have a mask, and a value x, where (x | mask) == x, we can drop the OR
and just use x.

Differential Revision: https://reviews.llvm.org/D90952
2020-11-10 11:32:13 +01:00
Mirko Brkusanin fb36ab0a42 [GlobalISel] Expand combine for (x & mask) -> x when (x & mask) == x
We can use KnownBitsAnalysis to cover cases when mask is not trivial. It can
also help with cases when mask is not constant but can still be folded into
one. Since 'and' is comutative we should treat both operands as possible
replacements.

Differential Revision: https://reviews.llvm.org/D90674
2020-11-10 11:32:13 +01:00
Francesco Petrogalli 9f61931e07 [llvm][AArch64] Allow TB(N)Z to drop signext for sign bit tests.
For example if the sign extension is only used in for TBZ, and the value is used elsewhere with a zero extension, this can eliminate a sign extension.

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D90606
2020-11-09 18:27:48 +00:00
Lucas Prates c2c2cc1360 [ARM][AArch64] Adding Neoverse V1 CPU support
Add support for the Neoverse V1 CPU to the ARM and AArch64 backends.

This is based on patches from Mark Murray and Victor Campos.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D90765
2020-11-09 13:15:40 +00:00
Elvina Yakubova 93b99728b1 [AArch64] Add pipeline model for HiSilicon's TSV110
This patch adds the scheduling and cost model for TSV110.

Reviewed by: SjoerdMeijer, bryanpkc

Differential Revision: https://reviews.llvm.org/D89972
2020-11-07 01:23:00 +03:00
Amara Emerson f347d78cca [AArch64][GlobalISel] Add AArch64::G_DUPLANE[X] opcodes for lane duplicates.
These were previously handled by pattern matching shuffles in the selector, but
adding a new opcode and making it equivalent to the AArch64duplane SDAG node
allows us to select more patterns, like lane indexed FMLAs (patch adding a test
for that will be committed later).

The pattern matching code has been simply moved to postlegalize lowering.

Differential Revision: https://reviews.llvm.org/D90820
2020-11-05 11:18:11 -08:00
Sander de Smalen d57bba7cf8 [SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference.
To accommodate frame layouts that have both fixed and scalable objects
on the stack, describing a stack location or offset using a pointer + uint64_t
is not sufficient. For this reason, we've introduced the StackOffset class,
which models both the fixed- and scalable sized offsets.

The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset,
so that this can be used in other interfaces, such as to eliminate frame indices
in PEI or to emit Debug locations for variables on the stack.

This patch is purely mechanical and doesn't change the behaviour of how
the result of this function is used for fixed-sized offsets. The patch adds
various checks to assert that the offset has no scalable component, as frame
offsets with a scalable component are not yet supported in various places.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90018
2020-11-05 11:02:18 +00:00
Cameron McInally c126eb7529 [SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL
Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247.

Differential Revision: https://reviews.llvm.org/D90644
2020-11-04 14:20:31 -06:00
Christopher Tetreault 900ec97bbe [UBSan] Cannot negate smallest negative signed integer
Silence warning Undefined Behavior Sanitzer warning:
runtime error: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D90710
2020-11-04 10:07:52 -08:00
Kerry McLaughlin f2412d372d [SVE][CodeGen] Lower scalable integer vector reductions
This patch uses the existing LowerFixedLengthReductionToSVE function to also lower
scalable vector reductions. A separate function has been added to lower VECREDUCE_AND
& VECREDUCE_OR operations with predicate types using ptest.

Lowering scalable floating-point reductions will be addressed in a follow up patch,
for now these will hit the assertion added to expandVecReduce() in TargetLowering.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D89382
2020-11-04 11:38:49 +00:00
Sander de Smalen 73b6cb67dc [NFCI] Replace AArch64StackOffset by StackOffset.
This patch replaces the AArch64StackOffset class by the generic one
defined in TypeSize.h.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D88983
2020-11-04 08:49:00 +00:00
Amara Emerson 393b55380a [AArch64][GlobalISel] Add combine for G_EXTRACT_VECTOR_ELT to allow selection of pairwise FADD.
For the <2 x float> case, instead of adding another combine or legalization to
get it into a <4 x float> form, I'm just adding a GISel specific selection
pattern to cover it.

Differential Revision: https://reviews.llvm.org/D90699
2020-11-03 17:25:14 -08:00
Nicholas Guy 54d8627852 [AArch64] Redundant masks in downcast long multiply
Adds patterns to catch masks preceeding a long multiply,
and generating a single umull/smull instruction instead.

Differential revision: https://reviews.llvm.org/D89956
2020-11-03 10:12:28 +00:00
Florian Hahn b3b993a7ad Reland "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts the revert commit 408c4408fa.

This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.

Original message:

On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
2020-11-02 15:39:29 +00:00
Caroline Concatto 71038788ce Revert "[AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register."
This reverts commit 8b281bfaf3.
2020-11-02 08:15:50 +00:00
Caroline Concatto 8b281bfaf3 [AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register.
Only the aliases 'xzr' and 'sp' exist for the physical register x31.
The reason for wanting to remove the alias 'x31' is because it allows users
to write invalid asm that is not accepted by the GNU assembler.

Is there any objection to removing this alias? Or do we want to keep
this for compatibility with existing code that uses w31/x31?

Differential Revision: https://reviews.llvm.org/D90153
2020-11-02 07:57:05 +00:00
Florian Hahn 408c4408fa Revert "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts commit 73f01e3df5.

This appears to break
http://lab.llvm.org:8011/#/builders/85/builds/383.
2020-10-30 21:26:14 +00:00
Peter Collingbourne 3d049bce98 hwasan: Support for outlined checks in the Linux kernel.
Add support for match-all tags and GOT-free runtime calls, which
are both required for the kernel to be able to support outlined
checks. This requires extending the access info to let the backend
know when to enable these features. To make the code easier to maintain
introduce an enum with the bit field positions for the access info.

Allow outlined checks to be enabled with -mllvm
-hwasan-inline-all-checks=0. Kernels that contain runtime support for
outlined checks may pass this flag. Kernels lacking runtime support
will continue to link because they do not pass the flag. Old versions
of LLVM will ignore the flag and continue to use inline checks.

With a separate kernel patch [1] I measured the code size of defconfig
+ tag-based KASAN, as well as boot time (i.e. time to init launch)
on a DragonBoard 845c with an Android arm64 GKI kernel. The results
are below:

         code size    boot time
before    92824064      6.18s
after     38822400      6.65s

[1] https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76

Depends on D90425

Differential Revision: https://reviews.llvm.org/D90426
2020-10-30 14:25:40 -07:00
Cameron McInally dda1e74b58 [Legalize] Add legalizations for VECREDUCE_SEQ_FADD
Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass.

Differential Revision: https://reviews.llvm.org/D90247
2020-10-30 16:02:55 -05:00
Peter Collingbourne c9b1a2b41d AArch64: Use SBFX instead of UBFX to extract address granule in outlined HWASan checks.
In a kernel (or in general in environments where bit 55 of the address
is set) the shadow base needs to point to the end of the shadow region,
not the beginning. Bit 55 needs to be sign extended into bits 52-63
of the shadow base offset, otherwise we end up loading from an invalid
address. We can do this by using SBFX instead of UBFX.

Using SBFX should have no effect in the userspace case where bit 55
of the address is clear so we do so unconditionally. I don't think
we need a ABI version bump for this (but one will come anyway when
we switch to x20 for the shadow base register).

Differential Revision: https://reviews.llvm.org/D90424
2020-10-30 12:53:15 -07:00
Peter Collingbourne 3859fc653f AArch64: Switch to x20 as the shadow base register for outlined HWASan checks.
From a code size perspective it turns out to be better to use a
callee-saved register to pass the shadow base. For non-leaf functions
it avoids the need to reload the shadow base into x9 after each
function call, at the cost of an additional stack slot to save the
caller's x20. But with x9 there is also a stack size cost, either
as a result of copying x9 to a callee-saved register across calls or
by spilling it to stack, so for the non-leaf functions the change to
stack usage is largely neutral.

It is also code size (and stack size) neutral for many leaf functions.
Although they now need to save/restore x20 this can typically be
combined via LDP/STP into the x30 save/restore. In the case where
the function needs callee-saved registers or stack spills we end up
needing, on average, 8 more bytes of stack and 1 more instruction
but given the improvements to other functions this seems like the
right tradeoff.

Unfortunately we cannot change the register for the v1 (non short
granules) check because the runtime assumes that the shadow base
register is stored in x9, so the v1 check still uses x9.

Aside from that there is no change to the ABI because the choice
of shadow base register is a contract between the caller and the
outlined check function, both of which are compiler generated. We do
need to rename the v2 check functions though because the functions
are deduplicated based on their names, not on their contents, and we
need to make sure that when object files from old and new compilers
are linked together we don't end up with a function that uses x9
calling an outlined check that uses x20 or vice versa.

With this change code size of /system/lib64/*.so in an Android build
with HWASan goes from 200066976 bytes to 194085912 bytes, or a 3%
decrease.

Differential Revision: https://reviews.llvm.org/D90422
2020-10-30 12:51:30 -07:00
Simon Pilgrim 0ff1ab42f2 Use cast<> instead of dyn_cast<> as we dereference the pointer immediately. NFCI.
Fix clang static analyzer warning - we know that the arg should be ConstantInt and we're better off relying on cast<> asserting on failure rather than a null dereference crash.
2020-10-30 14:33:20 +00:00
Florian Hahn 73f01e3df5 [TTI] Add VecPred argument to getCmpSelInstrCost.
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.

Reviewed By: dmgreen, RKSimon

Differential Revision: https://reviews.llvm.org/D90070
2020-10-30 13:49:08 +00:00
David Sherwood cea69fa4dc [SVE] Add fatal error for unnamed SVE variadic arguments
We don't currently support passing unnamed variadic SVE arguments
so I've added a fatal error if we hit such cases to prevent any
silent ABI issues in future.

Differential Revision: https://reviews.llvm.org/D90230
2020-10-30 13:35:47 +00:00
Florian Hahn 772aaa6023 [AArch64] Improve lowering of insert_vector_elt with 0.0 consts.
When moving +0.0 into a float vector, we can use to vi*gpr variants of
INS.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D90176
2020-10-28 21:35:33 +00:00
Florian Hahn ba78cae20f [AArch64] Use DUP for BUILD_VECTOR with few different elements.
If most elements of BUILD_VECTOR are the same, with a few different
elements, it is better to use DUP  for the common elements and
INSERT_VECTOR_ELT for the different elements.

Currently this transform is guarded quite restrictively to only trigger
in clearly beneficial cases.

With D90176, the lowering for patterns originating from code like
` float32x4_t y = {a,a,a,0};` (common in 3D apps) are lowered even
better (unnecessary fmov is removed).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D90233
2020-10-28 19:48:20 +00:00
David Green 066737fdbc [AArch64] Remove AArch64ISD::NOT, use vnot instead
vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT
node, but allow more folding thanks to all the target independent
optimizations. Specifically this allows select(icmp ne, x, y) to
become "cmeq; bsl y, x" as opposed to needing to convert the predicate
with "cmeq; mvn; bsl x, y"

Unfortunately there is a regression in a cmtst test, but the code it
selected from was already non-canonical, with instcombine preferring to
use an eq predicate instead. Plus the more common case of icmp ne is
improved.

Differential Revision: https://reviews.llvm.org/D90126
2020-10-28 08:15:37 +00:00
Fangrui Song d5adadb3a5 [AArch64][GlobalISel] Fix -Wunused-variable. NFC 2020-10-24 12:47:11 -07:00
Cameron McInally a1cc274cb3 [SVE] Lower fixed length VECREDUCE_SEQ_FADD operation
Differential Revision: https://reviews.llvm.org/D89162
2020-10-23 16:24:02 -05:00
Amara Emerson 0f0fd383b4 [AArch64][GlobalISel] Introduce a new post-isel optimization pass.
There are two optimizations here:

1. Consider the following code:
 FCMPSrr %0, %1, implicit-def $nzcv
 %sel1:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
 %sub:gpr32 = SUBSWrr %_, %_, implicit-def $nzcv
 FCMPSrr %0, %1, implicit-def $nzcv
 %sel2:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
This kind of code where we have 2 FCMPs each feeding a CSEL can happen
when we have a single IR fcmp being used by two selects. During selection,
to ensure that there can be no clobbering of nzcv between the fcmp and the
csel, we have to generate an fcmp immediately before each csel is
selected.

However, often we can essentially CSE these together later in MachineCSE.
This doesn't work though if there are unrelated flag-setting instructions
in between the two FCMPs. In this case, the SUBS defines NZCV
but it doesn't have any users, being overwritten by the second FCMP.

Our solution here is to try to convert flag setting operations between
a interval of identical FCMPs, so that CSE will be able to eliminate one.

2. SelectionDAG imported patterns for arithmetic ops currently select the
flag-setting ops for CSE reasons, and add the implicit-def $nzcv operand
to those instructions. However if those impdef operands are not marked as
dead, the peephole optimizations are not able to optimize them into non-flag
setting variants. The optimization here is to find these dead imp-defs and
mark them as such.

This pass is only enabled when optimizations are enabled.

Differential Revision: https://reviews.llvm.org/D89415
2020-10-23 10:18:36 -07:00
Huihui Zhang 1e113c078a [AArch64][SVE] Fix umin/umax lowering to handle out of range imm.
Immediate must be in an integer range [0,255] for umin/umax instruction.
Extend pattern matching helper SelectSVEArithImm() to take in value type
bitwidth when checking immediate value is in range or not.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D89831
2020-10-23 09:42:56 -07:00
Florian Hahn 0fcc6f7a76 [AArch64] Implement getIntrinsicInstrCost, handle min/max intrinsics.
This patch adds a specialized implementation of getIntrinsicInstrCost
and add initial cost-modeling for min/max vector intrinsics.

AArch64 NEON support umin/smin/umax/smax for vectors
<8 x i8>, <16 x i8>, <4 x i16>, <8 x i16>, <2 x i32> and <4 x i32>.
Notably, it does not support vectors with i64 elements.

This change by itself should have very little impact on codegen, but in
follow-up patches I plan to teach the vectorizers to consider using
those intrinsics on platforms where it is profitable, e.g. because there
is no general 'select'-like instruction.

The current cost returned should be better for throughput, latency and size.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D89953
2020-10-23 11:32:42 +01:00
Jessica Paquette 19dc9c9780 [AArch64][GlobalISel] Move imm adjustment for G_ICMP to post-legalizer lowering
Move the code which adjusts the immediate/predicate on a G_ICMP to
AArch64PostLegalizerLowering.

This

- Reduces the number of places we need to test for optimized compares in the
selector. We know that the compare should have been simplified by the time it
hits the selector, so we can avoid testing this in selects, brconds, etc.

- Allows us to potentially fold more compares (previously, this optimization
was only done after calling `tryFoldCompare`, this may allow us to hit some more
TST cases)

- Simplifies the selection code in `emitIntegerCompare` significantly; we can
just use an emitSUBS function.

- Allows us to avoid checking that the predicate has been updated after
`emitIntegerCompare`.

Also add a utility header file for things that may be useful in the selector
and various combiners. No need for an implementation file at this point, since
it's just one constexpr function for now. I've run into a couple cases where
having one of these would be handy, so might as well add it here. There are
a couple functions in the selector that can probably be factored out into
here.

Differential Revision: https://reviews.llvm.org/D89823
2020-10-22 15:27:36 -07:00
Jessica Paquette 147b9497e7 [AArch64][GlobalISel] Split post-legalizer combiner to allow for lowering at -O0
There are a lot of combines in AArch64PostLegalizerCombiner which exist to
facilitate instruction matching in the selector. (E.g. matching for G_ZIP and
other shuffle vector pseudos)

It still makes sense to select these instructions at -O0.

Matching earlier in a combiner can reduce complexity in the selector
significantly. For example, a good portion of our selection code for compares
would be a lot easier to represent in a combine.

This patch moves matching combines into a "AArch64PostLegalizerLowering"
combiner which runs at all optimization levels.

Also, while we're here, improve the documentation for the
AArch64PostLegalizerCombiner, and fix up the filepath in its file comment.

And also add a 'r' which somehow got dropped from a bunch of function names.

https://reviews.llvm.org/D89820
2020-10-22 14:43:25 -07:00
Evgenii Stepanov 188a7d6710 Add alloca size threshold for StackTagging initializer merging.
Summary:
Initializer merging generates pretty inefficient code for large allocas
that also happens to trigger an exponential algorithm somewhere in
Machine Instruction Scheduler. See https://bugs.llvm.org/show_bug.cgi?id=47867.

This change adds an upper limit for the alloca size. The default limit
is selected such that worst case size of memtag-generated code is
similar to non-memtag (but because of the ISA quirks, this case is
realized at the different value of alloca size, ex. memset inlining
triggers at sizes below 512, but stack tagging instructions are 2x
shorter, so limit is approx. 256).

We could try harder to emit more compact code with initializer merging,
but that would only affect large, sparsely initialized allocas, and
those are doing fine already.

Reviewers: vitalybuka, pcc

Subscribers: llvm-commits
2020-10-19 13:44:07 -07:00
Paul C. Anagnostopoulos 2871c6c93f [Aarch64] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans.
Differential Revision: https://reviews.llvm.org/D89551
2020-10-19 10:33:55 -04:00
David Sherwood d67d8f8790 [SVE][AArch64] Replace TypeSize comparisons with their integer equivalents
In many places in the AArch64 backend we are comparing TypeSize objects,
but in fact we are only ever expecting fixed width types. I've changed
all such comparisons to use their integer equivalents by replacing
calls to getSizeInBits() with getFixedSizeInBits(), etc.

Differential Revision: https://reviews.llvm.org/D89116
2020-10-19 07:41:33 +01:00
Amara Emerson 4ad459997e [AArch64][GlobalISel] Select csinc if a select has a 1 on RHS.
Differential Revision: https://reviews.llvm.org/D89513
2020-10-16 16:49:52 -07:00
Amara Emerson 39c05a1a71 [AArch64][GlobalISel] Add selection support for v2s32 and v2s64 reductions for FADD/ADD.
We'll need legalizer lower() support for the other types to work.

Differential Revision: https://reviews.llvm.org/D89159
2020-10-16 11:41:57 -07:00
Amara Emerson 32f77eea2d [AArch64][GlobalISel] Regbankselect reductions to use FPR bank for scalars.
Differential Revision: https://reviews.llvm.org/D89075
2020-10-16 10:42:15 -07:00
Amara Emerson 9190411fcf [AArch64][GlobalISel] Add basic legalizer rules for supported add/fadd reductions.
NEON is pretty limited in it's reduction support. As a first step add some
basic rules for the legal types we can select.

Differential Revision: https://reviews.llvm.org/D89070
2020-10-16 10:35:46 -07:00
Jessica Paquette 609d765cd3 [AArch64][GlobalISel] NFC: Refactor emitIntegerCompare
Simplify emitIntegerCompare and improve comments + asserts.

Mostly making the code a little easier to follow.

Also, this code is only used for G_ICMP. The legalizer ensures that the LHS/RHS
for every G_ICMP is either a s32 or s64. So, there's no need to handle anything
else. This lets us remove a bunch of checks for whether or not we successfully
emitted the compare.

Differential Revision: https://reviews.llvm.org/D89433
2020-10-15 16:04:08 -07:00
Evgenii Stepanov 2e794a46b5 [AArch64] Stack frame reordering.
Implement stack frame reordering in the AArch64 backend.

Unlike the X86 implementation, AArch64 does not seem to benefit from
"access density" based frame reordering, mainly because it has a much
smaller variety of addressing modes, and the fact that all instructions
are 4 bytes so each frame object is either in range of an instruction
(and then the access is "free") or not (and that has a code size cost
of 4 bytes).

This change improves Memory Tagging codegen by
* Placing an object that has been chosen as the base tagged pointer of
the function at SP + 0. This saves one instruction to setup the pointer
(IRG does not have an offset immediate), and more because that object
can now be referenced without materializing its tagged address in a
scratch register.
* Placing objects that go out of scope simultaneously together. This
exposes opportunities for instruction merging in tryMergeAdjacentSTG.

Differential Revision: https://reviews.llvm.org/D72366
2020-10-15 12:50:16 -07:00
Evgenii Stepanov 2f63e57fa5 [MTE] Pin the tagged base pointer to one of the stack slots.
Summary:
Pin the tagged base pointer to one of the stack slots, and (if
necessary) rewrite tag offsets so that an object that occupies that
slot has both address and tag offsets of 0. This allows ADDG
instructions for that object to be eliminated and their uses replaced
with the tagged base pointer itself.

This optimization must be done in machine instructions and not in the IR
instrumentation pass, because referring to a stack slot through an IRG
pointer would confuse the stack coloring pass.

The optimization makes a (pretty naive) attempt to find the slot that
would benefit the most by counting the uses of stack slots in the
function.

Reviewers: ostannard, pcc

Subscribers: merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72365
2020-10-15 12:50:16 -07:00
Vinay Madhusudan 159b2d8e62 [AArch64] Combine UADDVs to generate vector add
ADD(UADDV a, UADDV b) --> UADDV(ADD a, b)

This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888
Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929

Differential Revision: https://reviews.llvm.org/D88731
2020-10-15 09:56:31 +05:30
Amara Emerson 78ccb0359d [AArch64][GlobalISel] Don't use explicit zero registers for compare results.
These cause problems for later optimizations, just using an unused vreg like
SelectionDAG generates better code in the end, and obviates the need for some
GISel specific flag optimizations.

Differential Revision: https://reviews.llvm.org/D89419
2020-10-14 16:49:33 -07:00
Cameron McInally 421f1b7294 [SVE] Lower fixed length VECREDUCE_FADD operation
Differential Revision: https://reviews.llvm.org/D89263
2020-10-14 09:41:11 -05:00
David Sherwood af57a0838e [SVE] Add fatal error when running out of registers for SVE tuple call arguments
When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack. This is wrong and we'd like
to avoid any silent ABI compatibility issues in future. For now,
I've added a fatal error when this happens until we can get a
proper fix.

Differential Revision: https://reviews.llvm.org/D89326
2020-10-14 09:31:41 +01:00
Vinay Madhusudan 37dce7475b [AArch64] Identify SAD pattern
(ABS (SUB (EXTEND a), (EXTEND b))) to ZERO_EXTEND((UABD a, b))
(ABS (SUB (SIGN_EXTEND a), (SIGN_EXTEND b))) to ZERO_EXTEND((SABD a, b))

This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888
Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929

Differential Revision: https://reviews.llvm.org/D88742
2020-10-13 15:50:54 +05:30
Cullen Rhodes c87bd2d8eb [AArch64] Implement .variant_pcs directive
A dynamic linker with lazy binding support may need to handle variant
PCS function symbols specially, so an ELF symbol table marking
STO_AARCH64_VARIANT_PCS [1] was added to address this.

Function symbols that follow the vector PCS are marked via the
.variant_pcs assembler directive, which takes a single parameter
specifying the symbol name and sets the STO_AARCH64_VARIANT_PCS st_other
flag in the object file.

[1] https://github.com/ARM-software/abi-aa/blob/master/aaelf64/aaelf64.rst#st-other-values

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D89138
2020-10-13 10:06:27 +00:00
Paul Walker 981b31c282 [SVE] Add ISel patterns for "insert undef_nxv#f##, f##, 0"
Differential Revision: https://reviews.llvm.org/D89235
2020-10-13 10:49:18 +01:00
Cameron McInally 974ddb54c9 [SVE] Lower fixed length VECREDUCE_XOR operation
Differential Revision: https://reviews.llvm.org/D88974
2020-10-12 10:12:15 -05:00
Tim Northover 38348fa265 AArch64: treat MC expressions as 2s complement arithmetic.
We had a couple of over-zealous diagnostics that meant IR with a reasonable and
valid interpretation was rejected.
2020-10-08 11:54:51 +01:00
Anna Thomas 35cb45c533 [ImplicitNullChecks] Support complex addressing mode
The pass is updated to handle loads through complex addressing mode,
specifically, when we have a scaled register and a scale.
It requires two API updates in TII which have been implemented for X86.

See added IR and MIR testcases.

Tests-Run: make check
Reviewed-By: reames, danstrushin
Differential Revision: https://reviews.llvm.org/D87148
2020-10-07 20:55:38 -04:00