Commit Graph

4518 Commits

Author SHA1 Message Date
Jon Roelofs f5e1ec8c58 [AArch64] fjcvtzs,rmif,cfinv,setf* all clobber nzcv
Differential Revision: https://reviews.llvm.org/D83818
2020-07-27 09:17:53 -06:00
Tim Northover 0f1494be43 AArch64: avoid UB shift of negative value
Left shifting a negative value is undefined behaviour, so this just moves the
negation afterwards to avoid it.
2020-07-27 13:49:50 +01:00
Tim Northover 216b67e202 AArch64: diagnose out of range relocation addends on MachO.
MachO only has 24-bit addends for most relocations, small enough that it can
overflow in semi-reasonable functions and cause insidious bugs if compiled
without assertions enabled. Switch it to an actual error instead.

The condition isn't quite identical because ld64 treats the addend as a signed
number.
2020-07-27 13:01:22 +01:00
David Sherwood 14bc85e0eb [SVE] Don't use LocalStackAllocation for SVE objects
I have introduced a new TargetFrameLowering query function:

  isStackIdSafeForLocalArea

that queries whether or not it is safe for objects of a given stack
id to be bundled into the local area. The default behaviour is to
always bundle regardless of the stack id, however for AArch64 this is
overriden so that it's only safe for fixed-size stack objects.
There is future work here to extend this algorithm for multiple local
areas so that SVE stack objects can be bundled together and accessed
from their own virtual base-pointer.

Differential Revision: https://reviews.llvm.org/D83859
2020-07-27 08:22:01 +01:00
Amara Emerson 9b19400004 [AArch64][GlobalISel] Make <8 x s16> and <16 x s8> legal types for G_SHUFFLE_VECTOR and G_IMPLICIT_DEF.
Trivial change, we're still missing support for rev matching for these types
in the combiner.
2020-07-26 00:48:09 -07:00
Jessica Paquette 604e33e83a [AArch64][GlobalISel] Look through constants when selection stores of 0
Very minor code size improvements (hits 8 times in Bullet at -O3), but still
something.

Also very minor NFC change to make sure we only search for a 0 constant when
selecting a store. Before, we'd do this for loads as well.

Differential Revision: https://reviews.llvm.org/D84573
2020-07-24 22:46:14 -07:00
Jessica Paquette fcc55c0952 [AArch64][GlobalISel] Use wzr/xzr for 16 and 32 bit stores of zero
We weren't performing this optimization on 16 and 32 bit stores. SDAG happily
does this though.

e.g. https://godbolt.org/z/cWocKr

This saves about 0.2% in code size on CTMark at -O3.

Differential Revision: https://reviews.llvm.org/D84568
2020-07-24 17:15:20 -07:00
Amara Emerson f320f83f3a [AArch64][GlobalISel] Promote G_UITOFP vector operands to same elt size as result.
Fixes legalization failures.
2020-07-24 17:00:50 -07:00
Eli Friedman c02aa53ecb [AArch64][SVE] Add "fast" fcmp operations.
dacf8d3 added support for most fcmp operations, but there are some extra
variations I hadn't considered: SelectionDAG supports float comparisons
that are neither ordered nor unordered. Add support for the missing
operations.

Differential Revision: https://reviews.llvm.org/D84460
2020-07-24 13:22:41 -07:00
Francesco Petrogalli 809600d664 [llvm][sve] Reg + Imm addressing mode for ld1ro.
Reviewers: kmclaughlin, efriedma, sdesmalen

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83357
2020-07-24 17:48:47 +00:00
Eli Friedman 993c1a3219 [AArch64][SVE] Teach copyPhysReg to copy ZPR2/3/4.
It's sort of tricky to hit this in practice, but not impossible. I have
a synthetic C testcase if anyone is interested.

The implementation is identical to the equivalent NEON register copies.

Differential Revision: https://reviews.llvm.org/D84373
2020-07-23 16:41:37 -07:00
Amara Emerson 3b10e42ba1 [AArch64][GlobalISel] Add post-legalize combine for sext(trunc(sextload)) -> trunc/copy
On AArch64 we generate redundant G_SEXTs or G_SEXT_INREGs because of this.

Differential Revision: https://reviews.llvm.org/D81993
2020-07-23 12:06:35 -07:00
Matt Arsenault d26526fd09 AArch64: Use Register 2020-07-22 14:14:44 -04:00
Matt Arsenault 0c92bfa4b8 GlobalISel: Don't use virtual for distinguishing arg handlers
There's no reason to involve the hassle of a virtual method targets
have to override for a simple boolean.

Not sure exactly what's going on with Mips, but it seems to define its
own totally separate handler classes.
2020-07-22 14:14:43 -04:00
Matt Arsenault b98f902f18 GlobalISel: Restructure argument lowering loop in handleAssignments
This was structured in a way that implied every split argument is in
memory, or in registers. It is possible to pass an original argument
partially in registers, and partially in memory. Transpose the logic
here to only consider a single piece at a time. Every individual
CCValAssign should be treated independently, and any merge to original
value needs to be handled later.

This is in preparation for merging some preprocessing hacks in the
AMDGPU calling convention lowering into the generic code.

I'm also not sure what the correct behavior for memlocs where the
promoted size is larger than the original value. I've opted to clamp
the memory access size to not exceed the value register to avoid the
explicit trunc/extend/vector widen/vector extract instruction. This
happens for AMDGPU for i8 arguments that end up stack passed, which
are promoted to i16 (I think this is a preexisting DAG bug though, and
they should not really be promoted when in memory).
2020-07-22 13:31:11 -04:00
Sander de Smalen bef56f7fe2 [AArch64][SVE] Correctly allocate scavenging slot in presence of SVE.
This patch addresses two issues:

* Forces the availability of the base-pointer (x19) when the frame has
  both scalable vectors and variable-length arrays. Otherwise it will
  be expensive to access non-SVE locals.

* In presence of SVE stack objects, it will allocate the emergency
  scavenging slot close to the SP, so that they can be accessed from
  the SP or BP if available. If accessed from the frame-pointer, it will
  otherwise need an extra register to access the scavenging slot because
  of mixed scalable/non-scalable addressing modes.

Reviewers: efriedma, ostannard, cameron.mcinally, rengolin, david-arm

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D70174
2020-07-22 10:50:36 +01:00
Amara Emerson 791544422a Revert "[AArch64][GlobalISel] Add post-legalize combine for sext_inreg(trunc(sextload)) -> copy"
This reverts commit 64eb3a4915.

It caused miscompiles with optimizations enabled. Reverting while I investigate.
2020-07-21 16:01:18 -07:00
Amara Emerson f1ae96d9bf [AArch64][GlobalISel] Fix TLS accesses clobbering registers incorrectly.
This was happening because the BLR didn't have a use of the X0 arg register,
which would end up being re-used in high reg pressure situations.
The change also avoids hard coding the use of X0 for the sequence except to
copy the value for the call. ld64 should still be able to optimize it.

rdar://65438258
2020-07-21 16:01:17 -07:00
Sander de Smalen 9bacf15885 [AArch64][SVE] Fix PCS for functions taking/returning scalable types.
The default calling convention needs to save/restore the SVE callee
saves according to the SVE PCS when the function takes or returns
scalable types, even when the `aarch64_sve_vector_pcs` CC is not
specified for the function.

Reviewers: efriedma, paulwalker-arm, david-arm, rengolin

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D84041
2020-07-21 15:55:39 +01:00
Petre-Ionut Tudor 1af9fc8213 [ARM] Generate [SU]HADD from ((a + b) >> 1)
Summary:
Teach LLVM to recognize the above pattern, where the operands are
either signed or unsigned types.

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83777
2020-07-21 13:22:07 +01:00
Eli Friedman b8f765a1e1 [AArch64][SVE] Add support for trunc to <vscale x N x i1>.
This isn't a natively supported operation, so convert it to a
mask+compare.

In addition to the operation itself, fix up some surrounding stuff to
make the testcase work: we need concat_vectors on i1 vectors, we need
legalization of i1 vector truncates, and we need to fix up all the
relevant uses of getVectorNumElements().

Differential Revision: https://reviews.llvm.org/D83811
2020-07-20 13:11:02 -07:00
Paul Walker 6384ec4099 [SVE] Add lowering for fixed length vector fdiv, fma, fmul and fsub operations.
Differential Revision: https://reviews.llvm.org/D84034
2020-07-20 11:57:34 +00:00
Tim Northover 88464a55b4 AArch64: emit @llvm.debugtrap as `brk #0xf000` on all platforms
It's useful for a debugger to be able to distinguish an @llvm.debugtrap
from a (noreturn) @llvm.trap, so this extends the existing Windows
behaviour to other platforms.
2020-07-20 10:31:26 +01:00
Paul Walker 509351d768 [SVE] Add lowering for scalable vector fadd, fdiv, fmul and fsub operations.
Lower the operations to predicated variants.  This is prep work
required for fixed length code generation but also fixes a bug
whereby these operations fail selection when "unpacked" vector
types (e.g. MVT::nxv2f32) are used.

This patch also adds the missing "unpacked" patterns for FMA.

Differential Revision: https://reviews.llvm.org/D83765
2020-07-16 11:31:35 +00:00
Roman Lebedev fb432a51f4
Reland "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions"
This reverts commit 1067d3e176,
which reverted commit b2018198c3,
because it introduced a Dependency Cycle between Transforms/Scalar and
Transforms/Utils.

So let's just move SimplifyCFGOptions.h into Utils/, thus avoiding
the cycle.
2020-07-16 13:40:01 +03:00
Kerry McLaughlin 2762da0a16 [SVE][CodeGen] Legalisation of masked loads and stores
Summary:
This patch modifies IncrementMemoryAddress to use a vscale
when calculating the new address if the data type is scalable.

Also adds tablegen patterns which match an extract_subvector
of a legal predicate type with zip1/zip2 instructions

Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: efriedma, david-arm

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83137
2020-07-16 10:55:45 +01:00
Adrian Kuegel 1067d3e176 Revert "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions"
This reverts commit b2018198c3.
This commit introduced a Dependency Cycle between Transforms/Scalar and
Transforms/Utils. Transforms/Scalar already depends on Transforms/Utils,
so if SimplifyCFGOptions.h is moved to Scalar, and Utils/Local.h still
depends on it, we have a cycle.
2020-07-16 10:54:10 +02:00
Roman Lebedev b2018198c3
[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions
Taking so many parameters is simply unmaintainable.

We don't want to include the entire llvm/Transforms/Utils/Local.h into
llvm/Transforms/Scalar.h so i've split SimplifyCFGOptions into
it's own header.
2020-07-16 01:27:54 +03:00
Krzysztof Pszeniczny c3e6555616 Call Frame Information (CFI) Handling for Basic Block Sections
This patch handles CFI with basic block sections, which unlike DebugInfo does
not support ranges. The DWARF standard explicitly requires emitting separate
CFI Frame Descriptor Entries for each contiguous fragment of a function. Thus,
the CFI information for all callee-saved registers (possibly including the
frame pointer, if necessary) have to be emitted along with redefining the
Call Frame Address (CFA), viz. where the current frame starts.

CFI directives are emitted in FDE’s in the object file with a low_pc, high_pc
specification. So, a single FDE must point to a contiguous code region unlike
debug info which has the support for ranges. This is what complicates CFI for
basic block sections.

Now, what happens when we start placing individual basic blocks in unique
sections:

* Basic block sections allow the linker to randomly reorder basic blocks in the
address space such that a given basic block can become non-contiguous with the
original function.
* The different basic block sections can no longer share the cfi_startproc and
cfi_endproc directives. So, each basic block section should emit this
independently.
* Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that
caters to that basic block section.
* Now, this basic block section needs to duplicate the information from the
entry block to compute the CFA as it is an independent entity. It cannot refer
to the FDE of the original function and hence must duplicate all the stuff that
is needed to compute the CFA on its own.
* We are working on a de-duplication patch that can share common information in
FDEs in a CIE (Common Information Entry) and we will present this as a follow up
patch. This can significantly reduce the duplication overhead and is
particularly useful when several basic block sections are created.
* The CFI directives are emitted similarly for registers that are pushed onto
the stack, like callee saved registers in the prologue. There are cfi
directives that emit how to retrieve the value of the register at that point
when the push happened. This has to be duplicated too in a basic block that is
floated as a separate section.

Differential Revision: https://reviews.llvm.org/D79978
2020-07-14 12:54:12 -07:00
Victor Campos dad1868772 [AArch64][AsmParser] Add rcpc support in .arch_extension
AArch64 does not support enabling rcpc via .arch_extension in assembly.
GCC, on the other hand, does.

This patch adds 'rcpc' as a valid value to .arch_extension handling.

Differential Revision: https://reviews.llvm.org/D83685
2020-07-14 10:57:11 +01:00
Sander de Smalen a8f4f85d84 [AArch64][SVE] Remove erroneous assert in resolveFrameOffsetReference
The code already supports addressing a fixed-size stack object from
the frame-pointer, by first subtracting sizeof(SVE area) from FP.

Reviewers: efriedma, cameron.mcinally, david-arm, rengolin

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D83125
2020-07-14 09:22:45 +01:00
Amara Emerson 64eb3a4915 [AArch64][GlobalISel] Add post-legalize combine for sext_inreg(trunc(sextload)) -> copy
On AArch64 we generate redundant G_SEXTs or G_SEXT_INREGs because of this.

Differential Revision: https://reviews.llvm.org/D81993
2020-07-13 20:27:45 -07:00
Paul Walker 319a97b5e2 [SVE] Ensure fixed length vector fptrunc operations bigger than NEON are not considered legal.
Differential Revision: https://reviews.llvm.org/D83568
2020-07-13 11:16:30 +00:00
Matt Arsenault 9ff310d5bf AArch64: Fix unused variables 2020-07-10 15:12:25 -04:00
Sidharth Baveja e541e1b757 [NFC] Separate Peeling Properties into its own struct (re-land after minor fix)
Summary:
This patch separates the peeling specific parameters from the UnrollingPreferences,
and creates a new struct called PeelingPreferences. Functions which used the
UnrollingPreferences struct for peeling have been updated to use the PeelingPreferences struct.

Author: sidbav (Sidharth Baveja)

Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel), anhtuyen (Anh Tuyen Tran), nikic (Nikita Popov)

Reviewed By: Meinersbur (Michael Kruse)

Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D80580
2020-07-10 18:39:30 +00:00
Luke Geeson 954db63cd1 [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM
This patch upstreams support for the Arm-v8 Cortex-A78 and Cortex-X1
processors for AArch64 and ARM.

In detail:
- Adding cortex-a78 and cortex-x1 as cpu options for aarch64 and arm targets in clang
- Adding Cortex-A78 and Cortex-X1 CPU names and ProcessorModels in llvm

details of the CPU can be found here:
https://www.arm.com/products/cortex-x

https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78

The following people contributed to this patch:
- Luke Geeson
- Mikhail Maltsev

Reviewers: t.p.northover, dmgreen

Reviewed By: dmgreen

Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D83206
2020-07-10 18:24:11 +01:00
Paul Walker f78e6a3095 [SVE] Code generation for fixed length vector truncates.
Lower fixed length vector truncates to a sequence of SVE UZP1 instructions.

Differential Revision: https://reviews.llvm.org/D83395
2020-07-10 10:37:19 +00:00
Eli Friedman 56ae2cebcd [AArch64][SVE] Add lowering for llvm.fma.
This is currently bare-bones; we aren't taking advantage of any of the
FMA variant instructions.  But it's enough to at least generate
code.

Differential Revision: https://reviews.llvm.org/D83444
2020-07-09 16:12:41 -07:00
Kyungwoo Lee 7af27b65b3 [NFC][AArch64] Refactor getArgumentPopSize
Differential Revision: https://reviews.llvm.org/D83456
2020-07-09 11:58:15 -07:00
Paul Walker 6b403319f8 [SVE] Scalarize fixed length masked loads and stores.
When adding support for scalable vector masked loads and stores we
accidently opened up likewise for fixed length vectors. This patch
restricts support to scalable vectors only, thus ensuring fixed
length vectors are treated the same regardless of SVE support.

Differential Revision: https://reviews.llvm.org/D83341
2020-07-09 10:47:04 +00:00
Paul Walker 614fb09645 [SVE] Disable some BUILD_VECTOR related code generator features.
Fixed length vector code generation for SVE does not yet custom
lower BUILD_VECTOR and instead relies on expansion.  At the same
time custom lowering for VECTOR_SHUFFLE is also not available so
this patch updates isShuffleMaskLegal to reject vector types that
require SVE.

Related to this it also prevents the merging of stores after
legalisation because this only works when BUILD_VECTOR is either
legal or can be elminated.  When this is not the case the code
generator enters an infinite legalisation loop.

Differential Revision: https://reviews.llvm.org/D83408
2020-07-09 10:47:04 +00:00
Nikita Popov 0b39d2d752 Revert "[NFC] Separate Peeling Properties into its own struct"
This reverts commit 0369dc98f9.

Many failing tests.
2020-07-08 21:43:32 +02:00
Sidharth Baveja 0369dc98f9 [NFC] Separate Peeling Properties into its own struct
Summary:
This patch makes the peeling properties of the loop accessible by other loop transformations.

Author: sidbav (Sidharth Baveja)

Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel)

Reviewed By: Meinersbur (Michael Kruse)

Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D80580
2020-07-08 18:59:59 +00:00
Anh Tuyen Tran 6965af43e6 Revert "[NFC] Separate Peeling Properties into its own struct"
This reverts commit fead250b43.
2020-07-08 18:58:05 +00:00
Anh Tuyen Tran fead250b43 [NFC] Separate Peeling Properties into its own struct
Summary:
This patch makes the peeling properties of the loop accessible by other loop transformations.

Author: sidbav (Sidharth Baveja)

Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel)

Reviewed By: Meinersbur (Michael Kruse)

Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D80580
2020-07-08 18:56:03 +00:00
Paul Walker fb75451775 [SVE] Custom ISel for fixed length extract/insert_subvector.
We use extact_subvector and insert_subvector to "cast" between
fixed length and scalable vectors.  This patch adds custom c++
based ISel for the following cases:

  fixed_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0
  scalable_vector = ISD::INSERT_SUBVECTOR undef(scalable_vector), fixed_vector, 0

Which result in either EXTRACT_SUBREG/INSERT_SUBREG for NEON sized
vectors or COPY_TO_REGCLASS otherwise.

Differential Revision: https://reviews.llvm.org/D82871
2020-07-08 09:49:28 +00:00
David Sherwood 79d34a5a1b [SVE][CodeGen] Fix bug when falling back to DAG ISel
In an earlier commit 584d0d5c17 I
added functionality to allow AArch64 CodeGen support for falling
back to DAG ISel when Global ISel encounters scalable vector
types. However, it seems that we were not falling back early
enough as llvm::getLLTForType was still being invoked for scalable
vector types.

I've added a new fallback function to the call lowering class in
order to catch this problem early enough, rather than wait for
lowerFormalArguments to reject scalable vector types.

Differential Revision: https://reviews.llvm.org/D82524
2020-07-07 09:23:04 +01:00
Nicolai Hähnle 76c5cb05a3 DomTree: Remove getChildren() accessor
Summary:
Avoid exposing details about how children are stored. This will enable
subsequent type-erasure changes.

New methods are introduced to cover common access patterns.

Change-Id: Idb5f4b1b9c84e4cc71ddb39bb52a388682f5674f

Reviewers: arsenm, RKSimon, mehdi_amini, courbet

Subscribers: qcolombet, sdardis, wdng, hiraditya, jrtc27, zzheng, atanasyan, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83083
2020-07-06 21:58:11 +02:00
Paul Walker 7356b4243a [SVE] Fix invalid assert in expand_DestructiveOp.
AArch64ExpandPseudo::expand_DestructiveOp contains an assert to
ensure the destructive operand's register is unique.  However,
this is only required when psuedo expansion emits a movprfx.

A simple example when a movprfx is not required is
  Z0 = FADD_ZPZZ_UNDEF_S P0, Z0, Z0
which expands to an unprefixed FADD_ZPmZ_S instruction.

This patch moves the assert to the places where a movprfx is emitted.

Differential Revision: https://reviews.llvm.org/D83029
2020-07-04 09:21:40 +00:00
Petre-Ionut Tudor af80a4353e [ARM] Generate [SU]RHADD from (b - (~a)) >> 1
Summary:
Teach LLVM to recognize the above pattern, which is usually a
transformation of (a + b + 1) >> 1, where the operands are either
signed or unsigned types.

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82669
2020-07-03 16:00:06 +01:00
Luke Geeson 8bf99f1e6f [ARM] Add Cortex-A77 Support for Clang and LLVM
This patch upstreams support for the Arm-v8 Cortex-A77
processor for AArch64 and ARM.

In detail:
- Adding cortex-a77 as a cpu option for aarch64 and arm targets in clang
- Cortex-A77 CPU name and ProcessorModel in llvm

details of the CPU can be found here:
https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a77

and a similar submission to GCC can be found here:
e0664b7a63

The following people contributed to this patch:
- Luke Geeson
- Mikhail Maltsev

Reviewers: t.p.northover, dmgreen, ostannard, SjoerdMeijer

Reviewed By: dmgreen

Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82887
2020-07-03 13:00:54 +01:00
Guillaume Chatelet 87e2751cf0 [Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D83082
2020-07-03 08:06:43 +00:00
Sander de Smalen 8b7b0ad24c [AArch64][SVE] NFC: Rename isOrig -> isReverseInstr
This is a non-functional to clarify some of the terminology in the
AArch64SVEInstrInfo/SVEInstrFormats.td files around the tables
for mapping an instruction to it's reverse instruction counter part,
and vice versa. e.g. DIV -> DIVR and DIVR -> DIV.

Reviewers: paulwalker-arm, cameron.mcinally, rengolin, efriedma

Reviewed By: paulwalker-arm, efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82979
2020-07-02 17:01:15 +01:00
Sander de Smalen 075c440f7b [AArch64][SVE] Put zeroing pseudos and patterns under flag.
This patch puts the _ZERO pseudos and corresponding patterns
under the predicate 'UseExperimentalZeroingPseudos', so that they
can be enabled/disabled through compile flags.

This is done because the zeroing pseudos use MOVPRFX to do merging of
the inactive lanes, but it depends on the uarch whether this operation
is actually merged with the destructive operation. If not, it may be
more profitable to use a SELECT and to give the compiler the freedom to
schedule these instructions as normal, rather than keeping them bundled
together. Additionally, this feature is not yet fully implemented and
there are still known bugs (see D80410) that need to be resolved before
the 'experimental' can be dropped from the name.

Reviewers: paulwalker-arm, cameron.mcinally, efriedma

Reviewed By: paulwalker-arm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82780
2020-07-02 14:24:33 +01:00
Kerry McLaughlin fd6193d5ea [AArch64][SVE] Add reg+imm addressing mode for unpredicated stores
Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82985
2020-07-02 12:00:01 +01:00
Sander de Smalen 143e324e75 [CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR
There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics,
that shouldn't really have been there (I suspect this was a remnant from when
we expected the wider vector always to have come from a vector CONCAT).

When I tried to create a more minimal reproducer, I found a bug in
DAGCombiner where it drops the scalable flag when trying to fold:

      extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')

This patch fixes both issues.

Reviewers: david-arm, efriedma, spatel

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82910
2020-07-02 10:16:43 +01:00
Sander de Smalen 07bda98b6a [AArch64][SVE] Add unpred load/store patterns for bf16 types
Reviewers: kmclaughlin, c-rhodes, efriedma

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82909
2020-07-02 10:01:24 +01:00
Florian Hahn c30da98d47 [AArch64] Remove unnecessary CostKindCheck (NFC).
Simplification suggested post-commit.
2020-07-01 18:20:41 +01:00
Yuanfang Chen 78c69a00a4 [NFC] Clean up uses of MachineModuleInfoWrapperPass 2020-07-01 09:45:05 -07:00
Guillaume Chatelet ef36f5143d [Alignment] TargetLowering::hasPairedLoad must use Align for RequiredAlignment
As per documentation of `hasPairLoad`:
"`RequiredAlignment` gives the minimal alignment constraints that must be met to be able to select this paired load."
In this sense, `0` is strictly equivalent to `1`. We make this obvious by using `Align` instead of unsigned.
There is only one implementor of this interface.

Differential Revision: https://reviews.llvm.org/D82958
2020-07-01 14:32:30 +00:00
Kerry McLaughlin 4c6683eafc [AArch64][SVE] Add reg+imm addressing mode for unpredicated loads
Reviewers: efriedma, sdesmalen, david-arm

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82893
2020-07-01 10:33:56 +01:00
Paul Walker a1aed80a35 [SVE] Relax merge requirement for IR based divides.
We currently lower SDIV to SDIV_MERGE_OP1. This forces the value
for inactive lanes in a way that can hamper register allocation,
however, the lowering has no requirement for inactive lanes.

Instead this patch replaces SDIV_MERGE_OP1 with SDIV_PRED thus
freeing the register allocator. Once done the only user of
SDIV_MERGE_OP1 is intrinsic lowering so I've removed the node
and perform ISel on the intrinsic directly. This also allows
us to implement MOVPRFX based zeroing in the same manner as SUB.

This patch also renames UDIV_MERGE_OP1 and [F]ADD_MERGE_OP1 for
the same reason but in the ADD cases the ISel code is already
as required.

Differential Revision: https://reviews.llvm.org/D82783
2020-07-01 08:18:42 +00:00
Guillaume Chatelet 28de229bc6 [Alignment][NFC] Migrate MachineFrameInfo::CreateStackObject to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82894
2020-07-01 07:28:11 +00:00
Florian Hahn 1ccc49924a [AArch64] Add getCFInstrCost, treat branches as free for throughput.
D79164/2596da31740f changed getCFInstrCost to return 1 per default.
AArch64 did not have its own implementation, hence the throughput cost
of CFI instructions is overestimated. On most cores, most branches should
be predicated and essentially free throughput wise.

This restores a 9% performance regression on a SPEC2006 benchmark on
AArch64 with -O3 LTO & PGO.

This patch effectively restores pre 2596da3174 behavior for AArch64
and undoes the AArch64 test changes of the patch.

Reviewers: samparker, dmgreen, anemet

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D82755
2020-06-30 20:34:04 +01:00
Christopher Tetreault ab35ba5742 [SVE] Remove calls to VectorType::getNumElements from AArch64
Reviewers: efriedma, paquette, david-arm, kmclaughlin

Reviewed By: david-arm

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82214
2020-06-30 11:17:50 -07:00
Guillaume Chatelet 6a6af30d43 [Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemset to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82851
2020-06-30 12:46:26 +00:00
Guillaume Chatelet 4f5133a4dc [Alignment][NFC] Migrate AArch64, ARM, Hexagon, MSP and NVPTX backends to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82749
2020-06-30 07:56:17 +00:00
Francesco Petrogalli 67e4330fac [sve][acle] Implement some of the C intrinsics for brain float.
Summary:
The following intrinsics have been extended to support brain float types:

svbfloat16_t svclasta[_bf16](svbool_t pg, svbfloat16_t fallback, svbfloat16_t data)
bfloat16_t svclasta[_n_bf16](svbool_t pg, bfloat16_t fallback, svbfloat16_t data)
bfloat16_t svlasta[_bf16](svbool_t pg, svbfloat16_t op)

svbfloat16_t svclastb[_bf16](svbool_t pg, svbfloat16_t fallback, svbfloat16_t data)
bfloat16_t svclastb[_n_bf16](svbool_t pg, bfloat16_t fallback, svbfloat16_t data)
bfloat16_t svlastb[_bf16](svbool_t pg, svbfloat16_t op)

svbfloat16_t svdup[_n]_bf16(bfloat16_t op)
svbfloat16_t svdup[_n]_bf16_m(svbfloat16_t inactive, svbool_t pg, bfloat16_t op)
svbfloat16_t svdup[_n]_bf16_x(svbool_t pg, bfloat16_t op)
svbfloat16_t svdup[_n]_bf16_z(svbool_t pg, bfloat16_t op)

svbfloat16_t svdupq[_n]_bf16(bfloat16_t x0, bfloat16_t x1, bfloat16_t x2, bfloat16_t x3, bfloat16_t x4, bfloat16_t x5, bfloat16_t x6, bfloat16_t x7)
svbfloat16_t svdupq_lane[_bf16](svbfloat16_t data, uint64_t index)

svbfloat16_t svinsr[_n_bf16](svbfloat16_t op1, bfloat16_t op2)

Reviewers: sdesmalen, kmclaughlin, c-rhodes, ctetreau, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82345
2020-06-29 16:09:08 +00:00
Sander de Smalen 39f6a36a24 [AArch64][SVE] NFCI: Choose consistent naming for predicated SDAG nodes
This patch proposes a naming convention for operations that take
a general predicate (and are thus predicated) that specifies
what happens to the false lanes.

Currently the _PRED suffix is used, which doesn't really say much other
than that it takes a predicate. In some instances this means it has
merging predication and in other cases it means zeroing-predication.

This patch also changes the order of operands to
AArch64ISD::DUP_MERGE_PASSTHRU, to pass the predicate as the first
operand, which is in line with all other predicates nodes. It takes the
passthru value as an explicit passthru value, which is always passed as
the last operand.

Reviewers: paulwalker-arm, cameron.mcinally, eli.friedman, dancgr, efriedma

Reviewed By: paulwalker-arm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81850
2020-06-29 13:37:30 +01:00
Cullen Rhodes d5fc592b7c [AArch64][SVE] Add bfloat16 support to svext intrinsic
Reviewers: sdesmalen, kmclaughlin, efriedma, david-arm, fpetrogalli

Reviewed By: sdesmalen, fpetrogalli

Differential Revision: https://reviews.llvm.org/D82391
2020-06-29 11:08:38 +00:00
Kerry McLaughlin bb6603f013 [AArch64][SVE] Bail out of performPostLD1Combine for scalable types
Summary:
performPostLD1Combine will introduce either a LD1LANEpost
or LD1DUPpost node, which will cause selection failure if the
return type is a scalable vector.

Reviewers: sdesmalen, c-rhodes, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82670
2020-06-29 11:59:53 +01:00
Simon Pilgrim 973685fc78 [TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant
Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.
2020-06-29 11:46:58 +01:00
Francesco Petrogalli ddbdff3acc [sve][acle] Recommit https://reviews.llvm.org/D82501
The original patch was reverted in
ff5ccf258e
as it was missing the C tests that got accidentally missing.

This patch is a NFC of https://reviews.llvm.org/D82501, together with
the SVE ACLE tests for the C intrinsics of svreinterpret for brain
float types.
2020-06-26 20:45:29 +00:00
Francesco Petrogalli ff5ccf258e Revert "[sve][acle] Add reinterpret intrinsics for brain float."
This reverts commit a15722c5ce.

The commmit has to be reverted because I accidentally submit
https://reviews.llvm.org/D82501 without the C tests that were added in
an early version of the patch.
2020-06-26 20:19:49 +00:00
Paul Walker 3a98d5d7e7 [SVE] Code generation for fixed length vector adds.
Summary:
Teach LowerToPredicatedOp to lower fixed length vector operations.

Add AArch64ISD nodes and isel patterns for predicated integer
and floating point adds.

Together this enables SVE code generation for fixed length vector adds.

Reviewers: rengolin, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82483
2020-06-26 19:54:41 +00:00
Francesco Petrogalli a15722c5ce [sve][acle] Add reinterpret intrinsics for brain float.
Reviewers: kmclaughlin, efriedma, ctetreau, sdesmalen, david-arm

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82501
2020-06-26 15:20:58 +00:00
Kerry McLaughlin 6b313f198c [AArch64][SVE] Remove asserts from AArch64ISelLowering for bfloat16 types
Remove the asserts in performLDNT1Combine & performST[NT]1Combine
to ensure we get a failure where the type is a bfloat16 and
hasBF16() is false, regardless of whether asserts are enabled.
2020-06-26 14:51:27 +01:00
Cullen Rhodes 4319c48fc7 [AArch64][SVE] Only support sizeless bfloat types if supported by subtarget
Reviewers: sdesmalen, efriedma, kmclaughlin, fpetrogalli

Reviewed By: sdesmalen, fpetrogalli

Differential Revision: https://reviews.llvm.org/D82494
2020-06-26 12:37:47 +00:00
Guillaume Chatelet fdc7c7fb87 [Alignment][NFC] Migrate TTI::getInterleavedMemoryOpCost to Align
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82573
2020-06-26 11:00:53 +00:00
Kerry McLaughlin edcfef8fee [AArch64][SVE] Add bfloat16 support to store intrinsics
Summary:
Bfloat16 support added for the following intrinsics:
 - ST1
 - STNT1

Reviewers: sdesmalen, c-rhodes, fpetrogalli, efriedma, stuij, david-arm

Reviewed By: fpetrogalli

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82448
2020-06-26 11:05:56 +01:00
Kerry McLaughlin 0ccfe1b267 [AArch64][SVE] Predicate bfloat16 load patterns with HasBF16
Reviewers: sdesmalen, c-rhodes, efriedma, fpetrogalli

Reviewed By: fpetrogalli

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82464
2020-06-26 10:38:24 +01:00
Cullen Rhodes c65d4eb5d3 [AArch64][SVE] Guard perm and select bfloat16 intrinsic patterns
Summary:
Permutation and selection bfloat16 intrinsic patterns should be guarded
on the feature flag `+bf16`. Missed in D82182 and D80850.

Reviewers: sdesmalen, fpetrogalli, kmclaughlin, efriedma

Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D82492
2020-06-26 09:35:36 +00:00
Amara Emerson 97a34b5f8d [AArch64][GlobalISel] Fix extended shift addressing mode selection not handling sxth.
The complex pattern for extended shift offsets only allow sxtw as the extend,
not sxth. Our equivalent function to do this was not rejecting SXTH so we
were miscompiling. This was exposed by D81992.
2020-06-25 17:24:32 -07:00
Jessica Paquette 7fb84dff69 [AArch64][GlobalISel] Port buildvector -> dup pattern from AArch64ISelLowering
Given this:

```
%x:_(<n x sK>) = G_BUILD_VECTOR %lane, ...
...
%y:_(<n x sK>) = G_SHUFFLE_VECTOR %x(<n x sK>), %foo, shufflemask(0, 0, ...)
```

We can produce:

```
%y:_(<n x sK) = G_DUP %lane(sK)
```

Doesn't seem to be too common, but AArch64ISelLowering attempts to do this
before trying to produce a DUPLANE. Might as well port it.

Also make it so that when the splat has an undef mask, we try setting it to
0. SDAG does this, and it makes sure that when we get the build vector operand,
we actually get a source operand.

Differential Revision: https://reviews.llvm.org/D81979
2020-06-25 14:19:06 -07:00
Francesco Petrogalli 7200fa38a9 [sve][acle] Add some C intrinsics for brain float types.
Summary:
The following intrinsics has been added:

svuint16_t svcnt[_bf16]_m(svuint16_t inactive, svbool_t pg, svbfloat16_t op)
svuint16_t svcnt[_bf16]_x(svbool_t pg, svbfloat16_t op)
svuint16_t svcnt[_bf16]_z(svbool_t pg, svbfloat16_t op)

svbfloat16_t svtbl[_bf16](svbfloat16_t data, svuint16_t indices)

svbfloat16_t svtbl2[_bf16](svbfloat16x2_t data, svuint16_t indices)

svbfloat16_t svtbx[_bf16](svbfloat16_t fallback, svbfloat16_t data, svuint16_t indices)

Reviewers: c-rhodes, kmclaughlin, efriedma, sdesmalen, ctetreau

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82429
2020-06-25 16:31:01 +00:00
Victor Campos da852b03b0 [AArch64] Emit warning when disassembling unpredictable LDRAA and LDRAB
Summary:
LDRAA and LDRAB in their writeback variant should softfail when the same
register is used as result and base.

This patch adds a custom decoder that catches such case and emits a
warning when it occurs.

Differential Revision: https://reviews.llvm.org/D82541
2020-06-25 15:56:36 +01:00
Guillaume Chatelet 324cda2073 [Alignment][NFC] Conform X86, ARM and AArch64 TargetTransformInfo backends to the public API
The main interface has been migrated to Align already but a few backends where broadening the type from Align to MaybeAlign.
This patch makes sure all implementations conform to the public API.

Differential Revision: https://reviews.llvm.org/D82465
2020-06-25 13:23:13 +00:00
Guillaume Chatelet 2e7bba693e [Alignment][NFC] Use Align for TargetCallingConv::OrigAlign
This patch replaces D69249.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82307
2020-06-25 13:21:22 +00:00
Simon Pilgrim a53dddb3e9 Local.h - reduce includes to forward declarations. NFC.
Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.
2020-06-24 19:27:37 +01:00
Cullen Rhodes 26502ad609 [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
Summary:
Added for following intrinsics:

  * zip1, zip2, zip1q, zip2q
  * trn1, trn2, trn1q, trn2q
  * uzp1, uzp2, uzp1q, uzp2q
  * splice
  * rev
  * sel

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D82182
2020-06-24 10:04:51 +00:00
Kerry McLaughlin 3d6cab271c [AArch64][SVE] Add bfloat16 support to load intrinsics
Summary:
Bfloat16 support added for the following intrinsics:
 - LD1
 - LD1RQ
 - LDNT1
 - LDNF1
 - LDFF1

Reviewers: sdesmalen, c-rhodes, efriedma, stuij, fpetrogalli, david-arm

Reviewed By: fpetrogalli

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82298
2020-06-24 10:32:19 +01:00
Amara Emerson fceadbcb33 [AArch64][GlobalISel] Improve codegen for some constant vectors by using constant pool loads.
There's more smarts in AArch64ISelLowering that we don't have yet, but this
change incrementally improves some of the more common patterns. I think future
iterations will want to use some combination of PostLegalizerCombiner and the
selector to catch the other cases.

Differential Revision: https://reviews.llvm.org/D82340
2020-06-23 19:23:47 -07:00
Eli Friedman a2caa3b614 Remove GlobalValue::getAlignment().
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value.  If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.

This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.

Differential Revision: https://reviews.llvm.org/D80368
2020-06-23 19:13:42 -07:00
Eli Friedman e9d4e34ab8 [AArch64][SVE] Add legalization support for i32/i64 vector srem/urem
Implement them on top of sdiv/udiv, similar to what we do for integer
types.

Potential future work: implementing i8/i16 srem/urem, optimizations for
constant divisors, optimizing the mul+sub to mls.

Differential Revision: https://reviews.llvm.org/D81511
2020-06-23 16:27:52 -07:00
Mikhail Maltsev 3f353a2e5a [BFloat] Add convert/copy instrinsic support
This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

Specifically it adds intrinsic support in clang and llvm for Arm and AArch64.

The bfloat type, and its properties are specified in the Arm Architecture Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:
  - Alexandros Lamprineas
  - Luke Cheeseman
  - Mikhail Maltsev
  - Momchil Velikov
  - Luke Geeson

Differential Revision: https://reviews.llvm.org/D80928
2020-06-23 14:27:05 +00:00
Sander de Smalen 121e585ec8 [AArch64][SVE] ACLE: Add bfloat16 to struct load/stores.
This patch contains:
- Support in LLVM CodeGen for bfloat16 types for ld2/3/4 and st2/3/4.
- New bfloat16 ACLE builtins for svld(2|3|4)[_vnum] and svst(2|3|4)[_vnum]

Reviewers: stuij, efriedma, c-rhodes, fpetrogalli

Reviewed By: fpetrogalli

Tags: #clang, #lldb, #llvm

Differential Revision: https://reviews.llvm.org/D82187
2020-06-23 12:12:35 +01:00
Paul Walker 499c63288f [SVE] Code generation for fixed length vector loads & stores.
Summary:
This patch adds base support for code generating fixed length
vector operations targeting a known SVE vector length. To achieve
this we lower fixed length vector operations to equivalent scalable
vector operations, whereby SVE predication is used to limit the
elements processed to those present within the fixed length vector.

Specifically this patch implements load and store operations, which
get lowered to their masked counterparts thusly:

  V = load(Addr) =>
    V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr))

  store(V, (Addr)) =>
    masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr))

Reviewers: rengolin, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80385
2020-06-23 09:39:03 +00:00
Francesco Petrogalli ef597eda8e [sve][acle] Add SVE BFloat16 extensions.
Summary:
List of intrinsics:

svfloat32_t svbfdot[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfdot[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfdot_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svfloat32_t svbfmmla[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)

svfloat32_t svbfmlalb[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfmlalb[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfmlalb_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svfloat32_t svbfmlalt[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfmlalt[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfmlalt_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svbfloat16_t svcvt_bf16[_f32]_m(svbfloat16_t inactive, svbool_t pg, svfloat32_t op)
svbfloat16_t svcvt_bf16[_f32]_x(svbool_t pg, svfloat32_t op)
svbfloat16_t svcvt_bf16[_f32]_z(svbool_t pg, svfloat32_t op)

svbfloat16_t svcvtnt_bf16[_f32]_m(svbfloat16_t even, svbool_t pg, svfloat32_t op)
svbfloat16_t svcvtnt_bf16[_f32]_x(svbfloat16_t even, svbool_t pg, svfloat32_t op)

For reference, see section 7.2 of "Arm C Language Extensions for SVE - Version 00bet4"

Reviewers: sdesmalen, ctetreau, efriedma, david-arm, rengolin

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82141
2020-06-22 16:53:02 +00:00
stozer 539381da26 [DebugInfo] Update MachineInstr to help support variadic DBG_VALUE instructions
Following on from this RFC[0] from a while back, this is the first patch towards
implementing variadic debug values.

This patch specifically adds a set of functions to MachineInstr for performing
operations specific to debug values, and replacing uses of the more general
functions where appropriate. The most prevalent of these is replacing
getOperand(0) with getDebugOperand(0) for debug-value-specific code, as the
operands corresponding to values will no longer be at index 0, but index 2 and
upwards: getDebugOperand(x) == getOperand(x+2). Similar replacements have been
added for the other operands, along with some helper functions to replace
oft-repeated code and operate on a variable number of value operands.

[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139376.html<Paste>

Differential Revision: https://reviews.llvm.org/D81852
2020-06-22 16:01:12 +01:00
Eric Christopher cf23852587 [Target] As part of using inclusive language within the llvm project,
migrate away from the use of blacklist and whitelist.

This change affects an internal llvm command line option.
2020-06-20 00:06:39 -07:00
Amara Emerson 1feeecf224 [AArch64][GlobalISel] Make G_SEXT_INREG legal and add selection support.
We were defaulting to the lower action for this, resulting in SHL+ASHR
sequences. On AArch64 we can do this in one instruction for an arbitrary
extension using SBFM as we do for G_SEXT.

Differential Revision: https://reviews.llvm.org/D81992
2020-06-19 13:20:41 -07:00
David Sherwood 584d0d5c17 [SVE] Fall back on DAG ISel at -O0 when encountering scalable types
At the moment we use Global ISel by default at -O0, however it is
currently not capable of dealing with scalable vectors for two
reasons:

1. The register banks know nothing about SVE registers.
2. The LLT (Low Level Type) class knows nothing about scalable
   vectors.

For now, the easiest way to avoid users hitting issues when using
the SVE ACLE is to fall back on normal DAG ISel when encountering
instructions that operate on scalable vector types.

I've added a couple of RUN lines to existing SVE tests to ensure
we can compile at -O0. I've also added some new tests to

  CodeGen/AArch64/GlobalISel/arm64-fallback.ll

that demonstrate we correctly fallback to DAG ISel at -O0 when
lowering formal arguments or translating instructions that involve
scalable vector types.

Differential Revision: https://reviews.llvm.org/D81557
2020-06-19 10:57:00 +01:00
David Sherwood 0dc28af219 [CodeGen,AArch64] Fix up warnings in performExtendCombine
Try to avoid calling getVectorNumElements() or relying upon the
TypeSize conversion to uin64_t.

Differential Revision: https://reviews.llvm.org/D81573
2020-06-19 10:34:51 +01:00
Kristof Beyls d938ec4509 [AArch64] Avoid incompatibility between SLSBLR mitigation and BTI codegen.
A "BTI c" instruction only allows jumping/calling to using a BLR* instruction.
However, the SLSBLR mitigation changes a BLR to a BR to implement the
function call. Therefore, a "BTI c" check that passed before could
trigger after the BLR->BL change done by the SLSBLR mitigation.
However, if the register used in BR is X16 or X17, this trigger will not
fire (see ArmARM for further details).

Therefore, this patch simply changes the function stubs for the SLSBLR
mitigation from
__llvm_slsblr_thunk_x<N>:
    br x<N>
    SpeculationBarrier
to
__llvm_slsblr_thunk_x<N>:
    mov x16, x<N>
    br  x16
    SpeculationBarrier

Differential Revision: https://reviews.llvm.org/D81405
2020-06-19 06:21:54 +01:00
Francesco Petrogalli d32c134648 [llvm][SVE] Reg + reg addressing mode for LD1RO.
Reviewers: efriedma, sdesmalen

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80741
2020-06-19 03:56:10 +00:00
Matt Arsenault 7f8b2e1b91 GlobalISel: Pass LegalizerHelper to custom legalize callbacks
This was passing in all the parameters needed to construct a
LegalizerHelper in the custom legalization, when it's simpler to just
pass in the existing helper.

This is slightly more annoying to use in the common case where you
don't need the legalizer helper, but we could add back the common
parameters back in addition to the helper.

I didn't propagate this to all the internal target changes that this
logically implies, but did update a sample one for
legalizeMinNumMaxNum.

This is in preparation for moving AMDGPU load/store legalization
entirely into custom lowering. The current set of legalization actions
is really constraining and not really capable of expressing all the
actions needed to legalize loads/stores. In particular there's no way
to express when the memory access itself needs to change size vs. the
result type. There's also a lot of redundancy since the same
split/widen actions need to be applied in both vector and scalar
cases. All of the sub-cases logically belong as steps in the legalizer
helper, but it will be easier to consider everything at once in custom
lowering.
2020-06-18 17:17:38 -04:00
Paul Walker 4612f39120 [SVE] Add flag to specify SVE register size, using this to calculate legal vector types.
Adds aarch64-sve-vector-bits-{min,max} to allow the size of SVE
data registers (in bits) to be specified. This allows the code
generator to make assumptions it normally couldn't. As a starting
point this information is used to mark fixed length vector types
that can fit within the specified size as legal.

Reviewers: rengolin, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80384
2020-06-18 12:11:16 +00:00
Kristof Beyls 832cfc7672 [IndirectThunks] Make generated MF structure as expected by all instruction selectors.
This also enables running the AArch64 SLSHardening pass with GlobalISel,
so add a test for that.

Differential Revision: https://reviews.llvm.org/D81403
2020-06-18 06:44:53 +01:00
Kristof Beyls 3f0cc96a96 [AArch64] SLSHardening: compute correct thunk name for X29.
The enum values for AArch64 registers are not all consecutive.
Therefore, the computation
  "__llvm_slsblr_thunk_x" + utostr(Reg - AArch64::X0)
is not always correct. utostr(Reg - AArch64::X0) will not generate the
expected string for the registers that do not have consecutive values in
the enum.
This happened to work for most registers, but does not for AArch64::FP
(i.e. register X29).
This can get triggered when the X29 is not used as a frame pointer.

Differential Revision: https://reviews.llvm.org/D81997
2020-06-18 06:36:49 +01:00
Daniel Sanders e35ba09961 [gicombiner] Allow generated combiners to store additional members
Summary:
Adds the ability to add members to a generated combiner via
a State base class. In the current AArch64PreLegalizerCombiner
this is used to make Helper available without having to
provide it to every call.

As part of this, split the command line processing into a
separate object so that it still only runs once even though
the generated combiner is constructed more frequently.

Depends on D81862

Reviewers: aditya_nandakumar, bogner, volkan, aemerson, paquette, arsenm

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81863
2020-06-16 14:47:04 -07:00
Christopher Tetreault 616d8d942b [SVE] Eliminate calls to default-false VectorType::get() from AArch64
Reviewers: efriedma, c-rhodes, david-arm, samparker, greened

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81518
2020-06-16 13:53:25 -07:00
Jessica Paquette 7caa9caa80 [AArch64][GlobalISel] Avoid creating redundant ubfx when selecting G_ZEXT
When selecting 32 b -> 64 b G_ZEXTs, we don't have to always emit the extend.

If the instruction feeding into the G_ZEXT implicitly zero extends the high
half of the register, we can just emit a SUBREG_TO_REG instead.

Differential Revision: https://reviews.llvm.org/D81897
2020-06-16 09:50:47 -07:00
Luke Geeson 10b6567f49 [AArch64]: BFloat MatMul Intrinsics&CodeGen
This patch upstreams support for BFloat Matrix Multiplication Intrinsics
and Code Generation from __bf16 to AArch64. This includes IR intrinsics. Unittests are
provided as needed. AArch32 Intrinsics + CodeGen will come after this
patch.

This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm
Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:

Luke Geeson
 - Momchil Velikov
 - Mikhail Maltsev
 - Luke Cheeseman

Reviewers: SjoerdMeijer, t.p.northover, sdesmalen, labrinea, miyuki,
stuij

Reviewed By: miyuki, stuij

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki, chill, pbarrio, stuij

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80752

Change-Id: I174f0fd0f600d04e3799b06a7da88973c6c0703f
2020-06-16 15:23:30 +01:00
Luke Geeson 508a4764c0 [AArch64]: BFloat Load/Store Intrinsics&CodeGen
This patch upstreams support for ld / st variants of BFloat intrinsics
in from __bf16 to AArch64. This includes IR intrinsics. Unittests are
provided as needed.

This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm
Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:

 - Luke Geeson
 - Momchil Velikov
 - Luke Cheeseman

Reviewers: fpetrogalli, SjoerdMeijer, sdesmalen, t.p.northover, stuij

Reviewed By: stuij

Subscribers: arsenm, pratlucas, simon_tatham, labrinea, kristof.beyls,
hiraditya, danielkiss, cfe-commits, llvm-commits, pbarrio, stuij

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80716

Change-Id: I22e1dca2a8a9ec25d1e4f4b200cb50ea493d2575
2020-06-16 15:23:30 +01:00
Kristof Beyls 503a26d8e4 Silence GCC 7 warning
GCC 7 was reporting "enumeral and non-enumeral type in conditional expression"
as a warning.
The code casts an instruction opcode enum to unsigned implicitly, in
line with intentions; so this commit silences the warning by making the
cast to unsigned explicit.
2020-06-16 11:42:52 +01:00
Fangrui Song a3b5f428c1 [AArch64] Print the immediate operand for SPACE pseudo instruction
Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D81814
2020-06-15 20:55:53 -07:00
Amara Emerson 1035a416a6 [AArch64][GlobalISel] Emit constant pool loads for 64 bit fp immediates.
Note: don't do this for integer 64 bit materialization to match SDAG.

Differential Revision: https://reviews.llvm.org/D81893
2020-06-15 20:53:09 -07:00
Jessica Paquette 3495b884de [AArch64][GlobalISel] Add G_EXT and select ext using it
Add selection support for ext via a new opcode, G_EXT and a post-legalizer
combine which matches it.

Add an `applyEXT` function, because the AArch64ext patterns require a register
for the immediate. So, we have to create a G_CONSTANT to get these without
writing new patterns or modifying the existing ones.

Tests are the same as arm64-ext.ll.

Also prevent ext from firing on the zip test. It has higher priority, so we
don't want it potentially getting in the way of mask tests.

Also fix up the shuffle-splat test, because ext is now selected there. The
test was incorrectly regbank selected before, which could cause a verifier
failure when you emit copies.

Differential Revision: https://reviews.llvm.org/D81436
2020-06-15 12:20:59 -07:00
Francesco Petrogalli 28a00ac9ba [llvm][SVE] IR intrinsics for quadword permutation instructions.
Summary:
Adding intrinsics and codegen patterns for:

* trn1 <Zd>.q, <Zm>.q, <Zn>.q
* trn2 <Zd>.q, <Zm>.q, <Zn>.q
* zip1 <Zd>.q, <Zm>.q, <Zn>.q
* zip2 <Zd>.q, <Zm>.q, <Zn>.q
* uzp1 <Zd>.q, <Zm>.q, <Zn>.q
* uzp2 <Zd>.q, <Zm>.q, <Zn>.q

These instructions are defined in Armv8.6-A.

Reviewers: sdesmalen, efriedma, kmclaughlin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80850
2020-06-15 16:21:56 +00:00
Daniel Kiss b8ae3fdfa5 [AArch64] Fix BTI instruction emission.
Summary:
SCTLR_EL1.BT[01] controls the PACI[AB]SP compatibility with PBYTE 11
(see [1])
This bit will be set to zero so PACI[AB]SP are equal to BTI C
instruction only.

[1] https://developer.arm.com/docs/ddi0595/b/aarch64-system-registers/sctlr_el1

Reviewers: chill, tamas.petz, pbarrio, ostannard

Reviewed By: tamas.petz, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81746
2020-06-15 15:04:36 +02:00
Nikita Popov 7cac7e0cfc [IR] Prefer hasFnAttribute() where possible (NFC)
When checking for an enum function attribute, use hasFnAttribute()
rather than hasAttribute() at FunctionIndex, because it is
significantly faster (and more concise to boot).
2020-06-15 09:30:35 +02:00
Amara Emerson 1cbebd95de [AArch64][GlobalISel] Legalize vector G_PTR_ADD and enable selection.
Differential Revision: https://reviews.llvm.org/D81419
2020-06-12 11:25:17 -07:00
Jessica Paquette d3a56f062b [AArch64][GlobalISel] Allow G_DUP for elements smaller than 32 B.
We select all of these via patterns now, so there's no reason to disallow this.

Update select-dup.mir to show that we correctly select the smaller types.

Differential Revision: https://reviews.llvm.org/D81322
2020-06-12 09:40:34 -07:00
Jessica Paquette 305862a5a6 [AArch64][GlobalISel] Set hasSideEffects = 0 on custom shuffle opcodes
This was making it so that the instructions weren't eliminated in
select-rev.mir and select-trn.mir despite not being used.

Update the tests accordingly.

Differential Revision: https://reviews.llvm.org/D81492
2020-06-12 09:39:46 -07:00
Huihui Zhang bf7961fade [NFC] Silence compiler warning [-Wmissing-braces].
llvm/lib/Target/AArch64/AArch64SLSHardening.cpp:146:5: warning: suggest braces around initialization of subobject [-Wmissing-braces]
    "__llvm_slsblr_thunk_x0",  "__llvm_slsblr_thunk_x1",
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    {
llvm/lib/Target/AArch64/AArch64SLSHardening.cpp:168:5: warning: suggest braces around initialization of subobject [-Wmissing-braces]
    AArch64::X0,  AArch64::X1,  AArch64::X2,  AArch64::X3,  AArch64::X4,
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    {
2020-06-12 08:55:03 -07:00
Kristof Beyls c35ed40f4f [AArch64] Extend AArch64SLSHardeningPass to harden BLR instructions.
To make sure that no barrier gets placed on the architectural execution
path, each
  BLR x<N>
instruction gets transformed to a
  BL __llvm_slsblr_thunk_x<N>
instruction, with __llvm_slsblr_thunk_x<N> a thunk that contains
__llvm_slsblr_thunk_x<N>:
  BR x<N>
  <speculation barrier>

Therefore, the BLR instruction gets split into 2; one BL and one BR.
This transformation results in not inserting a speculation barrier on
the architectural execution path.

The mitigation is off by default and can be enabled by the
harden-sls-blr subtarget feature.

As a linker is allowed to clobber X16 and X17 on function calls, the
above code transformation would not be correct in case a linker does so
when N=16 or N=17. Therefore, when the mitigation is enabled, generation
of BLR x16 or BLR x17 is avoided.

As BLRA* indirect calls are not produced by LLVM currently, this does
not aim to implement support for those.

Differential Revision:  https://reviews.llvm.org/D81402
2020-06-12 07:34:33 +01:00
Kristof Beyls 0ee176edc8 [AArch64] Introduce AArch64SLSHardeningPass, implementing hardening of RET and BR instructions.
Some processors may speculatively execute the instructions immediately
following RET (returns) and BR (indirect jumps), even though
control flow should change unconditionally at these instructions.
To avoid a potential miss-speculatively executed gadget after these
instructions leaking secrets through side channels, this pass places a
speculation barrier immediately after every RET and BR instruction.

Since these barriers are never on the correct, architectural execution
path, performance overhead of this is expected to be low.

On targets that implement that Armv8.0-SB Speculation Barrier extension,
a single SB instruction is emitted that acts as a speculation barrier.
On other targets, a DSB SYS followed by a ISB is emitted to act as a
speculation barrier.

These speculation barriers are implemented as pseudo instructions to
avoid later passes to analyze them and potentially remove them.

Even though currently LLVM does not produce BRAA/BRAB/BRAAZ/BRABZ
instructions, these are also mitigated by the pass and tested through a
MIR test.

The mitigation is off by default and can be enabled by the
harden-sls-retbr subtarget feature.

Differential Revision:  https://reviews.llvm.org/D81400
2020-06-11 07:51:17 +01:00
Leonard Chan 6adc664b9d [llvm][ELF][AArch64] Handle R_AARCH64_PLT32 relocation
This patch allows for usage of the @PLT modifier in AArch64 assembly which
lowers to an R_AARCH64_PLT32 relocation. See D81184 for handling this
relocation in lld.

Differential Revision: https://reviews.llvm.org/D81446
2020-06-10 11:34:16 -07:00
Sam Parker fa8bff0cd1 [CostModel] Unify getArithmeticInstrCost
Add the remaining arithmetic opcodes into the generic implementation
of getUserCost and then call this from getInstructionThroughput. Most
of the backends have been modified to return the base implementation
for cost kinds other RecipThroughput. The outlier here is AMDGPU
which already uses getArithmeticInstrCost for all the cost kinds.
This change means that most of the opcodes can be removed from that
backends implementation of getUserCost.

Differential Revision: https://reviews.llvm.org/D80992
2020-06-10 09:08:45 +01:00
Amara Emerson 075890ca55 [AArch64] Move RegisterBankInfo.cpp/h to GISel.
Missed this file in the recent reorg.
2020-06-09 23:26:25 -07:00
Shawn Landden 9ec57cce62 [AArch64] custom lowering for i128 popcount
halves the number of CNT instructions generated
2020-06-10 09:44:16 +04:00
Amara Emerson 938cc573ee [AArch64][GlobalISel] Select G_ADD_LOW into a MOVaddr pseudo.
This ensures that we match SelectionDAG behaviour by waiting until the expand
pseudos pass to generate ADRP + ADD pairs. Doing this at selection time for the
G_ADD_LOW is fine because by the time we get to selecting the G_ADD_LOW,
previous attempts to fold it into loads/stores must have failed.

Differential Revision: https://reviews.llvm.org/D81512
2020-06-09 16:47:58 -07:00
Matt Arsenault 32823091c3 GlobalISel: Set instr/debugloc before any legalizer action
It was annoying enough that every custom lowering needed to set the
insert point, but this was made worse since now these all needed to be
updated to setInstrAndDebugLoc. Consolidate these so every
legalization action has the right insert position by default.

This should fix dropping debug info in every custom AMDGPU
legalization.
2020-06-09 15:37:02 -04:00
Daniel Kiss 7a38618a20 [AArch64] Allow BTI mnemonics in the HINT space with BTI disabled
Summary:
It is important to emit HINT instructions instead of BTI  ones when
BTI is disabled. This allows compatibility with other assemblers
(e.g. GAS).

Still, developers of assembly code will want to write code that is
compatible with both pre- and post-BTI CPUs. They could use HINT
mnemonics, but the new mnemonics are a lot more readable (e.g.
bti c instead of hint #34), and they will result in the same
encodings. So, while LLVM should not *emit* the new mnemonics when
BTI is disabled, this patch will at least make LLVM *accept*
assembly code that uses them.

Reviewers: pbarrio, tamas.petz, ostannard

Reviewed By: pbarrio, ostannard

Subscribers: ostannard, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81257
2020-06-09 19:57:02 +02:00
Jessica Paquette cb2d8b30ad [AArch64][GlobalISel] Select trn1 and trn2
Same idea as for zip, uzp, etc. Teach the post-legalizer combiner to recognize
G_SHUFFLE_VECTORs that are trn1/trn2 instructions.

- Add G_TRN1 and G_TRN2
- Port mask matching code from AArch64ISelLowering
- Produce G_TRN1 and G_TRN2 in the post-legalizer combiner
- Select via importer

Add select-trn.mir to test selection.

Add postlegalizer-combiner-trn.mir to test the combine. This is similar to the
existing arm64-trn test.

Note that both of these tests contain things we currently don't legalize.

I figured it would be easier to test these now rather than later, since once
we legalize the G_SHUFFLE_VECTORs, it's not guaranteed that someone will update
the tests.

Differential Revision: https://reviews.llvm.org/D81182
2020-06-09 10:55:19 -07:00
Henry Kao 4dcc0d1958 [CodeGen][SVE] Avoid scalarizing zero splat stores on scalable vectors.
Summary: Implemented in replaceZeroVectorStore(). Fixes several warnings in AArch64 SVE unit tests.

Reviewers: sdesmalen, kmclaughlin, dancgr, efriedma, each, andwar, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80824
2020-06-09 12:52:39 -04:00
Simon Pilgrim a375463ad0 Fix Wdocumentation warning. NFC.
The raw unsigned Opc value has been replaced with the ShuffleVectorPseudo MatchInfo wrapper struct.
2020-06-09 13:53:39 +01:00
Guillaume Chatelet 3b6196c9b3 [Alignment][NFC] TargetLowering::allowsMisalignedMemoryAccesses
Summary:
Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMisalignedMemoryAccesses` without marking it override.

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81374
2020-06-09 10:17:42 +00:00
Cullen Rhodes 91855483f3 [AArch64][AsmParser] Fix debug output in a few instructions
Summary:
In the parsing of BTIHint, PSBHint and Prefetch the identifier token
should be lexed after creating the operand, otherwise the StringRef is
moved before being copied and the debug output is incorrect.

Prefetch example:

    $ echo "prfm   pldl1keep, [x2]" | ./bin/llvm-mc \
        -triple aarch64-none-linux-gnu -show-encoding -debug

    Before:

      Matching formal operand class MCK_Prefetch against actual operand at
      index 1 (<prfop ,>): match success using generic matcher

    After:

      Matching formal operand class MCK_Prefetch against actual operand at
      index 1 (<prfop pldl1keep>): match success using generic matcher

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D80620
2020-06-09 09:02:59 +00:00
Cullen Rhodes b82be5db71 [AArch64][SVE] Implement structured load intrinsics
Summary:
This patch adds initial support for the following instrinsics:

    * llvm.aarch64.sve.ld2
    * llvm.aarch64.sve.ld3
    * llvm.aarch64.sve.ld4

For loading two, three and four vectors worth of data. Basic codegen is
implemented with reg+reg and reg+imm addressing modes being addressed
in a later patch.

The types returned by these intrinsics have a number of elements that is a
multiple of the elements in a 128-bit vector for a given type and N, where N is
the number of vectors being loaded, i.e. 2, 3 or 4. Thus, for 32-bit elements
the types are:

    LD2 : <vscale x 8 x i32>
    LD3 : <vscale x 12 x i32>
    LD4 : <vscale x 16 x i32>

This is implemented with target-specific intrinsics for each variant that take
the same operands as the IR intrinsic but return N values, where the type of
each value is a full vector, i.e. <vscale x 4 x i32> in the above example.
These values are then concatenated using the standard concat_vector intrinsic
to maintain type legality with the IR.

These intrinsics are intended for use in the Arm C Language
Extension (ACLE).

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D75751
2020-06-09 08:51:58 +00:00
Kristof Beyls 09d098506b [AArch64] Fix branch, terminator, etc properties for BRA* instructions.
Tests relying on some of these fixes will be added for this in follow-on
patches that introduce new features that require these properties to be
correct.

Differential Revision: https://reviews.llvm.org/D81399
2020-06-09 08:19:41 +01:00
Sam Parker 37289615c0 [NFCI][CostModel] Unify getCmpSelInstrCost
Add cases for icmp, fcmp and select into the switch statement of the
generic getUserCost implementation with getInstructionThroughput then
calling into it. The BasicTTI and backend implementations have be set
to return a default value (1) when a cost other than throughput is
being queried.

Differential Revision: https://reviews.llvm.org/D80550
2020-06-09 07:41:22 +01:00
David Sherwood 30dfbf03a2 [CodeGen,AArch64] Fix up warnings in splitStores
The code for trying to split up stores is designed for NEON vectors,
where we support arbitrary alignments. It's an optimisation designed
to improve performance by using smaller, aligned stores. However,
we currently only support 16 byte alignments for SVE vectors anyway
so we may as well bail out early.

This change fixes up remaining warnings in a couple of tests:

  CodeGen/AArch64/sve-callbyref-notailcall.ll
  CodeGen/AArch64/sve-calling-convention-byref.ll

Differential Revision: https://reviews.llvm.org/D80720
2020-06-09 07:39:38 +01:00
Jian Cai 0e1accd0f7 [AArch64] Support expression results as immediate values in mov
Summary:
This patch adds support of using the result of an expression as an
immediate value. For example,

0:
.skip 4
 1:
mov x0, 1b - 0b

is assembled to

mov x0, #4

Currently it does not support expressions requiring relocation unless
explicitly specified. This fixes PR#45781.

Reviewers: peter.smith, ostannard, efriedma

Reviewed By: efriedma

Subscribers: nickdesaulniers, llozano, manojgupta, efriedma, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80028
2020-06-08 17:57:20 -07:00
Florian Hahn 1975ff9a0a [AArch64] Fix ldst-opt of multiple disjunct subregs.
Currently aarch64-ldst-opt will incorrectly rename registers with
multiple disjunct subregisters (e.g. result of LD3). This patch updates
the canRenameUpToDef to bail out if it encounters such a register class
that contains the register to rename.

Fixes PR46105.

Reviewers: efriedma, dmgreen, paquette, t.p.northover

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D81108
2020-06-08 20:18:24 +01:00
Sam Parker 5b5e78ad2b [CostModel] Follow-up to buildbot fix
Adding type checks into the other backends that call
getTypeLegalizationCost.

Differential Revision: https://reviews.llvm.org/D80984
2020-06-08 15:26:25 +01:00
David Sherwood 41fb119e8c [CodeGen] Fix nullptr crash in tryConvertSVEWideCompare
When the input to a wide compare instruction is a DUP or SPLAT_VECTOR
node we should deal with cases where the DUP/SPLAT_VECTOR input
operand is not an immediate value. I've fixed the code to return
SDValue() in such cases and added a couple of tests - one each to
represent the signed and unsigned cases.

Differential Revision: https://reviews.llvm.org/D81167
2020-06-08 15:20:18 +01:00
Cullen Rhodes 3ebbe35363 [AArch64][SVE] Implement vector tuple intrinsics
Summary:
This patch adds the following intrinsics for creating two-tuple,
three-tuple and four-tuple scalable vectors:

    * llvm.aarch64.sve.tuple.create2
    * llvm.aarch64.sve.tuple.create3
    * llvm.aarch64.sve.tuple.create4

As well as:

    * llvm.aarch64.sve.tuple.get
    * llvm.aarch64.sve.tuple.set

For extracting and inserting scalable vectors from vector tuples. These
intrinsics are intended to be used by the ACLE functions svcreate<n>,
svget and svset.

This patch also includes calling convention support for passing and
returning tuples of scalable vectors to/from functions.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75674
2020-06-08 11:09:55 +00:00
Guillaume Chatelet be4f5061ea [Alignment][NFC] Migrate part of Arm/AArch64 backend
Summary: Follow up on D81196

Reviewers: courbet

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81274
2020-06-08 07:15:37 +00:00
Simon Pilgrim f6cb987d50 DomTreeUpdater.h - refine includes. NFC.
We don't need any of its defs or many of its includes inside PostDominators.h - so split it and reduce the frontend load.
2020-06-07 16:57:48 +01:00