Commit Graph

1445 Commits

Author SHA1 Message Date
Bradley Smith 90ffcb1245 [AArch64][SVE] Add unpredicated vector BIC ISD node
Addition of this node allows us to better utilize the different forms of
the SVE BIC instructions, including using the alias to an AND (immediate).

Differential Revision: https://reviews.llvm.org/D101831
2021-05-14 16:12:13 +01:00
Tim Northover ea0eec69f1 IR+AArch64: add a "swiftasync" argument attribute.
This extends any frame record created in the function to include that
parameter, passed in X22.

The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:

  * 0b0000 => a normal, non-extended record with just [FP, LR]
  * 0b0001 => the extended record [X22, FP, LR]
  * 0b1111 => kernel space, and a non-extended record.

All other values are currently reserved.

If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.

There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
2021-05-14 11:43:58 +01:00
Peter Waller 6e6f9a636b [AArch64][SVE] Improve sve.convert.to.svbool lowering
The sve.convert.to.svbool lowering has the effect of widening a logical
<M x i1> vector representing lanes into a physical <16 x i1> vector
representing bits in a predicate register.

In general, if converting to svbool, the contents of lanes in the
physical register might not be known. For sve.convert.to.svbool the new
lanes are specified to be zeroed, requiring 'and' instructions to mask
off the new lanes. For lanes coming from a ptrue or a comparison,
however, they are known to be zero.

CodeGen Before:
  ptrue p0.s, vl16
  ptrue p1.s
  ptrue p2.b
  and   p0.b, p2/z, p0.b, p1.b
  ret

After:
  ptrue	p0.s, vl16
  ret

Differential Revision: https://reviews.llvm.org/D101544
2021-05-12 10:57:25 +01:00
Bradley Smith 635164b95a [AArch64][SVE] Improve SVE codegen for fixed length BITCAST
Expanding a fixed length operation involves wrapping the operation in an
insert/extract subvector pair, as such, when this is done to bitcast we
end up with an extract_subvector of a bitcast. DAGCombine tries to
convert this into a bitcast of an extract_subvector which restores the
initial fixed length bitcast, causing an infinite loop of legalization.

As part of this patch, we must make sure the above DAGCombine does not
trigger after legalization if the created bitcast would not be legal.

Differential Revision: https://reviews.llvm.org/D101990
2021-05-10 14:43:53 +01:00
Bradley Smith 65c89cd1a6 [AArch64][SVE] Better utilisation of unpredicated forms of remaining intrinsics
When using predicated intrinsics, if the predicate used is all lanes active,
use an unpredicated form of the instruction, additionally this allows for
better use of immediate forms.

This only includes instructions where the unpredicated/predicated forms
matched in such a way that instruction selection would not introduce extra
ptrue instructions. This allows us to convert the intrinsics directly to
architecture independent ISD nodes.

Depends on D101062

Differential Revision: https://reviews.llvm.org/D101828
2021-05-10 13:06:02 +01:00
Bradley Smith f8f953c2a6 [AArch64][SVE] Better utilisation of unpredicated forms of arithmetic intrinsics
When using predicated arithmetic intrinsics, if the predicate used is all
lanes active, use an unpredicated form of the instruction, additionally
this allows for better use of immediate forms.

This also includes a new complex isel pattern which allows matching an
all active predicate when the types are different but the predicate is a
superset of the type being used. For example, to allow a b8 ptrue for a
b32 predicate operand.

This only includes instructions where the unpredicated/predicated forms
are mismatched between variants, meaning that the removal of the
predicate is done during instruction selection in order to prevent
spurious re-introductions of ptrue instructions.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D101062
2021-05-10 13:05:37 +01:00
Sander de Smalen 407a33889d [AArch64][SVE] Fix isel failure for FP-extending loads
DAGCombiner tries to combine a (fpext (load)) to (fround (extload))
but SVE has no FP-extending loads. By marking these as expand,
the combine no longer happens.

This also fixes a similar issue for fptrunc, where the source type
is not a legal type.

Reviewed By: bsmith, kmclaughlin

Differential Revision: https://reviews.llvm.org/D102053
2021-05-10 11:27:38 +01:00
Jun Ma b3aeb13892 [AArch64][SVE] Remove index_vector node.
Since index_vector is lowered into step_vector in D100816, we can just remove
index_vector, use step_vector for codegen directly.

Differential Revision: https://reviews.llvm.org/D101593
2021-05-10 11:08:58 +08:00
Simon Pilgrim 280aa3415e [DAG] Add a generic expansion for SHIFT_PARTS opcodes using funnel shifts
Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts.

Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication.

I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to).

NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly.

Differential Revision: https://reviews.llvm.org/D101987
2021-05-07 13:12:30 +01:00
Bradley Smith 9f37980d45 [AArch64][SVE] Fold insert(zero, extract(X, 0), 0) -> X, when X is known to zero lanes 1-N
Specifically, this allow us to rely on the lane zero'ing behaviour of
SVE reduce instructions.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D101369
2021-05-04 15:05:05 +01:00
David Green 966435daf9 [AArch64] Fold CSEL x, x, cc -> x
This can come up in rare situations, where a csel is created with
identical operands. These can be folded simply to the original value,
allowing the csel to be removed and further simplification to happen.

This patch also removes FCSEL as it is unused, not being produced
anywhere or lowered to anything.

Differential Revision: https://reviews.llvm.org/D101687
2021-05-03 17:34:05 +01:00
LemonBoy 4751cadcca [AArch64] Prevent spilling between ldxr/stxr pairs
Apply the same logic used to check if CMPXCHG nodes should be expanded
at -O0: the register allocator may end up spilling some register in
between the atomic load/store pairs, breaking the atomicity and possibly
stalling the execution.

Fixes PR48017

Reviewed By: efriedman

Differential Revision: https://reviews.llvm.org/D101163
2021-05-01 17:17:05 +02:00
Eli Friedman 6e6ae6c727 [AArch64] Fix lowering for fshl/fshr with SVE types.
These operations don't exist natively, so just let the
target-independent code expand to plain shifts.

The generated sequences could probably be optimized a bit more, but
they seem good enough for now.

Differential Revision: https://reviews.llvm.org/D101574
2021-04-30 10:51:25 -07:00
Jun Ma b310dd1501 [AArch64][SVE] Lower index_vector to step_vector
As discussed in D100107, this patch first convert index_vector to
step_vector, and convert step_vector back to index_vector after LegalizeDAG.

Differential Revision: https://reviews.llvm.org/D100816
2021-04-30 19:04:39 +08:00
Joe Ellis 1eb81f8309 [AArch64] Add missing UINT_TO_FP promotions for v16i8
Differential Revision: https://reviews.llvm.org/D101042
2021-04-28 08:49:15 +00:00
Sander de Smalen 43ace8b5ce [TTI] NFC: Change getScalingFactorCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100564
2021-04-23 16:06:36 +01:00
Joe Ellis c19c0ad681 [AArch64][SVE] Fix bug in lowering of fixed-length integer vector divides
The function AArch64TargetLowering::LowerFixedLengthVectorIntDivideToSVE
previously assumed the operands were full vectors, but this is not
always true. This function would produce bogus if the division operands
are not full vectors, resulting in miscompiles when dividing 8-bit or
16-bit vectors.

The fix is to perform an extend + div + truncate for non-full vectors,
instead of the usual unpacking and unzipping logic. This is an additive
change which reduces the non-full integer vector divisions to a pattern
recognised by the existing lowering logic.

For future reference, an example of code that would miscompile before
this patch is below:

     1  int8_t foo(unsigned N, int8_t *a, int8_t *b, int8_t *c) {
     2      int8_t result = 0;
     3      for (int i = 0; i < N; ++i) {
     4          result += (a[i] / b[i]) / c[i];
     5      }
     6      return result;
     7  }

Differential Revision: https://reviews.llvm.org/D100370
2021-04-23 14:55:10 +00:00
David Green c0bf5929ee [AArch64] Improve vector reverse lowering
This improves the lowering of v8i16 and v16i8 vector reverse shuffles.
Instead of going via a generic tbl it uses a rev64; ext pair, as already
happens for v4i32.

Differential Revision: https://reviews.llvm.org/D100882
2021-04-22 21:01:25 +01:00
Joe Ellis 528ee161c9 [AArch64] Block tryCombineToBSL combines for vectors wider than NEON
There are no patterns for the AArch64ISD::BSP ISD node for anything
other than NEON vectors at the moment. As a result, if we hit these
combines for vectors wider than a NEON vector (such as what we might get
with fixed length SVE) we will fail to lower.

This patch simply prevents us from attempting the combines if the input
vector type is too wide.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D100961
2021-04-22 15:09:13 +00:00
Martin Storsjö 8000e1f578 [AArch64] Fix calling windows varargs with floats in fixed args from non-windows functions
When inspecting the calling convention, for calling windows functions
from a non-windows function, inspect the calling convention of
the called function, not the caller.

Also remove an unnecessary parameter to AArch64CallLowering
OutgoingArgHandler.

Differential Revision: https://reviews.llvm.org/D100890
2021-04-22 12:02:49 +03:00
Caroline Concatto ca9b7e2e2f [AArch64][SVE] Fix crash with icmp+select
This patch changes the lowering of SELECT_CC from Legal to Expand for scalable
vector and adds support for scalable vectors in performSelectCombine.

When selecting the nodes to lower in visitSELECT it checks if it is possible to
use SELECT_CC in cases where SETCC is followed by SELECT. visistSELECT checks
if SELECT_CC is legal or custom to replace SELECT by SELECT_CC.
SELECT_CC used to be legal for scalable vector, so the node changes to
SELECT_CC. This used to crash the compiler as there is no support for SELECT_CC
with scalable vectors. So now the compiler lowers to VSELECT instead of
SELECT_CC.

Differential Revision: https://reviews.llvm.org/D100485
2021-04-21 14:16:27 +01:00
Bradley Smith b8b075d8d7 [AArch64][SVE] Lower MULHU/MULHS nodes to umulh/smulh instructions
Mark MULHS/MULHU nodes as legal for both scalable and fixed SVE types,
and lower them to the appropriate SVE instructions.

Additionally now that the MULH nodes are legal, integer divides can be
expanded into a more performant code sequence.

Differential Revision: https://reviews.llvm.org/D100487
2021-04-20 15:18:06 +01:00
Bradley Smith 22c017f0f9 [AArch64][NEON] Match (or (and -a b) (and (a+1) b)) => bit select
With this patch vbslq_f32(vnegq_s32(a), b, c) lowers to a BIT instruction.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D100304
2021-04-15 13:52:47 +01:00
Martin Storsjö 5144f730a8 [AArch64] Fix windows vararg functions with floats in the fixed args
On Windows, float arguments are normally passed in float registers
in the calling convention for regular functions. For variable
argument functions, floats are passed in integer registers. This
already was done correctly since many years.

However, the surprising bit was that floats among the fixed arguments
also are supposed to be passed in integer registers, contrary to regular
functions. (This also seems to be the behaviour on ARM though, both
on Windows, but also on e.g. hardfloat linux.)

In the calling convention, don't promote shorter floats to f64, but
convert them to integers of the same length. (Floats passed as part of
the actual variable arguments are promoted to double already on the
C/Clang level; the LLVM vararg calling convention doesn't do any
extra promotion of f32 to f64 - this matches how it works on X86 too.)

Technically, this is an ABI break compared to older LLVM versions,
but it fixes compatibility with the official platform ABI. (In practice,
floats among the fixed arguments in variable argument functions is
a pretty rare construct.)

Differential Revision: https://reviews.llvm.org/D100365
2021-04-15 11:02:14 +03:00
David Sherwood 1206313f82 [CodeGen][AArch64] Fix isel crash for truncating FP stores
When attempting to truncate a FP vector and store the result out
to memory we crashed because we had no pattern for truncating FP
stores. In fact, we don't support these types of stores and the
correct fix is to stop marking these truncating stores as legal.

Tests have been added here:

  CodeGen/AArch64/sve-fptrunc-store.ll

Differential Revision: https://reviews.llvm.org/D100025
2021-04-08 13:21:29 +01:00
Jun Ma 274ac9d40e [AArch64][SVE] Lowering sve.dot to DOT node
Differential Revision: https://reviews.llvm.org/D99699
2021-04-02 20:05:17 +08:00
Bradley Smith 2f45e632c0 [AArch64][SVE] Improve codegen for select nodes with fixed types
Additionally, move the existing fixed vselect tests to *-vselect.ll.

Differential Revision: https://reviews.llvm.org/D99418
2021-04-01 15:54:37 +01:00
Bradley Smith 0934fa4f5d [AArch64][SVE] SVE functions should use the SVE calling convention for fast calls
When an SVE function calls another SVE function using the C calling
convention we use the more efficient SVE VectorCall PCS.  However,
for the Fast calling convention we're incorrectly falling back to
the generic AArch64 PCS.

This patch adds the same "can use SVE vector calling convention"
detection used by CallingConv::C to CallingConv::Fast.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D99657
2021-04-01 15:52:08 +01:00
Sander de Smalen 7108b2dec1 [SVE] Fix LoopVectorizer test scalalable-call.ll
This marks FSIN and other operations to EXPAND for scalable
vectors, so that they are not assumed to be legal by the cost-model.

Depends on D97470

Reviewed By: dmgreen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D97471
2021-03-31 14:52:49 +01:00
Florian Hahn 52e015081a
[AArch64] Avoid SCALAR_TO_VECTOR for single FP constant vector.
Currently the code only checks for integer constants (ConstantSDNode)
and triggers an infinite cycle for single-element floating point
vector constants. We need to check for both FP and integer constants.

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D99384
2021-03-31 10:17:36 +01:00
Joe Ellis a7dde4c5f7 [AArch64][SVE] Lower fixed length INSERT_VECTOR_ELT
Differential Revision: https://reviews.llvm.org/D98496
2021-03-30 09:37:11 +00:00
Joe Ellis c4d39f64d0 [AArch64][SVE] Lower fixed length EXTRACT_VECTOR_ELT
Differential Revision: https://reviews.llvm.org/D98625
2021-03-30 09:35:44 +00:00
Florian Hahn 482283042f
[AArch64] Remove custom zext/sext legalization code.
Currently performExtendCombine assumes that the src-element bitwidth * 2
is a valid MVT. But this is not the case for i1 and it causes a crash on
the v64i1 test cases added in this patch.

It turns out that this code appears to not be needed; the same patterns are
handled by other code and we end up with the same results, even without the
custom lowering. I also added additional test cases in a50037aaa6.

Let's just remove the unneeded code.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D99437
2021-03-29 22:22:05 +01:00
Bradley Smith 9745dce8c3 [SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps
This is currently performed in SelectionDAGLegalize, here we make it also
happen in LegalizeVectorOps, allowing a target to lower the SETCC condition
codes first in LegalizeVectorOps and then lower to a custom node afterwards,
without having to duplicate all of the SETCC condition legalization in the
target specific lowering.

As a result of this, fixed length floating point SETCC nodes can now be
properly lowered for SVE.

Differential Revision: https://reviews.llvm.org/D98939
2021-03-29 15:32:25 +01:00
Florian Hahn eb3d9f2eb6
[SelDag] Add isIntOrFPConstant helper function.
This patch adds a new isIntOrFPConstant  helper function to check if a
SDValue is a integer of FP constant. This pattern is used in various
places.

There also are places that incorrectly just check for integer constants,
e.g. D99384, so hopefully this helper will help people avoid that issue.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D99428
2021-03-28 12:48:58 +01:00
Nashe Mncube ac2a1e9596 [SVE] Suppress vselect warning from incorrect interface call
The VSelectCombine handler within AArch64ISelLowering,
uses an interface call which only expects fixed vectors.
This generates a warning when the call is made on a
scalable vector. This warning has been suppressed with this change,
by using the ElementCount interface, which supports both fixed and scalable vectors.
I have also added a regression test which recreates the warning.

Differential Revision: https://reviews.llvm.org/D98249
2021-03-24 14:34:34 +00:00
Benjamin Kramer 39e36fff3d [AArch64] Fix unused variable warning 2021-03-23 13:42:14 +01:00
David Sherwood 748ae5281d [IR][SVE] Add new llvm.experimental.stepvector intrinsic
This patch adds a new llvm.experimental.stepvector intrinsic,
which takes no arguments and returns a linear integer sequence of
values of the form <0, 1, ...>. It is primarily intended for
scalable vectors, although it will work for fixed width vectors
too. It is intended that later patches will make use of this
new intrinsic when vectorising induction variables, currently only
supported for fixed width. I've added a new CreateStepVector
method to the IRBuilder, which will generate a call to this
intrinsic for scalable vectors and fall back on creating a
ConstantVector for fixed width.

For scalable vectors this intrinsic is lowered to a new ISD node
called STEP_VECTOR, which takes a single constant integer argument
as the step. During lowering this argument is set to a value of 1.
The reason for this additional argument at the codegen level is
because in future patches we will introduce various generic DAG
combines such as

  mul step_vector(1), 2 -> step_vector(2)
  add step_vector(1), step_vector(1) -> step_vector(2)
  shl step_vector(1), 1 -> step_vector(2)
  etc.

that encourage a canonical format for all targets. This hopefully
means all other targets supporting scalable vectors can benefit
from this too.

I've added cost model tests for both fixed width and scalable
vectors:

  llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll
  llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll

as well as codegen lowering tests for fixed width and scalable
vectors:

  llvm/test/CodeGen/AArch64/neon-stepvector.ll
  llvm/test/CodeGen/AArch64/sve-stepvector.ll

See this thread for discussion of the intrinsic:
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html
2021-03-23 10:43:35 +00:00
Craig Topper 1066dcb550 [AArch64] Fix LowerMGATHER to return the chain result for floating point gathers.
Found by adding asserts to LegalizeDAG to make sure custom legalized
results had the right types.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D98968
2021-03-19 11:53:46 -07:00
Peter Waller 0d6482a76a [llvm][AArch64][SVE] Lower fixed length vector fabs
Seemingly striaghtforward.

Differential Revision: https://reviews.llvm.org/D98434
2021-03-18 17:20:08 +00:00
Sjoerd Meijer 90ecb862a0 [AArch64] Rewrite (add, csel) to cinc
Don't rewrite an add instruction with 2 SET_CC operands into a csel
instruction. The total instruction sequence uses an extra instruction and
register. Preventing this allows us to match a `(add, csel)` pattern and
rewrite this into a `cinc`.

Differential Revision: https://reviews.llvm.org/D98704
2021-03-18 08:49:27 +00:00
Bradley Smith cf0da91ba5 [AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE
Previously NEON used a target specific intrinsic for frintn, given that
the FROUNDEVEN ISD node now exists, move over to that instead and add
codegen support for that node for both NEON and fixed length SVE.

Differential Revision: https://reviews.llvm.org/D98487
2021-03-17 11:41:22 +00:00
Joe Ellis ff2dd8a212 [AArch64][SVE] Fold vector ZExt/SExt into gather loads where possible
This commit folds sxtw'd or uxtw'd offsets into gather loads where
possible with a DAGCombine optimization.

As an example, the following code:

     1	#include <arm_sve.h>
     2
     3	svuint64_t func(svbool_t pred, const int32_t *base, svint64_t offsets) {
     4	  return svld1sw_gather_s64offset_u64(
     5	    pred, base, svextw_s64_x(pred, offsets)
     6	  );
     7	}

would previously lower to the following assembly:

    sxtw	z0.d, p0/m, z0.d
    ld1sw	{ z0.d }, p0/z, [x0, z0.d]
    ret

but now lowers to:

    ld1sw   { z0.d }, p0/z, [x0, z0.d, sxtw]
    ret

Differential Revision: https://reviews.llvm.org/D97858
2021-03-16 15:09:46 +00:00
Stelios Ioannou ab86edbc88 [AArch64] Implement __rndr, __rndrrs intrinsics
This patch implements the __rndr and __rndrrs intrinsics to provide access to the random
number instructions introduced in Armv8.5-A. They are only defined for the AArch64
execution state and are available when __ARM_FEATURE_RNG is defined.

These intrinsics store the random number in their pointer argument and return a status
code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter
intrinsic reseeds the random number generator.

The instructions write the NZCV flags indicating the success of the operation that we can
then read with a CSET.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] https://bugs.llvm.org/show_bug.cgi?id=47838

Differential Revision: https://reviews.llvm.org/D98264

Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc
2021-03-15 17:51:48 +00:00
Bradley Smith 860ae9d50c [AArch64][SVE] Add fixed/scalable lowering of FMAXIMUM/FMINIMUM ISD nodes
Differential Revision: https://reviews.llvm.org/D98348
2021-03-11 13:37:47 +00:00
David Green 1a808286ef [AArch64] Extend vecreduce -> udot handling to mla reductions
We previously have lowering for:
  vecreduce.add(zext(X)) to vecreduce.add(UDOT(zero, X, one))
This extends that to also handle:
  vecreduce.add(mul(zext(X), zext(Y)) to vecreduce.add(UDOT(zero, X, Y))
It extends the existing code to optionally handle a mul with equal
extends.

Differential Revision: https://reviews.llvm.org/D97280
2021-03-10 22:25:12 +00:00
David Green a02f506876 [AArch64] Extend vecreduce -> udot handling to v8i8
https://reviews.llvm.org/D88577 added v16i8 vecreduce to udot/sdot
lowering. This extends that to v8i8 too, generalizing the pattern to
handle the extra types.

Differential Revision: https://reviews.llvm.org/D97279
2021-03-10 21:03:15 +00:00
Cullen Rhodes 2750f3ed31 [IR] Introduce llvm.experimental.vector.splice intrinsic
This patch introduces a new intrinsic @llvm.experimental.vector.splice
that constructs a vector of the same type as the two input vectors,
based on a immediate where the sign of the immediate distinguishes two
variants. A positive immediate specifies an index into the first vector
and a negative immediate specifies the number of trailing elements to
extract from the first vector.

For example:

  @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E>  ; index
  @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing element count

These intrinsics support both fixed and scalable vectors, where the
former is lowered to a shufflevector to maintain existing behaviour,
although while marked as experimental the recommended way to express
this operation for fixed-width vectors is to use shufflevector. For
scalable vectors where it is not possible to express a shufflevector
mask for this operation, a new ISD node has been implemented.

This is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker and Cullen Rhodes.

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D94708
2021-03-09 10:44:22 +00:00
Akira Hatanaka dca5737945 Move ObjCARCUtil.h back to llvm/Analysis
Instead of adding the header to llvm/IR, just duplicate the marker
string in the auto upgrader.
2021-03-08 16:35:24 -08:00
LemonBoy 8725b24c6d [AArch64] Legalize horizontal fmax/fmin reductions on f16 vectors
Expand the horizontal reduction during the instruction selection phase, but only if the target doesn't support the full fp16 instruction set.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49401

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D97840
2021-03-05 16:09:37 +01:00
David Blaikie a2a55def35 Move llvm/Analysis/ObjCARCUtil.h to IR to fix layering.
This is included from IR files, and IR doesn't/can't depend on Analysis
(because Analysis depends on IR).

Also fix the implementation - don't use non-member static in headers, as
it leads to ODR violations, inaccurate "unused function" warnings, etc.
And fix the header protection macro name (we don't generally include
"LIB" in the names, so far as I can tell).
2021-03-04 16:14:53 -08:00
Akira Hatanaka 1900503595 [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.

Original commit message:

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-03-04 11:22:30 -08:00
Hans Wennborg 0a5dd06718 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.

> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
>   which indicates the call is implicitly followed by a marker
>   instruction and an implicit retainRV/claimRV call that consumes the
>   call result. In addition, it emits a call to
>   @llvm.objc.clang.arc.noop.use, which consumes the call result, to
>   prevent the middle-end passes from changing the return type of the
>   called function. This is currently done only when the target is arm64
>   and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
>   with the operand bundle in the IR and removes the inserted calls after
>   processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
>   operand bundle. It doesn't remove the operand bundle on the call since
>   the backend needs it to emit the marker instruction. The retainRV and
>   claimRV calls are emitted late in the pipeline to prevent optimization
>   passes from transforming the IR in a way that makes it harder for the
>   ARC middle-end passes to figure out the def-use relationship between
>   the call and the retainRV/claimRV calls (which is the cause of
>   PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
>   nothing in the callee prevents it from being paired up with the
>   retainRV/claimRV call in the caller. It then inserts a release call if
>   claimRV is attached to the call since autoreleaseRV+claimRV is
>   equivalent to a release. If it cannot find an autoreleaseRV call, it
>   tries to transfer the operand bundle to a function call in the callee.
>   This is important since the ARC optimizer can remove the autoreleaseRV
>   returning the callee result, which makes it impossible to pair it up
>   with the retainRV/claimRV call in the caller. If that fails, it simply
>   emits a retain call in the IR if retainRV is attached to the call and
>   does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
>   constant value if the call has the operand bundle. This ensures the
>   call always has at least one user (the call to
>   @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
>   multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
>   calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808

This reverts commit ed4718eccb.
2021-03-03 15:51:40 +01:00
David Green 7abf7dd5ef [AArch64] Add combine for add(udot(0, x, y), z) -> udot(z, x, y).
Given a zero input for a udot, an add can be folded in to take the place
of the input, using thte addition that the instruction naturally
performs.

Differential Revision: https://reviews.llvm.org/D97188
2021-03-01 12:53:34 +00:00
Fraser Cormack 6718fda6ad [CodeGen] Fix issues with subvector intrinsic index types
This patch addresses issues arising from the fact that the index type
used for subvector insertion/extraction is inconsistent between the
intrinsics and SDNodes. The intrinsic forms require i64 whereas the
SDNodes use the type returned by SelectionDAG::getVectorIdxTy.

Rather than update the intrinsic definitions to use an overloaded index
type, this patch fixes the issue by transforming the index to the
correct type as required. Any loss of index bits going from i64 to a
smaller type is unexpected, and will be caught by an assertion in
SelectionDAG::getVectorIdxConstant.

The patch also updates the documentation for INSERT_SUBVECTOR and adds
an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR.
This necessitated changes to AArch64 which was using i64 for
EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed
its codegen after updating the backend accordingly.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D97459
2021-03-01 10:28:21 +00:00
David Green f51b3de4e8 [AArch64] Introduce UDOT/SDOT DAG nodes
This is used to lower UDOT/SDOT instructions, as opposed to relying on
the intrinsic. Subsequent optimizations will be able to optimize them
more cleanly based on these nodes.
2021-02-23 20:31:01 +00:00
Serge Pavlov 2c4f60e45b [FPEnv][AArch64] Implement lowering of llvm.set.rounding
Differential Revision: https://reviews.llvm.org/D96836
2021-02-19 13:16:51 +07:00
Bradley Smith 8bad8a43c3 [AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD
Adjust generateFMAsInMachineCombiner to return false if SVE is present
in order to combine fmul+fadd into fma. Also add new pseudo instructions
so as to select the most appropriate of FMLA/FMAD depending on register
allocation.

Depends on D96599

Differential Revision: https://reviews.llvm.org/D96424
2021-02-18 16:55:16 +00:00
Bradley Smith 5b094bfeb3 [AArch64] Allow folding FMUL/FADD into FMA for FP16 types
isFMAFasterThanFMulAndFAdd should return true for FP16 types when
HasFullFP16 is present, since we have the instructions to handle it for
both SVE and NEON. (SVE patterns and tests will follow).

Differential Revision: https://reviews.llvm.org/D96599
2021-02-18 16:51:22 +00:00
Fraser Cormack 0176fecfbc [SVE][CodeGen] Expand SVE MULH[SU] and [SU]MUL_LOHI nodes
This patch fixes a codegen crash introduced in fde2466171, where the
DAGCombiner started generating optimized MULH[SU] or [SU]MUL_LOHI nodes
unless the target opted out. The AArch64 backend cannot currently select
any of these nodes, so ensure that they are not generated in the first
place.

This issue was raised by @huihuiz in D94501.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D96849
2021-02-18 10:06:24 +00:00
Florian Hahn 211147c5ba
[AArch64] Convert CMP/SELECT sign patterns to OR & ASR.
ICMP & SELECT patterns extracting the sign of a value can be simplified
to OR & ASR (see  https://alive2.llvm.org/ce/z/Xx4iZ0).

This does not save any instructions in IR, but it is profitable on
AArch64, because we need at least 2 extra instructions to materialize 1
and -1 for the SELECT.

The improvements result in ~5% speedups on loops of the form

    static int sign_of(int x) {
      if (x < 0) return -1;
      return 1;
    }

    void foo(const int *x, int *res, int cnt) {
      for (int i=0;i<cnt;i++)
        res[i] = sign_of(x[i]);
    }

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D96596
2021-02-16 17:17:34 +00:00
Caroline Concatto 2d728bbff5 [CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse
This patch adds  a new intrinsic experimental.vector.reduce that takes a single
vector and returns a vector of matching type but with the original lane order
 reversed. For example:

```
vector.reverse(<A,B,C,D>) ==> <D,C,B,A>
```

The new intrinsic supports fixed and scalable vectors types.
The fixed-width vector relies on shufflevector to maintain existing behaviour.
Scalable vector uses the new ISD node - VECTOR_REVERSE.

This new intrinsic is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker (@paulwalker-arm).

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Differential Revision: https://reviews.llvm.org/D94883
2021-02-15 13:39:43 +00:00
Akira Hatanaka ed4718eccb [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-12 09:51:57 -08:00
Nico Weber de1966e542 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 4a64d8fe39.
Makes clang crash when buildling trivial iOS programs, see comment
after https://reviews.llvm.org/D92808#2551401
2021-02-09 11:06:32 -05:00
Akira Hatanaka 4a64d8fe39 [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.

Original commit message:

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 06:09:42 -08:00
Akira Hatanaka 2fbbb18c1d Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 3fe3946d9a.

The commit violates layering by including a header from Analysis in
lib/IR/AutoUpgrade.cpp.
2021-02-05 06:00:05 -08:00
Akira Hatanaka 3fe3946d9a [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 05:55:18 -08:00
Craig Topper 11ef356d9e [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
Kerry McLaughlin 9b4fcfaa9e [SVE][CodeGen] Remove performMaskedGatherScatterCombine
The AArch64 DAG combine added by D90945 & D91433 extends the index
of a scalable masked gather or scatter to i32 if necessary.

This patch removes the combine and instead adds shouldExtendGSIndex, which
is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether
the index should be extended before calling getMaskedGather/Scatter.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D94525
2021-02-01 14:10:00 +00:00
Kazu Hirata 7925aa091d [llvm] Populate SmallVector at construction time (NFC) 2021-01-28 22:21:14 -08:00
Richard Smith 925ae8c790 Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV"
This reverts commit 53176c1680, which
introduceed a layering violation. LLVM's IR library can't include
headers from Analysis.
2021-01-25 13:53:38 -08:00
Akira Hatanaka 53176c1680 [ObjC][ARC] Annotate calls with attributes instead of emitting retainRV
or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end annotates calls with attribute "clang.arc.rv"="retain"
  or "clang.arc.rv"="claim", which indicates the call is implicitly
  followed by a marker instruction and a retainRV/claimRV call that
  consumes the call result. This is currently done only when the target
  is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the
  annotated calls in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the annotated
  calls. It doesn't remove the attribute on the call since the backend
  needs it to emit the marker instruction. The retainRV/claimRV calls
  are emitted late in the pipeline to prevent optimization passes from
  transforming the IR in a way that makes it harder for the ARC
  middle-end passes to figure out the def-use relationship between the
  call and the retainRV/claimRV calls (which is the cause of PR31925).

- The function inliner removes the autoreleaseRV call in the callee that
  returns the result if nothing in the callee prevents it from being
  paired up with the calls annotated with "clang.arc.rv"="retain/claim"
  in the caller. If the call is annotated with "claim", a release call
  is inserted since autoreleaseRV+claimRV is equivalent to a release. If
  it cannot find an autoreleaseRV call, it tries to transfer the
  attributes to a function call in the callee. This is important since
  ARC optimizer can remove the autoreleaseRV call returning the callee
  result, which makes it impossible to pair it up with the retainRV or
  claimRV call in the caller. If that fails, it simply emits a retain
  call in the IR if the call is annotated with "retain" and does nothing
  if it's annotated with "claim".

- This patch teaches dead argument elimination pass not to change the
  return type of a function if any of the calls to the function are
  annotated with attribute "clang.arc.rv". This is necessary since the
  pass can incorrectly determine nothing in the IR uses the function
  return, which can happen since the front-end no longer explicitly
  emits retainRV/claimRV calls in the IR, and change its return type to
  'void'.

Future work:

- Use the attribute on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the attributes.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-01-25 11:57:08 -08:00
QingShan Zhang ffc3e800c6 [NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero
For now, we correct the result for sqrt if iteration > 0. This doesn't make
sense as they are not strict relative.

Reviewed By: dmgreen, spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D94480
2021-01-25 04:02:44 +00:00
Kazu Hirata cfa241680f [llvm] Don't include StringSwitch.h where unnecessary (NFC) 2021-01-21 19:59:48 -08:00
Simon Pilgrim 69bc0990a9 [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED).
Add DemandedElts support inside the TRUNCATE analysis.

REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d

Differential Revision: https://reviews.llvm.org/D56387
2021-01-21 13:01:34 +00:00
Hans Wennborg a51226057f Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE"
It caused "Vector shift amounts must be in the same as their first arg"
asserts in Chromium builds. See the code review for repro instructions.

> Add DemandedElts support inside the TRUNCATE analysis.
>
> Differential Revision: https://reviews.llvm.org/D56387

This reverts commit cad4275d69.
2021-01-20 20:06:55 +01:00
Simon Pilgrim cad4275d69 [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE
Add DemandedElts support inside the TRUNCATE analysis.

Differential Revision: https://reviews.llvm.org/D56387
2021-01-20 15:39:58 +00:00
Amanieu d'Antras 21bfd068b3 [AArch64] Add support for the GNU ILP32 ABI
Add the aarch64[_be]-*-gnu_ilp32 targets to support the GNU ILP32 ABI for AArch64.

The needed codegen changes were mostly already implemented in D61259, which added support for the watchOS ILP32 ABI. The main changes are:
- Wiring up the new target to enable ILP32 codegen and MC.
- ILP32 va_list support.
- ILP32 TLSDESC relocation support.

There was existing MC support for ELF ILP32 relocations from D25159 which could be enabled by passing "-target-abi ilp32" to llvm-mc. This was changed to check for "gnu_ilp32" in the target triple instead. This shouldn't cause any issues since the existing support was slightly broken: it was generating ELF64 objects instead of the ELF32 object files expected by the GNU ILP32 toolchain.

This target has been tested by running the full rustc testsuite on a big-endian ILP32 system based on the GCC ILP32 toolchain.

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D94143
2021-01-20 13:34:47 +00:00
Nicholas Guy 16bf02c3a1 Reland "[AArch64] Attempt to sink mul operands""
This relands dda60035e9,
which was reverted by dbaa6a1858
2021-01-18 16:00:22 +00:00
Nicholas Guy f5fcbe4e3c [AArch64] Further restricts when a dup(*ext) can be rearranged
In most cases, the dup(*ext) pattern can be rearranged to perform
the extension on the vector side, allowing for further vector-specific
optimisations to be made. However the initial checks for this conversion
were insufficient, allowing invalid encodings to be attempted (causing
compilation to fail).

Differential Revision: https://reviews.llvm.org/D94778
2021-01-18 16:00:21 +00:00
Stephan Herhut 061d152085 [SVE] Fix unused variable.
Introduced by [SVE] Restrict the usage of REINTERPRET_CAST.

Differential Revision: https://reviews.llvm.org/D94773
2021-01-15 15:10:33 +01:00
Paul Walker 2b8db40c92 [SVE] Restrict the usage of REINTERPRET_CAST.
In order to limit the number of combinations of REINTERPRET_CAST,
whilst at the same time prevent overlap with BITCAST, this patch
establishes the following rules:

1. The operand and result element types must be the same.
2. The operand and/or result type must be an unpacked type.

Differential Revision: https://reviews.llvm.org/D94593
2021-01-15 11:32:13 +00:00
Kazu Hirata 7dc3575ef2 [llvm] Remove redundant return and continue statements (NFC)
Identified with readability-redundant-control-flow.
2021-01-14 20:30:34 -08:00
Martin Storsjö dbaa6a1858 Revert "[AArch64] Attempt to sink mul operands"
This reverts commit dda60035e9.

This commit caused failures to compile some sources, erroring out
with "error in backend: Cannot select: t85: v2i32 = AArch64ISD::DUP t15",
see https://reviews.llvm.org/D91271 for the full reproduction case.
2021-01-14 17:28:18 +02:00
Nicholas Guy dda60035e9 [AArch64] Attempt to sink mul operands
Following on from D91255, this patch is responsible for sinking relevant mul
operands to the same block so that umull/smull instructions can be correctly
generated by the mul combine implemented in the aforementioned patch.

Differential revision: https://reviews.llvm.org/D91271
2021-01-13 15:23:36 +00:00
Hsiangkai Wang 914e2f5a02 [NFC] Use generic name for scalable vector stack ID.
Differential Revision: https://reviews.llvm.org/D94471
2021-01-13 10:57:43 +08:00
Kerry McLaughlin c37f68a888 [SVE][CodeGen] Fix legalisation of floating-point masked gathers
Changes in this patch:
- When lowering floating-point masked gathers, cast the result of the
  gather back to the original type with reinterpret_cast before returning.
- Added patterns for reinterpret_casts from integer to floating point, and
  concat_vector patterns for bfloat16.
- Tests for various legalisation scenarios with floating point types.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D94171
2021-01-11 10:57:46 +00:00
Mark Murray af7cce2fa4 [AArch64] Add +pauth archictecture option, allowing the v8.3a pointer authentication extension.
Differential Revision: https://reviews.llvm.org/D94083
2021-01-08 13:21:11 +00:00
Nicholas Guy ed23229a64 [AArch64] Fix crash caused by invalid vector element type
Fixes a crash caused by D91255, when LLVMTy is null when
calling changeExtendedVectorElementType.

Differential Revision: https://reviews.llvm.org/D94234
2021-01-08 12:02:54 +00:00
David Sherwood d1bf26fd94 [AArch64][SVE] Add lowering for llvm abs intrinsic
Add functionality to permit lowering of the abs and neg intrinsics
using the passthru variants.

Differential Revision: https://reviews.llvm.org/D94160
2021-01-08 08:55:25 +00:00
Kazu Hirata b934160aaa [Target] Use llvm::find_if (NFC) 2021-01-07 20:29:36 -08:00
Nicholas Guy 350247a93c [AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))
Performing this rearrangement allows for existing patterns
to match cases where the vector may be built after an extend,
instead of before.

Differential Revision: https://reviews.llvm.org/D91255
2021-01-06 16:02:16 +00:00
David Green 78d8a821e2 [AArch64] Handle any extend whilst lowering mull
Demanded bits may turn a sext or zext into an anyext if the top bits are
not needed. This currently prevents the lowering to instructions like
mull, addl and addw. This patch fixes the mull generation by keeping it
simple and treating them like zextends.

Differential Revision: https://reviews.llvm.org/D93832
2021-01-06 10:08:43 +00:00
Sander de Smalen a9f5e4375b [AArch64] Use faddp to implement fadd reductions.
Custom-expand legal VECREDUCE_FADD SDNodes
to benefit from pair-wise faddp instructions.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D59259
2021-01-06 09:36:51 +00:00
Paul Walker eba6deab22 [SVE] Lower vector CTLZ, CTPOP and CTTZ operations.
CTLZ and CTPOP are lowered to CLZ and CNT instructions respectively.

CTTZ is not a native SVE operation but is instead lowered to:
  CTTZ(V) => CTLZ(BITREVERSE(V))

In the case of fixed-length support using SVE we also lower CTTZ
operating on NEON sized vectors because of its reliance on
BITREVERSE which is also lowered to SVE intructions at these lengths.

Differential Revision: https://reviews.llvm.org/D93607
2021-01-05 10:42:35 +00:00
Nikita Popov fb77d95022 [AArch64] Fix legalization of i128 ctpop without neon
If neon is disabled, LowerCTPOP will return SDValue() to indicate
that normal legalization should be used. However, ReplaceNodeResults
does not check for this and pushes the empty SDValue() onto the
result vector, which will subsequently result in a crash.

Differential Revision: https://reviews.llvm.org/D93825
2020-12-27 17:24:41 +01:00
Paul Walker 8eec7294fe [SVE] Lower vector BITREVERSE and BSWAP operations.
These operations are lowered to RBIT and REVB instructions
respectively.  In the case of fixed-length support using SVE we
also lower BITREVERSE operating on NEON sized vectors as this
results in fewer instructions.

Differential Revision: https://reviews.llvm.org/D93606
2020-12-22 16:49:50 +00:00
Kazu Hirata 966f1431de [Target] Use llvm::erase_if (NFC) 2020-12-20 17:43:22 -08:00
Kerry McLaughlin 52e4084d9c [SVE][CodeGen] Vector + immediate addressing mode for masked gather/scatter
This patch extends LowerMGATHER/MSCATTER to make use of the vector + reg/immediate
addressing modes for scalable masked gathers & scatters.

selectGatherScatterAddrMode checks if the base pointer is null, in which case
we can swap the base pointer and the index, e.g.
     getelementptr nullptr, <vscale x N x T> (splat(%offset)) + %indices)
  -> getelementptr %offset, <vscale x N x T> %indices

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D93132
2020-12-18 11:56:36 +00:00
Kerry McLaughlin 6d2a78996b [SVE][CodeGen] Add bfloat16 support to scalable masked gather
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D93307
2020-12-17 11:08:15 +00:00
QingShan Zhang ebdd20f430 Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128
X86 and AArch64 expand it as libcall inside the target. And PowerPC also
want to expand them as libcall for P8. So, propose an implement in the
legalizer to common the logic and remove the code for X86/AArch64 to
avoid the duplicate code.

Reviewed By: Craig Topper

Differential Revision: https://reviews.llvm.org/D91331
2020-12-17 07:59:30 +00:00
Paul Walker b74c4dbb96 [SVE] Move INT_TO_FP i1 promotion into custom lowering.
AddPromotedToType is being used to legalise INT_TO_FP operations
when the source is a predicate. The point where this introduces
vector extends might cause problems in the future so this patch
falls back to manual promotion within custom lowering.

Differential Revision: https://reviews.llvm.org/D90093
2020-12-15 11:57:07 +00:00
Kerry McLaughlin c5ced82c8e [SVE][CodeGen] Lower scalable floating-point vector reductions
Changes in this patch:
-  Minor changes to the LowerVECREDUCE_SEQ_FADD function added by @cameron.mcinally
   to also work for scalable types
- Added TableGen patterns for FP reductions with unpacked types (nxv2f16, nxv4f16 & nxv2f32)
- Asserts added to expandFMINNUM_FMAXNUM & expandVecReduceSeq for scalable types

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D93050
2020-12-14 11:45:42 +00:00
Florian Hahn 46bc40e502
Recommit "[AArch64] Lower calls with rv_marker attribute."
This recommits a87fccb3ff with a fix to mark the destination operand
of the marker instruction as def, to fix a machine verifier failure.

This reverts the revert commit c0f2cea7c0.
2020-12-13 16:20:39 +00:00
Florian Hahn c0f2cea7c0
Revert "[AArch64] Lower calls with rv_marker attribute ."
This reverts commit a87fccb3ff.

A test appears to fail with expensive checks. Reverting while I
investigate.
2020-12-11 20:12:59 +00:00
Florian Hahn a87fccb3ff
[AArch64] Lower calls with rv_marker attribute .
This patch adds support for lowering function calls with the
rv_marker attribute. The goal is to expand such calls to the
following sequence of instructions:

    BL @fn
    mov x29, x29

This sequence of instructions triggers Objective-C runtime optimizations,
hence we want to ensure no instructions get moved in between them.
This patch achieves that by adding a new CALL_RVMARKER ISD node,
which gets turned into the BLR_RVMARKER pseudo, which eventually gets
expanded into the sequence mentioned above. The sequence is then marked
as instruction bundle, to avoid anything being moved in between.

@ahatanak is working on using this attribute in the front- & middle-end.

Together with the front- & middle-end changes, this should address
PR31925 for AArch64.

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D92569
2020-12-11 19:45:44 +00:00
Kerry McLaughlin abe7775f5a [SVE][CodeGen] Extend index of masked gathers
This patch changes performMSCATTERCombine to also promote the indices of
masked gathers where the element type is i8 or i16, and adds various tests
for gathers with illegal types.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91433
2020-12-10 13:54:45 +00:00
Kerry McLaughlin 05edfc5475 [SVE][CodeGen] Add DAG combines for s/zext_masked_gather
This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true:
 - fold (and (masked_gather x)) -> (zext_masked_gather x)
 - fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x)

LowerMGATHER has also been updated to fetch the LoadExtType associated with the
gather and also use this value to determine the correct masked gather opcode to use.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92230
2020-12-09 11:53:19 +00:00
Huihui Zhang 8e6fc1f97e [AArch64][SVE] Add lowering for llvm.maxnum|minnum for scalable type.
LLVM intrinsic llvm.maxnum|minnum is overloaded intrinsic, can be used on any
floating-point or vector of floating-point type.
This patch extends current infrastructure to support scalable vector type.

This patch also fix a warning message of incorrect use of EVT::getVectorNumElements()
for scalable type, when DAGCombiner trying to split scalable vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92607
2020-12-08 09:35:53 -08:00
David Sherwood 59f17b57d9 [SVE] Fix crashes with inline assembly
All the crashes found compiling inline assembly are fixed in this
patch by changing AArch64TargetLowering::getRegForInlineAsmConstraint
to be more resilient to mismatched value and register types. For
example, it makes no sense to request a predicate register for
a nxv2i64 type and so on.

Tests have been added here:

  test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll

Differential Revision: https://reviews.llvm.org/D92554
2020-12-08 13:48:43 +00:00
Tim Northover c5978f42ec UBSAN: emit distinctive traps
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
2020-12-08 10:28:26 +00:00
Kerry McLaughlin 111f559bbd [SVE][CodeGen] Call refineIndexType & refineUniformBase from visitMGATHER
The refineIndexType & refineUniformBase functions added by D90942 can also be used to
improve CodeGen of masked gathers.

These changes were split out from D91092

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92319
2020-12-07 13:20:19 +00:00
Kerry McLaughlin f6dd32fd35 [SVE][CodeGen] Lower scalable masked gathers
Lowers the llvm.masked.gather intrinsics (scalar plus vector addressing mode only)

Changes in this patch:
- Add custom lowering for MGATHER, using getGatherVecOpcode() to choose the appropriate
  gather load opcode to use.
- Improve codegen with refineIndexType/refineUniformBase, added in D90942
- Tests added for gather loads with 32 & 64-bit scaled & unscaled offsets.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91092
2020-12-07 12:20:41 +00:00
Craig Topper c55d9af8c0 [AArch64] Add custom lowering for ISD::ABS
Instead of trying to pattern match the code produced by ISD::ABS expansion, just custom legalize ISD::ABS to the desired sequence.

The one test change is because a DAG combine for (neg (abs)) is no longer firing because ISD::ABS is now Custom instead of Expand.

Differential Revision: https://reviews.llvm.org/D92154
2020-12-04 10:45:31 -08:00
Kerry McLaughlin 603d40da9d [SVE][CodeGen] Add a DAG combine to extend mscatter indices
This patch adds a target-specific DAG combine for mscatter to promote indices
with element types i8 or i16 before legalisation, plus various tests with illegal types.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90945
2020-11-25 11:18:22 +00:00
Pavel Iliin 4d7df43ffd [AArch64] Out-of-line atomics (-moutline-atomics) implementation.
This patch implements out of line atomics for LSE deployment
mechanism. Details how it works can be found in llvm/docs/Atomics.rst
Options -moutline-atomics and -mno-outline-atomics to enable and disable it
were added to clang driver. This is clang and llvm part of out-of-line atomics
interface, library part is already supported by libgcc. Compiler-rt
support is provided in separate patch.

Differential Revision: https://reviews.llvm.org/D91157
2020-11-20 13:30:12 +00:00
Adhemerval Zanella 807320119f [AArch64] Lower fptrunc/fpext from/to FP128t to/from FP16
The compiler-rt part which adds the emitted symbols is handled in
a subsequent patch.

Differential Revision: https://reviews.llvm.org/D91731
2020-11-19 15:14:50 -03:00
Kerry McLaughlin 306c8ab208 [SVE][CodeGen] Improve codegen of scalable masked scatters
If the scatter store is able to perform the sign/zero extend of
its index, this is folded into the instruction with refineIndexType().
Additionally, refineUniformBase() will return the base pointer and index
from an add + splat_vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90942
2020-11-13 11:19:36 +00:00
David Sherwood 3225fcf11e [SVE] Deal with SVE tuple call arguments correctly when running out of registers
When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack.

I've fixed this by ensuring that:

1. When we don't have enough registers to allocate the whole block
   we mark any remaining SVE registers temporarily as allocated.
2. We temporarily remove the InConsecutiveRegs flags from the last
   tuple part argument and reinvoke the autogenerated calling
   convention handler. Doing this prevents the code from entering
   an infinite recursion and, in combination with 1), ensures we
   switch over to the Indirect mode.
3. After allocating a GPR register for the pointer to the tuple we
   then deallocate any SVE registers we marked as allocated in 1).
   We also set the InConsecutiveRegs flags back how they were before.
4. I've changed the AArch64ISelLowering LowerCALL and
   LowerFormalArguments functions to detect the start of a tuple,
   which involves allocating a single stack object and doing the
   correct numbers of legal loads and stores.

Differential Revision: https://reviews.llvm.org/D90219
2020-11-12 08:41:50 +00:00
Caroline Concatto 37f4ccb275 [AArch64]Add memory op cost model for SVE
This patch adds/fixes memory op cost model for SVE with fixed-width
vector.

Differential Revision: https://reviews.llvm.org/D90950
2020-11-11 12:49:19 +00:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
Kerry McLaughlin 170947a5de [SVE][CodeGen] Lower scalable masked scatters
Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only)

Changes included in this patch:
 - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use.
    Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts.
 - Added the getCanonicalIndexType function to convert redundant addressing
   modes (e.g. scaling is redundant when accessing bytes)
 - Tests with 32 & 64-bit scaled & unscaled offsets

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90941
2020-11-11 11:50:22 +00:00
Francesco Petrogalli 9f61931e07 [llvm][AArch64] Allow TB(N)Z to drop signext for sign bit tests.
For example if the sign extension is only used in for TBZ, and the value is used elsewhere with a zero extension, this can eliminate a sign extension.

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D90606
2020-11-09 18:27:48 +00:00
Christopher Tetreault 900ec97bbe [UBSan] Cannot negate smallest negative signed integer
Silence warning Undefined Behavior Sanitzer warning:
runtime error: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D90710
2020-11-04 10:07:52 -08:00
Kerry McLaughlin f2412d372d [SVE][CodeGen] Lower scalable integer vector reductions
This patch uses the existing LowerFixedLengthReductionToSVE function to also lower
scalable vector reductions. A separate function has been added to lower VECREDUCE_AND
& VECREDUCE_OR operations with predicate types using ptest.

Lowering scalable floating-point reductions will be addressed in a follow up patch,
for now these will hit the assertion added to expandVecReduce() in TargetLowering.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D89382
2020-11-04 11:38:49 +00:00
David Sherwood cea69fa4dc [SVE] Add fatal error for unnamed SVE variadic arguments
We don't currently support passing unnamed variadic SVE arguments
so I've added a fatal error if we hit such cases to prevent any
silent ABI issues in future.

Differential Revision: https://reviews.llvm.org/D90230
2020-10-30 13:35:47 +00:00
Florian Hahn ba78cae20f [AArch64] Use DUP for BUILD_VECTOR with few different elements.
If most elements of BUILD_VECTOR are the same, with a few different
elements, it is better to use DUP  for the common elements and
INSERT_VECTOR_ELT for the different elements.

Currently this transform is guarded quite restrictively to only trigger
in clearly beneficial cases.

With D90176, the lowering for patterns originating from code like
` float32x4_t y = {a,a,a,0};` (common in 3D apps) are lowered even
better (unnecessary fmov is removed).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D90233
2020-10-28 19:48:20 +00:00
David Green 066737fdbc [AArch64] Remove AArch64ISD::NOT, use vnot instead
vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT
node, but allow more folding thanks to all the target independent
optimizations. Specifically this allows select(icmp ne, x, y) to
become "cmeq; bsl y, x" as opposed to needing to convert the predicate
with "cmeq; mvn; bsl x, y"

Unfortunately there is a regression in a cmtst test, but the code it
selected from was already non-canonical, with instcombine preferring to
use an eq predicate instead. Plus the more common case of icmp ne is
improved.

Differential Revision: https://reviews.llvm.org/D90126
2020-10-28 08:15:37 +00:00
Cameron McInally a1cc274cb3 [SVE] Lower fixed length VECREDUCE_SEQ_FADD operation
Differential Revision: https://reviews.llvm.org/D89162
2020-10-23 16:24:02 -05:00
David Sherwood d67d8f8790 [SVE][AArch64] Replace TypeSize comparisons with their integer equivalents
In many places in the AArch64 backend we are comparing TypeSize objects,
but in fact we are only ever expecting fixed width types. I've changed
all such comparisons to use their integer equivalents by replacing
calls to getSizeInBits() with getFixedSizeInBits(), etc.

Differential Revision: https://reviews.llvm.org/D89116
2020-10-19 07:41:33 +01:00
Vinay Madhusudan 159b2d8e62 [AArch64] Combine UADDVs to generate vector add
ADD(UADDV a, UADDV b) --> UADDV(ADD a, b)

This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888
Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929

Differential Revision: https://reviews.llvm.org/D88731
2020-10-15 09:56:31 +05:30
Cameron McInally 421f1b7294 [SVE] Lower fixed length VECREDUCE_FADD operation
Differential Revision: https://reviews.llvm.org/D89263
2020-10-14 09:41:11 -05:00
David Sherwood af57a0838e [SVE] Add fatal error when running out of registers for SVE tuple call arguments
When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack. This is wrong and we'd like
to avoid any silent ABI compatibility issues in future. For now,
I've added a fatal error when this happens until we can get a
proper fix.

Differential Revision: https://reviews.llvm.org/D89326
2020-10-14 09:31:41 +01:00
Vinay Madhusudan 37dce7475b [AArch64] Identify SAD pattern
(ABS (SUB (EXTEND a), (EXTEND b))) to ZERO_EXTEND((UABD a, b))
(ABS (SUB (SIGN_EXTEND a), (SIGN_EXTEND b))) to ZERO_EXTEND((SABD a, b))

This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888
Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929

Differential Revision: https://reviews.llvm.org/D88742
2020-10-13 15:50:54 +05:30
Cameron McInally 974ddb54c9 [SVE] Lower fixed length VECREDUCE_XOR operation
Differential Revision: https://reviews.llvm.org/D88974
2020-10-12 10:12:15 -05:00
Cameron McInally 333b2ab60b [SVE] Lower fixed length VECREDUCE_OR operation
Differential Revision: https://reviews.llvm.org/D88847
2020-10-07 09:56:25 -05:00
Paul Walker 27f3d51b4e [SVE] Lower fixed length vector fneg and fsqrt operations.
Also updates sve-fp.ll to use fneg directly.

Differential Revision: https://reviews.llvm.org/D88683
2020-10-06 10:48:16 +01:00
Paul Walker 8bb702a8ad [SVE] Lower fixed length vector floating point rounding operations.
Adds lowering for:
  llvm.ceil
  llvm.floor
  llvm.nearbyint
  llvm.rint
  llvm.round
  llvm.trunc

Differential Revision: https://reviews.llvm.org/D88671
2020-10-06 10:48:16 +01:00
Cameron McInally 9642ded8ba [SVE] Lower fixed length VECREDUCE_AND operation
Differential Revision: https://reviews.llvm.org/D88707
2020-10-05 11:28:38 -05:00
Vinay Madhusudan f192594956 [AArch64] Generate dot for v16i8 sum reduction to i32
Convert VECREDUCE_ADD( EXTEND(v16i8_type) ) to VECREDUCE_ADD( DOTv16i8(v16i8_type) ) whenever the result type is i32. This gains in one of the SPECCPU 2017 benchmark.

This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888
Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929

Differential Revision: https://reviews.llvm.org/D88577
2020-10-02 17:11:02 +01:00
Muhammad Asif Manzoor aab6f7db47 [AArch64][SVE] Add lowering for llvm fabs
Add the functionality to lower fabs for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88679
2020-10-01 19:41:25 -04:00
Kerry McLaughlin fcf70e1e3b [SVE][CodeGen] Lower scalable fp_extend & fp_round operations
This patch adds FP_EXTEND_MERGE_PASSTHRU & FP_ROUND_MERGE_PASSTHRU
ISD nodes, used to lower scalable vector fp_extend/fp_round operations.
fp_round has an additional argument, the 'trunc' flag, which is an integer of zero or one.

This also fixes a warning introduced by the new tests added to sve-split-fcvt.ll,
resulting from an implicit TypeSize -> uint64_t cast in SplitVecOp_FP_ROUND.

Reviewed By: sdesmalen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88321
2020-10-01 12:17:37 +01:00
Kerry McLaughlin 75db7cf78a [SVE][CodeGen] Legalisation of integer -> floating point conversions
Splitting the operand of a scalable [S|U]INT_TO_FP results in a
concat_vectors operation where the operands are unpacked FP
scalable vectors (e.g. nxv2f32).
This patch adds custom lowering of concat_vectors which
checks that the number of operands is 2, and isel patterns
to match concat_vectors of scalable FP types with uzp1.

Reviewed By: efriedma, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88033
2020-10-01 10:43:20 +01:00
Paul Walker 8931c3d682 [NFC] Iterate across an explicit list of scalable MVTs when driving setOperationAction.
Iterating across all of integer_scalable_vector_valuetypes seems
wasteful when there's only a handful we care about.

Also removes some rouge whitespace.

Differential Revision: https://reviews.llvm.org/D88552
2020-10-01 10:17:59 +01:00
Cameron McInally 80381c4dc9 [SVE] Lower fixed length VECREDUCE_[FMAX|FMIN] to Scalable
Differential Revision: https://reviews.llvm.org/D88444
2020-09-29 16:22:29 -05:00
Cameron McInally 9b0b09671c [SVE] Lower fixed length VECREDUCE_[UMAX|UMIN] to Scalable
Essentially the same as the signed variants from D88259. Also includes a clean up of the lowering function.

Differential Revision: https://reviews.llvm.org/D88317
2020-09-28 09:29:00 -05:00
David Sherwood bafdd11326 [SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy
After some recent upstream discussion we decided that it was best
to avoid having the / operator for both ElementCount and TypeSize,
since this could give the impression that these classes can be used
in the same way as basic integer integer types. However, division
for scalable types is a bit odd because we are only dividing the
minimum quantity by a value, as opposed to something like:

  (MinSize * Vscale) / SomeValue

This is why when performing division it's important the caller
first establishes whether the operation makes sense, perhaps by
calling isKnownMultipleOf() prior to division. The caller must now
explictly call divideCoefficientBy() on the class to perform the
operation.

Differential Revision: https://reviews.llvm.org/D87700
2020-09-28 08:03:00 +01:00
Cameron McInally 9a4767411e [SVE] Revert accidental change from 405e22fbe8719cff6c40eec15c2044f42527f116
Accidentally commited two lines that were not intended. Remove those.
2020-09-25 10:11:10 -05:00
Cameron McInally e2ccf7f178 [SVE] Lower fixed length VECREDUCE_[SMAX|SMIN] to Scalable
This patch is pretty similar to the VECREDUCE_ADD patch, with some minor tweaks.

Results from the AArch64ISD::[SMAX|SMIN]V_PRED return element sized results. This requires an ANY_EXTEND for results < 32-bits, since Legalization promotes those results.

There is no NEON i64 vector support for SMAXV|SMINV, so use SVE for those.

Differential Revision: https://reviews.llvm.org/D88259
2020-09-25 09:58:17 -05:00
Daniel Kiss 2a96f47c5f [AArch64] __builtin_return_address for PAuth.
This change adds the support for __builtin_return_address
for ARMv8.3A Pointer Authentication.
Location of the authentication code in the pointer depends on
the system configuration, therefore a dedicated instruction is used for
effectively removing the authentication code without
authenticating the pointer.

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D75044
2020-09-24 23:23:49 +02:00
Cameron McInally e8413ac97f [AArch64] Expand some vector of i64 reductions on NEON
With the exception of VECREDUCE_ADD, there are no NEON instructions to support vector of i64 reductions. This patch removes the Custom lowerings for those and adds some test coverage to confirm.

Differential Revision: https://reviews.llvm.org/D88161
2020-09-23 16:01:24 -05:00
Muhammad Asif Manzoor 3a76de4275 [AArch64][SVE] Add lowering for llvm frecpx
Add the functionality to lower frecpx for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88032
2020-09-23 15:23:54 -04:00
Cameron McInally db40a74344 [SVE] Lower fixed length ISD::VECREDUCE_ADD to Scalable
Differential Revision: https://reviews.llvm.org/D87796
2020-09-23 09:08:07 -05:00
Kerry McLaughlin d0149ba9b4 [SVE][CodeGen] Lower legal integer -> floating point conversions
This patch adds new ISD nodes, SCVTZ_MERGE_PASSTHRU &
UCVTZ_MERGE_PASSTHRU, which are used to lower both legal
scalable vector [S|U]INT_TO_FP operations and the following intrinsics:
 - llvm.aarch64.sve.scvtf
 - llvm.aarch64.sve.ucvtf

Reviewed By: sdesmalen, efriedma

Differential Revision: https://reviews.llvm.org/D87913
2020-09-23 11:53:53 +01:00
David Sherwood e077367a28 [SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits
An existing function Type::getScalarSizeInBits returns a uint64_t
instead of a TypeSize class because the caller is requesting a
scalar size, which cannot be scalable. This patch makes other
similar functions requesting a scalar size consistent with that,
thereby eliminating more than 1000 implicit TypeSize -> uint64_t
casts.

Differential revision: https://reviews.llvm.org/D87889
2020-09-23 09:20:08 +01:00
Paul Walker f3fa954b5b [SVE] Change definition of reduction ISD nodes to have an SVE vector result type.
The current nodes, AArch64::SMAXV_PRED for example, are defined to
return a NEON vector result.  This is incorrect because they modify
the complete SVE register and are thus changed to represent such.

This patch also adds nodes for UADDV_PRED and SADDV_PRED, which
unifies the handling of all SVE reductions.

NOTE: Floating-point reductions are already implemented correctly,
so this patch is essentially making everything consistent with those.

Differential Revision: https://reviews.llvm.org/D87843
2020-09-21 13:16:28 +01:00
Tim Northover 2afe4becec AArch64: make sure jump table entries can reach entire image
This turns all jump table entries into deltas within the target
function because in the small memory model all code & static data must
be in a 4GB block somewhere in memory.

When the entries were a delta between the table location and a basic
block, the 32-bit signed entries are not enough to guarantee
reachability.

https://reviews.llvm.org/D87286
2020-09-18 09:50:40 +01:00
Cameron McInally a35c7f3076 [SVE][WIP] Implement lowering for fixed length VSELECT to Scalable
Map fixed length VSELECT to its Scalable equivalent.

Differential Revision: https://reviews.llvm.org/D85364
2020-09-17 14:02:57 -05:00
Sanne Wouda d5fd3d9b90 [AArch64] Match pairwise add/fadd pattern
D75689 turns the faddp pattern into a shuffle with vector add.

Match this new pattern in target-specific DAG combine, rather than ISel,
because legalization (for v2f32) turns it into a bit of a mess.

- extended to cover f16, f32, f64 and i64
2020-09-17 16:27:01 +01:00
Kerry McLaughlin f7185b271f [SVE][CodeGen] Lower floating point -> integer conversions
This patch adds new ISD nodes, FCVTZS_MERGE_PASSTHRU &
FCVTZU_MERGE_PASSTHRU, which are used to lower scalable vector
FP_TO_SINT/FP_TO_UINT operations and the following intrinsics:
 - llvm.aarch64.sve.fcvtzu
 - llvm.aarch64.sve.fcvtzs

Reviewed By: efriedma, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D87232
2020-09-17 14:04:22 +01:00
Muhammad Asif Manzoor d417488ef5 [AArch64][SVE] Add lowering for llvm fsqrt
Add the functionality to lower fsqrt for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D87707
2020-09-15 15:26:17 -04:00
Philip Reames e6bc7037d3 [AArch64] Statepoint support for AArch64.
Differential Revision: https://reviews.llvm.org/D66012
Patch By: loicottet (with major rebase by me)
2020-09-14 16:43:08 -07:00
Craig Topper c193a689b4 [SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore.
The versions that take 'unsigned' will be removed in the future.

I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87592
2020-09-14 13:54:50 -07:00
Sanjay Patel 3a8ea8609b [Intrinsics] define semantics for experimental fmax/fmin vector reductions
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

This is hopefully the final remaining showstopper before we can remove
the 'experimental' from the reduction intrinsics.

No behavior was specified for the FP min/max reductions, so we have a
mess of different interpretations.

There are a few potential options for the semantics of these max/min ops.
I think this is the simplest based on current behavior/implementation:
make the reductions inherit from the existing llvm.maxnum/minnum intrinsics.
These correspond to libm fmax/fmin, and those are similar to the (now
deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing
data). So the default expansion creates calls to libm functions.

Another option would be to inherit from llvm.maximum/minimum (NaNs propagate),
but most targets just crash in codegen when given those nodes because no
default expansion was ever implemented AFAICT.

We could also just assume 'nnan' semantics by default (we are already
assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets
(AArch64, PowerPC) support the more defined behavior, so it doesn't make much
sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to
loosen the semantics.

(Note that D67507 was proposed to update the LangRef to acknowledge the more
recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do
update based on the new standard, the reduction instructions can seamlessly
inherit from whatever updates are made to the max/min intrinsics.)

x86 sees a regression here on 'nnan' tests because we have underlying,
longstanding bugs in FMF creation/propagation. Those need to be fixed apart
from this change (for example: https://llvm.org/PR35538). The expansion
sequence before this patch may not have been correct.

Differential Revision: https://reviews.llvm.org/D87391
2020-09-12 09:10:28 -04:00
Kerry McLaughlin cd89f5c91b [SVE][CodeGen] Legalisation of truncate for scalable vectors
Truncating from an illegal SVE type to a legal type, e.g.
`trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>`
fails after PromoteIntOp_CONCAT_VECTORS attempts to
create a BUILD_VECTOR.

This patch changes the promote function to create a sequence of
INSERT_SUBVECTORs if the return type is scalable, and replaces
these with UNPK+UZP1 for AArch64.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D86548
2020-09-10 11:35:33 +01:00
Muhammad Asif Manzoor 1ffcbe35ae [AArch64][SVE] Add lowering for rounding operations
Add the functionality to lower SVE rounding operations for passthru variant.
Created a new test case file for all rounding operations.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D86793
2020-09-04 11:16:57 -04:00
Benjamin Kramer 8782c72765 Strength-reduce SmallVectors to arrays. NFCI. 2020-08-28 21:14:20 +02:00
David Sherwood f4257c5832 [SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065
2020-08-28 14:43:53 +01:00
Ties Stuij d678e14c55 [AArch64][CodeGen] Restrict bfloat vector operations to what's actually supported
Previously in addTypeForNeon, we would set the operations for bfloat vectors
like other generic types. But as bfloat is a storage-only type a number of
operations shouldn't be set. This patch fixes that.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D85101
2020-08-28 11:44:37 +01:00
Mikhail Maltsev 23d5e93f34 [AArch64] Optimize instruction selection for certain vector shuffles
This patch adds code to recognize vector shuffles which can be
represented as VDUP (splat) of a vector lane with of a different
(wider) type than the original vector lane type.

For example:
    shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
is essentially:
    shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 0, i32 0>

Such patterns are generated by the SelectionDAG machinery in some cases
(see DAGCombiner::visitBITCAST in DAGCombiner.cpp, the "Remove double
bitcasts from shuffles" part).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D86225
2020-08-27 11:06:49 +01:00
Paul Walker 81337c915f [SVE] Fallback to default expansion when lowering SIGN_EXTEN_INREG from non-byte based source.
Differential Revision: https://reviews.llvm.org/D86394
2020-08-27 10:57:37 +01:00
Ahmed Bougacha 383f7c8858 [AArch64] Use CCAssignFnForReturn helper in more spots. NFC.
It was added for GISel, but SDAG could use it too!
2020-08-26 14:39:11 -07:00
Muhammad Asif Manzoor fd536eeed9 [AArch64][SVE] Add lowering for llvm fceil
Add the functionality to lower fceil for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D84548
2020-08-26 15:59:44 -04:00
Paul Walker 73ac3c0ede [SVE] Lower scalable vector ISD::FNEG operations.
Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A)
transformation when -1 is expressed as an ISD::SPLAT_VECTOR.

Differential Revision: https://reviews.llvm.org/D86415
2020-08-25 11:22:28 +01:00
Cameron McInally 36dbb8fc97 [SVE] Lower fixed length UDIV to scalable
Pretty much just a copy of the SDIV patches (D86114 and D85982) with string replacement.

Differential Revision: https://reviews.llvm.org/D86316
2020-08-21 09:01:25 -05:00
Cameron McInally ac63959460 [SVE] Lower fixed length vXi8/vXi16 SDIV to scalable
There are no nxv16i8/nxv8i16 SDIV instructions, so these fixed width operations must be promoted to nxv4i32.

Differential Revision: https://reviews.llvm.org/D86114
2020-08-20 13:47:01 -05:00
Mehdi Amini a407ec9b6d Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private.""
Was reverted because MLIR/Flang builds were broken, these APIs have been
fixed in the meantime.
2020-08-19 17:26:36 +00:00
Mehdi Amini 4fc56d70aa Revert "[NFC][llvm] Make the contructors of `ElementCount` private."
This reverts commit 264afb9e6a.
(and dependent 6b742cc48 and fc53bd610f)

MLIR/Flang are broken.
2020-08-19 17:21:37 +00:00
Francesco Petrogalli 264afb9e6a [NFC][llvm] Make the contructors of `ElementCount` private.
Differential Revision: https://reviews.llvm.org/D86120
2020-08-19 16:26:44 +00:00
Eli Friedman bb18532399 [AArch64][SVE] Allow llvm.aarch64.sve.st2/3/4 with vectors of pointers.
This isn't necessaary for ACLE, but could be useful in other situations.
And the change is simple.

Differential Revision: https://reviews.llvm.org/D85251
2020-08-18 12:51:16 -07:00
Paul Walker cb5cc47a65 [SVE] Lower fixed length vector ISD::SPLAT_VECTOR operations.
Also strengthens the CHECK lines for scalable vector splat tests.

Differential Revision: https://reviews.llvm.org/D86070
2020-08-18 11:19:43 +01:00
Cameron McInally 92593f9e77 [SVE] Lower fixed length vXi32/vXi64 SDIV to scalable vectors.
Differential Revision: https://reviews.llvm.org/D85982
2020-08-14 18:47:22 -05:00
Cameron McInally 21810b0e14 [SVE] Lower fixed length vector integer UMIN/UMAX
Differential Revision: https://reviews.llvm.org/D85926
2020-08-13 14:48:36 -05:00
Cameron McInally e1a87f0a9b [SVE] Lower fixed length vector integer SMIN/SMAX
Differential Revision: https://reviews.llvm.org/D85855
2020-08-13 11:41:20 -05:00
Paul Walker e63cc8105a [SVE] Lower fixed length vector integer shifts.
Differential Revision: https://reviews.llvm.org/D85724
2020-08-13 12:35:47 +01:00
Paul Walker 130098228d [SVE] Lower fixed length vector integer ISD::SETCC operations.
Differential Revision: https://reviews.llvm.org/D85831
2020-08-13 12:01:56 +01:00
Paul Walker 9e04895258 [SVE] Lower fixed length integer extend operations.
Differential Revision: https://reviews.llvm.org/D85640
2020-08-13 11:54:53 +01:00
Francesco Petrogalli c561f4d2ec [SVE][VLS] Don't combine logical AND.
Testing is performed when targeting 128, 256 and 512-bit wide vectors.

For 128-bit vectors, the original behavior of using NEON instructions is
preserved.

Differential Revision: https://reviews.llvm.org/D85479
2020-08-12 20:00:07 +01:00
Cameron McInally ce2c991061 [SVE] Lower fixed length FP minnum/maxnum
Lower fixed length MINNUM/MAXNUM to scalable vectors. Cherry-picked from D71767 with added tests.

Differential Revision: https://reviews.llvm.org/D85744
2020-08-12 12:02:52 -05:00
David Sherwood 88bbd30736 [SVE][CodeGen] Fix issues with EXTRACT_SUBVECTOR when using scalable FP vectors
In this patch I have fixed two issues:

1. Our SVE tuple get/set intrinsics were using the wrong constant type
for the index passed to EXTRACT_SUBVECTOR. I have fixed this by using the
function SelectionDAG::getVectorIdxConstant to create the value. Also, I
have updated the documentation for EXTRACT_SUBVECTOR describing what type
the constant index should be and we now enforce this when creating the
node.
2. The AArch64 backend was missing the appropriate patterns for
extracting certain subvectors (nxv4f16 and nxv2f32) from legal SVE types.
I have added them as part of this patch.

The only way that I could find to test the new patterns was to use the
SVE tuple get intrinsics, although I realise it looks a bit unusual.
Tests added here:

  test/CodeGen/AArch64/sve-extract-subvector.ll

Differential Revision: https://reviews.llvm.org/D85516
2020-08-12 08:35:46 +01:00
Paul Walker b6c7b7fa31 [SVE] Add ISD nodes for predicated integer extend inreg operations.
These are useful instructions when lowering fixed length vector
extends, so I've broken this patch out as kind of NFC like work.

Differential Revision: https://reviews.llvm.org/D85546
2020-08-11 11:39:26 +01:00
Paul Walker d542feb8e4 [SVE] Lower fixed length vector integer subtract operations.
Differential Revision: https://reviews.llvm.org/D85665
2020-08-11 11:32:12 +01:00
Paul Walker 0d33a8ef5b [SVE] Lower scalable vector mul operations.
This allows us to remove extra patterns from AArch64SVEInstrInfo.td
because we can reuse those required for fixed length vectors.

Differential Revision: https://reviews.llvm.org/D85328
2020-08-06 11:15:35 +01:00
Paul Walker 3ed59b775d [SVE] Implement lowering for fixed length vector multiplication.
NOTE: Also uses SVE code generation for NEON size vectors, instead
of expanding i64 based vector multiplications.

Differential Revision: https://reviews.llvm.org/D85327
2020-08-06 11:01:39 +01:00
Paul Walker 927fc536ca [SVE] Add lowering for fixed length vector and, or & xor operations.
Since there are no ill effects when performing these operations
with undefined elements, they are lowered to the already supported
unpredicated scalable vector equivalents.

Differential Revision: https://reviews.llvm.org/D85117
2020-08-05 11:28:34 +01:00
Sander de Smalen f2916636f8 [AArch64][SVE] Disable tail calls if callee does not preserve SVE regs.
This fixes an issue triggered by the following code, where emitEpilogue
got confused when trying to restore the SVE registers after the call,
whereas the call to bar() is implemented as a TCReturn:

  int non_sve();
  int sve(svint32_t x) { return non_sve(); }

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84869
2020-08-05 09:38:54 +01:00
Eli Friedman 95efea4b93 [AArch64][SVE] Widen narrow sdiv/udiv operations.
The SVE instruction set only supports sdiv/udiv for 32-bit and 64-bit
integers.  If we see an 8-bit or 16-bit divide, widen the operands to 32
bits, and narrow the result.

Differential Revision: https://reviews.llvm.org/D85170
2020-08-04 13:22:15 -07:00
Paul Walker 4be13b15d6 [SVE] Replace remaining _MERGE_OP1 nodes with _PRED variants.
This is the final bit of work to relax the register allocation
requirements when code generating normal LLVM IR, which rarely
care about the result of inactive lanes. By using _PRED nodes
we can make better use of SVE's reversed instructions.

Also removes a redundant parameter from the min/max tests.

Differential Revision: https://reviews.llvm.org/D85142
2020-08-04 11:19:17 +01:00
David Sherwood 23ad660b5d [SVE][CodeGen] At -O0 fallback to DAG ISel when translating alloca with scalable types
When building code at -O0 We weren't falling back to DAG ISel correctly
when encountering alloca instructions with scalable vector types. This
is because the alloca has no operands that are scalable. I've fixed this by
adding a check in AArch64ISelLowering::fallBackToDAGISel for alloca
instructions with scalable types.

Differential Revision: https://reviews.llvm.org/D84746
2020-07-30 08:40:53 +01:00
Eli Friedman c02aa53ecb [AArch64][SVE] Add "fast" fcmp operations.
dacf8d3 added support for most fcmp operations, but there are some extra
variations I hadn't considered: SelectionDAG supports float comparisons
that are neither ordered nor unordered. Add support for the missing
operations.

Differential Revision: https://reviews.llvm.org/D84460
2020-07-24 13:22:41 -07:00
Francesco Petrogalli 809600d664 [llvm][sve] Reg + Imm addressing mode for ld1ro.
Reviewers: kmclaughlin, efriedma, sdesmalen

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83357
2020-07-24 17:48:47 +00:00
Petre-Ionut Tudor 1af9fc8213 [ARM] Generate [SU]HADD from ((a + b) >> 1)
Summary:
Teach LLVM to recognize the above pattern, where the operands are
either signed or unsigned types.

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83777
2020-07-21 13:22:07 +01:00
Eli Friedman b8f765a1e1 [AArch64][SVE] Add support for trunc to <vscale x N x i1>.
This isn't a natively supported operation, so convert it to a
mask+compare.

In addition to the operation itself, fix up some surrounding stuff to
make the testcase work: we need concat_vectors on i1 vectors, we need
legalization of i1 vector truncates, and we need to fix up all the
relevant uses of getVectorNumElements().

Differential Revision: https://reviews.llvm.org/D83811
2020-07-20 13:11:02 -07:00
Paul Walker 6384ec4099 [SVE] Add lowering for fixed length vector fdiv, fma, fmul and fsub operations.
Differential Revision: https://reviews.llvm.org/D84034
2020-07-20 11:57:34 +00:00
Tim Northover 88464a55b4 AArch64: emit @llvm.debugtrap as `brk #0xf000` on all platforms
It's useful for a debugger to be able to distinguish an @llvm.debugtrap
from a (noreturn) @llvm.trap, so this extends the existing Windows
behaviour to other platforms.
2020-07-20 10:31:26 +01:00
Paul Walker 509351d768 [SVE] Add lowering for scalable vector fadd, fdiv, fmul and fsub operations.
Lower the operations to predicated variants.  This is prep work
required for fixed length code generation but also fixes a bug
whereby these operations fail selection when "unpacked" vector
types (e.g. MVT::nxv2f32) are used.

This patch also adds the missing "unpacked" patterns for FMA.

Differential Revision: https://reviews.llvm.org/D83765
2020-07-16 11:31:35 +00:00
Paul Walker 319a97b5e2 [SVE] Ensure fixed length vector fptrunc operations bigger than NEON are not considered legal.
Differential Revision: https://reviews.llvm.org/D83568
2020-07-13 11:16:30 +00:00
Paul Walker f78e6a3095 [SVE] Code generation for fixed length vector truncates.
Lower fixed length vector truncates to a sequence of SVE UZP1 instructions.

Differential Revision: https://reviews.llvm.org/D83395
2020-07-10 10:37:19 +00:00
Eli Friedman 56ae2cebcd [AArch64][SVE] Add lowering for llvm.fma.
This is currently bare-bones; we aren't taking advantage of any of the
FMA variant instructions.  But it's enough to at least generate
code.

Differential Revision: https://reviews.llvm.org/D83444
2020-07-09 16:12:41 -07:00
Paul Walker 614fb09645 [SVE] Disable some BUILD_VECTOR related code generator features.
Fixed length vector code generation for SVE does not yet custom
lower BUILD_VECTOR and instead relies on expansion.  At the same
time custom lowering for VECTOR_SHUFFLE is also not available so
this patch updates isShuffleMaskLegal to reject vector types that
require SVE.

Related to this it also prevents the merging of stores after
legalisation because this only works when BUILD_VECTOR is either
legal or can be elminated.  When this is not the case the code
generator enters an infinite legalisation loop.

Differential Revision: https://reviews.llvm.org/D83408
2020-07-09 10:47:04 +00:00
Paul Walker fb75451775 [SVE] Custom ISel for fixed length extract/insert_subvector.
We use extact_subvector and insert_subvector to "cast" between
fixed length and scalable vectors.  This patch adds custom c++
based ISel for the following cases:

  fixed_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0
  scalable_vector = ISD::INSERT_SUBVECTOR undef(scalable_vector), fixed_vector, 0

Which result in either EXTRACT_SUBREG/INSERT_SUBREG for NEON sized
vectors or COPY_TO_REGCLASS otherwise.

Differential Revision: https://reviews.llvm.org/D82871
2020-07-08 09:49:28 +00:00
Petre-Ionut Tudor af80a4353e [ARM] Generate [SU]RHADD from (b - (~a)) >> 1
Summary:
Teach LLVM to recognize the above pattern, which is usually a
transformation of (a + b + 1) >> 1, where the operands are either
signed or unsigned types.

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82669
2020-07-03 16:00:06 +01:00
Guillaume Chatelet 87e2751cf0 [Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D83082
2020-07-03 08:06:43 +00:00
Sander de Smalen 143e324e75 [CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR
There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics,
that shouldn't really have been there (I suspect this was a remnant from when
we expected the wider vector always to have come from a vector CONCAT).

When I tried to create a more minimal reproducer, I found a bug in
DAGCombiner where it drops the scalable flag when trying to fold:

      extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')

This patch fixes both issues.

Reviewers: david-arm, efriedma, spatel

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82910
2020-07-02 10:16:43 +01:00
Guillaume Chatelet ef36f5143d [Alignment] TargetLowering::hasPairedLoad must use Align for RequiredAlignment
As per documentation of `hasPairLoad`:
"`RequiredAlignment` gives the minimal alignment constraints that must be met to be able to select this paired load."
In this sense, `0` is strictly equivalent to `1`. We make this obvious by using `Align` instead of unsigned.
There is only one implementor of this interface.

Differential Revision: https://reviews.llvm.org/D82958
2020-07-01 14:32:30 +00:00
Paul Walker a1aed80a35 [SVE] Relax merge requirement for IR based divides.
We currently lower SDIV to SDIV_MERGE_OP1. This forces the value
for inactive lanes in a way that can hamper register allocation,
however, the lowering has no requirement for inactive lanes.

Instead this patch replaces SDIV_MERGE_OP1 with SDIV_PRED thus
freeing the register allocator. Once done the only user of
SDIV_MERGE_OP1 is intrinsic lowering so I've removed the node
and perform ISel on the intrinsic directly. This also allows
us to implement MOVPRFX based zeroing in the same manner as SUB.

This patch also renames UDIV_MERGE_OP1 and [F]ADD_MERGE_OP1 for
the same reason but in the ADD cases the ISel code is already
as required.

Differential Revision: https://reviews.llvm.org/D82783
2020-07-01 08:18:42 +00:00
Guillaume Chatelet 28de229bc6 [Alignment][NFC] Migrate MachineFrameInfo::CreateStackObject to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82894
2020-07-01 07:28:11 +00:00
Christopher Tetreault ab35ba5742 [SVE] Remove calls to VectorType::getNumElements from AArch64
Reviewers: efriedma, paquette, david-arm, kmclaughlin

Reviewed By: david-arm

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82214
2020-06-30 11:17:50 -07:00
Guillaume Chatelet 4f5133a4dc [Alignment][NFC] Migrate AArch64, ARM, Hexagon, MSP and NVPTX backends to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82749
2020-06-30 07:56:17 +00:00
Sander de Smalen 39f6a36a24 [AArch64][SVE] NFCI: Choose consistent naming for predicated SDAG nodes
This patch proposes a naming convention for operations that take
a general predicate (and are thus predicated) that specifies
what happens to the false lanes.

Currently the _PRED suffix is used, which doesn't really say much other
than that it takes a predicate. In some instances this means it has
merging predication and in other cases it means zeroing-predication.

This patch also changes the order of operands to
AArch64ISD::DUP_MERGE_PASSTHRU, to pass the predicate as the first
operand, which is in line with all other predicates nodes. It takes the
passthru value as an explicit passthru value, which is always passed as
the last operand.

Reviewers: paulwalker-arm, cameron.mcinally, eli.friedman, dancgr, efriedma

Reviewed By: paulwalker-arm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81850
2020-06-29 13:37:30 +01:00
Kerry McLaughlin bb6603f013 [AArch64][SVE] Bail out of performPostLD1Combine for scalable types
Summary:
performPostLD1Combine will introduce either a LD1LANEpost
or LD1DUPpost node, which will cause selection failure if the
return type is a scalable vector.

Reviewers: sdesmalen, c-rhodes, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82670
2020-06-29 11:59:53 +01:00
Simon Pilgrim 973685fc78 [TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant
Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.
2020-06-29 11:46:58 +01:00
Paul Walker 3a98d5d7e7 [SVE] Code generation for fixed length vector adds.
Summary:
Teach LowerToPredicatedOp to lower fixed length vector operations.

Add AArch64ISD nodes and isel patterns for predicated integer
and floating point adds.

Together this enables SVE code generation for fixed length vector adds.

Reviewers: rengolin, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82483
2020-06-26 19:54:41 +00:00
Kerry McLaughlin 6b313f198c [AArch64][SVE] Remove asserts from AArch64ISelLowering for bfloat16 types
Remove the asserts in performLDNT1Combine & performST[NT]1Combine
to ensure we get a failure where the type is a bfloat16 and
hasBF16() is false, regardless of whether asserts are enabled.
2020-06-26 14:51:27 +01:00
Cullen Rhodes 4319c48fc7 [AArch64][SVE] Only support sizeless bfloat types if supported by subtarget
Reviewers: sdesmalen, efriedma, kmclaughlin, fpetrogalli

Reviewed By: sdesmalen, fpetrogalli

Differential Revision: https://reviews.llvm.org/D82494
2020-06-26 12:37:47 +00:00
Kerry McLaughlin edcfef8fee [AArch64][SVE] Add bfloat16 support to store intrinsics
Summary:
Bfloat16 support added for the following intrinsics:
 - ST1
 - STNT1

Reviewers: sdesmalen, c-rhodes, fpetrogalli, efriedma, stuij, david-arm

Reviewed By: fpetrogalli

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82448
2020-06-26 11:05:56 +01:00
Kerry McLaughlin 0ccfe1b267 [AArch64][SVE] Predicate bfloat16 load patterns with HasBF16
Reviewers: sdesmalen, c-rhodes, efriedma, fpetrogalli

Reviewed By: fpetrogalli

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82464
2020-06-26 10:38:24 +01:00
Eli Friedman e9d4e34ab8 [AArch64][SVE] Add legalization support for i32/i64 vector srem/urem
Implement them on top of sdiv/udiv, similar to what we do for integer
types.

Potential future work: implementing i8/i16 srem/urem, optimizations for
constant divisors, optimizing the mul+sub to mls.

Differential Revision: https://reviews.llvm.org/D81511
2020-06-23 16:27:52 -07:00
Paul Walker 499c63288f [SVE] Code generation for fixed length vector loads & stores.
Summary:
This patch adds base support for code generating fixed length
vector operations targeting a known SVE vector length. To achieve
this we lower fixed length vector operations to equivalent scalable
vector operations, whereby SVE predication is used to limit the
elements processed to those present within the fixed length vector.

Specifically this patch implements load and store operations, which
get lowered to their masked counterparts thusly:

  V = load(Addr) =>
    V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr))

  store(V, (Addr)) =>
    masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr))

Reviewers: rengolin, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80385
2020-06-23 09:39:03 +00:00
David Sherwood 584d0d5c17 [SVE] Fall back on DAG ISel at -O0 when encountering scalable types
At the moment we use Global ISel by default at -O0, however it is
currently not capable of dealing with scalable vectors for two
reasons:

1. The register banks know nothing about SVE registers.
2. The LLT (Low Level Type) class knows nothing about scalable
   vectors.

For now, the easiest way to avoid users hitting issues when using
the SVE ACLE is to fall back on normal DAG ISel when encountering
instructions that operate on scalable vector types.

I've added a couple of RUN lines to existing SVE tests to ensure
we can compile at -O0. I've also added some new tests to

  CodeGen/AArch64/GlobalISel/arm64-fallback.ll

that demonstrate we correctly fallback to DAG ISel at -O0 when
lowering formal arguments or translating instructions that involve
scalable vector types.

Differential Revision: https://reviews.llvm.org/D81557
2020-06-19 10:57:00 +01:00
David Sherwood 0dc28af219 [CodeGen,AArch64] Fix up warnings in performExtendCombine
Try to avoid calling getVectorNumElements() or relying upon the
TypeSize conversion to uin64_t.

Differential Revision: https://reviews.llvm.org/D81573
2020-06-19 10:34:51 +01:00
Paul Walker 4612f39120 [SVE] Add flag to specify SVE register size, using this to calculate legal vector types.
Adds aarch64-sve-vector-bits-{min,max} to allow the size of SVE
data registers (in bits) to be specified. This allows the code
generator to make assumptions it normally couldn't. As a starting
point this information is used to mark fixed length vector types
that can fit within the specified size as legal.

Reviewers: rengolin, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80384
2020-06-18 12:11:16 +00:00
Nikita Popov 7cac7e0cfc [IR] Prefer hasFnAttribute() where possible (NFC)
When checking for an enum function attribute, use hasFnAttribute()
rather than hasAttribute() at FunctionIndex, because it is
significantly faster (and more concise to boot).
2020-06-15 09:30:35 +02:00
Shawn Landden 9ec57cce62 [AArch64] custom lowering for i128 popcount
halves the number of CNT instructions generated
2020-06-10 09:44:16 +04:00
Henry Kao 4dcc0d1958 [CodeGen][SVE] Avoid scalarizing zero splat stores on scalable vectors.
Summary: Implemented in replaceZeroVectorStore(). Fixes several warnings in AArch64 SVE unit tests.

Reviewers: sdesmalen, kmclaughlin, dancgr, efriedma, each, andwar, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80824
2020-06-09 12:52:39 -04:00
Guillaume Chatelet 3b6196c9b3 [Alignment][NFC] TargetLowering::allowsMisalignedMemoryAccesses
Summary:
Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMisalignedMemoryAccesses` without marking it override.

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81374
2020-06-09 10:17:42 +00:00
Cullen Rhodes b82be5db71 [AArch64][SVE] Implement structured load intrinsics
Summary:
This patch adds initial support for the following instrinsics:

    * llvm.aarch64.sve.ld2
    * llvm.aarch64.sve.ld3
    * llvm.aarch64.sve.ld4

For loading two, three and four vectors worth of data. Basic codegen is
implemented with reg+reg and reg+imm addressing modes being addressed
in a later patch.

The types returned by these intrinsics have a number of elements that is a
multiple of the elements in a 128-bit vector for a given type and N, where N is
the number of vectors being loaded, i.e. 2, 3 or 4. Thus, for 32-bit elements
the types are:

    LD2 : <vscale x 8 x i32>
    LD3 : <vscale x 12 x i32>
    LD4 : <vscale x 16 x i32>

This is implemented with target-specific intrinsics for each variant that take
the same operands as the IR intrinsic but return N values, where the type of
each value is a full vector, i.e. <vscale x 4 x i32> in the above example.
These values are then concatenated using the standard concat_vector intrinsic
to maintain type legality with the IR.

These intrinsics are intended for use in the Arm C Language
Extension (ACLE).

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D75751
2020-06-09 08:51:58 +00:00
David Sherwood 30dfbf03a2 [CodeGen,AArch64] Fix up warnings in splitStores
The code for trying to split up stores is designed for NEON vectors,
where we support arbitrary alignments. It's an optimisation designed
to improve performance by using smaller, aligned stores. However,
we currently only support 16 byte alignments for SVE vectors anyway
so we may as well bail out early.

This change fixes up remaining warnings in a couple of tests:

  CodeGen/AArch64/sve-callbyref-notailcall.ll
  CodeGen/AArch64/sve-calling-convention-byref.ll

Differential Revision: https://reviews.llvm.org/D80720
2020-06-09 07:39:38 +01:00
David Sherwood 41fb119e8c [CodeGen] Fix nullptr crash in tryConvertSVEWideCompare
When the input to a wide compare instruction is a DUP or SPLAT_VECTOR
node we should deal with cases where the DUP/SPLAT_VECTOR input
operand is not an immediate value. I've fixed the code to return
SDValue() in such cases and added a couple of tests - one each to
represent the signed and unsigned cases.

Differential Revision: https://reviews.llvm.org/D81167
2020-06-08 15:20:18 +01:00
Cullen Rhodes 3ebbe35363 [AArch64][SVE] Implement vector tuple intrinsics
Summary:
This patch adds the following intrinsics for creating two-tuple,
three-tuple and four-tuple scalable vectors:

    * llvm.aarch64.sve.tuple.create2
    * llvm.aarch64.sve.tuple.create3
    * llvm.aarch64.sve.tuple.create4

As well as:

    * llvm.aarch64.sve.tuple.get
    * llvm.aarch64.sve.tuple.set

For extracting and inserting scalable vectors from vector tuples. These
intrinsics are intended to be used by the ACLE functions svcreate<n>,
svget and svset.

This patch also includes calling convention support for passing and
returning tuples of scalable vectors to/from functions.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75674
2020-06-08 11:09:55 +00:00
Kadir Cetinkaya 6bad8b07e6
[llvm][AArch64] Fix unused variable 2020-06-05 15:56:19 +02:00
Kerry McLaughlin 89fc0166f5 [CodeGen][SVE] Legalisation of extends with scalable types
Summary:
This patch adds legalisation of extensions where the operand
of the extend is a legal scalable type but the result is not.

EXTRACT_SUBVECTOR is used to split the result, before
being replaced by target-specific [S|U]UNPK[HI|LO] operations.

For example:

```
zext <vscale x 16 x i8> %a to <vscale x 16 x i16>
```
should emit:

```
uunpklo z2.h, z0.b
uunpkhi z1.h, z0.b
```

Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79587
2020-06-05 12:08:42 +01:00
Eric Christopher 8c9badf61d Replace integer usage with enumeration. 2020-06-03 20:00:28 -07:00
Francesco Petrogalli febeaf94a8 [llvm][SVE] IR intrinsic for LD1RO.
Reviewers: sdesmalen, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80738
2020-06-03 13:57:16 +00:00
Amara Emerson f573d489b6 [AArch64][GlobalISel] Split G_GLOBAL_VALUE into ADRP + G_ADD_LOW and optimize.
The concept of G_GLOBAL_VALUE is nice and simple, but always using it as the
representation for global var addressing until selection time creates some
problems in optimizing accesses in certain code/relocation models.

The problem comes from trying to optimize adrp -> add -> load/store sequences
in the most common "small" code model. These accesses can be optimized into an
adrp -> load with the add offset being folded into the load's immediate field.
If we try to keep all global var references as a single generic instruction
then by the time we get to the complex operand trying to match these, we end up
generating an adrp at the point of use. The real issue here is that we don't
have any form of CSE during selection, so the code size will bloat from many
redundant adrp's.

This patch custom legalizes small code mode non-GOT G_GLOBALs into target ADRP
and a new "target specific generic opcode" G_ADD_LOW. We also teach the
localizer to localize these instructions via the custom hook that was added
recently. Finally, the complex pattern for indexed loads/stores is extended to
try to fold these G_ADD_LOW instructions into the load immediate.

On -O0 CTMark, we see a 0.8% geomean code size improvement. We should also see
some minor performance improvements too.

Differential Revision: https://reviews.llvm.org/D78465
2020-06-01 16:00:56 -07:00
Martin Storsjö cf97e0ec42 [AArch64] Treat x18 as callee-saved in functions with windows calling convention on non-windows OSes
Treat it as callee-saved, and always back it up. When windows code calls
entry points in unix code, marked with the windows calling convention,
that unix code can call other functions that isn't compiled with
-ffixed-x18 which may clobber x18 freely. By backing it up and restoring
it on return, we preserve the register across the function call,
fulfilling this part of the windows calling convention on another OS.

This isn't enough for making sure that x18 is preseved when non-windows
code does a callback to windows code, but is a clear improvement over
the current status quo. Additionally, wine is nowadays building many
modules as PE DLLs, which avoids the callback issue altogether for those
DLLs.

Differential Revision: https://reviews.llvm.org/D61892
2020-05-30 09:22:09 +03:00
Christopher Tetreault 56eb7556e7 [SVE] Eliminate calls to default-false VectorType::get() from AArch64
Reviewers: efriedma, c-rhodes, david-arm, mcrosier, t.p.northover

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80327
2020-05-29 15:39:30 -07:00
Zequan Wu 80e107ccd0 Add NoMerge MIFlag to avoid MIR branch folding
Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given

Differential Revision: https://reviews.llvm.org/D79537
2020-05-29 12:31:06 -07:00
David Sherwood 205085d4cc [CodeGen] Fix warnings in LowerToPredicatedOp
When creating a new vector type based on another vector type we
should pass in the element count instead of the number of elements
and scalable flag separately.

I encountered this warning whilst compiling this test:

  CodeGen/AArch64/sve-intrinsics-int-compares.ll

Differential revision: https://reviews.llvm.org/D80621
2020-05-29 15:19:03 +01:00
Ties Stuij 78bd0c0e5e [AArch64][BFloat] add BFloat instruction support for AArch64
Summary:
Add support for lowering various BFloat related SelDAG nodes:
- load/store (ldrh/strh)
- concat
- dup/duplane
- bitconvert/bitcast
- insert_subvector/insert_subreg

This patch is part of a series implementing the Bfloat16 extension of the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

Reviewers: ab, t.p.northover, john.brawn, fpetrogalli, sdesmalen, LukeGeeson

Reviewed By: fpetrogalli

Subscribers: LukeGeeson, pbarrio, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79712
2020-05-27 15:36:54 +01:00
Ties Stuij 42eba9b40b [AArch64][BFloat] basic AArch64 bfloat support
Summary:
This patch adds the bfloat type to the AArch64 backend:
- adds it as part of the FPR16 register class
- adds bfloat calling conventions
- as f16 is now not the only FPR16 type anymore, we need to constrain a number
  of instruction patterns using FPR16Op to help out the TableGen type inferrer

This patch is part of a series implementing the Bfloat16 extension of the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

Reviewers: t.p.northover, c-rhodes, fpetrogalli, sdesmalen, ostannard, LukeGeeson, ab

Reviewed By: fpetrogalli

Subscribers: pbarrio, LukeGeeson, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79709
2020-05-27 15:26:40 +01:00
Craig Topper 80cc43b420 [AArch64] Set i32 ISD::MULHU/S to Expand instead of Legal.
Looks like there are no isel patterns for these. A DAG combine
turns it into i64 multiply and a shift which hides this.

Extracted from D80485
2020-05-26 00:41:09 -07:00
Sanjay Patel 7eed772a27 [PatternMatch] abbreviate vector inst matchers; NFC
Readability is not reduced with these opcodes/match lines,
so reduce odds of awkward wrapping from 80-col limit.
2020-05-24 09:19:47 -04:00
Alexey Lapshin bf242c067e [AARCH64][NEON] Allow to sink operands of aarch64_neon_pmull64.
Summary:
This patch fixes a problem when pmull2 instruction is not
generated for vmull_high_p64 intrinsic.

ISel has a pattern for int_aarch64_neon_pmull64 intrinsic to generate
PMULL2 instruction. That pattern assumes that extraction operations
are located in the same basic block. We need to sink them
if they are not. Handle operands of int_aarch64_neon_pmull64
into AArch64TargetLowering::shouldSinkOperands.

Reviewed by: efriedma

Differential Revision: https://reviews.llvm.org/D80320
2020-05-22 01:35:24 +03:00
Eli Friedman a1ce88b4e3 [AArch64][SVE] Implement AArch64ISD::SETCC_PRED
This unifies SETCC operations along the lines of other operations.

Differential Revision: https://reviews.llvm.org/D79975
2020-05-15 11:53:21 -07:00
Benjamin Kramer f242950fdf Fold single-use variables into assert
This avoids unused variable warnings in Release builds.
2020-05-12 15:26:59 +02:00
Sander de Smalen 077d2d6802 [CodeGen][SVE] Add patterns for whole vector predicate select
Added patterns to implement `select i1 %p, <vty> %a, <vty> %b`

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79356
2020-05-12 11:47:39 +01:00
Petre-Ionut Tudor 9682d0d5dc [ARM] Refactor lower to S[LR]I optimization
Summary:
The optimization has been refactored to fix certain bugs and
limitations. The condition for lowering to S[LR]I has been changed
to reflect the manual pseudocode description of SLI and SRI operation.
The optimization can now handle more cases of operand type and order.

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79233
2020-05-12 11:00:13 +01:00
Craig Topper d1119980e5 [SelectionDAG] Use Align/MaybeAlign for ConstantPoolSDNode.
This patch stores the alignment for ConstantPoolSDNode as an
Align and updates the getConstantPool interface to take a MaybeAlign.

Removing getAlignment() will be done as a follow up.

Differential Revision: https://reviews.llvm.org/D79436
2020-05-08 16:04:11 -07:00
Kerry McLaughlin 3bcd3dd473 [CodeGen][SVE] Lowering of shift operations with scalable types
Summary:
Adds AArch64ISD nodes for:
 - SHL_PRED (logical shift left)
 - SHR_PRED (logical shift right)
 - SRA_PRED (arithmetic shift right)

Existing patterns for unpredicated left shift by immediate
have also been moved into the appropriate multiclasses
in SVEInstrFormats.td.

Reviewers: sdesmalen, efriedma, ctetreau, huihuiz, rengolin

Reviewed By: efriedma

Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79478
2020-05-07 11:43:49 +01:00
Eli Friedman 2c8546107a [AArch64][SVE] Implement lowering for SIGN_EXTEND etc. of SVE predicates.
Now using patterns, since there's a single-instruction lowering. (We
could convert to VSELECT and pattern-match that, but there doesn't seem
to be much point.)

I think this might be the first instruction to use nested multiclasses
this way? It seems like a good way to reduce duplication between
different integer widths. Let me know if it seems like an improvement.

Also, while I'm here, fix the return type of SETCC so we don't try to
merge a sign-extend with a SETCC.

Differential Revision: https://reviews.llvm.org/D79193
2020-05-06 17:56:32 -07:00
Kerry McLaughlin 19f5da9c1d [SVE][Codegen] Lower legal min & max operations
Summary:
This patch adds AArch64ISD nodes for [S|U]MIN_PRED
and [S|U]MAX_PRED, and lowers both SVE intrinsics and
IR operations for min and max to these nodes.

There are two forms of these instructions for SVE: a predicated
form and an immediate (unpredicated) form. The patterns
which existed for the latter have been updated to match a
predicated node with an immediate and map this
to the immediate instruction.

Reviewers: sdesmalen, efriedma, dancgr, rengolin

Reviewed By: efriedma

Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79087
2020-05-04 11:19:19 +01:00
Cullen Rhodes 672b62ea21 [AArch64][SVE] Custom lowering of floating-point reductions
Summary:
This patch implements custom floating-point reduction ISD nodes that
have vector results, which are used to lower the following intrinsics:

    * llvm.aarch64.sve.fadda
    * llvm.aarch64.sve.faddv
    * llvm.aarch64.sve.fmaxv
    * llvm.aarch64.sve.fmaxnmv
    * llvm.aarch64.sve.fminv
    * llvm.aarch64.sve.fminnmv

SVE reduction instructions keep their result within a vector register,
with all other bits set to zero.

Changes in this patch were implemented by Paul Walker and Sander de
Smalen.

Reviewers: sdesmalen, efriedma, rengolin

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D78723
2020-04-30 10:18:40 +00:00
Kerry McLaughlin 53dd72a87a [SVE][CodeGen] Lower SDIV & UDIV to SVE intrinsics
Summary:
This patch maps IR operations for sdiv & udiv to the
@llvm.aarch64.sve.[s|u]div intrinsics.

A ptrue must be created during lowering as the div instructions
have only a predicated form.

Patch contains changes by Andrzej Warzynski.

Reviewers: sdesmalen, c-rhodes, efriedma, cameron.mcinally, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, andwar, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78569
2020-04-24 11:38:20 +01:00
Christopher Tetreault 84584b0d29 [SVE] Remove calls to isScalable from AARCH64
Reviewers: efriedma, sdesmalen, t.p.northover, mcrosier

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77758
2020-04-23 13:09:17 -07:00
Kerry McLaughlin 17f6e18acf [AArch64][SVE] Add SVE intrinsic for LD1RQ
Summary:
Adds the following intrinsic for contiguous load & replicate:
  - @llvm.aarch64.sve.ld1rq

The LD1RQ intrinsic only needs the SImmS16XForm added by this
patch. The others (SImmS2XForm, SImmS3XForm & SImmS4XForm)
were added for consistency.

Reviewers: andwar, sdesmalen, efriedma, cameron.mcinally, dancgr, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76929
2020-04-22 11:29:27 +01:00
Petre-Ionut Tudor 52474992b1 Revert "[ARM] Fix conditions for lowering to S[LR]I"
This reverts commit cabfcf840a.
The patch introduced another bug in the optimization.
2020-04-20 16:11:04 +01:00
Kerry McLaughlin 33ffce5414 [AArch64][SVE] Remove LD1/ST1 dependency on llvm.masked.load/store
Summary:
The SVE masked load and store intrinsics introduced in D76688 rely on
common llvm.masked.load/store nodes. This patch creates new ISD nodes
for LD1(S) & ST1 to remove this dependency.

Additionally, this adds support for sign & zero extending
loads and truncating stores.

Reviewers: sdesmalen, efriedma, cameron.mcinally, c-rhodes, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, andwar, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78204
2020-04-20 11:08:11 +01:00
Francesco Petrogalli 897fdec586 [llvm][CodeGen] Addressing modes for SVE stN.
This reverts commit 17b1869b72.

It is an attempt to fix the failure reported at

The patch differs from the original one reviwed at
https://reviews.llvm.org/D77435 only for the use of the std::make_tuple
in building the return value of `findAddrModeSVELoadStore`:

   -  return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset};
   +  return std::make_tuple(IsRegReg ? Opc_rr : Opc_ri, NewBase,

the original patch submitted at
fc4e954ed5
was failing the following build:

http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420/

with error:

/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10:
error: chosen constructor is explicit in copy-initialization
  return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset};
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   /usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19:
   note: explicit constructor declared here
           constexpr tuple(_UElements&&... __elements)
	                     ^
			     1 error generated.
2020-04-17 20:35:35 +01:00
Francesco Petrogalli 17b1869b72 Revert "[llvm][CodeGen] Addressing modes for SVE stN."
This reverts commit fc4e954ed5.

The commit reported the following failure:

http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420

FAILED: lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o
/usr/bin/c++   -DGTEST_HAS_RTTI=0 -D_DEBUG -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Target/AArch64 -I/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64 -I/usr/include/libxml2 -Iinclude -I/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/include -mthumb -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -O3  -fvisibility=hidden    -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MMD -MT lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o -MF lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o.d -o lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o -c /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10: error: chosen constructor is explicit in copy-initialization
  return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset};
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	   /usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here
	           constexpr tuple(_UElements&&... __elements)
2020-04-17 20:03:11 +01:00
Francesco Petrogalli fc4e954ed5 [llvm][CodeGen] Addressing modes for SVE stN.
Reviewers: efriedma, sdesmalen, c-rhodes, ctetreau

Reviewed By: c-rhodes

Subscribers: tschuett, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77435
2020-04-17 19:31:44 +01:00
Francesco Petrogalli 48879c02bf [llvm][CodeGen] Fix issue for SVE gather prefetch.
Summary:
This change is fixing an issue where the dagcombine incorrectly used an addressing mode with scaled offsets (indices), instead of unscaled offsets.
Those addressing modes do not exist for `prfh` , `prfw` and `prfd`, hence we can reuse `prfb` because that has unscaled offsets, and because the pseudo-code in the XML spec suggests that the element size is not used for the amount of data that is prefetched by the instruction.

FWIW, GCC also emits a `prfb` for these cases.

Reviewers: sdesmalen, andwar, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78069
2020-04-17 19:23:28 +01:00
Benjamin Kramer d1ef44982f [AArch64] Fold one-use variables into assert
Avoids unused variable warnings in Release builds.
2020-04-17 19:43:06 +02:00
Petre-Ionut Tudor cabfcf840a [ARM] Fix conditions for lowering to S[LR]I
Summary:
Fixed wrong conditions for generating (S[LR]I X, Y, C2) from
(or (and X, BvecC1), (lsl Y, C2)) and added ISel nodes to lower to S[LR]I. The
optimisation is also enabled by default now.

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77387
2020-04-17 17:19:24 +01:00
Benjamin Kramer 166467e822 [VectorUtils] Create shufflevector masks as int vectors instead of Constants
No functionality change intended.
2020-04-17 15:28:00 +02:00
Francesco Petrogalli 89680f25e8 [llvm][CodeGen] Rename SVE gather prefetch intrinsics. [NFC]
Summary:
The renaming is necessary to make the naming scheme uniform with other
gather/scatter load/stores SVE intrinsics.

The naming of variables and functions have been adapted to make it
explicit whether we are dealing with a scalar offset (which is
unscaled) or an index (which is scaled according to the data type of
the lanes of the vector).

Reviewers: andwar, sdesmalen, rengolin

Reviewed By: andwar

Subscribers: tschuett, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77839
2020-04-15 21:49:16 +01:00
Christopher Tetreault 05a079895c [SVE] Remove calls to getBitWidth from AArch64
Reviewers: efriedma

Reviewed By: efriedma

Subscribers: danielkiss, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77905
2020-04-14 10:26:37 -07:00
Craig Topper 113f37a1f9 [CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase
Differential Revision: https://reviews.llvm.org/D77995
2020-04-13 13:50:15 -07:00
Christopher Tetreault 9f87d951fc Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: mcrosier, efriedma, sdesmalen

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77269
2020-04-09 16:43:29 -07:00
Eli Friedman 1ee6ec2bf3 Remove "mask" operand from shufflevector.
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.

This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.

I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors.  Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it.  Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.

Differential Revision: https://reviews.llvm.org/D72467
2020-03-31 13:08:59 -07:00
Eli Friedman dacf8d3562 [AArch64][SVE] Add support for fcmp.
This also requires support for boolean "not", so I added boolean logic
while I was there.

Differential Revision: https://reviews.llvm.org/D76901
2020-03-31 12:04:39 -07:00
Kerry McLaughlin 05606329e2 [AArch64][SVE] Add SVE intrinsics for masked loads & stores
Summary:
Implements the following intrinsics for contiguous loads & stores:
  - @llvm.aarch64.sve.ld1
  - @llvm.aarch64.sve.st1

Reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, dancgr, rengolin

Reviewed By: cameron.mcinally

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76688
2020-03-25 11:48:40 +00:00
Amara Emerson 472d282046 [AArch64][GlobalISel] Don't localize TLS G_GLOBAL_VALUEs on Darwin.
On Darwin these need to be selected into a function call for the TLS
address lookup. As a result, they can't be moved below a physreg write,
which happens in call sequences. In the long term, we should have some
mechanism in the localizer to prevent localizing into target-specific
atomic instruction sequences.

rdar://60056248

Differential Revision: https://reviews.llvm.org/D76652
2020-03-24 13:35:50 -07:00
Andrzej Warzynski 0ea4fb5bb7 [AArch64][SVE] Rename intrinsics for gather prefetch [NFC]
Summary:
In order to keep the names consistent with other SVE gather loads, the
intrinsics for gather prefetch are renamed as follows:
  * @llvm.aarch64.sve.gather.prfb -> @llvm.aarch64.sve.prfb.gather

Reviewed by: fpetrogalli

Differential Revision: https://reviews.llvm.org/D76421
2020-03-19 12:53:36 +00:00
Sander de Smalen 4788ca450f [AArch64][SVE] Change pointer type of nontemporal load/store intrinsics
Summary:
This fixes a discrepancy between the non-temporal loads/store
intrinsics and other SVE load intrinsics (such as nf/ff), so
that Clang can use the same code to generate these intrinsics.

Reviewers: andwar, kmclaughlin, rengolin, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76237
2020-03-18 12:44:51 +00:00
Vitaly Buka f20dcc31e3 Fix unused function warning 2020-03-16 19:45:36 -07:00
Benjamin Kramer 05ff3323e0 [AArch64] Remove unused variable 2020-03-16 21:59:55 +01:00
Francesco Petrogalli 0f2b68d9c7 Implement IR intrinsics for gather prefetch.
Summary:
Intrinsics and relative codegen has been implemented for the following
SVE instructions:

1. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod>] -> 32-bit          scaled offset
2. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod>] -> 32-bit unpacked scaled offset
3. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.D]        -> 64-bit          scaled offset
4. PRF<T> <prfop>, <Pg>, [<Zn>.S{, #<imm>}]       -> 32-bit element
5. PRF<T> <prfop>, <Pg>, [<Zn>.D{, #<imm>}]       -> 64-bit element

The instructions are associated the following intrinsics, respectively:

1. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx4vi32(
          i8* %base,
          <vscale x 4 x i32> %offset,
          <vscale x 4 x i1> %Pg,
          i32 %prfop)

2. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx2vi32(
          i8* %base,
          <vscale x 2 x i32> %offset,
          <vscale x 2 x i1> %Pg,
          i32 %prfop)

3. void @llvm.aarch64.sve.gather.prf<T>.scaled.nx2vi64(
          i8* %base,
          <vscale x 2 x i64> %offset,
          <vscale x 2 x i1> %Pg,
          i32 %prfop)

4. void @llvm.aarch64.sve.gather.prf<T>.nx4vi32(
          <vscale x 4 x i32> %bases,
          i64 %imm,
          <vscale x 4 x i1> %Pg,
          i32 %prfop)

5. void @llvm.aarch64.sve.gather.prf<T>.nx2vi64(
          <vscale x 2 x i64> %bases,
          i64 %imm,
          <vscale x 2 x i1> %Pg,
          i32 %prfop)

The intrinsics are the IR counterpart of the following SVE ACLE functions:

* void svprf<T>(svbool_t pg, const void *base, svprfop op)
* void svprf<T>_vnum(svbool_t pg, const void *base, int64_t vnum, svprfop op)
* void svprf<T>_gather[_u32base](svbool_t pg, svuint32_t bases, svprfop op)
* void svprf<T>_gather[_u64base](svbool_t pg, svuint64_t bases, svprfop op)
* void svprf<T>_gather_[s32]offset(svbool_t pg, const void *base, svint32_t offsets, svprfop op)
* void svprf<T>_gather_[u32]offset(svbool_t pg, const void *base, svint32_t offsets, svprfop op)
* void svprf<T>_gather_[s64]offset(svbool_t pg, const void *base, svint64_t offsets, svprfop op)
* void svprf<T>_gather_[u64]offset(svbool_t pg, const void *base, svint64_t offsets, svprfop op)
* void svprf<T>_gather[_u32base]_offset(svbool_t pg, svuint32_t bases, int64_t offset, svprfop op)
* void svprf<T>_gather[_u64base]_offset(svbool_t pg, svuint64_t bases,int64_t offset, svprfop op)

Reviewers: andwar, sdesmalen, efriedma, rengolin

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75580
2020-03-16 18:52:35 +00:00
Andrzej Warzynski a0c15ed460 [AArch64][SVE] Add the @llvm.aarch64.sve.dup.x intrinsic
Summary:
This intrinsic implements the unpredicated duplication of scalar values
and is mapped to (through ISD::SPLAT_VECTOR):
  * DUP <Zd>.<T>, #<imm>
  * DUP <Zd>.<T>, <R><n|SP>

Reviewed by: sdesmalen

Differential Revision: https://reviews.llvm.org/D75900
2020-03-13 12:40:22 +00:00
Andrzej Warzynski 46b9f14d71 [AArch64][SVE] Add intrinsics for non-temporal scatters/gathers
Summary:
This patch adds the following intrinsics for non-temporal gather loads
and scatter stores:
  * aarch64_sve_ldnt1_gather_index
  * aarch64_sve_stnt1_scatter_index
These intrinsics implement the "scalar + vector of indices" addressing
mode.

As opposed to regular and first-faulting gathers/scatters, there's no
instruction that would take indices and then scale them. Instead, the
indices for non-temporal gathers/scatters are scaled before the
intrinsics are lowered to `ldnt1` instructions.

The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as
placeholders so that we can easily identify the cases implemented in
this patch in performGatherLoadCombine and performScatterStoreCombined.
Once encountered, they are replaced with:
  * GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1
  * SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1

The patterns for lowering ISD::SHL for scalable vectors (required by
this patch) were missing, so these are added too.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D75601
2020-03-12 13:55:56 +00:00
Andrzej Warzynski a9f1583228 [AArch64][SVE] Add the @llvm.aarch64.sve.sel intrinsic
Reviewers: sdesmalen, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75928
2020-03-11 17:05:21 +00:00
Djordje Todorovic c15c68abdc [CallSiteInfo] Enable the call site info only for -g + optimizations
Emit call site info only in the case of '-g' + 'O>0' level.

Differential Revision: https://reviews.llvm.org/D75175
2020-03-09 12:12:44 +01:00
Andrzej Warzynski 9249f60602 [AArch64][SVE] Add intrinsics for non-temporal gather-loads/scatter-stores
Summary:
This patch adds the following LLVM IR intrinsics for SVE:
1. non-temporal gather loads
  * @llvm.aarch64.sve.ldnt1.gather
  * @llvm.aarch64.sve.ldnt1.gather.uxtw
  * @llvm.aarch64.sve.ldnt1.gather.scalar.offset
2. non-temporal scatter stores
  * @llvm.aarch64.sve.stnt1.scatter
  * @llvm.aarch64.sve.ldnt1.gather.uxtw
  * @llvm.aarch64.sve.ldnt1.gather.scalar.offset
These intrinsic are mapped to the corresponding SVE instructions
(example for half-words, zero-extending):
  * ldnt1h { z0.s }, p0/z, [z0.s, x0]
  * stnt1h { z0.s }, p0/z, [z0.s, x0]

Note that for non-temporal gathers/scatters, the SVE spec defines only
one instruction type: "vector + scalar". For this reason, we swap the
arguments when processing intrinsics that implement the "scalar +
vector" addressing mode:
  * @llvm.aarch64.sve.ldnt1.gather
  * @llvm.aarch64.sve.ldnt1.gather.uxtw
  * @llvm.aarch64.sve.stnt1.scatter
  * @llvm.aarch64.sve.ldnt1.gather.uxtw
In other words, all intrinsics for gather-loads and scatter-stores
implemented in this patch are mapped to the same load and store
instruction, respectively.

The sve2_mem_gldnt_vs multiclass (and it's counterpart for scatter
stores) from SVEInstrFormats.td was split into:
  * sve2_mem_gldnt_vec_vs_32_ptrs (32bit wide base addresses)
  * sve2_mem_gldnt_vec_vs_62_ptrs (64bit wide base addresses)
This is consistent with what we did for
@llvm.aarch64.sve.ld1.scalar.offset and highlights the actual split in
the spec and the implementation.

Reviewed by: sdesmalen

Differential Revision: https://reviews.llvm.org/D74858
2020-03-02 10:38:28 +00:00
Andrzej Warzynski fa9439fac8 [AArch64][SVE] Add intrinsics for first-faulting gather loads
Summary:
The following intrinsics are added:
  * @llvm.aarch64.sve.ldff1.gather
  * @llvm.aarch64.sve.ldff1.gather.index
  * @llvm.aarch64.sve.ldff1.gather_sxtw
  * @llvm.aarch64.sve.ldff1.gather.uxtw
  * @llvm.aarch64.sve.ldff1.gather_sxtw.index
  * @llvm.aarch64.sve.ldff1.gather.uxtw.index
  * @llvm.aarch64.sve.ldff1.gather.scalar.offset

Although this patch is quite substantial, the vast majority of the
implementation is just a 'copy & paste' of the implementation of regular
gather loads, including tests. There's only a handful of new
definitions:
  * AArch64ISD nodes defined in AArch64ISelLowering.h (e.g. GLDFF1)
  * Seleciton DAG Types in AArch64SVEInstrInfo.td (e.g.
    AArch64ldff1_gather)
  * intrinsics in IntrinsicsAArch64.td (e.g. aarch64_sve_ldff1_gather)
  * Pseudo instructions in SVEInstrFormats.td to workaround the issue of
    use-before-def for the FFR register.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D75128
2020-02-27 12:56:33 +00:00
Sjoerd Meijer 13db7490fa [AArch64] Peephole optimization: merge AND and TST instructions
In some cases Clang does not perform merging of instructions AND and TST (aka
ANDS xzr).

Example:

  tst x2, x1
  and x3, x2, x1

to:

  ands x3, x2, x1

This patch add such merging during instruction selection: when AND is replaced
with ANDS instruction in LowerSELECT_CC, all users of AND also should be
changed for using this ANDS instruction

Short discussion on mailing list:
http://llvm.1065342.n5.nabble.com/llvm-dev-ARM-Peephole-optimization-instructions-tst-add-tp133109.html

Patch by Pavel Kosov.

Differential Revision: https://reviews.llvm.org/D71701
2020-02-27 09:23:47 +00:00
Craig Topper 735d27dc40 [SelectionDAG][PowerPC][AArch64][X86][ARM] Add chain input and output the ISD::FLT_ROUNDS_
This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order.

This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code.

Differential Revision: https://reviews.llvm.org/D75132
2020-02-25 16:58:23 -08:00
Andrzej Warzynski cff90c938b [AArch64][SVE] Update names and comments for gathers/scatters (NFC)
Summary:
This patch renames functions and TableGen classes for SVE gathers and
scatters. The original names implied that the corresponding
methods/classes are only suited for regular gathers/scatters (i.e. LD1
and ST1), which is not the case. Indeed, we will be re-using them for
non-temporal and first-faulting gathers/scatters in the forthcoming
patches. The new names also highlight the split into Vector-Scalar (VS)
and Scalar-Vector (SV) cases.

List of changes:
* `performLD1GatherCombine` and `performST1ScatterCombine` are renamed
  as `performGatherLoadCombine` and `performScatterStoreCombine`,
  respectively.
* Selection DAG types for scatters and gathers from
  AArch64SVEInstrInfo.td are renamed. For example, `SDT_AArch64_GLD1` is
  renamed as `SDT_AArch64_GATHER_SV`. SV stands for Scalar-Vector, as
  opposed to Vector-Scalar (VS).
* The intrinsic classes from IntrinsicsAArch64.td are renamed. For
  example, `AdvSIMD_GatherLoad_64bitOffset_Intrinsic` is renamed as
  `AdvSIMD_GatherLoad_SV_64b_Offsets_Intrinsic`.
* Updated comments in `performGatherLoadCombine` and
  `performScatterStoreCombine`.

Reviewers: sdesmalen, rengolin, efriedma

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75035
2020-02-25 11:09:01 +00:00
Cullen Rhodes 72848f26b4 [AArch64][SVE] Add predicate reinterpret intrinsics
Summary:
Implements the following intrinsics:

    * llvm.aarch64.sve.convert.to.svbool
    * llvm.aarch64.sve.convert.from.svbool

For converting the ACLE svbool_t type (<n x 16 x i1>) to and from the
other predicate types: <n x 8 x i1>, <n x 4 x i1> and <n x 2 x i1>.

Reviewers: sdesmalen, kmclaughlin, efriedma, dancgr, rengolin

Reviewed By: sdesmalen, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74471
2020-02-25 10:24:06 +00:00
Kerry McLaughlin f87f23c81c [AArch64][SVE] Add the SVE dupq_lane intrinsic
Summary:
Implements the @llvm.aarch64.sve.dupq.lane intrinsic.

As specified in the ACLE, the behaviour of:
  svdupq_lane_u64(data, index)

...is identical to:
  svtbl(data, svadd_x(svptrue_b64(),
                      svand_x(svptrue_b64(), svindex_u64(0, 1), 1),
                      index * 2))

If the index is in the range [0,3], the operation is equivalent
to a single DUP (.q) instruction.

Reviewers: sdesmalen, c-rhodes, cameron.mcinally, efriedma, dancgr, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74734
2020-02-24 13:59:47 +00:00
Craig Topper 9bbf271fc9 [AArch64] Move isOverflowIntrOpRes help function to the ISD namespace in SelectionDAG.h. NFC
Enables sharing with an upcoming X86 change.
2020-02-20 08:50:17 -08:00
Djordje Todorovic 2f215cf36a Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGfaff707db82d.
A failure found on an ARM 2-stage buildbot.
The investigation is needed.
2020-02-20 14:41:39 +01:00
Cameron McInally 3931734990 [AArch64][SVE] Add initial backend support for FP splat_vector
Differential Revision: https://reviews.llvm.org/D74632
2020-02-19 10:19:11 -06:00
Djordje Todorovic faff707db8 Reland "[DebugInfo] Enable the debug entry values feature by default"
Differential Revision: https://reviews.llvm.org/D73534
2020-02-19 11:12:26 +01:00
Djordje Todorovic 2bf44d11cb Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGa82d3e8a6e67.
2020-02-18 16:38:11 +01:00
Djordje Todorovic a82d3e8a6e Reland "[DebugInfo] Enable the debug entry values feature by default"
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-18 14:41:08 +01:00
Sander de Smalen a7a96c726e [AArch64] Implement passing SVE vectors by ref for AAPCS.
Summary:
This patch implements the part of the calling convention
where SVE Vectors are passed by reference. This means the
caller must allocate stack space for these objects and
pass the address to the callee.

Reviewers: efriedma, rovka, cameron.mcinally, c-rhodes, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71216
2020-02-17 15:20:28 +00:00
Kerry McLaughlin 633db60f3e [AArch64][SVE] Add SVE index intrinsic
Summary:
Implements the @llvm.aarch64.sve.index intrinsic, which
takes a scalar base and step value.

This patch also adds the printSImm function to AArch64InstPrinter
to ensure that immediates of type i8 & i16 are printed correctly.

Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin

Reviewed By: cameron.mcinally

Subscribers: tatyana-krasnukha, tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74550
2020-02-17 10:30:11 +00:00
Pavel Iliin b6a9fe2099 [AArch64] Add BIT/BIF support.
This patch added generation of SIMD bitwise insert BIT/BIF instructions.
In the absence of GCC-like functionality for optimal constraints satisfaction
during register allocation the bitwise insert and select patterns are matched
by pseudo bitwise select BSP instruction with not tied def.
It is expanded later after register allocation with def tied
to BSL/BIT/BIF depending on operands registers.
This allows to get rid of redundant moves.

Reviewers: t.p.northover, samparker, dmgreen

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D74147
2020-02-14 14:19:39 +00:00
Kerry McLaughlin 671cbc1fbb [AArch64][SVE] Add mul/mla/mls lane & dup intrinsics
Summary:
Implements the following intrinsics:
 - @llvm.aarch64.sve.dup
 - @llvm.aarch64.sve.mul.lane
 - @llvm.aarch64.sve.mla.lane
 - @llvm.aarch64.sve.mls.lane

Reviewers: c-rhodes, sdesmalen, dancgr, efriedma, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74222
2020-02-13 10:32:59 +00:00
Djordje Todorovic 97ed706a96 Revert "[DebugInfo] Enable the debug entry values feature by default"
This reverts commit rG9f6ff07f8a39.

Found a test failure on clang-with-thin-lto-ubuntu buildbot.
2020-02-12 11:59:04 +01:00
Djordje Todorovic 9f6ff07f8a [DebugInfo] Enable the debug entry values feature by default
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-12 10:25:14 +01:00
Craig Topper eeb63944e4 [LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results.
Remove code from LegalizeTypes that allowed this to work.

We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
2020-02-08 09:52:31 -08:00
Guillaume Chatelet f85d3408e6 [NFC] Introduce an API for MemOp
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73964
2020-02-07 11:32:27 +01:00
Reid Kleckner 2d89e0a098 [SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr
The CATCHPAD node mostly existed to be selected into the EH_RESTORE
instruction, which sets the frame back up when 32-bit Windows exceptions
return to the parent function. However, creating this MachineInstr early
increases the risk that other passes will come along and insert
instructions that use the stack before ESP and EBP are restored. That
happened in PR44697.

Instead of representing these in the instruction stream early, delay it
until PEI. Mark the blocks where this needs to happen as EHPads, but not
funclet entry blocks. Passes after PEI have to be careful not to hoist
instructions that can use stack across frame setup instructions, so this
should be relatively reliable.

Fixes PR44697

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D73752
2020-02-04 15:13:12 -08:00
Guillaume Chatelet b8144c0536 [NFC] Encapsulate MemOp logic
Summary:
This patch simply introduces functions instead of directly accessing the fields.
This helps introducing additional check logic. A second patch will add simplifying functions.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73945
2020-02-04 10:36:26 +01:00
Guillaume Chatelet 333f2ad8b8 [Alignment][NFC] Use Align for getMemcpy/Memmove/Memset
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73885
2020-02-03 17:13:19 +01:00
John Brawn 68cf574857 [FPEnv][AArch64] Add lowering of f128 STRICT_FSETCC
These get lowered to function calls, like the non-strict versions.

Differential Revision: https://reviews.llvm.org/D73784
2020-02-03 14:39:16 +00:00
Guillaume Chatelet 3c89b75f23 [NFC] Introduce a type to model memory operation
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73785
2020-01-31 17:29:01 +01:00
John Brawn 0bb9a27c98 [FPEnv][AArch64] Add lowering and instruction selection for strict conversions
Strict fp-to-int and int-to-fp conversions can be handled in the same way that
the non-strict versions are (by using the appropriate instruction or converting
to a function call when we have no instruction).

Differential Revision: https://reviews.llvm.org/D73625
2020-01-30 13:50:06 +00:00
John Brawn 258d8dd76a [FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUND
This gets selected to the appropriate fcvt instruction. Handling from there on
isn't fully correct yet, as we need to model fcvt reading and writing to fpsr
and fpcr.

Differential Revision: https://reviews.llvm.org/D73201
2020-01-30 12:51:25 +00:00
John Brawn 2224407ef5 Add lowering of STRICT_FSETCC and STRICT_FSETCCS
These become STRICT_FCMP and STRICT_FCMPE, which then get selected to the
corresponding FCMP and FCMPE instructions, though the handling from there on
isn't fully correct as we don't model reads and writes to FPCR and FPSR.

Differential Revision: https://reviews.llvm.org/D73368
2020-01-30 10:40:55 +00:00
Kerry McLaughlin aa0f37e14a [AArch64][SVE] Add first-faulting load intrinsic
Summary:
Implements the llvm.aarch64.sve.ldff1 intrinsic and DAG
combine rules for first-faulting loads with sign & zero extends

Reviewers: sdesmalen, efriedma, andwar, dancgr, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73025
2020-01-23 11:57:16 +00:00
Sander de Smalen 4cf16efe49 [AArch64][SVE] Add patterns for unpredicated load/store to frame-indices.
This patch also fixes up a number of cases in DAGCombine and
SelectionDAGBuilder where the size of a scalable vector is used in a
fixed-width context (thus triggering an assertion failure).

Reviewers: efriedma, c-rhodes, rovka, cameron.mcinally

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71215
2020-01-22 14:32:27 +00:00
Kerry McLaughlin cdcc4f2a44 [AArch64][SVE] Add intrinsic for non-faulting loads
Summary:
This patch adds the llvm.aarch64.sve.ldnf1 intrinsic, plus
DAG combine rules for non-faulting loads and sign/zero extends

Reviewers: sdesmalen, efriedma, andwar, dancgr, mgudim, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71698
2020-01-22 11:15:20 +00:00
Sander de Smalen 67d4c9924c Add support for (expressing) vscale.
In LLVM IR, vscale can be represented with an intrinsic. For some targets,
this is equivalent to the constexpr:

  getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1

This can be used to propagate the value in CodeGenPrepare.

In ISel we add a node that can be legalized to one or more
instructions to materialize the runtime vector length.

This patch also adds SVE CodeGen support for VSCALE, which maps this
node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD]
instructions (scaled multiples of 2, 4, or 8 bytes, respectively).

Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner

Reviewed by: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68203
2020-01-22 10:09:27 +00:00
Florian Hahn 535ed62c5f [AArch64] Add custom store lowering for 256 bit non-temporal stores.
Currently we fail to lower non-termporal stores for 256+ bit vectors
to STNPQ, because type legalization will split them up to 128 bit stores
and because there are no single non-temporal stores, creating STPNQ
in the Load/Store optimizer would be quite tricky.

This patch adds custom lowering for 256 bit non-temporal vector stores
to improve the generated code.

Reviewers: dmgreen, samparker, t.p.northover, ab

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D72919
2020-01-21 14:53:40 -08:00
Andrzej Warzynski 7e717b3990 [AArch64][SVE] Extend int_aarch64_sve_ld1_gather_imm
The ACLE distinguishes between the following addressing modes for gather
loads:
  * "scalar base, vector offset", and
  * "vector base, scalar offset".
For the "vector base, scalar offset" case, the
`int_aarch64_sve_ld1_gather_imm` intrinsic was added in 79f2422d.
Currently, that intrinsic assumes that the scalar offset is passed as an
immediate.  As a result, it does not cater for cases where scalar offset
is stored in a register.

In this patch `int_aarch64_sve_ld1_gather_imm` is extended so that all
cases are covered:
* `int_aarch64_sve_ld1_gather_imm` is renamed as
  `int_aarch64_sve_ld1_gather_scalar_offset`
* new DAG combine rules are added for GLD1_IMM for scenarios where the
  offset is a non-immediate scalar or an out-of-range immediate
* sve-intrinsics-gather-loads-vector-base.ll is renamed as
  sve-intrinsics-gather-loads-vector-base-imm-offset.ll
* sve-intrinsics-gather-loads-vector-base-scalar-offset.ll is added to test
  file for non-immediate offsets

Similar changes are made for scatter store intrinsics.

Reviewed By: sdesmalen, efriedma

Differential Revision: https://reviews.llvm.org/D71773
2020-01-20 12:19:18 +00:00
Matt Arsenault 0d0fce42b0 GlobalISel: Preserve load/store metadata in IRTranslator
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.

Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
2020-01-16 13:49:43 -05:00
Benjamin Kramer 06cfcdcca7 [AArch64][SVE] Fold variable into assert to silence unused variable warnings in Release builds 2020-01-15 12:50:27 +01:00
Cullen Rhodes 93a4dede3a [AArch64][SVE] Add ptest intrinsics
Summary:
Implements the following intrinsics:

    * @llvm.aarch64.sve.ptest.any
    * @llvm.aarch64.sve.ptest.first
    * @llvm.aarch64.sve.ptest.last

Reviewers: sdesmalen, efriedma, dancgr, mgudim, cameron.mcinally, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72398
2020-01-15 11:15:01 +00:00
Danilo Carvalho Grael 2d7e757a83 [AArch64][SVE] Add patterns for some arith SVE instructions.
Summary: Add patterns for the following instructions:
- smax, smin, umax, umin

Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin

Subscribers: amehsan

Differential Revision: https://reviews.llvm.org/D71779
2020-01-13 11:39:42 -05:00
KAWASHIMA Takahiro 10c11e4e2d This option allows selecting the TLS size in the local exec TLS model,
which is the default TLS model for non-PIC objects. This allows large/
many thread local variables or a compact/fast code in an executable.

Specification is same as that of GCC. For example, the code model
option precedes the TLS size option.

TLS access models other than local-exec are not changed. It means
supoort of the large code model is only in the local exec TLS model.

Patch By KAWASHIMA Takahiro (kawashima-fj <t-kawashima@fujitsu.com>)
Reviewers: dmgreen, mstorsjo, t.p.northover, peter.smith, ostannard
Reviewd By: peter.smith
Committed by: peter.smith

Differential Revision: https://reviews.llvm.org/D71688
2020-01-13 10:16:53 +00:00
Jessica Paquette ceb801612a [AArch64] Don't generate libcalls for wide shifts on Darwin
Similar to cff90f07cb.

Darwin doesn't always use compiler-rt, and so we can't assume that these
functions are available (at least on arm64).
2020-01-10 15:58:51 -08:00
Matt Arsenault 255cc5a760 CodeGen: Use LLT instead of EVT in getRegisterByName
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
2020-01-09 17:37:52 -05:00
QingShan Zhang 2133d3c558 [DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG for vector type as 'expand' instead of 'legal'
For now, we didn't set the default operation action for SIGN_EXTEND_INREG for
vector type, which is 0 by default, that is legal. However, most target didn't
have native instructions to support this opcode. It should be set as expand by
default, as what we did for ANY_EXTEND_VECTOR_INREG.

Differential Revision: https://reviews.llvm.org/D70000
2020-01-03 03:26:41 +00:00
Jay Foad 13a7a4ccbf Remove unneeded extra variable realArgIdx. NFC. 2020-01-02 14:27:32 +00:00
Andrzej Warzynski 404da13e1e [AArch64][SVE] Gather loads: pass 32 bit unpacked offsets as nxv2i32
Summary:
Currently 32 bit unpacked offsets are passed as nxv2i64. However, as
pointed out in https://reviews.llvm.org/D71074, using nxv2i32 instead
would improve consistency with:
  * how other arguments are treated
  * how scatter stores are implemented
This patch makes sure that 32 bit unpacked offsets are passes as nxv2i32
instead of nxv2i64.

Reviewers: sdesmalen, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71724
2020-01-02 13:01:28 +00:00
Craig Topper 4ae3120ed8 [LegalizeVectorOps][AArch64] Stop asking for v4f16 fp_round and fp_extend to be promoted.
These operations are needed as building blocks for promoting so they
can't be promoted themselves.

This appeared to work because the fp_extend query type for operation
actions is the result type, not the input type so it never triggered
in the legalizer.

For fp_round, the vector op legalizer just ended up creating a
nop fp_extend that was elided by getNode, followed by a nop
fp_round that was also elided by getNode. This was followed by
a final fp_round from v4f32 back to vf416 which was CSEd to the
original node. Then legalize vector ops just believed that node
legalized to itself. LegalizeDAG took another crack at promoting
it, but didn't have a handler so just skipped it with a debug
message saying it wasn't promoted.

This patch just removes the operation actions to avoid this
non-sense. Found while trying to refactor LegalizeVectorOps to
handle multiple result nodes better.
2019-12-31 15:04:12 -08:00
Sanjay Patel 8cefc37be5 [DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast()) support
This moves the X86 specific transform from rL364407
into DAGCombiner to generically handle 'little to big' cases
(for example: extract_subvector(v2i64 bitcast(v16i8))). This
allows us to remove both the x86 implementation and the aarch64
bitcast(extract_subvector(bitcast())) combine.

Earlier patches that dealt with regressions initially exposed
by this patch:
rG5e5e99c041e4
rG0b38af89e2c0

Patch by: @RKSimon (Simon Pilgrim)

Differential Revision: https://reviews.llvm.org/D63815
2019-12-23 10:11:45 -05:00
Martin Storsjö 5a751e747d [AArch64] [Windows] Use COFF stubs for calls to extern_weak functions
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).

Improve the classifyGlobalFunctionReference method to set
MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in
AArch64TargetLowering::LowerCall to use the return value from
classifyGlobalFunctionReference for these cases.

Add code in both AArch64FastISel and GlobalISel/IRTranslator to
bail out for function calls to extern weak functions on windows,
to let SelectionDAG handle them.

This matches what was done for X86 in 6bf108d77a.

Differential Revision: https://reviews.llvm.org/D71721
2019-12-23 12:13:49 +02:00
Sanjay Patel 0b38af89e2 [AArch64] match splat of bitcasted extract subvector to DUPLANE
This is another potential regression exposed by D63815.

Here we peek through a bitcast to find an extract subvector and
scale the splat offset based on that:
splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC'

Differential Revision: https://reviews.llvm.org/D71672
2019-12-22 08:37:03 -05:00
Cullen Rhodes 974f00a436 [AArch64][SVE] Fold constant multiply of element count
Summary:
E.g.

  %0 = tail call i64 @llvm.aarch64.sve.cntw(i32 31)
  %mul = mul i64 %0, <const>

Should emit:

  cntw    x0, all, mul #<const>

For <const> in the range 1-16.

Patch by Kerry McLaughlin

Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71014
2019-12-20 11:58:00 +00:00
Cullen Rhodes 3f9005eb89 Recommit "[AArch64][SVE] Add permutation and selection intrinsics"
Recommit 23c28c4043 (reverted in
dcb48f50bd) with a fix for an assert
"Request for a fixed size on a scalable object" being triggered in
`LowerSVEIntrinsicEXT`. The fix is to call `getKnownMinSize` on the
TypeSize object.
2019-12-20 10:45:17 +00:00
Cullen Rhodes dcb48f50bd Revert "[AArch64][SVE] Add permutation and selection intrinsics"
This reverts commit 23c28c4043.

It caused build failures in the following expensive checks builders:

    http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1295
    http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/700

Reverting for now whilst I figure what the issue is.
2019-12-19 14:26:14 +00:00
Cullen Rhodes 23c28c4043 [AArch64][SVE] Add permutation and selection intrinsics
Summary:
Adds the following intrinsics:

    * @llvm.aarch64.sve.clasta
    * @llvm.aarch64.sve.clasta_n
    * @llvm.aarch64.sve.clastb
    * @llvm.aarch64.sve.clastb_n
    * @llvm.aarch64.sve.compact
    * @llvm.aarch64.sve.ext
    * @llvm.aarch64.sve.lasta
    * @llvm.aarch64.sve.lastb
    * @llvm.aarch64.sve.rev
    * @llvm.aarch64.sve.splice
    * @llvm.aarch64.sve.tbl
    * @llvm.aarch64.sve.trn1
    * @llvm.aarch64.sve.trn2
    * @llvm.aarch64.sve.uzp1
    * @llvm.aarch64.sve.uzp2
    * @llvm.aarch64.sve.zip1
    * @llvm.aarch64.sve.zip2

Reviewers: sdesmalen, efriedma, dancgr, mgudim, huntergr, rengolin

Reviewed By: sdesmalen, efriedma

Subscribers: kmclaughlin, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71401
2019-12-19 13:18:40 +00:00
Cullen Rhodes 49199465a3 [AArch64][SVE] Implement ptrue intrinsic
Reviewers: sdesmalen, eli.friedman, dancgr, mgudim, cameron.mcinally,
huntergr, efriedma

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71457
2019-12-19 11:02:05 +00:00
Victor Campos 364b8f5fbe [AArch64] Improve codegen of volatile load/store of i128
Summary:
Instead of generating two i64 instructions for each load or store of a
volatile i128 value (two LDRs or STRs), now emit a single LDP or STP.

Reviewers: labrinea, t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69559
2019-12-18 10:03:12 +00:00
Andrzej Warzynski 7e20c3a71d [Aarch64][SVE] Add intrinsics for scatter stores
Summary:
This patch adds the following SVE intrinsics for scatter stores:
* 64-bit offsets:
  * @llvm.aarch64.sve.st1.scatter (unscaled)
  * @llvm.aarch64.sve.st1.scatter.index (scaled)
* 32-bit unscaled offsets:
  * @llvm.aarch64.sve.st1.scatter.uxtw (zero-extended offset)
  * @llvm.aarch64.sve.st1.scatter.sxtw (sign-extended-offset)
* 32-bit scaled offsets:
  * @llvm.aarch64.sve.st1.scatter.uxtw.index (zero-extended offset)
  * @llvm.aarch64.sve.st1.scatter.sxtw.index (sign-extended offset)
* vector base + immediate:
  * @llvm.aarch64.sve.st1.scatter.imm

Reviewers: rengolin, efriedma, sdesmalen

Reviewed By: efriedma, sdesmalen

Subscribers: kmclaughlin, eli.friedman, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71074
2019-12-16 11:52:53 +00:00
Alex Richardson be15dfa88f [NFC] Use EVT instead of bool for getSetCCInverse()
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).

In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.

Committing this change will significantly reduce our merge conflicts
for each upstream merge.

Reviewers: spatel, bogner

Reviewed By: bogner

Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70917
2019-12-13 12:22:03 +00:00
Kerry McLaughlin 4194ca8e5a Recommit "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"
Updated pred_load patterns added to AArch64SVEInstrInfo.td by this patch
to use reg + imm non-temporal loads to fix previous test failures.

Original commit message:

Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.
2019-12-13 10:08:20 +00:00
Cullen Rhodes bbd16b6876 [AArch64][SVE] Remove nxv1f32 and nxv1f64 as legal types
Summary: Also cleans up ZPR register class definition.

Reviewers: sdesmalen, cameron.mcinally, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71351
2019-12-12 09:49:22 +00:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
Kerry McLaughlin c0a3ab3655 Revert "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"
This reverts commit 3f5bf35f86 as it was
causing build failures in llvm-clang-x86_64-expensive-checks:

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/392
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1045
2019-12-11 13:58:39 +00:00
Andrzej Warzynski 65651f197a [AArch64][SVE] Add DAG combine rules for gather loads and sext/zext
Summary:
These changes allow us to support sign-extending gather loads with the
exisiting intrinsics (i.e. @llvm.aarch64.sve.ld1.gather.*).

Reviewers: sdesmalen, huntergr, kmclaughlin, efriedma, rengolin, rovka, dancgr, mgudim

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential revision: https://reviews.llvm.org/D70812
2019-12-11 12:56:18 +00:00
Kerry McLaughlin 3f5bf35f86 [AArch64][SVE] Implement intrinsics for non-temporal loads & stores
Summary:
Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.

Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71000
2019-12-11 11:13:51 +00:00
Cullen Rhodes 1b9a608c84 [AArch64][SVE] Add wide compare immediate patterns
Summary:
Recognize wide compares where the wide operand is a splat of a scalar
value in the appropriate range and convert to the immediate variant of
the instruction.

Patch by Graham Hunter

Reviewers: sdesmalen, efriedma, dancgr, rovka, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71009
2019-12-10 10:41:22 +00:00
Eli Friedman f1ddef34f1 [AArch64][SVE] Implement SPLAT_VECTOR for i1 vectors.
The generated sequence with whilelo is unintuitive, but it's the best
I could come up with given the limited number of SVE instructions that
interact with scalar registers. The other sequence I was considering
was something like dup+cmpne, but an extra scalar instruction seems
better than an extra vector instruction.

Differential Revision: https://reviews.llvm.org/D71160
2019-12-09 15:09:33 -08:00
Danilo Carvalho Grael b29916cec3 [AArch64][SVE] Integer reduction instructions pattern/intrinsics.
Added pattern matching/intrinsics for the following SVE instructions:

-- saddv, uaddv
-- smaxv, sminv, umaxv, uminv
-- orv, eorv, andv
2019-12-05 09:59:19 -05:00
Sander de Smalen 79f2422d6a [Aarch64][SVE] Add intrinsics for gather loads (vector + imm)
This patch adds intrinsics for SVE gather loads from memory addresses generated by a vector base plus immediate index:
  * @llvm.aarch64.sve.ld1.gather.imm

This intrinsics maps 1-1 to the corresponding SVE instruction (example for half-words):
  * ld1h { z0.d }, p0/z, [z0.d, #16]

Committed on behalf of Andrzej Warzynski (andwar)

Reviewers: sdesmalen, huntergr, kmclaughlin, eli.friedman, rengolin, rovka, dancgr, mgudim, efriedma

Reviewed By: sdesmalen

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70806
2019-12-03 15:19:16 +00:00
Sander de Smalen 8bf31e28d7 [Aarch64][SVE] Add intrinsics for gather loads with 32-bits offsets
This patch adds intrinsics for SVE gather loads for which the offsets are 32-bits wide and are:
* unscaled
  * @llvm.aarch64.sve.ld1.gather.sxtw
  * @llvm.aarch64.sve.ld1.gather.uxtw
* scaled (offsets become indices)
  * @llvm.arch64.sve.ld1.gather.sxtw.index
  * @llvm.arch64.sve.ld1.gather.uxtw.index
The offsets are either zero (uxtw) or sign (sxtw) extended to 64 bits.

These intrinsics map 1-1 to the corresponding SVE instructions (examples for half-words):
* unscaled
  * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw]
  * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw]
* scaled
  * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw #1]
  * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw #1]

Committed on behalf of Andrzej Warzynski (andwar)

Reviewers: sdesmalen, kmclaughlin, eli.friedman, rengolin, rovka, huntergr, dancgr, mgudim, efriedma

Reviewed By: sdesmalen

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70782
2019-12-03 14:48:29 +00:00
Sander de Smalen 6e51ceba53 [AArch64][SVE] Add intrinsics for gather loads with 64-bit offsets
This patch adds the following intrinsics for gather loads with 64-bit offsets:
      * @llvm.aarch64.sve.ld1.gather (unscaled offset)
      * @llvm.aarch64.sve.ld1.gather.index (scaled offset)

These intrinsics map 1-1 to the following AArch64 instructions respectively (examples for half-words):
      * ld1h { z0.d }, p0/z, [x0, z0.d]
      * ld1h { z0.d }, p0/z, [x0, z0.d, lsl #1]

Committing on behalf of Andrzej Warzynski (andwar)

Reviewers: sdesmalen, huntergr, rovka, mgudim, dancgr, rengolin, efriedma

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70542
2019-12-03 12:55:03 +00:00
Kerry McLaughlin 7483eb656f [AArch64][SVE] Implement shift intrinsics
Summary:
Adds the following intrinsics:
- asr & asrd
- insr
- lsl & lsr

This patch also adds a new AArch64ISD node (INSR) to represent the int_aarch64_sve_insr intrinsic.

Reviewers: huntergr, sdesmalen, dancgr, mgudim, rengolin, efriedma

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70437
2019-12-03 11:47:12 +00:00
Matt Arsenault b696b9dba7 DAG: Add function context to isFMAFasterThanFMulAndFAdd
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.

AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
2019-11-19 19:25:26 +05:30
Graham Hunter 3f08ad611a [SVE][CodeGen] Scalable vector MVT size queries
* Implements scalable size queries for MVTs, split out from D53137.

* Contains a fix for FindMemType to avoid using scalable vector type
  to contain non-scalable types.

* Explicit casts for several places where implicit integer sign
  changes or promotion from 32 to 64 bits caused problems.

* CodeGenDAGPatterns will treat scalable and non-scalable vector types
  as different.

Reviewers: greened, cameron.mcinally, sdesmalen, rovka

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D66871
2019-11-18 12:30:59 +00:00
Sander de Smalen 84a0c8e3ae [AArch64][SVE] Spilling/filling of SVE callee-saves.
Implement the spills/fills of callee-saved SVE registers using STR and LDR
instructions.

Also adds the `aarch64_sve_vector_pcs` attribute to specify the
callee-saved registers to be used for functions that return SVE vectors or
take SVE vectors as arguments. The callee-saved registers are vector
registers z8-z23 and predicate registers p4-p15.

The overal frame-layout with SVE will be as follows:

   +-------------+
   | stack args  |
   +-------------+
   | Callee Saves|
   |   X29, X30  |
   |-------------| <- FP
   | SVE Callee  | < //////////////
   | saved regs  | < //////////////
   |    z23      | < //////////////
   |     :       | < // SCALABLE //
   |    z8       | < //////////////
   |    p15      | < /// STACK ////
   |     :       | < //////////////
   |    p4       | < //// AREA ////
   +-------------+ < //////////////
   |     :       | < //////////////
   |  SVE locals | < //////////////
   |     :       | < //////////////
   +-------------+
   |/////////////| alignment gap.
   |     :       |
   | Stack objs  |
   |     :       |
   +-------------+ <- SP after call and frame-setup

Reviewers: cameron.mcinally, efriedma, greened, thegameg, ostannard, rengolin

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D68996
2019-11-11 09:03:19 +00:00
Eli Friedman 5df3a87224 [AArch64][X86] Don't assume __powidf2 is available on Windows.
We had some code for this for 32-bit ARM, but this doesn't really need
to be in target-specific code; generalize it.

(I think this started showing up recently because we added an
optimization that converts pow to powi.)

Differential Revision: https://reviews.llvm.org/D69013
2019-11-08 12:43:21 -08:00
David Green 2179867ddc [AArch64] Select saturating Neon instructions
This adds some extra patterns to select AArch64 Neon SQADD, UQADD, SQSUB
and UQSUB from the existing target independent sadd_sat, uadd_sat,
ssub_sat and usub_sat nodes.

It does not attempt to replace the existing int_aarch64_neon_uqadd
intrinsic nodes as they are apparently used for both scalar and vector,
and need to be legal on scalar types for some of the patterns to work.
The int_aarch64_neon_uqadd on scalar would move the two integers into
floating point registers, perform a Neon uqadd and move the value back.
I don't believe this is good idea for uadd_sat to do the same as the
scalar alternative is simpler (an adds with a csinv). For signed it may
be smaller, but I'm not sure about it being better.

So this just adds some extra patterns for the existing vector
instructions, matching on the _sat nodes.

Differential Revision: https://reviews.llvm.org/D69374
2019-10-31 17:28:36 +00:00
Ehsan Amiri ed7bcb2cb1 [AArch64][SVE] Add patterns for some integer vector instructions
Add pattern matching for SVE vector instructions:

-- add, sub, and, or, xor instructions
-- sqadd, uqadd, sqsub, uqsub target-independent intrinsics
-- bic intrinsics
-- predicated add, sub, subr intrinsics

Patch Review: https://reviews.llvm.org/D69128
Patch authored by: dancgr (Danilo Carvalho Grael)
2019-10-30 21:52:19 -04:00
Andrew Paverd d157a9bc8b Add Windows Control Flow Guard checks (/guard:cf).
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.

Reviewers: thakis, rnk, theraven, pcc

Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65761
2019-10-28 15:19:39 +00:00
Kerry McLaughlin da720a38b9 [AArch64][SVE] Implement masked load intrinsics
Summary:
Adds support for codegen of masked loads, with non-extending,
zero-extending and sign-extending variants.

Reviewers: huntergr, rovka, greened, dmgreen

Reviewed By: dmgreen

Subscribers: dmgreen, samparker, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68877
2019-10-28 10:06:14 +00:00
Graham Hunter 84da2596f9 [AArch64][SVE] Add SPLAT_VECTOR ISD Node
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.

Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.

At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.

I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.

Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy

Reviewed By: jmolloy

Differential Revision: https://reviews.llvm.org/D47775

llvm-svn: 375222
2019-10-18 11:48:35 +00:00
Kerry McLaughlin 0c7cc383e5 [AArch64][SVE] Implement unpack intrinsics
Summary:
Implements the following intrinsics:
  - int_aarch64_sve_sunpkhi
  - int_aarch64_sve_sunpklo
  - int_aarch64_sve_uunpkhi
  - int_aarch64_sve_uunpklo

This patch also adds AArch64ISD nodes for UNPK instead of implementing
the intrinsics directly, as they are required for a future patch which
implements the sign/zero extension of legal vectors.

This patch includes tests for the Subdivide2Argument type added by D67549

Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka

Reviewed By: greened

Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D67550

llvm-svn: 375210
2019-10-18 09:40:16 +00:00
Graham Hunter b302561b76 [SVE][IR] Scalable Vector size queries and IR instruction support
* Adds a TypeSize struct to represent the known minimum size of a type
  along with a flag to indicate that the runtime size is a integer multiple
  of that size
* Converts existing size query functions from Type.h and DataLayout.h to
  return a TypeSize result
* Adds convenience methods (including a transparent conversion operator
  to uint64_t) so that most existing code 'just works' as if the return
  values were still scalars.
* Uses the new size queries along with ElementCount to ensure that all
  supported instructions used with scalable vectors can be constructed
  in IR.

Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen

Reviewed By: rovka, sdesmalen

Differential Revision: https://reviews.llvm.org/D53137

llvm-svn: 374042
2019-10-08 12:53:54 +00:00
Nikola Prica 02682498b8 [ISEL][ARM][AARCH64] Tracking simple parameter forwarding registers
Support for tracking registers that forward function parameters into the
following function frame. For now we only support cases when parameter
is forwarded through single register.

Reviewers: aprantl, vsk, t.p.northover

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D66953

llvm-svn: 374033
2019-10-08 09:43:05 +00:00
Matt Arsenault f24ac13aaa TLI: Remove DAG argument from getRegisterByName
Replace with the MachineFunction. X86 is the only user, and only uses
it for the function. This removes one obstacle from using this in
GlobalISel. The other is the more tolerable EVT argument.

The X86 use of the function seems questionable to me. It checks hasFP,
before frame lowering.

llvm-svn: 373292
2019-10-01 01:44:39 +00:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
Hans Wennborg 3740ae3b8a Revert r372893 "[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets"
This caused severe compile-time regressions, see PR43455.

> Modern processors predict the targets of an indirect branch regardless of
> the size of any jump table used to glean its target address.  Moreover,
> branch predictors typically use resources limited by the number of actual
> targets that occur at run time.
>
> This patch changes the semantics of the option `-max-jump-table-size` to limit
> the number of different targets instead of the number of entries in a jump
> table.  Thus, it is now renamed to `-max-jump-table-targets`.
>
> Before, when `-max-jump-table-size` was specified, it could happen that
> cluster jump tables could have targets used repeatedly, but each one was
> counted and typically resulted in tables with the same number of entries.
> With this patch, when specifying `-max-jump-table-targets`, tables may have
> different lengths, since the number of unique targets is counted towards the
> limit, but the number of unique targets in tables is the same, but for the
> last one containing the balance of targets.
>
> Differential revision: https://reviews.llvm.org/D60295

llvm-svn: 373060
2019-09-27 09:54:26 +00:00
Evandro Menezes 3bd8ba156b [CodeGen] Replace -max-jump-table-size with -max-jump-table-targets
Modern processors predict the targets of an indirect branch regardless of
the size of any jump table used to glean its target address.  Moreover,
branch predictors typically use resources limited by the number of actual
targets that occur at run time.

This patch changes the semantics of the option `-max-jump-table-size` to limit
the number of different targets instead of the number of entries in a jump
table.  Thus, it is now renamed to `-max-jump-table-targets`.

Before, when `-max-jump-table-size` was specified, it could happen that
cluster jump tables could have targets used repeatedly, but each one was
counted and typically resulted in tables with the same number of entries.
With this patch, when specifying `-max-jump-table-targets`, tables may have
different lengths, since the number of unique targets is counted towards the
limit, but the number of unique targets in tables is the same, but for the
last one containing the balance of targets.

Differential revision: https://reviews.llvm.org/D60295

llvm-svn: 372893
2019-09-25 16:10:20 +00:00
Florian Hahn 364a23427b [AArch64] Convert neon_ushl and neon_sshl with positive constants to VSHL.
I think we should be able to use shl instead of sshl and ushl for
positive constant shift values, unless I am missing something.

We already have the machinery in place to ensure we only replace
nodes, if the shift value is positive and <= the element width.

This is a generalization of an earlier patch rL372565.

Reviewers: t.p.northover, samparker, dmgreen, anemet

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D67955

llvm-svn: 372824
2019-09-25 08:22:05 +00:00
Florian Hahn 3e2fdbee80 [AArch64] support neon_sshl and neon_ushl in performIntrinsicCombine.
Try to generate ushll/sshll for aarch64_neon_ushl/aarch64_neon_sshl,
if their first operand is extended and the second operand is a constant

Also adds a few tests marked with FIXME, where we can further increase
codegen.

Reviewers: t.p.northover, samparker, dmgreen, anemet

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D62308

llvm-svn: 372565
2019-09-23 09:38:53 +00:00
Benjamin Kramer df4b9a3f4f Hide implementation details in namespaces.
llvm-svn: 372113
2019-09-17 12:56:29 +00:00
Graham Hunter 1a9195d817 [SVE][MVT] Fixed-length vector MVT ranges
* Reordered MVT simple types to group scalable vector types
    together.
  * New range functions in MachineValueType.h to only iterate over
    the fixed-length int/fp vector types.
  * Stopped backends which don't support scalable vector types from
    iterating over scalable types.

Reviewers: sdesmalen, greened

Reviewed By: greened

Differential Revision: https://reviews.llvm.org/D66339

llvm-svn: 372099
2019-09-17 10:19:23 +00:00
Kerry McLaughlin e55b3bf40e [SVE][Inline-Asm] Add constraints for SVE predicate registers
Summary:
Adds the following inline asm constraints for SVE:
  - Upl: One of the low eight SVE predicate registers, P0 to P7 inclusive
  - Upa: SVE predicate register with full range, P0 to P15

Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, cameron.mcinally, greened, rengolin

Reviewed By: rovka

Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66524

llvm-svn: 371967
2019-09-16 09:45:27 +00:00
Tim Northover f1c2892912 AArch64: support arm64_32, an ILP32 slice for watchOS.
This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM.
FastISel is mostly disabled for now since it would generate incorrect code for
ILP32.

llvm-svn: 371722
2019-09-12 10:22:23 +00:00
Guillaume Chatelet ad1cea0dda [Alignment][NFC] Use Align with TargetLowering::setPrefFunctionAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67267

llvm-svn: 371212
2019-09-06 15:03:49 +00:00
Guillaume Chatelet 9fcf066d0c [Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67278

llvm-svn: 371210
2019-09-06 14:51:15 +00:00
Guillaume Chatelet 4fc3ad9e13 [Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67229

llvm-svn: 371200
2019-09-06 12:48:34 +00:00
Guillaume Chatelet aff45e4b23 [LLVM][Alignment] Make functions using log of alignment explicit
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:

 - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
 - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
 - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,

Reviewers: lattner, thegameg, courbet

Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65945

llvm-svn: 371045
2019-09-05 10:00:22 +00:00
Kerry McLaughlin 7b5c6b8d86 [SVE][Inline-Asm] Fix -Wimplicit-fallthrough in AArch64ISelLowering.cpp
Summary: Adds break to 'x' case in getRegForInlineAsmConstraint added by D66302, fixing the unintentional fallthrough.

Reviewers: sdesmalen, rovka, cameron.mcinally, greened, gribozavr, ruiu

Reviewed By: sdesmalen

Subscribers: bjope, javed.absar, tschuett, kristof.beyls, rkruppe, psnobl, llvm-commits, cfe-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67095

llvm-svn: 370769
2019-09-03 15:45:42 +00:00
Kerry McLaughlin da4ef9b4c8 [SVE][Inline-Asm] Support for SVE asm operands
Summary:
Adds the following inline asm constraints for SVE:
  - w: SVE vector register with full range, Z0 to Z31
  - x: Restricted to registers Z0 to Z15 inclusive.
  - y: Restricted to registers Z0 to Z7 inclusive.

This change also adds the "z" modifier to interpret a register as an SVE register.

Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness.

Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened

Reviewed By: sdesmalen

Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66302

llvm-svn: 370673
2019-09-02 16:12:31 +00:00
Shiva Chen b39876d8cd [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall
The patch fixed the issue that RV64 didn't clear the upper bits
when return complex floating value with lp64 ABI.

float _Complex
complex_add(float _Complex a, float _Complex b)
{
   return a + b;
}

RealResult = zero_extend(RealA + RealB)
ImageResult = ImageA + ImageB
Return (RealResult | (ImageResult << 32))

The patch introduces shouldExtendTypeInLibCall target hook to suppress
the AssertZext generation when lowering floating LibCall.

Thanks to Eli's comments from the Bugzilla
https://bugs.llvm.org/show_bug.cgi?id=42820

Differential Revision: https://reviews.llvm.org/D65497

llvm-svn: 370275
2019-08-28 23:40:37 +00:00
Hans Wennborg cff90f07cb [SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711)
Neither libgcc or compiler-rt are usually used on Windows, so these
functions can't be called.

Differential revision: https://reviews.llvm.org/D66880

llvm-svn: 370204
2019-08-28 13:55:10 +00:00
Tim Northover a7f226f9db AArch64: avoid creating cycle in DAG for post-increment NEON ops.
Inserting a value into Visited has the effect of terminating a search for
predecessors if that node is seen. This is legitimate for the base address, and
acts as a slight performance optimization, but the vector-building node can be
paert of a legitimate cycle so we shouldn't stop searching there.

PR43056.

llvm-svn: 370036
2019-08-27 10:21:11 +00:00
Simon Pilgrim c88408cf85 Use VT::getHalfNumVectorElementsVT helpers in a few places. NFCI.
llvm-svn: 369751
2019-08-23 12:37:09 +00:00
Shiva Chen 72a41e7b0d [TargetLowering] Remove optional arguments passing to makeLibCall
The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497.
The struct contain argument flags which will pass to makeLibCall function.
The patch should not has any functionality changes.

Differential Revision: https://reviews.llvm.org/D65795

llvm-svn: 369622
2019-08-22 04:59:43 +00:00
Eli Friedman b5eb3e1e82 [AArch64] Remove incorrect usage of MONonTemporal.
This has no effect at the moment, but might matter if we try to
implement non-temporal loads in the future.

llvm-svn: 368770
2019-08-13 23:12:14 +00:00
Daniel Sanders 5ae66e56cf [aarch64] Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Manual fixups in:
AArch64InstrInfo.cpp - genFusedMultiply() now takes a Register* instead of unsigned*
AArch64LoadStoreOptimizer.cpp - Ternary operator was ambiguous between Register/MCRegister. Settled on Register

Depends on D65919

Reviewers: aemerson

Subscribers: jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision for full review was: https://reviews.llvm.org/D65962

llvm-svn: 368628
2019-08-12 22:40:53 +00:00
Evandro Menezes a005c1ac4f [AArch64] Expand bcmp() for small block lengths
Patch D56593 by @courbet results in calls to `bcmp()` in some cases, should
the target support the it.  Unless `TTI::MemCmpExpansionOptions()`
is overridden by the target.

In a proprietary benchmark we see a performance drop of about 12% on PNG
compression before this patch, though it passes all tests.

This patch mirrors X86 for AArch64 and initializes
`TTI::MemCmpExpansionOptions()` to then expand calls to `bcmp()` when
appropriate.  No tuning of the parameters was performed, but, at this point,
it's enough to recover the performance drop above.

This problem also exists on ARM.  Once a consensus is reached for AArch64, we
can work to fix ARM as well.

Authors:
- Evandro Menezes (@evandro) <e.menezes@samsung.com>
- Brian Rzycki (@brzycki) <b.rzycki@samsung.com>

Differential revision: https://reviews.llvm.org/D64805

llvm-svn: 367898
2019-08-05 18:09:14 +00:00
Cullen Rhodes 2a48176373 [AArch64] Implement initial SVE calling convention support
Summary:

This patch adds initial support for the SVE calling convention such that
SVE types can be passed as arguments and return values to/from a
subroutine.

The SVE AAPCS states [1]:

    z0-z7 are used to pass scalable vector arguments to a subroutine,
    and to return scalable vector results from a function. If a
    subroutine takes arguments in scalable vector or predicate
    registers, or if it is a function that returns results in such
    registers, it must ensure that the entire contents of z8-z23 are
    preserved across the call. In other cases it need only preserve the
    low 64 bits of z8-z15, as described in §5.1.2.

    p0-p3 are used to pass scalable predicate arguments to a subroutine
    and to return scalable predicate results from a function. If a
    subroutine takes arguments in scalable vector or predicate
    registers, or if it is a function that returns results in these
    registers, it must ensure that p4-p15 are preserved across the call.
    In other cases it need not preserve any scalable predicate register
    contents.

SVE predicate and data registers are passed indirectly (i.e. spilled to the
stack and pass the address) if they exceed the registers used for argument
passing defined by the PCS referenced above.  Until SVE stack support is merged
we can't spill SVE registers to the stack, so currently an llvm_unreachable is
used where we will eventually handle this.

[1] https://static.docs.arm.com/100986/0000/100986_0000.pdf

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D65448

llvm-svn: 367859
2019-08-05 13:44:10 +00:00
Florian Hahn e3ea97b049 [AArch64] Skip isZIPMask check for masks with an odd number of elements.
We process 2 elements at a time and expect the number of elements to be
even. Similar to D60690.

Reviewers: dmgreen, samparker, t.p.northover

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D65400

llvm-svn: 367831
2019-08-05 11:12:23 +00:00