Commit Graph

308 Commits

Author SHA1 Message Date
David Green 3c88ff44c5 [AArch64] Remove unsued WideningBaseCost. NFC
The WideningBaseCost is always 0. This removes it to clean up the code.
2022-04-03 22:16:39 +01:00
Vasileios Porpodas 39aa202aff Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354

This reverts commit e6ead19b77.
2022-03-23 18:32:17 -07:00
Arthur Eubanks e6ead19b77 Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash."
This reverts commit 27bd8f9492.

Causes crashes, see comments in D121973
2022-03-23 10:57:45 -07:00
Vasileios Porpodas 27bd8f9492 Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354

This reverts commit f7d7d2a08d.
2022-03-22 16:41:55 -07:00
Arthur Eubanks f7d7d2a08d Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads.""
This reverts commit 79613185d3.

Causes crashes, see comments in https://reviews.llvm.org/D121973.
2022-03-22 13:33:49 -07:00
Vasileios Porpodas 79613185d3 Recommit "[SLP] Fix lookahead operand reordering for splat loads."
Original review: https://reviews.llvm.org/D121354

The original commit 9136145eb0 broke the build on several targets.

Differential Revision: https://reviews.llvm.org/D121973
2022-03-21 15:57:32 -07:00
Matt Devereau a9e08bc7c1 [AArch64][SVE] InstCombine llvm.aarch64.sve.sel to select
InstCombine llvm.aarch64.sve.sel to select. This allows an existing instCombine
added in 20b0fa91c9 to fire.

Differential Revision: https://reviews.llvm.org/D121792
2022-03-17 16:20:48 +00:00
Florian Hahn aa590e5823
[AArch64] Improve costs for some conversions to fp16.
Currently the cost model under-estimates the cost of certain
FP16 conversions.

This patch updates getCastInstrCost to return more accurate costs for
the cases improved in c2ed9fd054.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D113700
2022-03-11 10:27:39 +00:00
David Green 47f4cd9c3d [AArch64] Update costs for some fp16 converts
This updates the costs for FP16 converts, as some of them were pretty
high.

Differential Revision: https://reviews.llvm.org/D120771
2022-03-03 11:17:24 +00:00
David Green 65c0e45a37 [AArch64] Vector shifts cost 1
The costs of vector shifts was 2 as opposed to 1, as the nodes are
marked custom. Fix this like the others and mark the nodes as cheap.

Differential Revision: https://reviews.llvm.org/D120773
2022-03-03 10:42:57 +00:00
Sander de Smalen 0b41238ae7 [AArch64] Emit TBAA metadata for SVE load/store intrinsics
In Clang we can attach TBAA metadata based on the load/store intrinsics
based on the operation's element type.

This also contains changes to InstCombine where the AArch64-specific
intrinsics are transformed into generic LLVM load/store operations,
to ensure that all metadata is transferred to the new instruction.

There will be some further work after this patch to also emit TBAA
metadata for SVE's gather/scatter- and struct load/store intrinsics.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D119319
2022-02-11 09:00:29 +00:00
Nikita Popov 3196ef8ee2 [AArch64TargetTransformInfo] Avoid pointer element type access
Use the element type of the gathered/scattered vector instead.
2022-02-08 15:18:18 +01:00
Florian Hahn 17ebd68ae6
[AArch64] Fix costs of float vector compare/selects pairs.
The current cost-model overestimates the cost of vector compares &
selects for ordered floating point compares. This patch fixes that by
extending the existing logic for integer predicates.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118256
2022-01-31 10:18:29 +00:00
Alban Bridonneau 2feddb37b4 Implement correct cost for SVE bitcasts
We have some bitcasts which we know will be simplified,
so their cost is zero.

Reviewed By: david-arm, sdesmalen

Differential Revision: https://reviews.llvm.org/D118019
2022-01-26 14:25:44 +00:00
Matt Devereau cee8b255be [AArch64][SVE] Remove Redundant aarch64.sve.convert.to.svbool
Generated code resulted in redundant aarch64.sve.convert.to.svbool
calls for AArch64 Binary Operations. Narrow the more precise operands
instead of widening the less precise operands

Differential Revision: https://reviews.llvm.org/D116730
2022-01-17 14:35:24 +00:00
David Green 61888d97f6 [AArch64] Basic demand elements for some intrinsics
A lot of neon intrinsics work lane-wise, meaning that non-demanded
elements in and not demanded out. This teaches that to
AArch64TTIImpl::simplifyDemandedVectorEltsIntrinsic for some simple
single-input truncate intrinsics, which can help remove unnecessary
instructions.

Differential Revision: https://reviews.llvm.org/D117097
2022-01-13 11:53:12 +00:00
David Sherwood ef1ca4d3e9 [AArch64] Fix incorrect use of MVT::getVectorNumElements in AArch64TTIImpl::getVectorInstrCost
If we are inserting into or extracting from a scalable vector we do
not know the number of elements at runtime, so we can only let the
index wrap for fixed-length vectors.

Tests added here:

  Analysis/CostModel/AArch64/sve-insert-extract.ll

Differential Revision: https://reviews.llvm.org/D117099
2022-01-13 09:27:14 +00:00
David Green bc615e436c [AArch64] Update addo and subo costs
Similar to D116732, this adds basic scalar sadd_with_overflow,
uadd_with_overflow, ssub_with_overflow and usub_with_overflow costs for
aarch64, which are usually quite efficiently lowered.

Differential Revision: https://reviews.llvm.org/D116734
2022-01-07 16:20:23 +00:00
David Green c65270cf96 [AArch64] Add basic umulo and smulo costs
This adds some AArch64 specific smul_with_overflow and umul_with_overflow
costs, overriding the default costs. The code generation for these mul
with overflow intrinsics is usually better than the default expansion on
AArch64. The costs come from https://godbolt.org/z/zEzYhMWqo with various
types, or llvm/test/CodeGen/AArch64/arm64-xaluo.ll.

Differential Revision: https://reviews.llvm.org/D116732
2022-01-06 17:22:47 +00:00
Matthew Devereau e00f22c1b1 [AArch64][SVE] Teach cost model that masked loads/stores are cheap
Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.
2021-12-17 15:04:45 +00:00
Matt Devereau fb47725d14 [AArch64][SVE] Instcombine SDIV to ASRD
Instcombine SDIV to ASRD when the third operand of SDIV is a power of 2

Differential Revision: https://reviews.llvm.org/D115448
2021-12-14 15:58:28 +00:00
David Sherwood 8b0448ce5d [AArch64][Analysis] Add on overhead costs for SVE gathers and scatters
This patch adds on an overhead cost for gathers and scatters, which
is a rough estimate based on performance investigations I have
performed on SVE hardware for various micro-benchmarks.

Differential Revision: https://reviews.llvm.org/D115143
2021-12-09 16:02:59 +00:00
Paul Walker 01bc67e449 [SVE][InstCombine] Support more cases where ld1/st1 can be lowered to load/store instructions.
This patch extends the "is all active predicate" check to cover
cases where the predicate is casted but in a way that doesn't
change its "all active" status.

Differential Revision: https://reviews.llvm.org/D115047
2021-12-08 11:01:33 +00:00
Igor Kirillov 08d45e6f4d [AArch64][SVEIntrinsicOpts] Fix: predicated SVE mul/fmul are not commutative
We can not swap multiplicand and multiplier because the sve intrinsics
are predicated. Imagine lanes in vectors having the following values:
         pg = 0
         multiplicand = 1 (from dup)
         multiplier = 2
The resulting value should be 1, but if we swap multiplicand and multiplier it will become 2,
which is incorrect.

Differential Revision: https://reviews.llvm.org/D114577
2021-11-26 12:41:27 +00:00
Rosie Sumpter c2441b6b89 [LoopVectorize] Add vector reduction support for fmuladd intrinsic
Enables LoopVectorize to handle reduction patterns involving the
llvm.fmuladd intrinsic.

Differential Revision: https://reviews.llvm.org/D111555
2021-11-24 08:50:04 +00:00
Matt Devereau f526c600c0 [AArch64][SVE] Instcombine SVE LD1/ST1 to stock LLVM IR
InstCombine AArch64 LD1/ST1 to llvm.masked.load/llvm.masked.store
and LD1/ST1 to load/store when a ptrue all predicate pattern operand
is present.

This allows existing IR optimizations such as dead-load removal to
occur.

Differential Revision: https://reviews.llvm.org/D113489
2021-11-16 11:10:23 +00:00
Matt 4a59694ba1 [AArch64][SVE] Combine FADD and FMUL aarch64 intrinsics to FMLA
This is a refinement to the work in
https://reviews.llvm.org/D111638

Fold (fadd p a (fmul p b c)) into (fma p a b c)

Differential Revision: https://reviews.llvm.org/D113095
2021-11-08 12:22:38 +00:00
Peter Waller 7a34145f40 Reland "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store"
This reverts commit 753eba6421.

Contiguous gather => masked load:

  (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1))
  => (masked.load (gep BasePtr IndexBase) Align Mask undef)

Contiguous scatter => masked store:

  (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1))
  => (masked.store Value (gep BasePtr IndexBase) Align Mask)

Tests with <vscale x 2 x double>:

[Gather, Scatter] for each [Positive test (index=1), Negative test
(index=2), Alignment propagation].

Differential Revision: https://reviews.llvm.org/D112076
2021-11-03 13:42:14 +00:00
Peter Waller 753eba6421 Revert "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store"
This reverts commit 1febf42f03, which has
a use-of-uninitialized-memory bug.

See: https://reviews.llvm.org/D112076
2021-11-03 13:39:38 +00:00
Peter Waller 1febf42f03 [AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store
Contiguous gather => masked load:

  (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1))
  => (masked.load (gep BasePtr IndexBase) Align Mask undef)

Contiguous scatter => masked store:

  (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1))
  => (masked.store Value (gep BasePtr IndexBase) Align Mask)

Tests with <vscale x 2 x double>:

[Gather, Scatter] for each [Positive test (index=1), Negative test (index=2), Alignment propagation].

Differential Revision: https://reviews.llvm.org/D112076
2021-11-03 11:02:44 +00:00
Matt 895145aacb Revert "[AArch64][SVE] Combine predicated FMUL/FADD into FMA"
This reverts commit fc28a2f8ce.
2021-11-02 14:56:01 +00:00
Bradley Smith 13faa5f440 [AArch64][SVE] Generate SVE >1 element structured load/stores from fixed types
This adds support for SVE structured loads/stores to the relevant target
hooks, such that we can support these instructions in the InterleavedAccess
pass.

Depends on D112078

Differential Revision: https://reviews.llvm.org/D112303
2021-10-29 09:35:57 +00:00
Matt fc28a2f8ce [AArch64][SVE] Combine predicated FMUL/FADD into FMA
Combine FADD and FMUL intrinsics into FMA when the result of the FMUL is an FADD operand
with one only use and both use the same predicate.

Differential Revision: https://reviews.llvm.org/D111638
2021-10-27 11:41:23 +00:00
David Sherwood 9448cdc900 [SVE][Analysis] Tune the cost model according to the tune-cpu attribute
This patch introduces a new function:

  AArch64Subtarget::getVScaleForTuning

that returns a value for vscale that can be used for tuning the cost
model when using scalable vectors. The VScaleForTuning option in
AArch64Subtarget is initialised according to the following rules:

1. If the user has specified the CPU to tune for we use that, else
2. If the target CPU was specified we use that, else
3. The tuning is set to "generic".

For CPUs of type "generic" I have assumed that vscale=2.

New tests added here:

  Analysis/CostModel/AArch64/sve-gather.ll
  Analysis/CostModel/AArch64/sve-scatter.ll
  Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

Differential Revision: https://reviews.llvm.org/D110259
2021-10-21 09:33:50 +01:00
David Sherwood 26b7d9d622 [LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns
This patch adds further support for vectorisation of loops that involve
selecting an integer value based on a previous comparison. Consider the
following C++ loop:

  int r = a;
  for (int i = 0; i < n; i++) {
    if (src[i] > 3) {
      r = b;
    }
    src[i] += 2;
  }

We should be able to vectorise this loop because all we are doing is
selecting between two states - 'a' and 'b' - both of which are loop
invariant. This just involves building a vector of values that contain
either 'a' or 'b', where the final reduced value will be 'b' if any lane
contains 'b'.

The IR generated by clang typically looks like this:

  %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ]
  ...
  %pred = icmp ugt i32 %val, i32 3
  %phi.update = select i1 %pred, i32 %b, i32 %phi

We already detect min/max patterns, which also involve a select + cmp.
However, with the min/max patterns we are selecting loaded values (and
hence loop variant) in the loop. In addition we only support certain
cmp predicates. This patch adds a new pattern matching function
(isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp.
We only support selecting values that are integer and loop invariant,
however we can support any kind of compare - integer or float.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
  Transforms/LoopVectorize/select-cmp-predicated.ll
  Transforms/LoopVectorize/select-cmp.ll

Differential Revision: https://reviews.llvm.org/D108136
2021-10-11 09:41:38 +01:00
Matthew Devereau 2ac1999937 [AArch64][SVE] Propagate math flags from intrinsics to instructions
Retain floating-point math flags inside instCombineSVEVectorBinOp
2021-10-05 15:39:13 +01:00
Kazu Hirata c1e32b3fc0 [Target] Migrate from getNumArgOperands to arg_size (NFC)
Note that getNumArgOperands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-10-02 12:06:29 -07:00
Matthew Devereau f085a9db8b [AArch64][SVE] Replace fmul, fadd and fsub LLVM IR instrinsics with LLVM IR binary ops
Replacing fmul and fadd instrinsics with their binary ops results
more succinct AArch64 SVE output, e.g.:

4:   65428041 	fmul	z1.h, p0/m, z1.h, z2.h
8:   65408020 	fadd	z0.h, p0/m, z0.h, z1.h
->
4:   65620020   fmla    z0.h, p0/m, z1.h, z2.h
2021-10-01 11:24:46 +01:00
Krasimir Georgiev 685f1bfd0a Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns"
It appears to cause stage2 clang build failures, e.g.,
https://lab.llvm.org/buildbot/#/builders/74/builds/7145.

This reverts commit 1fb37334bd.
2021-10-01 11:39:43 +02:00
David Sherwood 1fb37334bd [LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns
This patch adds further support for vectorisation of loops that involve
selecting an integer value based on a previous comparison. Consider the
following C++ loop:

  int r = a;
  for (int i = 0; i < n; i++) {
    if (src[i] > 3) {
      r = b;
    }
    src[i] += 2;
  }

We should be able to vectorise this loop because all we are doing is
selecting between two states - 'a' and 'b' - both of which are loop
invariant. This just involves building a vector of values that contain
either 'a' or 'b', where the final reduced value will be 'b' if any lane
contains 'b'.

The IR generated by clang typically looks like this:

  %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ]
  ...
  %pred = icmp ugt i32 %val, i32 3
  %phi.update = select i1 %pred, i32 %b, i32 %phi

We already detect min/max patterns, which also involve a select + cmp.
However, with the min/max patterns we are selecting loaded values (and
hence loop variant) in the loop. In addition we only support certain
cmp predicates. This patch adds a new pattern matching function
(isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp.
We only support selecting values that are integer and loop invariant,
however we can support any kind of compare - integer or float.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
  Transforms/LoopVectorize/select-cmp-predicated.ll
  Transforms/LoopVectorize/select-cmp.ll

Differential Revision: https://reviews.llvm.org/D108136
2021-10-01 08:41:03 +01:00
Simon Pilgrim 676f2809b5 [CostModel][AArch64] Don't dereference CostTblEntry before null check.
Fix static analysis warning that we check for null Entry after dereferencing it.

I don't think this can actually happen as i8/i16 should legalize to use the i32 path which should return a cost - but I'd rather play it safe that rely on an implicit type legalization.
2021-09-29 16:35:29 +01:00
Usman Nadeem 3b12282b0e [AArch64][SVE][InstCombine] Eliminate redundant chains of tuple get/set
Differential Revision: https://reviews.llvm.org/D109667

Change-Id: I06a3c28e3658ecda109a3a1b73265828274ab2ea
2021-09-22 20:59:46 -07:00
David Spickett 92c9b28347 Revert "[AArch64][SVE] Teach cost model that masked loads/stores are cheap"
This reverts commit 734708e04f.

Due to build failures on the 2 stage SVE VLS bot.
https://lab.llvm.org/buildbot/#/builders/176/builds/908/steps/11/logs/stdio
2021-09-20 08:45:18 +00:00
Usman Nadeem 757384abff [AArch64][SVE][InstCombine] Fold redundant zip1/2(uzp1/2) operations
zip1(uzp1(A, B), uzp2(A, B)) --> A
    zip2(uzp1(A, B), uzp2(A, B)) --> B

Differential Revision: https://reviews.llvm.org/D109666

Change-Id: I4a6578db2fcef9ff71ad0e77b9fe08354e6dbfcd
2021-09-17 15:24:46 -07:00
Usman Nadeem ab111e982f Revert "Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation""
This reverts commit eee7d225de.
Effectively relanding 98c37247d8
after fixing the failing tests.

Change-Id: I5d7461aeb820a2d5f1895457d824a8de4d316ee5
2021-09-10 18:11:24 -07:00
Usman Nadeem eee7d225de Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation"
This reverts commit 98c37247d8.
2021-09-10 13:01:48 -07:00
Usman Nadeem 98c37247d8 [AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation
Differential Revision: https://reviews.llvm.org/D109118

Change-Id: I47adc1984a54bea02bf5a0a767b765afe7e16aa3
2021-09-10 12:52:14 -07:00
David Sherwood d581d94385 [SVE] Fix the FP arithmetic instruction costs for SVE
Several FP instructions (fadd, fsub, etc.) were incorrectly assigned
a higher cost for SVE because they have custom lowering, however we
know they are legal. This patch explicitly assigns a cost of 2 to
these opcodes.

Tests added here:

  Analysis/CostModel/AArch64/arith-fp-sve.ll

Differential Revision: https://reviews.llvm.org/D108993
2021-09-02 09:55:13 +01:00
Nikita Popov c1b7540645 [TTI] Sink IVDescriptors.h include (NFC)
Forward declare RecurrenceDescriptor and include IVDescritor.h
only in implementation code that actually needs it.
2021-08-30 22:41:58 +02:00
Jun Ma 8c47103491 [AArch64][SVE] Add API for conversion between SVE predicate pattern and element number. NFC
This patch solely moves convert operation between SVE predicate pattern
and element number into two small functions. It's pre-commit patch for optimize
pture with known sve register width.

Differential Revision: https://reviews.llvm.org/D108705
2021-08-27 20:03:48 +08:00
Matthew Devereau 9b830c798e [AArch64][SVE] Teach cost model masked gathers/scatters are cheap
Tell the cost model to use the scalable calculation for non-neon fixed vector.
This results in a cheaper cost for fixed-length SVE masked gathers/scatters
allowing the vectorizor to emit them more frequently.
2021-08-26 11:17:47 +01:00
Jingu Kang 94c4952951 [AArch64] Enable Upper bound unrolling universally
Differential Revision: https://reviews.llvm.org/D105996
2021-08-20 11:25:38 +01:00
Matthew Devereau 734708e04f [AArch64][SVE] Teach cost model that masked loads/stores are cheap
Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.
2021-08-19 13:01:33 +01:00
David Sherwood 219d4518fc [Analysis][AArch64] Make fixed-width ordered reductions slightly more expensive
For tight loops like this:

  float r = 0;
  for (int i = 0; i < n; i++) {
    r += a[i];
  }

it's better not to vectorise at -O3 using fixed-width ordered reductions
on AArch64 targets. Although the resulting number of instructions in the
generated code ends up being comparable to not vectorising at all, there
may be additional costs on some CPUs, for example perhaps the scheduling
is worse. It makes sense to deter vectorisation in tight loops.

Differential Revision: https://reviews.llvm.org/D108292
2021-08-18 17:01:56 +01:00
Dylan Fleming ef198cd99e [SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute
Removed AArch64 usage of the getMaxVScale interface, replacing it with
the vscale_range(min, max) IR Attribute.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D106277
2021-08-17 14:42:47 +01:00
Usman Nadeem 5420fc4a27 [AArch64][SVE][InstCombine] Unpack of a splat vector -> Scalar extend
Replace vector unpack operation with a scalar extend operation.
  unpack(splat(X)) --> splat(extend(X))

If we have both, unpkhi and unpklo, for the same vector then we may
save a register in some cases, e.g:
  Hi = unpkhi (splat(X))
  Lo = unpklo(splat(X))
   --> Hi = Lo = splat(extend(X))

Differential Revision: https://reviews.llvm.org/D106929

Change-Id: I77c5c201131e3a50de1cdccbdcf84420f5b2244b
2021-08-09 14:58:54 -07:00
Usman Nadeem 85bbc05154 [AArch64][SVE][InstCombine] Move last{a,b} before binop if one operand is a splat value
Move the last{a,b} operation to the vector operand of the binary instruction if
the binop's operand is a splat value. This essentially converts the binop
to a scalar operation.

Example:
    // If x and/or y is a splat value:
    lastX (binop (x, y)) --> binop(lastX(x), lastX(y))

Differential Revision: https://reviews.llvm.org/D106932

Change-Id: I93ff5302f9a7972405ee0d3854cf115f072e99c0
2021-08-09 14:48:41 -07:00
David Green 649cf4514d [AArch64] Expand the SVE min/max reduction costs to NEON
This takes the existing SVE costing for the various min/max reduction
intrinsics and expands it to NEON, where I believe it applies equally
well.

In the process it changes the lowering to use min/max cost, as opposed
to summing up the cost of ICmp+Select.

Differential Revision: https://reviews.llvm.org/D106239
2021-08-05 23:23:24 +01:00
Roman Lebedev 6f6e9a867f
[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107271
2021-08-03 00:57:26 +03:00
Irina Dobrescu b01417d3c5 [AArch64] Optimise min/max lowering in ISel
Differential Revision: https://reviews.llvm.org/D106561
2021-08-02 13:40:21 +01:00
David Sherwood 0aff1798b5 [Analysis] Add simple cost model for strict (in-order) reductions
I have added a new FastMathFlags parameter to getArithmeticReductionCost
to indicate what type of reduction we are performing:

  1. Tree-wise. This is the typical fast-math reduction that involves
  continually splitting a vector up into halves and adding each
  half together until we get a scalar result. This is the default
  behaviour for integers, whereas for floating point we only do this
  if reassociation is allowed.
  2. Ordered. This now allows us to estimate the cost of performing
  a strict vector reduction by treating it as a series of scalar
  operations in lane order. This is the case when FP reassociation
  is not permitted. For scalable vectors this is more difficult
  because at compile time we do not know how many lanes there are,
  and so we use the worst case maximum vscale value.

I have also fixed getTypeBasedIntrinsicInstrCost to pass in the
FastMathFlags, which meant fixing up some X86 tests where we always
assumed the vector.reduce.fadd/mul intrinsics were 'fast'.

New tests have been added here:

  Analysis/CostModel/AArch64/reduce-fadd.ll
  Analysis/CostModel/AArch64/sve-intrinsics.ll
  Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
  Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

Differential Revision: https://reviews.llvm.org/D105432
2021-07-26 10:26:06 +01:00
David Green 38986c6782 [AArch64] Add worst case shuffle costs
This adds some missing single source shuffle costs for AArch64, of i16
and i8 vectors. v4i16 are the same as v4i32 with a worse case cost of 3
coming from the perfect shuffle tables. The larger vector sizes expand
into a constant pool, plus a load (and adrp) and a tbl. I arbitrarily
chose 8 for the cost to be expensive but not too expensive.

Differential Revision: https://reviews.llvm.org/D106241
2021-07-23 09:01:58 +01:00
David Green c9cebda772 [AArch64] Adjust the cost of integer sum reductions
This changes the cost to (LT.first-1) * cost(add) + 2, where the cost of
an add is assumed to be 1. This brings it inline with the other
reductions.

Differential Revision: https://reviews.llvm.org/D106240
2021-07-22 18:19:54 +01:00
Bradley Smith 191f9fa5d2 [AArch64][SVE] Move instcombine like transforms out of SVEIntrinsicOpts
Instead move them to the instcombine that happens in AArch64TargetTransformInfo.

Differential Revision: https://reviews.llvm.org/D106144
2021-07-20 14:17:30 +00:00
Sander de Smalen eb1a5120b8 [AArch64][SVE][InstCombine] last{a,b} of a splat vector
Replace last{a,b}(splat(X)) with X, irrespective of the predicate.

Patch by/Committing on behalf of: Usman Nadeem (mnadeem)

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105520
2021-07-20 09:44:43 +01:00
Sander de Smalen eac1670739 [CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> invalid.
At the moment, <vscale x 1 x eltty> are not yet fully handled by the
code-generator, so to avoid vectorizing loops with that VF, we mark the
cost for these types as invalid.
The reason for not adding a new "TTI::getMinimumScalableVF" is because
the type is supposed to be a type that can be legalized. It partially is,
although the support for these types need some more work.

Reviewed By: paulwalker-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D103882
2021-07-14 16:44:22 +01:00
David Green 38c9a4068d [TTI] Remove IsPairwiseForm from getArithmeticReductionCost
This patch removes the IsPairwiseForm flag from the Reduction Cost TTI
hooks, along with some accompanying code for pattern matching reductions
from trees starting at extract elements. IsPairWise is now assumed to be
false, which was the predominant way that the value was used from both
the Loop and SLP vectorizers. Since the adjustments such as D93860, the
SLP vectorizer has not relied upon this distinction between paiwise and
non-pairwise reductions.

This also removes some code that was detecting reductions trees starting
from extract elements inside the costmodel. This case was
double-counting costs though, adding the individual costs on the
individual instruction _and_ the total cost of the reduction. Removing
it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to
not double count. The cost of reduction intrinsics is still tested
through the various tests in
llvm/test/Analysis/CostModel/X86/reduce-xyz.ll.

Differential Revision: https://reviews.llvm.org/D105484
2021-07-09 11:51:16 +01:00
Kerry McLaughlin a7512401e5 [LV] Prevent vectorization with unsupported element types.
This patch adds a TTI function, isElementTypeLegalForScalableVector, to query
whether it is possible to vectorize a given element type. This is called by
isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if
any of the instruction types in the loop are unsupported, e.g:

  int foo(__int128_t* ptr, int N)
    #pragma clang loop vectorize_width(4, scalable)
    for (int i=0; i<N; ++i)
      ptr[i] = ptr[i] + 42;

This example currently crashes if we attempt to vectorize since i128 is not a
supported type for scalable vectorization.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D102253
2021-07-06 13:06:21 +01:00
Caroline Concatto a2c5c56055 [AArch64][CostModel] Add cost model for experimental.vector.splice
This patch adds a new  ShuffleKind SK_Splice and then handle the cost in
getShuffleCost, as in experimental.vector.reverse.

Differential Revision: https://reviews.llvm.org/D104630
2021-07-05 14:30:24 +01:00
Sjoerd Meijer ee752134ac [AArch64] Cost-model i8 vector loads/stores
Loads of <4 x i8> vectors were modeled as extremely expensive. And while we
don't have a load instruction that supports this, it isn't that expensive to
create a vector of i8 elements. The codegen for this was fixed/optimised in
D105110. This now tweaks the cost model and enables SLP vectorisation of my
motivating case loadi8.ll.

Differential Revision: https://reviews.llvm.org/D103629
2021-07-05 11:25:10 +01:00
Jun Ma ae5433945f [AArch64][SVEIntrinsicOpts] Convect cntb/h/w/d to vscale intrinsic or constant.
As is mentioned above

Differential Revision: https://reviews.llvm.org/D104852
2021-07-01 10:09:47 +08:00
Rosie Sumpter 0c4651f0a8 [CostModel][AArch64] Improve cost model for vector reduction intrinsics
OR, XOR and AND entries are added to the cost table. An extra cost
is added when vector splitting occurs.

This is done to address the issue of a missed SLP vectorization
opportunity due to unreasonably high costs being attributed to the vector
Or reduction (see: https://bugs.llvm.org/show_bug.cgi?id=44593).

Differential Revision: https://reviews.llvm.org/D104538
2021-06-24 12:02:58 +01:00
Rosie Sumpter d7c219a506 [CostModel][AArch64] Improve the cost estimate of CTPOP intrinsic
Added a case for CTPOP to AArch64TTIImpl::getIntrinsicInstrCost so that
the cost estimate matches the codegen in
test/CodeGen/AArch64/arm64-vpopcnt.ll

Differential Revision: https://reviews.llvm.org/D103952
2021-06-11 11:15:46 +01:00
Simon Pilgrim 5e6bfb661e [Analysis] Pass RecurrenceDescriptor as const reference. NFCI.
We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.).

Differential Revision: https://reviews.llvm.org/D104029
2021-06-11 10:24:14 +01:00
Benjamin Kramer 3dceffd0fd [AArch64] Silence fallthrough warning. NFC.
AArch64TargetTransformInfo.cpp:302:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
  default:
    ^
2021-06-10 17:23:37 +02:00
Irina Dobrescu de79919e9e [AArch64] Add cost tests for bitreverse
This patch includes cost tests for bit reverse as well as some adjustments to the cost model.

Differential Revision: https://reviews.llvm.org/D102755
2021-06-10 14:51:33 +01:00
Kerry McLaughlin 5db52751a5 [CostModel] Return an invalid cost for memory ops with unsupported types
Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector
type cannot be legalized by widening/splitting. When this is the method of legalization
found, getTypeLegalizationCost will return an Invalid cost.

The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call
getTypeLegalizationCost and will now also return an Invalid cost for unsupported types.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D102515
2021-06-08 12:07:36 +01:00
Bradley Smith 60c9b5f35c [AArch64][SVE] Improve codegen for dupq SVE ACLE intrinsics
Use llvm.experimental.vector.insert instead of storing into an alloca
when generating code for these intrinsics. This defers the codegen of
the generated vector to instruction selection, allowing existing
shufflevector style optimizations to apply.

Additionally, introduce a new target transform that can recognise fixed
predicate patterns in the svbool variants of these intrinsics.

Differential Revision: https://reviews.llvm.org/D103082
2021-06-07 12:21:38 +01:00
Nicholas Guy 3043cbc436 [AArch64] Further enable UnrollAndJam
Due to the dependency on runtime unrolling, UnJ is only
enabled by default on in-order scheduling models,
and if a cpu is specified through -mcpu.

Differential Revision: https://reviews.llvm.org/D103604
2021-06-04 14:18:49 +01:00
Daniil Fukalov e1cb98be2d [TTI] NFC: Change getCostOfKeepingLiveOverCall to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D102831
2021-05-21 15:18:12 +03:00
Peter Waller 2d574a1104 [CodeGen][AArch64][SVE] Canonicalize intrinsic rdffr{ => _z}
Follow up to D101357 / 3fa6510f6.
Supersedes D102330.

Goal: Use flags setting rdffrs instead of rdffr + ptest.

Problem: RDFFR_P doesn't have have a flags setting equivalent.

Solution: in instcombine, canonicalize to RDFFR_PP at the IR level, and
rely on RDFFR_PP+PTEST => RDFFRS_PP optimization in
AArch64InstrInfo::optimizePTestInstr.

While here:

* Test that rdffr.z+ptest generates a rdffrs.
* Use update_{test,llc}_checks.py on the tests.
* Use sve attribute on functions.

Differential Revision: https://reviews.llvm.org/D102623
2021-05-20 16:22:50 +00:00
Caroline Concatto 9199b6535d [CostModel][AArch64] Add missing costs for getShuffleCost with scalable vectors
Differential Revision: https://reviews.llvm.org/D102490
2021-05-20 09:08:31 +01:00
Daniil Fukalov 3489c2d7b1 [TTI] NFC: Change getTypeLegalizationCost to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen, kparzysz

Differential Revision: https://reviews.llvm.org/D101533
2021-04-30 22:51:51 +03:00
Alexey Bataev 12c51f2358 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 12:48:00 -07:00
Alexey Bataev 6e859f3cd4 Revert "[COST] Improve shuffle kind detection if shuffle mask is provided."
This reverts commit 9239932221 to fix
a compiler crash on mask checks.
2021-04-29 12:40:33 -07:00
Alexey Bataev 9239932221 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 09:42:56 -07:00
Bradley Smith 89085bcc86 [AArch64][SVE] Convert svdup(vec, SV_VL1, elm) to insertelement(vec, elm, 0)
By converting the SVE intrinsic to a normal LLVM insertelement we give
the code generator a better chance to remove transitions between GPRs
and VPRs

Co-authored-by: Paul Walker <paul.walker@arm.com>

Depends on D101302

Differential Revision: https://reviews.llvm.org/D101167
2021-04-29 12:17:42 +01:00
Bradley Smith c8f20ed448 [AArch64][SVE] Move convert.{from,to}.svbool optimization into InstCombine
As part of this the ptrue coalescing done in SVEIntrinsicOpts has been
modified to not introduce redundant converts, since the convert removal
will no longer run after that optimisation to clean up.

Differential Revision: https://reviews.llvm.org/D101302
2021-04-29 12:17:42 +01:00
Nicholas Guy 2b6e0c90f9 [AArch64] Enable runtime unrolling for in-order sched models
Differential Revision: https://reviews.llvm.org/D97947
2021-04-27 13:22:10 +01:00
David Sherwood a458b7855e [AArch64] Add AArch64TTIImpl::getMaskedMemoryOpCost function
When vectorising for AArch64 targets if you specify the SVE attribute
we automatically then treat masked loads and stores as legal. Also,
since we have no cost model for masked memory ops we believe it's
cheap to use the masked load/store intrinsics even for fixed width
vectors. This can lead to poor code quality as the intrinsics will
currently be scalarised in the backend. This patch adds a basic
cost model that marks fixed-width masked memory ops as significantly
more expensive than for scalable vectors.

Tests for the cost model are added here:

  Transforms/LoopVectorize/AArch64/masked-op-cost.ll

Differential Revision: https://reviews.llvm.org/D100745
2021-04-26 11:00:03 +01:00
Sander de Smalen f9a50f04ba [TTI] NFC: Change getIntImmCost[Inst|Intrin] to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100565
2021-04-23 16:06:36 +01:00
Sander de Smalen e0edfa052f [TTI] NFC: Change getAddressComputationCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100561
2021-04-23 16:06:35 +01:00
David Sherwood 57ca65e21e [AArch64] Add instruction costs for FP_TO_UINT and FP_TO_SINT with half types
We were missing some instruction costs when converting vectors of
floating point half types into integers, so I've added those here.
I also manually generated assembly code for each FP->int case and
looked at the number of instructions generated, which meant
adjusting some of the existing costs too.

I've updated an existing test to reflect the new costs:

  Analysis/CostModel/AArch64/sve-fptoi.ll

Differential Revision: https://reviews.llvm.org/D99935
2021-04-21 09:39:45 +01:00
Alexey Bataev 673e2f1b70 [COST][AARCH64] Improve cost of reverse shuffles for AArch64.
Introduced the cost of thre reverse shuffles for AArch64, currently just
copied the costs for PermuteSingleSrc.

Differential Revision: https://reviews.llvm.org/D100871
2021-04-20 13:47:56 -07:00
Joe Ellis c91cd4f3bb [AArch64][SVE][InstCombine] Replace last{a,b} intrinsics with extracts...
when the predicate used by last{a,b} specifies a known vector length.

For example:
  aarch64_sve_lasta(VL1, D) -> extractelement(D, #1)
  aarch64_sve_lastb(VL1, D) -> extractelement(D, #0)

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D100476
2021-04-20 10:01:33 +00:00
Florian Hahn acd9cc7495
[AArch64] Use type-legalization cost for code size memop cost.
At the moment, getMemoryOpCost returns 1 for all inputs if CostKind is
CodeSize or SizeAndLatency. This fools LoopUnroll into thinking memory
operations on large vectors have a cost of one, even if they will get
expanded to a large number of memory operations in the backend.

This patch updates getMemoryOpCost to return the cost for the type
legalization for both CodeSize and SizeAndLatency. This should more
accurately reflect the number of memory operations required.

I am not sure how latency should properly be included in SizeAndLatency
from the description, but returning the size cost should be clearly more
accurate.

This does not cause any binary changes when building
MultiSource/SPEC2000/SPEC2006 with -O3 -flto for AArch64, likely because
large vector memops are not really formed by code emitted from Clang.
But using the C/C++ matrix extension can easily result in code with very
large vector operations directly from Clang, e.g.
https://clang.godbolt.org/z/6xzxcTGvb

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D100291
2021-04-15 10:11:05 +01:00
Sander de Smalen 4f42d873c2 [TTI] NFC: Change getArithmeticInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100317
2021-04-14 17:20:36 +01:00
Sander de Smalen 1af35e77f4 [TTI] NFC: Change getVectorInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100315
2021-04-14 17:20:35 +01:00
Sander de Smalen 174e8f6c5e [TTI] NFC: Change getShuffleCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100314
2021-04-14 17:20:35 +01:00
Sander de Smalen 14b934f8a6 [TTI] NFC: Change getCFInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D100313
2021-04-14 17:20:34 +01:00