Commit Graph

1167 Commits

Author SHA1 Message Date
Sjoerd Meijer 51e434fc25 [AArch64] Custom lower <4 x i8> loads
This custom lowers <4 x i8> vector loads using a 32-bit load, followed by 2
SSHLL instructions to extend it to e.g. a <4 x i32> vector. Before, it was
really inefficient and expensive to construct a <4 x i32> for this as 4 byte
loads and 4 moves were used. With this improvement SLP vectorisation might for
example become profitable, see D103629.

Differential Revision: https://reviews.llvm.org/D104782
2021-06-25 09:53:51 +01:00
Martin Storsjö 42f74e8249 [llvm] Rename StringRef _lower() method calls to _insensitive()
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
2021-06-25 00:22:01 +03:00
Sander de Smalen d5e14ba88c [GlobalISel] NFC: Change LLT::vector to take ElementCount.
This also adds new interfaces for the fixed- and scalable case:
* LLT::fixed_vector
* LLT::scalable_vector

The strategy for migrating to the new interfaces was as follows:
* If the new LLT is a (modified) clone of another LLT, taking the
  same number of elements, then use LLT::vector(OtherTy.getElementCount())
  or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2)
  or operator*. That is because there is no reason to specifically restrict
  the types to 'fixed_vector'.
* If the algorithm works on the number of elements (as unsigned), then
  just use fixed_vector. This will need to be fixed up in the future when
  modifying the algorithm to also work for scalable vectors, and will need
  then need additional tests to confirm the behaviour works the same for
  scalable vectors.
* If the test used the '/*Scalable=*/true` flag of LLT::vector, then
  this is replaced by LLT::scalable_vector.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104451
2021-06-24 11:26:12 +01:00
David Spickett e4ecd83fe9 [llvm][AArch64] Handle arrays of struct properly (from IR)
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.

This fixes https://bugs.llvm.org/show_bug.cgi?id=46996

One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)

This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
  ret [ 1 x %T1 ] zeroinitializer
}
```

We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.

This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.

You get this:
```
  renamable $d0 = FMOVD0
  $w0 = COPY killed renamable $d0
```

Rather than this:
```
  $d0 = FMOVD0
  $w0 = COPY $wzr
```

The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.

This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.

Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.

New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D104123
2021-06-16 13:56:01 +00:00
Huihui Zhang 1c096bf09f [SVE][LSR] Teach LSR to enable simple scaled-index addressing mode generation for SVE.
Currently, Loop strengh reduce is not handling loops with scalable stride very well.

Take loop vectorized with scalable vector type <vscale x 8 x i16> for instance,
(refer to test/CodeGen/AArch64/sve-lsr-scaled-index-addressing-mode.ll added).

Memory accesses are incremented by "16*vscale", while induction variable is incremented
by "8*vscale". The scaling factor "2" needs to be extracted to build candidate formula
i.e., "reg(%in) + 2*reg({0,+,(8 * %vscale)}". So that addrec register reg({0,+,(8*vscale)})
can be reused among Address and ICmpZero LSRUses to enable optimal solution selection.

This patch allow LSR getExactSDiv to recognize special cases like "C1*X*Y /s C2*X*Y",
and pull out "C1 /s C2" as scaling factor whenever possible. Without this change, LSR
is missing candidate formula with proper scaled factor to leverage target scaled-index
addressing mode.

Note: This patch doesn't fully fix AArch64 isLegalAddressingMode for scalable
vector. But allow simple valid scale to pass through.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D103939
2021-06-14 16:42:34 -07:00
Jingu Kang 08ce52ef5e [AArch64] Improve SAD pattern
Given a vecreduce_add node, detect the below pattern and convert it to the node
sequence with UABDL, [S|U]ADB and UADDLP.

i32 vecreduce_add(
 v16i32 abs(
   v16i32 sub(
    v16i32 [sign|zero]_extend(v16i8 a), v16i32 [sign|zero]_extend(v16i8 b))))
=================>
i32 vecreduce_add(
  v4i32 UADDLP(
    v8i16 add(
      v8i16 zext(
        v8i8 [S|U]ABD low8:v16i8 a, low8:v16i8 b
      v8i16 zext(
        v8i8 [S|U]ABD high8:v16i8 a, high8:v16i8 b

Differential Revision: https://reviews.llvm.org/D104042
2021-06-14 15:48:51 +01:00
Nikita Popov 1ffa6499ea [TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC)
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.

Differential Revision: https://reviews.llvm.org/D103759
2021-06-06 16:29:50 +02:00
David Green 12f53e5392 [AArch64] Remove AArch64ISD::NEG
This NEG node is just a vector negation, easily represented as a SUB
zero. Removing it from the one place it is generated is essentially an
NFC, but can allow some extra folding. The updated tests are now loading
different constant literals, which have already been negated.

Differential Revision: https://reviews.llvm.org/D103703
2021-06-05 19:54:42 +01:00
Bradley Smith a85f5874e2 [AArch64] Remove SETCC of CSEL when the latter's condition can be inverted
setcc (csel 0, 1, cond, X), 1, ne ==> csel 0, 1, !cond, X

Where X is a condition code setting instruction.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D103256
2021-06-04 15:53:21 +01:00
Irina Dobrescu e971099a9b [AArch64] Optimise bitreverse lowering in ISel
Differential Revision: https://reviews.llvm.org/D103105
2021-06-02 12:51:12 +01:00
Jessica Paquette e7f501b5e7 [GlobalISel][AArch64] Combine and (lshr x, cst), mask -> ubfx x, cst, width
Also add a target hook which allows us to get around custom legalization on
AArch64.

Differential Revision: https://reviews.llvm.org/D99283
2021-06-01 10:56:17 -07:00
Eli Friedman 0b3b0a727a [AArch64][RISCV] Make sure isel correctly honors failure orderings.
If a cmpxchg specifies acquire or seq_cst on failure, make sure we
generate code consistent with that ordering even if the success ordering
is not acquire/seq_cst.

At one point, it was ambiguous whether this sort of construct was valid,
but the C++ standad and LLVM now accept arbitrary combinations of
success/failure orderings.

This doesn't address the corresponding issue in AtomicExpand. (This was
reported as https://bugs.llvm.org/show_bug.cgi?id=33332 .)

Fixes https://bugs.llvm.org/show_bug.cgi?id=50512.

Differential Revision: https://reviews.llvm.org/D103284
2021-05-28 12:47:40 -07:00
Tim Northover 9ff2eb1ea5 SwiftTailCC: teach verifier musttail rules applicable to this CC.
SwiftTailCC has a different set of requirements than the C calling convention
for a tail call. The exact argument sequence doesn't have to match, but fewer
ABI-affecting attributes are allowed.

Also make sure the musttail diagnostic triggers if a musttail call isn't
actually a tail call.
2021-05-28 11:12:00 +01:00
Bradley Smith f3c577ed38 [AArch64][SVE] Add fixed length codegen for FP_TO_{S,U}INT/{S,U}INT_TO_FP
Depends on D102607

Differential Revision: https://reviews.llvm.org/D102777
2021-05-25 12:54:55 +01:00
Bradley Smith e40513252a [AArch64][SVE] Add fixed length codegen for FP_ROUND/FP_EXTEND
Depends on D102498

Differential Revision: https://reviews.llvm.org/D102607
2021-05-24 13:02:30 +01:00
Bradley Smith 4bc14be259 [AArch64][SVE] Improve codegen for fixed length vector concat
Differential Revision: https://reviews.llvm.org/D102498
2021-05-24 12:56:02 +01:00
David Truby bf3b6cf920 [llvm][sve] Lowering for VLS MLOAD/MSTORE
This adds custom lowering for the MLOAD and MSTORE ISD nodes when
passed fixed length vectors in SVE. This is done by converting the
vectors to VLA vectors and using the VLA code generation.

Fixed length extending loads and truncating stores currently produce
correct code, but do not use the built in extend/truncate in the
load and store instructions. This will be fixed in a future patch.

Differential Revision: https://reviews.llvm.org/D101834
2021-05-20 10:50:59 +00:00
Andrew Savonichev a647100b43 [AArch64] Combine vector shift instructions in SelectionDAG
bswap.v2i16 + sitofp in LLVM IR generate a sequence of:

  - REV32 + USHR for bswap.v2i16
  - SHL + SSHR + SCVTF for sext to v2i32 and scvt

The shift instructions are excessive as noted in PR24820, and they can
be optimized to just SSHR.

Differential Revision: https://reviews.llvm.org/D102333
2021-05-20 10:50:13 +03:00
Eli Friedman 3dd49ec194 [AArch64][SVE] Implement extractelement of i1 vectors.
The implementation just extends the vector to a larger element type, and
extracts from that.  Not fancy, but generates reasonable code.

There was discussion in the review of doing the promotion in
target-independent code, but I'm sticking with this to avoid making
LegalizeDAG infrastructure more complicated.

Differential Revision: https://reviews.llvm.org/D87651
2021-05-17 14:51:11 -07:00
Irina Dobrescu 50511df32e [AArch64] Lower bitreverse in ISel
Adding lowering support for bitreverse.

Previously, lowering bitreverse would expand it into a series of other instructions. This patch makes it so this produces a single rbit instruction instead.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D102397
2021-05-17 13:35:27 +01:00
Tim Northover 82a0e808bb IR/AArch64/X86: add "swifttailcc" calling convention.
Swift's new concurrency features are going to require guaranteed tail calls so
that they don't consume excessive amounts of stack space. This would normally
mean "tailcc", but there are also Swift-specific ABI desires that don't
naturally go along with "tailcc" so this adds another calling convention that's
the combination of "swiftcc" and "tailcc".

Support is added for AArch64 and X86 for now.
2021-05-17 10:48:34 +01:00
Jacob Bramley 900c898994 [AArch64] Lower fpto*i.sat intrinsics.
AArch64's fctv* instructions implement the saturating behaviour that the
fpto*i.sat intrinsics require, in cases where the destination width
matches the saturation width. Lowering them removes a lot of unnecessary
generated code.

Only scalar lowerings are supported for now.

Differential Revision: https://reviews.llvm.org/D102353
2021-05-17 10:19:19 +01:00
Bradley Smith 90ffcb1245 [AArch64][SVE] Add unpredicated vector BIC ISD node
Addition of this node allows us to better utilize the different forms of
the SVE BIC instructions, including using the alias to an AND (immediate).

Differential Revision: https://reviews.llvm.org/D101831
2021-05-14 16:12:13 +01:00
Tim Northover ea0eec69f1 IR+AArch64: add a "swiftasync" argument attribute.
This extends any frame record created in the function to include that
parameter, passed in X22.

The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:

  * 0b0000 => a normal, non-extended record with just [FP, LR]
  * 0b0001 => the extended record [X22, FP, LR]
  * 0b1111 => kernel space, and a non-extended record.

All other values are currently reserved.

If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.

There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
2021-05-14 11:43:58 +01:00
Peter Waller 6e6f9a636b [AArch64][SVE] Improve sve.convert.to.svbool lowering
The sve.convert.to.svbool lowering has the effect of widening a logical
<M x i1> vector representing lanes into a physical <16 x i1> vector
representing bits in a predicate register.

In general, if converting to svbool, the contents of lanes in the
physical register might not be known. For sve.convert.to.svbool the new
lanes are specified to be zeroed, requiring 'and' instructions to mask
off the new lanes. For lanes coming from a ptrue or a comparison,
however, they are known to be zero.

CodeGen Before:
  ptrue p0.s, vl16
  ptrue p1.s
  ptrue p2.b
  and   p0.b, p2/z, p0.b, p1.b
  ret

After:
  ptrue	p0.s, vl16
  ret

Differential Revision: https://reviews.llvm.org/D101544
2021-05-12 10:57:25 +01:00
Bradley Smith 635164b95a [AArch64][SVE] Improve SVE codegen for fixed length BITCAST
Expanding a fixed length operation involves wrapping the operation in an
insert/extract subvector pair, as such, when this is done to bitcast we
end up with an extract_subvector of a bitcast. DAGCombine tries to
convert this into a bitcast of an extract_subvector which restores the
initial fixed length bitcast, causing an infinite loop of legalization.

As part of this patch, we must make sure the above DAGCombine does not
trigger after legalization if the created bitcast would not be legal.

Differential Revision: https://reviews.llvm.org/D101990
2021-05-10 14:43:53 +01:00
Bradley Smith 65c89cd1a6 [AArch64][SVE] Better utilisation of unpredicated forms of remaining intrinsics
When using predicated intrinsics, if the predicate used is all lanes active,
use an unpredicated form of the instruction, additionally this allows for
better use of immediate forms.

This only includes instructions where the unpredicated/predicated forms
matched in such a way that instruction selection would not introduce extra
ptrue instructions. This allows us to convert the intrinsics directly to
architecture independent ISD nodes.

Depends on D101062

Differential Revision: https://reviews.llvm.org/D101828
2021-05-10 13:06:02 +01:00
Bradley Smith f8f953c2a6 [AArch64][SVE] Better utilisation of unpredicated forms of arithmetic intrinsics
When using predicated arithmetic intrinsics, if the predicate used is all
lanes active, use an unpredicated form of the instruction, additionally
this allows for better use of immediate forms.

This also includes a new complex isel pattern which allows matching an
all active predicate when the types are different but the predicate is a
superset of the type being used. For example, to allow a b8 ptrue for a
b32 predicate operand.

This only includes instructions where the unpredicated/predicated forms
are mismatched between variants, meaning that the removal of the
predicate is done during instruction selection in order to prevent
spurious re-introductions of ptrue instructions.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D101062
2021-05-10 13:05:37 +01:00
Sander de Smalen 407a33889d [AArch64][SVE] Fix isel failure for FP-extending loads
DAGCombiner tries to combine a (fpext (load)) to (fround (extload))
but SVE has no FP-extending loads. By marking these as expand,
the combine no longer happens.

This also fixes a similar issue for fptrunc, where the source type
is not a legal type.

Reviewed By: bsmith, kmclaughlin

Differential Revision: https://reviews.llvm.org/D102053
2021-05-10 11:27:38 +01:00
Jun Ma b3aeb13892 [AArch64][SVE] Remove index_vector node.
Since index_vector is lowered into step_vector in D100816, we can just remove
index_vector, use step_vector for codegen directly.

Differential Revision: https://reviews.llvm.org/D101593
2021-05-10 11:08:58 +08:00
Simon Pilgrim 280aa3415e [DAG] Add a generic expansion for SHIFT_PARTS opcodes using funnel shifts
Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts.

Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication.

I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to).

NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly.

Differential Revision: https://reviews.llvm.org/D101987
2021-05-07 13:12:30 +01:00
Bradley Smith 9f37980d45 [AArch64][SVE] Fold insert(zero, extract(X, 0), 0) -> X, when X is known to zero lanes 1-N
Specifically, this allow us to rely on the lane zero'ing behaviour of
SVE reduce instructions.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D101369
2021-05-04 15:05:05 +01:00
David Green 966435daf9 [AArch64] Fold CSEL x, x, cc -> x
This can come up in rare situations, where a csel is created with
identical operands. These can be folded simply to the original value,
allowing the csel to be removed and further simplification to happen.

This patch also removes FCSEL as it is unused, not being produced
anywhere or lowered to anything.

Differential Revision: https://reviews.llvm.org/D101687
2021-05-03 17:34:05 +01:00
LemonBoy 4751cadcca [AArch64] Prevent spilling between ldxr/stxr pairs
Apply the same logic used to check if CMPXCHG nodes should be expanded
at -O0: the register allocator may end up spilling some register in
between the atomic load/store pairs, breaking the atomicity and possibly
stalling the execution.

Fixes PR48017

Reviewed By: efriedman

Differential Revision: https://reviews.llvm.org/D101163
2021-05-01 17:17:05 +02:00
Eli Friedman 6e6ae6c727 [AArch64] Fix lowering for fshl/fshr with SVE types.
These operations don't exist natively, so just let the
target-independent code expand to plain shifts.

The generated sequences could probably be optimized a bit more, but
they seem good enough for now.

Differential Revision: https://reviews.llvm.org/D101574
2021-04-30 10:51:25 -07:00
Jun Ma b310dd1501 [AArch64][SVE] Lower index_vector to step_vector
As discussed in D100107, this patch first convert index_vector to
step_vector, and convert step_vector back to index_vector after LegalizeDAG.

Differential Revision: https://reviews.llvm.org/D100816
2021-04-30 19:04:39 +08:00
Joe Ellis 1eb81f8309 [AArch64] Add missing UINT_TO_FP promotions for v16i8
Differential Revision: https://reviews.llvm.org/D101042
2021-04-28 08:49:15 +00:00
Sander de Smalen 43ace8b5ce [TTI] NFC: Change getScalingFactorCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100564
2021-04-23 16:06:36 +01:00
Joe Ellis c19c0ad681 [AArch64][SVE] Fix bug in lowering of fixed-length integer vector divides
The function AArch64TargetLowering::LowerFixedLengthVectorIntDivideToSVE
previously assumed the operands were full vectors, but this is not
always true. This function would produce bogus if the division operands
are not full vectors, resulting in miscompiles when dividing 8-bit or
16-bit vectors.

The fix is to perform an extend + div + truncate for non-full vectors,
instead of the usual unpacking and unzipping logic. This is an additive
change which reduces the non-full integer vector divisions to a pattern
recognised by the existing lowering logic.

For future reference, an example of code that would miscompile before
this patch is below:

     1  int8_t foo(unsigned N, int8_t *a, int8_t *b, int8_t *c) {
     2      int8_t result = 0;
     3      for (int i = 0; i < N; ++i) {
     4          result += (a[i] / b[i]) / c[i];
     5      }
     6      return result;
     7  }

Differential Revision: https://reviews.llvm.org/D100370
2021-04-23 14:55:10 +00:00
David Green c0bf5929ee [AArch64] Improve vector reverse lowering
This improves the lowering of v8i16 and v16i8 vector reverse shuffles.
Instead of going via a generic tbl it uses a rev64; ext pair, as already
happens for v4i32.

Differential Revision: https://reviews.llvm.org/D100882
2021-04-22 21:01:25 +01:00
Joe Ellis 528ee161c9 [AArch64] Block tryCombineToBSL combines for vectors wider than NEON
There are no patterns for the AArch64ISD::BSP ISD node for anything
other than NEON vectors at the moment. As a result, if we hit these
combines for vectors wider than a NEON vector (such as what we might get
with fixed length SVE) we will fail to lower.

This patch simply prevents us from attempting the combines if the input
vector type is too wide.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D100961
2021-04-22 15:09:13 +00:00
Martin Storsjö 8000e1f578 [AArch64] Fix calling windows varargs with floats in fixed args from non-windows functions
When inspecting the calling convention, for calling windows functions
from a non-windows function, inspect the calling convention of
the called function, not the caller.

Also remove an unnecessary parameter to AArch64CallLowering
OutgoingArgHandler.

Differential Revision: https://reviews.llvm.org/D100890
2021-04-22 12:02:49 +03:00
Caroline Concatto ca9b7e2e2f [AArch64][SVE] Fix crash with icmp+select
This patch changes the lowering of SELECT_CC from Legal to Expand for scalable
vector and adds support for scalable vectors in performSelectCombine.

When selecting the nodes to lower in visitSELECT it checks if it is possible to
use SELECT_CC in cases where SETCC is followed by SELECT. visistSELECT checks
if SELECT_CC is legal or custom to replace SELECT by SELECT_CC.
SELECT_CC used to be legal for scalable vector, so the node changes to
SELECT_CC. This used to crash the compiler as there is no support for SELECT_CC
with scalable vectors. So now the compiler lowers to VSELECT instead of
SELECT_CC.

Differential Revision: https://reviews.llvm.org/D100485
2021-04-21 14:16:27 +01:00
Bradley Smith b8b075d8d7 [AArch64][SVE] Lower MULHU/MULHS nodes to umulh/smulh instructions
Mark MULHS/MULHU nodes as legal for both scalable and fixed SVE types,
and lower them to the appropriate SVE instructions.

Additionally now that the MULH nodes are legal, integer divides can be
expanded into a more performant code sequence.

Differential Revision: https://reviews.llvm.org/D100487
2021-04-20 15:18:06 +01:00
Bradley Smith 22c017f0f9 [AArch64][NEON] Match (or (and -a b) (and (a+1) b)) => bit select
With this patch vbslq_f32(vnegq_s32(a), b, c) lowers to a BIT instruction.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D100304
2021-04-15 13:52:47 +01:00
Martin Storsjö 5144f730a8 [AArch64] Fix windows vararg functions with floats in the fixed args
On Windows, float arguments are normally passed in float registers
in the calling convention for regular functions. For variable
argument functions, floats are passed in integer registers. This
already was done correctly since many years.

However, the surprising bit was that floats among the fixed arguments
also are supposed to be passed in integer registers, contrary to regular
functions. (This also seems to be the behaviour on ARM though, both
on Windows, but also on e.g. hardfloat linux.)

In the calling convention, don't promote shorter floats to f64, but
convert them to integers of the same length. (Floats passed as part of
the actual variable arguments are promoted to double already on the
C/Clang level; the LLVM vararg calling convention doesn't do any
extra promotion of f32 to f64 - this matches how it works on X86 too.)

Technically, this is an ABI break compared to older LLVM versions,
but it fixes compatibility with the official platform ABI. (In practice,
floats among the fixed arguments in variable argument functions is
a pretty rare construct.)

Differential Revision: https://reviews.llvm.org/D100365
2021-04-15 11:02:14 +03:00
David Sherwood 1206313f82 [CodeGen][AArch64] Fix isel crash for truncating FP stores
When attempting to truncate a FP vector and store the result out
to memory we crashed because we had no pattern for truncating FP
stores. In fact, we don't support these types of stores and the
correct fix is to stop marking these truncating stores as legal.

Tests have been added here:

  CodeGen/AArch64/sve-fptrunc-store.ll

Differential Revision: https://reviews.llvm.org/D100025
2021-04-08 13:21:29 +01:00
Jun Ma 274ac9d40e [AArch64][SVE] Lowering sve.dot to DOT node
Differential Revision: https://reviews.llvm.org/D99699
2021-04-02 20:05:17 +08:00
Bradley Smith 2f45e632c0 [AArch64][SVE] Improve codegen for select nodes with fixed types
Additionally, move the existing fixed vselect tests to *-vselect.ll.

Differential Revision: https://reviews.llvm.org/D99418
2021-04-01 15:54:37 +01:00
Bradley Smith 0934fa4f5d [AArch64][SVE] SVE functions should use the SVE calling convention for fast calls
When an SVE function calls another SVE function using the C calling
convention we use the more efficient SVE VectorCall PCS.  However,
for the Fast calling convention we're incorrectly falling back to
the generic AArch64 PCS.

This patch adds the same "can use SVE vector calling convention"
detection used by CallingConv::C to CallingConv::Fast.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D99657
2021-04-01 15:52:08 +01:00
Sander de Smalen 7108b2dec1 [SVE] Fix LoopVectorizer test scalalable-call.ll
This marks FSIN and other operations to EXPAND for scalable
vectors, so that they are not assumed to be legal by the cost-model.

Depends on D97470

Reviewed By: dmgreen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D97471
2021-03-31 14:52:49 +01:00
Florian Hahn 52e015081a
[AArch64] Avoid SCALAR_TO_VECTOR for single FP constant vector.
Currently the code only checks for integer constants (ConstantSDNode)
and triggers an infinite cycle for single-element floating point
vector constants. We need to check for both FP and integer constants.

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D99384
2021-03-31 10:17:36 +01:00
Joe Ellis a7dde4c5f7 [AArch64][SVE] Lower fixed length INSERT_VECTOR_ELT
Differential Revision: https://reviews.llvm.org/D98496
2021-03-30 09:37:11 +00:00
Joe Ellis c4d39f64d0 [AArch64][SVE] Lower fixed length EXTRACT_VECTOR_ELT
Differential Revision: https://reviews.llvm.org/D98625
2021-03-30 09:35:44 +00:00
Florian Hahn 482283042f
[AArch64] Remove custom zext/sext legalization code.
Currently performExtendCombine assumes that the src-element bitwidth * 2
is a valid MVT. But this is not the case for i1 and it causes a crash on
the v64i1 test cases added in this patch.

It turns out that this code appears to not be needed; the same patterns are
handled by other code and we end up with the same results, even without the
custom lowering. I also added additional test cases in a50037aaa6.

Let's just remove the unneeded code.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D99437
2021-03-29 22:22:05 +01:00
Bradley Smith 9745dce8c3 [SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps
This is currently performed in SelectionDAGLegalize, here we make it also
happen in LegalizeVectorOps, allowing a target to lower the SETCC condition
codes first in LegalizeVectorOps and then lower to a custom node afterwards,
without having to duplicate all of the SETCC condition legalization in the
target specific lowering.

As a result of this, fixed length floating point SETCC nodes can now be
properly lowered for SVE.

Differential Revision: https://reviews.llvm.org/D98939
2021-03-29 15:32:25 +01:00
Florian Hahn eb3d9f2eb6
[SelDag] Add isIntOrFPConstant helper function.
This patch adds a new isIntOrFPConstant  helper function to check if a
SDValue is a integer of FP constant. This pattern is used in various
places.

There also are places that incorrectly just check for integer constants,
e.g. D99384, so hopefully this helper will help people avoid that issue.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D99428
2021-03-28 12:48:58 +01:00
Nashe Mncube ac2a1e9596 [SVE] Suppress vselect warning from incorrect interface call
The VSelectCombine handler within AArch64ISelLowering,
uses an interface call which only expects fixed vectors.
This generates a warning when the call is made on a
scalable vector. This warning has been suppressed with this change,
by using the ElementCount interface, which supports both fixed and scalable vectors.
I have also added a regression test which recreates the warning.

Differential Revision: https://reviews.llvm.org/D98249
2021-03-24 14:34:34 +00:00
Benjamin Kramer 39e36fff3d [AArch64] Fix unused variable warning 2021-03-23 13:42:14 +01:00
David Sherwood 748ae5281d [IR][SVE] Add new llvm.experimental.stepvector intrinsic
This patch adds a new llvm.experimental.stepvector intrinsic,
which takes no arguments and returns a linear integer sequence of
values of the form <0, 1, ...>. It is primarily intended for
scalable vectors, although it will work for fixed width vectors
too. It is intended that later patches will make use of this
new intrinsic when vectorising induction variables, currently only
supported for fixed width. I've added a new CreateStepVector
method to the IRBuilder, which will generate a call to this
intrinsic for scalable vectors and fall back on creating a
ConstantVector for fixed width.

For scalable vectors this intrinsic is lowered to a new ISD node
called STEP_VECTOR, which takes a single constant integer argument
as the step. During lowering this argument is set to a value of 1.
The reason for this additional argument at the codegen level is
because in future patches we will introduce various generic DAG
combines such as

  mul step_vector(1), 2 -> step_vector(2)
  add step_vector(1), step_vector(1) -> step_vector(2)
  shl step_vector(1), 1 -> step_vector(2)
  etc.

that encourage a canonical format for all targets. This hopefully
means all other targets supporting scalable vectors can benefit
from this too.

I've added cost model tests for both fixed width and scalable
vectors:

  llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll
  llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll

as well as codegen lowering tests for fixed width and scalable
vectors:

  llvm/test/CodeGen/AArch64/neon-stepvector.ll
  llvm/test/CodeGen/AArch64/sve-stepvector.ll

See this thread for discussion of the intrinsic:
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html
2021-03-23 10:43:35 +00:00
Craig Topper 1066dcb550 [AArch64] Fix LowerMGATHER to return the chain result for floating point gathers.
Found by adding asserts to LegalizeDAG to make sure custom legalized
results had the right types.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D98968
2021-03-19 11:53:46 -07:00
Peter Waller 0d6482a76a [llvm][AArch64][SVE] Lower fixed length vector fabs
Seemingly striaghtforward.

Differential Revision: https://reviews.llvm.org/D98434
2021-03-18 17:20:08 +00:00
Sjoerd Meijer 90ecb862a0 [AArch64] Rewrite (add, csel) to cinc
Don't rewrite an add instruction with 2 SET_CC operands into a csel
instruction. The total instruction sequence uses an extra instruction and
register. Preventing this allows us to match a `(add, csel)` pattern and
rewrite this into a `cinc`.

Differential Revision: https://reviews.llvm.org/D98704
2021-03-18 08:49:27 +00:00
Bradley Smith cf0da91ba5 [AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE
Previously NEON used a target specific intrinsic for frintn, given that
the FROUNDEVEN ISD node now exists, move over to that instead and add
codegen support for that node for both NEON and fixed length SVE.

Differential Revision: https://reviews.llvm.org/D98487
2021-03-17 11:41:22 +00:00
Joe Ellis ff2dd8a212 [AArch64][SVE] Fold vector ZExt/SExt into gather loads where possible
This commit folds sxtw'd or uxtw'd offsets into gather loads where
possible with a DAGCombine optimization.

As an example, the following code:

     1	#include <arm_sve.h>
     2
     3	svuint64_t func(svbool_t pred, const int32_t *base, svint64_t offsets) {
     4	  return svld1sw_gather_s64offset_u64(
     5	    pred, base, svextw_s64_x(pred, offsets)
     6	  );
     7	}

would previously lower to the following assembly:

    sxtw	z0.d, p0/m, z0.d
    ld1sw	{ z0.d }, p0/z, [x0, z0.d]
    ret

but now lowers to:

    ld1sw   { z0.d }, p0/z, [x0, z0.d, sxtw]
    ret

Differential Revision: https://reviews.llvm.org/D97858
2021-03-16 15:09:46 +00:00
Stelios Ioannou ab86edbc88 [AArch64] Implement __rndr, __rndrrs intrinsics
This patch implements the __rndr and __rndrrs intrinsics to provide access to the random
number instructions introduced in Armv8.5-A. They are only defined for the AArch64
execution state and are available when __ARM_FEATURE_RNG is defined.

These intrinsics store the random number in their pointer argument and return a status
code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter
intrinsic reseeds the random number generator.

The instructions write the NZCV flags indicating the success of the operation that we can
then read with a CSET.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] https://bugs.llvm.org/show_bug.cgi?id=47838

Differential Revision: https://reviews.llvm.org/D98264

Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc
2021-03-15 17:51:48 +00:00
Bradley Smith 860ae9d50c [AArch64][SVE] Add fixed/scalable lowering of FMAXIMUM/FMINIMUM ISD nodes
Differential Revision: https://reviews.llvm.org/D98348
2021-03-11 13:37:47 +00:00
David Green 1a808286ef [AArch64] Extend vecreduce -> udot handling to mla reductions
We previously have lowering for:
  vecreduce.add(zext(X)) to vecreduce.add(UDOT(zero, X, one))
This extends that to also handle:
  vecreduce.add(mul(zext(X), zext(Y)) to vecreduce.add(UDOT(zero, X, Y))
It extends the existing code to optionally handle a mul with equal
extends.

Differential Revision: https://reviews.llvm.org/D97280
2021-03-10 22:25:12 +00:00
David Green a02f506876 [AArch64] Extend vecreduce -> udot handling to v8i8
https://reviews.llvm.org/D88577 added v16i8 vecreduce to udot/sdot
lowering. This extends that to v8i8 too, generalizing the pattern to
handle the extra types.

Differential Revision: https://reviews.llvm.org/D97279
2021-03-10 21:03:15 +00:00
Cullen Rhodes 2750f3ed31 [IR] Introduce llvm.experimental.vector.splice intrinsic
This patch introduces a new intrinsic @llvm.experimental.vector.splice
that constructs a vector of the same type as the two input vectors,
based on a immediate where the sign of the immediate distinguishes two
variants. A positive immediate specifies an index into the first vector
and a negative immediate specifies the number of trailing elements to
extract from the first vector.

For example:

  @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E>  ; index
  @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing element count

These intrinsics support both fixed and scalable vectors, where the
former is lowered to a shufflevector to maintain existing behaviour,
although while marked as experimental the recommended way to express
this operation for fixed-width vectors is to use shufflevector. For
scalable vectors where it is not possible to express a shufflevector
mask for this operation, a new ISD node has been implemented.

This is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker and Cullen Rhodes.

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D94708
2021-03-09 10:44:22 +00:00
Akira Hatanaka dca5737945 Move ObjCARCUtil.h back to llvm/Analysis
Instead of adding the header to llvm/IR, just duplicate the marker
string in the auto upgrader.
2021-03-08 16:35:24 -08:00
LemonBoy 8725b24c6d [AArch64] Legalize horizontal fmax/fmin reductions on f16 vectors
Expand the horizontal reduction during the instruction selection phase, but only if the target doesn't support the full fp16 instruction set.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49401

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D97840
2021-03-05 16:09:37 +01:00
David Blaikie a2a55def35 Move llvm/Analysis/ObjCARCUtil.h to IR to fix layering.
This is included from IR files, and IR doesn't/can't depend on Analysis
(because Analysis depends on IR).

Also fix the implementation - don't use non-member static in headers, as
it leads to ODR violations, inaccurate "unused function" warnings, etc.
And fix the header protection macro name (we don't generally include
"LIB" in the names, so far as I can tell).
2021-03-04 16:14:53 -08:00
Akira Hatanaka 1900503595 [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.

Original commit message:

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-03-04 11:22:30 -08:00
Hans Wennborg 0a5dd06718 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.

> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
>   which indicates the call is implicitly followed by a marker
>   instruction and an implicit retainRV/claimRV call that consumes the
>   call result. In addition, it emits a call to
>   @llvm.objc.clang.arc.noop.use, which consumes the call result, to
>   prevent the middle-end passes from changing the return type of the
>   called function. This is currently done only when the target is arm64
>   and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
>   with the operand bundle in the IR and removes the inserted calls after
>   processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
>   operand bundle. It doesn't remove the operand bundle on the call since
>   the backend needs it to emit the marker instruction. The retainRV and
>   claimRV calls are emitted late in the pipeline to prevent optimization
>   passes from transforming the IR in a way that makes it harder for the
>   ARC middle-end passes to figure out the def-use relationship between
>   the call and the retainRV/claimRV calls (which is the cause of
>   PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
>   nothing in the callee prevents it from being paired up with the
>   retainRV/claimRV call in the caller. It then inserts a release call if
>   claimRV is attached to the call since autoreleaseRV+claimRV is
>   equivalent to a release. If it cannot find an autoreleaseRV call, it
>   tries to transfer the operand bundle to a function call in the callee.
>   This is important since the ARC optimizer can remove the autoreleaseRV
>   returning the callee result, which makes it impossible to pair it up
>   with the retainRV/claimRV call in the caller. If that fails, it simply
>   emits a retain call in the IR if retainRV is attached to the call and
>   does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
>   constant value if the call has the operand bundle. This ensures the
>   call always has at least one user (the call to
>   @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
>   multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
>   calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808

This reverts commit ed4718eccb.
2021-03-03 15:51:40 +01:00
David Green 7abf7dd5ef [AArch64] Add combine for add(udot(0, x, y), z) -> udot(z, x, y).
Given a zero input for a udot, an add can be folded in to take the place
of the input, using thte addition that the instruction naturally
performs.

Differential Revision: https://reviews.llvm.org/D97188
2021-03-01 12:53:34 +00:00
Fraser Cormack 6718fda6ad [CodeGen] Fix issues with subvector intrinsic index types
This patch addresses issues arising from the fact that the index type
used for subvector insertion/extraction is inconsistent between the
intrinsics and SDNodes. The intrinsic forms require i64 whereas the
SDNodes use the type returned by SelectionDAG::getVectorIdxTy.

Rather than update the intrinsic definitions to use an overloaded index
type, this patch fixes the issue by transforming the index to the
correct type as required. Any loss of index bits going from i64 to a
smaller type is unexpected, and will be caught by an assertion in
SelectionDAG::getVectorIdxConstant.

The patch also updates the documentation for INSERT_SUBVECTOR and adds
an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR.
This necessitated changes to AArch64 which was using i64 for
EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed
its codegen after updating the backend accordingly.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D97459
2021-03-01 10:28:21 +00:00
David Green f51b3de4e8 [AArch64] Introduce UDOT/SDOT DAG nodes
This is used to lower UDOT/SDOT instructions, as opposed to relying on
the intrinsic. Subsequent optimizations will be able to optimize them
more cleanly based on these nodes.
2021-02-23 20:31:01 +00:00
Serge Pavlov 2c4f60e45b [FPEnv][AArch64] Implement lowering of llvm.set.rounding
Differential Revision: https://reviews.llvm.org/D96836
2021-02-19 13:16:51 +07:00
Bradley Smith 8bad8a43c3 [AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD
Adjust generateFMAsInMachineCombiner to return false if SVE is present
in order to combine fmul+fadd into fma. Also add new pseudo instructions
so as to select the most appropriate of FMLA/FMAD depending on register
allocation.

Depends on D96599

Differential Revision: https://reviews.llvm.org/D96424
2021-02-18 16:55:16 +00:00
Bradley Smith 5b094bfeb3 [AArch64] Allow folding FMUL/FADD into FMA for FP16 types
isFMAFasterThanFMulAndFAdd should return true for FP16 types when
HasFullFP16 is present, since we have the instructions to handle it for
both SVE and NEON. (SVE patterns and tests will follow).

Differential Revision: https://reviews.llvm.org/D96599
2021-02-18 16:51:22 +00:00
Fraser Cormack 0176fecfbc [SVE][CodeGen] Expand SVE MULH[SU] and [SU]MUL_LOHI nodes
This patch fixes a codegen crash introduced in fde2466171, where the
DAGCombiner started generating optimized MULH[SU] or [SU]MUL_LOHI nodes
unless the target opted out. The AArch64 backend cannot currently select
any of these nodes, so ensure that they are not generated in the first
place.

This issue was raised by @huihuiz in D94501.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D96849
2021-02-18 10:06:24 +00:00
Florian Hahn 211147c5ba
[AArch64] Convert CMP/SELECT sign patterns to OR & ASR.
ICMP & SELECT patterns extracting the sign of a value can be simplified
to OR & ASR (see  https://alive2.llvm.org/ce/z/Xx4iZ0).

This does not save any instructions in IR, but it is profitable on
AArch64, because we need at least 2 extra instructions to materialize 1
and -1 for the SELECT.

The improvements result in ~5% speedups on loops of the form

    static int sign_of(int x) {
      if (x < 0) return -1;
      return 1;
    }

    void foo(const int *x, int *res, int cnt) {
      for (int i=0;i<cnt;i++)
        res[i] = sign_of(x[i]);
    }

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D96596
2021-02-16 17:17:34 +00:00
Caroline Concatto 2d728bbff5 [CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse
This patch adds  a new intrinsic experimental.vector.reduce that takes a single
vector and returns a vector of matching type but with the original lane order
 reversed. For example:

```
vector.reverse(<A,B,C,D>) ==> <D,C,B,A>
```

The new intrinsic supports fixed and scalable vectors types.
The fixed-width vector relies on shufflevector to maintain existing behaviour.
Scalable vector uses the new ISD node - VECTOR_REVERSE.

This new intrinsic is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker (@paulwalker-arm).

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Differential Revision: https://reviews.llvm.org/D94883
2021-02-15 13:39:43 +00:00
Akira Hatanaka ed4718eccb [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-12 09:51:57 -08:00
Nico Weber de1966e542 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 4a64d8fe39.
Makes clang crash when buildling trivial iOS programs, see comment
after https://reviews.llvm.org/D92808#2551401
2021-02-09 11:06:32 -05:00
Akira Hatanaka 4a64d8fe39 [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.

Original commit message:

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 06:09:42 -08:00
Akira Hatanaka 2fbbb18c1d Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 3fe3946d9a.

The commit violates layering by including a header from Analysis in
lib/IR/AutoUpgrade.cpp.
2021-02-05 06:00:05 -08:00
Akira Hatanaka 3fe3946d9a [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 05:55:18 -08:00
Craig Topper 11ef356d9e [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
Kerry McLaughlin 9b4fcfaa9e [SVE][CodeGen] Remove performMaskedGatherScatterCombine
The AArch64 DAG combine added by D90945 & D91433 extends the index
of a scalable masked gather or scatter to i32 if necessary.

This patch removes the combine and instead adds shouldExtendGSIndex, which
is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether
the index should be extended before calling getMaskedGather/Scatter.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D94525
2021-02-01 14:10:00 +00:00
Kazu Hirata 7925aa091d [llvm] Populate SmallVector at construction time (NFC) 2021-01-28 22:21:14 -08:00
Richard Smith 925ae8c790 Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV"
This reverts commit 53176c1680, which
introduceed a layering violation. LLVM's IR library can't include
headers from Analysis.
2021-01-25 13:53:38 -08:00
Akira Hatanaka 53176c1680 [ObjC][ARC] Annotate calls with attributes instead of emitting retainRV
or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end annotates calls with attribute "clang.arc.rv"="retain"
  or "clang.arc.rv"="claim", which indicates the call is implicitly
  followed by a marker instruction and a retainRV/claimRV call that
  consumes the call result. This is currently done only when the target
  is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the
  annotated calls in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the annotated
  calls. It doesn't remove the attribute on the call since the backend
  needs it to emit the marker instruction. The retainRV/claimRV calls
  are emitted late in the pipeline to prevent optimization passes from
  transforming the IR in a way that makes it harder for the ARC
  middle-end passes to figure out the def-use relationship between the
  call and the retainRV/claimRV calls (which is the cause of PR31925).

- The function inliner removes the autoreleaseRV call in the callee that
  returns the result if nothing in the callee prevents it from being
  paired up with the calls annotated with "clang.arc.rv"="retain/claim"
  in the caller. If the call is annotated with "claim", a release call
  is inserted since autoreleaseRV+claimRV is equivalent to a release. If
  it cannot find an autoreleaseRV call, it tries to transfer the
  attributes to a function call in the callee. This is important since
  ARC optimizer can remove the autoreleaseRV call returning the callee
  result, which makes it impossible to pair it up with the retainRV or
  claimRV call in the caller. If that fails, it simply emits a retain
  call in the IR if the call is annotated with "retain" and does nothing
  if it's annotated with "claim".

- This patch teaches dead argument elimination pass not to change the
  return type of a function if any of the calls to the function are
  annotated with attribute "clang.arc.rv". This is necessary since the
  pass can incorrectly determine nothing in the IR uses the function
  return, which can happen since the front-end no longer explicitly
  emits retainRV/claimRV calls in the IR, and change its return type to
  'void'.

Future work:

- Use the attribute on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the attributes.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-01-25 11:57:08 -08:00
QingShan Zhang ffc3e800c6 [NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero
For now, we correct the result for sqrt if iteration > 0. This doesn't make
sense as they are not strict relative.

Reviewed By: dmgreen, spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D94480
2021-01-25 04:02:44 +00:00
Kazu Hirata cfa241680f [llvm] Don't include StringSwitch.h where unnecessary (NFC) 2021-01-21 19:59:48 -08:00
Simon Pilgrim 69bc0990a9 [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED).
Add DemandedElts support inside the TRUNCATE analysis.

REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d

Differential Revision: https://reviews.llvm.org/D56387
2021-01-21 13:01:34 +00:00
Hans Wennborg a51226057f Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE"
It caused "Vector shift amounts must be in the same as their first arg"
asserts in Chromium builds. See the code review for repro instructions.

> Add DemandedElts support inside the TRUNCATE analysis.
>
> Differential Revision: https://reviews.llvm.org/D56387

This reverts commit cad4275d69.
2021-01-20 20:06:55 +01:00
Simon Pilgrim cad4275d69 [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE
Add DemandedElts support inside the TRUNCATE analysis.

Differential Revision: https://reviews.llvm.org/D56387
2021-01-20 15:39:58 +00:00
Amanieu d'Antras 21bfd068b3 [AArch64] Add support for the GNU ILP32 ABI
Add the aarch64[_be]-*-gnu_ilp32 targets to support the GNU ILP32 ABI for AArch64.

The needed codegen changes were mostly already implemented in D61259, which added support for the watchOS ILP32 ABI. The main changes are:
- Wiring up the new target to enable ILP32 codegen and MC.
- ILP32 va_list support.
- ILP32 TLSDESC relocation support.

There was existing MC support for ELF ILP32 relocations from D25159 which could be enabled by passing "-target-abi ilp32" to llvm-mc. This was changed to check for "gnu_ilp32" in the target triple instead. This shouldn't cause any issues since the existing support was slightly broken: it was generating ELF64 objects instead of the ELF32 object files expected by the GNU ILP32 toolchain.

This target has been tested by running the full rustc testsuite on a big-endian ILP32 system based on the GCC ILP32 toolchain.

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D94143
2021-01-20 13:34:47 +00:00