Commit Graph

2148 Commits

Author SHA1 Message Date
Simon Pilgrim 5aa2acc86b [DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper
None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.
2022-02-02 12:04:49 +00:00
David Sherwood daa80339df [CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors
I have updated TargetLowering::isConstTrueVal to also consider
SPLAT_VECTOR nodes with constant integer operands. This allows the
optimisation to also work for targets that support scalable vectors.

Differential Revision: https://reviews.llvm.org/D117210
2022-02-01 09:50:00 +00:00
Benjamin Kramer f15014ff54 Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17"
This reverts commit ef82063207.

- It conflicts with the existing llvm::size in STLExtras, which will now
  never be called.
- Calling it without llvm:: breaks C++17 compat
2022-01-26 16:55:53 +01:00
serge-sans-paille ef82063207 Rename llvm::array_lengthof into llvm::size to match std::size from C++17
As a conquence move llvm::array_lengthof from STLExtras.h to
STLForwardCompat.h (which is included by STLExtras.h so no build
breakage expected).
2022-01-26 16:17:45 +01:00
Nikita Popov aa97bc116d [NFC] Remove uses of PointerType::getElementType()
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().

This is part of D117885, in preparation for deprecating the API.
2022-01-25 09:44:52 +01:00
Jim Lin d6b0734837 [NFC] Use Register instead of unsigned 2022-01-19 20:17:04 +08:00
Kazu Hirata 2aed08131d [llvm] Use true/false instead of 1/0 (NFC)
Identified with modernize-use-bool-literals.
2022-01-07 00:39:14 -08:00
David Green 2ec3ca7477 [ARM] Extend IsCMPZCSINC to handle CMOV
A 'CMOV 1, 0, CC, %cpsr, Cmp' is the same as a 'CSINC 0, 0, CC, Cmp',
and can be treated the same in IsCMPZCSINC added in D114013. This allows
us to remove the unnecessary CMOV in the same way that we could remove a
CSINC.

Differential Revision: https://reviews.llvm.org/D115188
2021-12-27 14:15:03 +00:00
Kazu Hirata c5cf7d910e [ARM] Use range-based for loops (NFC) 2021-12-20 23:06:47 -08:00
Kazu Hirata de90490060 Revert "[ARM] Use range-based for loops (NFC)"
This reverts commit 93d79cac2e.

This patch seems to break
llvm/test/CodeGen/ARM/constant-islands-cfg.mir under asan.
2021-12-20 10:51:36 -08:00
Kazu Hirata 93d79cac2e [ARM] Use range-based for loops (NFC) 2021-12-20 00:04:53 -08:00
David Green 4ece4cd77e [ARM] Fold away CMP/CSINC from CMOV
This makes use of the code in D114013 to fold away unnecessary
CMPZ/CSINC starting from a CMOV, in a similar way to how we fold away
CSINV/CSINC/etc

Differential Revision: https://reviews.llvm.org/D115185
2021-12-19 21:53:50 +00:00
David Green 6bd8f114c8 [ARM] Handle splats of constants for MVE qr instruction
Some MVE instructions have qr variants that take a Q and R register,
splatting the R register for each lane. This is usually handled fine for
standard splats as we sink the splat into the loop and combine the
resulting dup into the qr instruction. It does not work for constant
splats though, as we generate a vmovimm or constant pool load instead.

This intercepts that, generating a vdup of the constant instead where we
can turn the result into a qr instruction variant.

Differential Revision: https://reviews.llvm.org/D115242
2021-12-17 09:16:28 +00:00
David Green d43c801d13 [ARM] Peek through And 1 in IsCMPZCSINC
We can be in situations where And 1 zext nodes will not have been yet,
preventing us from detecting removable cmpz/csinc patterns. This peeks
through those nodes allowing us to simplify more code.

Differential Revision: https://reviews.llvm.org/D115176
2021-12-08 15:40:23 +00:00
Ties Stuij 63eb7ff47d [ARM] Implement PAC return address signing mechanism for PACBTI-M
This patch implements PAC return address signing for armv8-m. This patch roughly
accomplishes the following things:

- PAC and AUT instructions are generated.
- They're part of the stack frame setup, so that shrink-wrapping can move them
inwards to cover only part of a function
- The auth code generated by PAC is saved across subroutine calls so that AUT
can find it again to check
- PAC is emitted before stacking registers (so that the SP it signs is the one
on function entry).
- The new pseudo-register ra_auth_code is mentioned in the DWARF frame data
- With CMSE also in use: PAC is emitted before stacking FPCXTNS, and AUT
validates the corresponding value of SP
- Emit correct unwind information when PAC is replaced by PACBTI
- Handle tail calls correctly

Some notes:

We make the assembler accept the `.save {ra_auth_code}` directive that is
emitted by the compiler when it saves a register that contains a
return address authentication code.

For EHABI we need to have the `FrameSetup` flag on the instruction and
handle the `t2PACBTI` opcode (identically to `t2PAC`), so we can emit
`.save {ra_auth_code}`, instead of `.save {r12}`.

For PACBTI-M, the instruction which computes return address PAC should use SP
value before adjustment for the argument registers save are (used for variadic
functions and when a parameter is is split between stack and register), but at
the same it should be after the instruction that saves FPCXT when compiling a
CMSE entry function.

This patch moves the varargs SP adjustment after the FPCXT save (they are never
enabled at the same time), so in a following patch handling of the `PAC`
instruction can be placed between them.

Epilogue emission code adjusted in a similar manner.

PACBTI-M code generation should not emit any instructions for architectures
v6-m, v8-m.base, and for A- and R-class cores. Diagnostic message for such cases
is handled separately by a future ticket.

note on tail calls:

If the called function has four arguments that occupy registers `r0`-`r3`, the
only option for holding the function pointer itself is `r12`, but this register
is used to keep the PAC during function/prologue epilogue and clobbers the
function pointer.

When we do the tail call we need the five registers (`r0`-`r3` and `r12`) to
keep six values - the four function arguments, the function pointer and the PAC,
which is obviously impossible.

One option would be to authenticate the return address before all callee-saved
registers are restored, so we have a scratch register to temporarily keep the
value of `r12`. The issue with this approach is that it violates a fundamental
invariant that PAC is computed using CFA as a modifier. It would also mean using
separate instructions to pop `lr` and the rest of the callee-saved registers,
which would offset the advantages of doing a tail call.

Instead, this patch disables indirect tail calls when the called function take
four or more arguments and the return address sign and authentication is enabled
for the caller function, conservatively assuming the caller function would spill
LR.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Momchil Velikov
- Ties Stuij

Reviewed By: danielkiss

Differential Revision: https://reviews.llvm.org/D112429
2021-12-07 10:15:19 +00:00
Ties Stuij 0fbb17458a [ARM] Implement setjmp BTI placement for PACBTI-M
This patch intends to guard indirect branches performed by longjmp
by inserting BTI instructions after calls to setjmp.

Calls with 'returns-twice' are lowered to a new pseudo-instruction
named t2CALL_BTI that is later expanded to a bundle of {tBL,t2BTI}.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Alexandros Lamprineas
- Ties Stuij

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D112427
2021-12-06 11:07:10 +00:00
David Green 255ad73424 [ARM] Make MVE v2i1 predicates legal
MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the
same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the
two halves. This was never treated as a legal type in llvm in the past
as there are not many 64bit instructions and no 64bit compares. There
are a few instructions that could use it though, notably a VSELECT (as
it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for
similar reasons, some gathers/scatter and long multiplies and VCTP64
instructions.

This patch goes through and makes v2i1 a legal type, handling all the
cases that fall out of that. It also makes VSELECT legal for v2i64 as a
side benefit. A lot of the codegen changes as a result - usually in way
that is a little better or a little worse, but still expensive. Costs
can change a little too in the process, again in a way that expensive
things remain expensive. A lot of the tests that changed are mainly to
ensure correctness - the code can hopefully be improved in the future
where it comes up in practice.

The intrinsics currently remain using the v4i1 they previously did to
emulate a v2i1. This will be changed in a followup patch but this one
was already large enough.

Differential Revision: https://reviews.llvm.org/D114449
2021-12-03 14:05:41 +00:00
David Green 9e8a71caf0 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 15:29:14 +00:00
Hans Wennborg a87782c34d Revert "[DAG] Create fptosi.sat from clamped fptosi"
It causes builds to fail with this assert:

llvm/include/llvm/ADT/APInt.h:990:
bool llvm::APInt::operator==(const llvm::APInt &) const:
Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.

See comment on the code review.

> This adds a fold in DAGCombine to create fptosi_sat from sequences for
> smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
> the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
> it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
> ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
> to be handled similarly.
>
> A shouldConvertFpToSat method was added to control when converting may
> be profitable. The original fptosi will have a less strict semantics
> than the fptosisat, with less values that need to produce defined
> behaviour.
>
> This especially helps on ARM/AArch64 where the vcvt instructions
> naturally saturate the result.
>
> Differential Revision: https://reviews.llvm.org/D111976

This reverts commit 52ff3b0093.
2021-11-30 15:36:56 +01:00
David Green 52ff3b0093 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 11:05:32 +00:00
David Green 7d5d063c77 [ARM] Fold away unnecessary CSET/CMPZ
Codegen from expanded vector operations can end up with unnecessary
CMPZ/CSINC, of the form:
  CSXYZ A, B, C1 (CMPZ (CSINC 0, 0, C2, D), 0)

These can be converted to remove the CMPZ and CSINC, depending on the
condition.
  if C1==NE -> CSXYZ A, B, C2, D
  if C1==EQ -> CSXYZ A, B, NOT(C2), D

Differential Revision: https://reviews.llvm.org/D114013
2021-11-27 19:07:16 +00:00
Kazu Hirata 562356d6e3 [Target] Use range-based for loops (NFC) 2021-11-26 08:23:01 -08:00
David Green c76d6dd192 [ARM] Generate VCTP from SETCC
This converts a vector SETCC([0,1,2,..], splat(n), ult) to vctp n, which
can be fewer instructions and prevent the need for constant pool loads.

Differential Revision: https://reviews.llvm.org/D114177
2021-11-26 10:57:14 +00:00
Simon Pilgrim 63b1e58f07 [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED)
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.

For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.

Reapplied with fix for AArch64 rev patterns to matching the ARM fix.

https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)

Differential Revision: https://reviews.llvm.org/D114354
2021-11-25 11:14:15 +00:00
Benjamin Kramer d32787230d Revert "[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl"
This reverts commit 3cf4a2c620.

It makes llc hang on the following test case.
```
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

define dso_local void @_PyUnicode_EncodeUTF16() local_unnamed_addr #0 {
entry:
  br label %while.body117.i

while.body117.i:                                  ; preds = %cleanup149.i, %entry
  %out.6269.i = phi i16* [ undef, %cleanup149.i ], [ undef, %entry ]
  %0 = load i16, i16* undef, align 2
  %1 = icmp eq i16 undef, -10240
  br i1 %1, label %fail.i, label %cleanup149.i

cleanup149.i:                                     ; preds = %while.body117.i
  %or130.i = call i16 @llvm.bswap.i16(i16 %0) #2
  store i16 %or130.i, i16* %out.6269.i, align 2
  br label %while.body117.i

fail.i:                                           ; preds = %while.body117.i
  ret void
}

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare i16 @llvm.bswap.i16(i16) #1

attributes #0 = { "target-features"="+neon,+v8a" }
attributes #1 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #2 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" }
```
2021-11-24 14:42:54 +01:00
Simon Pilgrim 3cf4a2c620 [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.

For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.

https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)

Differential Revision: https://reviews.llvm.org/D114354
2021-11-24 11:28:35 +00:00
David Green 581f837355 [ARM] Fold (fadd x, (vselect c, y, -1.0)) into (vselect c, (fadd x, y), x)
This is similar to D113574, but as a DAG combine, not tablegen patterns.
Doing the fold as a DAG combine allows the fadd to be folded with a
fmul, finally producing a predicated vfma. It performs the same fold of
fadd(x, vselect(p, y, -0.0)) to vselect p, (fadd x, y), x) using -0.0 as
the identity value of a fadd.

Differential Revision: https://reviews.llvm.org/D113584
2021-11-24 10:41:00 +00:00
Kazu Hirata d45cb1d7ea [llvm] Use range-based for loops (NFC) 2021-11-23 08:54:48 -08:00
Zarko Todorovski 5b8bbbecfa [NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target
Reworded removed code comments that contain `sanity check` and `sanity
test`.
2021-11-17 21:59:00 -05:00
Kazu Hirata efa896e5f7 [Target] Use SDNode::uses (NFC) 2021-11-12 21:23:04 -08:00
Ard Biesheuvel 2caf85ad7a [ARM] implement LOAD_STACK_GUARD for remaining targets
Currently, LOAD_STACK_GUARD on ARM is only implemented for Mach-O targets, and
other targets rely on the generic support which may result in spilling of the
stack canary value or address, or may cause it to be kept in a callee save
register across function calls, which means they essentially get spilled as
well, only by the callee when it wants to free up this register.

So let's implement LOAD_STACK GUARD for other targets as well. This ensures
that the load of the stack canary is rematerialized fully in the epilogue.

This code was split off from

  D112768: [ARM] implement support for TLS register based stack protector

for which it is a prerequisite.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D112811
2021-11-08 22:59:15 +01:00
Kazu Hirata 41ef3187e0 [ARM, X86] Use MachineBasicBlock::{predecessors,successors} (NFC) 2021-11-07 09:53:16 -08:00
Craig Topper 04c184bba7 [TargetLowering] Simplify the interface of expandABS. NFC
Instead of returning a bool to indicate success and a separate
SDValue, return the SDValue and have the callers check if it is
null.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112331
2021-10-22 10:22:23 -07:00
Andrew Savonichev dc8a41de34 [ARM] Simplify address calculation for NEON load/store
The patch attempts to optimize a sequence of SIMD loads from the same
base pointer:

    %0 = gep float*, float* base, i32 4
    %1 = bitcast float* %0 to <4 x float>*
    %2 = load <4 x float>, <4 x float>* %1
    ...
    %n1 = gep float*, float* base, i32 N
    %n2 = bitcast float* %n1 to <4 x float>*
    %n3 = load <4 x float>, <4 x float>* %n2

For AArch64 the compiler generates a sequence of LDR Qt, [Xn, #16].
However, 32-bit NEON VLD1/VST1 lack the [Wn, #imm] addressing mode, so
the address is computed before every ld/st instruction:

    add r2, r0, #32
    add r0, r0, #16
    vld1.32 {d18, d19}, [r2]
    vld1.32 {d22, d23}, [r0]

This can be improved by computing address for the first load, and then
using a post-indexed form of VLD1/VST1 to load the rest:

    add r0, r0, #16
    vld1.32 {d18, d19}, [r0]!
    vld1.32 {d22, d23}, [r0]

In order to do that, the patch adds more patterns to DAGCombine:

  - (load (add ptr inc1)) and (add ptr inc2) are now folded if inc1
    and inc2 are constants.

  - (or ptr inc) is now recognized as a pointer increment if ptr is
    sufficiently aligned.

In addition to that, we now search for all possible base updates and
then pick the best one.

Differential Revision: https://reviews.llvm.org/D108988
2021-10-14 15:23:10 +03:00
David Green 860b4479dc [ARM] Be more explicit about disabling CombineBaseUpdate for MVE.
This shouldn't be called for non-neon targets at the moment in either
case, but it is good to be expliit about the CombineBaseUpdate being a
NEON function, not expecting to be run under MVE.
2021-10-11 21:51:45 +01:00
Itay Bookstein 40ec1c0f16 [IR][NFC] Rename getBaseObject to getAliaseeObject
To better reflect the meaning of the now-disambiguated {GlobalValue,
GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction
(D109792), the function is renamed to getAliaseeObject.
2021-10-06 19:33:10 -07:00
Pengxuan Zheng b0045f5595 [ARM] Fix a bug in finding a pair of extracts to create VMOVRRD
D100244 missed a check on the ResNo of the extract's operand 0 when finding a
pair of extracts to combine into a VMOVRRD (extract(x, n); extract(x, n+1) ->
VMOVRRD(extract x, n/2)). As a result, it can incorrectly pair an extract(x, n)
with another extract(x:3, n+1) for example. This patch fixes the bug by adding
the proper check on ResNo.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D111188
2021-10-06 10:03:32 -07:00
Jay Foad a9bceb2b05 [APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807
2021-10-04 08:57:44 +01:00
Kazu Hirata c1e32b3fc0 [Target] Migrate from getNumArgOperands to arg_size (NFC)
Note that getNumArgOperands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-10-02 12:06:29 -07:00
David Green f9aa8623fe [ARM] Add more MVE intrinsics to sink splats to
This adds a few more unpredicated intrinsics to sink splats to, in order
to create more qr instruction variants. Notably this includes
saddsat/uaddsat but also some of the unpredicated mve intrinsics.

Differential Revision: https://reviews.llvm.org/D110333
2021-09-30 14:41:23 +01:00
David Green 3f90df22f1 [ARM] MVE reverse shuffles.
The vectorizer can sometimes make reverse shuffles from indices that
count down. In MVE, we don't have a 128bit rev instruction, but we can
select this to a VREV64 with some lane movs to swap the two halfs.

Ideally this would use VMOVD's, but only gets as far as VMOVS's at the
moment.

Differential Revision: https://reviews.llvm.org/D69510
2021-09-20 13:48:01 +01:00
David Green cb5e3f7959 [ARM] Prevent large integer VQDMULH pattern crashes
Put a limit on the size of constant integers we test when looking for
VQDMULH, to prevent it from crashing from values more than 64bits.
2021-09-18 18:47:02 +01:00
David Green a2332d5332 [ARM] Prevent continuous folding of SUBC
Under some situations under Thumb1, we could be stuck in an infinite
loop recombining the same instruction. This puts a limit on that, not
combining SUBC with SUBE repeatedly.
2021-09-15 11:23:32 +01:00
Craig Topper 9af8f1b18e [SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode.
Soft deprecrate isNullValue/isAllOnesValue and update in tree
callers. This matches the changes to the APInt interface from
D109483.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D109535
2021-09-09 13:28:30 -07:00
Chris Lattner d51da74889 [CodeGen] Use DAG.getAllOnesConstant where possible to simplify code. NFC. 2021-09-09 10:22:51 -07:00
Chris Lattner 735f46715d [APInt] Normalize naming on keep constructors / predicate methods.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`.  This achieves two things:

1) This starts standardizing predicates across the LLVM codebase,
   following (in this case) ConstantInt.  The word "Value" doesn't
   convey anything of merit, and is missing in some of the other things.

2) Calling an integer "null" doesn't make any sense.  The original sin
   here is mine and I've regretted it for years.  This moves us to calling
   it "zero" instead, which is correct!

APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go.  As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.

Included in this patch are changes to a bunch of the codebase, but there are
more.  We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.

Differential Revision: https://reviews.llvm.org/D109483
2021-09-09 09:50:24 -07:00
Ben Shi 63ca9371c7 [ARM] Implement target hook function to decide folding (mul (add x, c1), c2)
Prevent the folding in DAGCombine if it leads to worse code.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D109124
2021-09-07 15:42:43 +08:00
David Green f37e132263 [ARM] Add VFP lowering for fptosi.sat
This extends D107865 to the VFP insructions, lowering llvm.fptosi.sat
and llvm.fptoui.sat to VCVT instructions that inherently perform the
saturate.

Differential Revision: https://reviews.llvm.org/D107866
2021-09-03 18:11:08 +01:00
David Green 9cb8f4d1ad [ARM] Add a tail-predication loop predicate register
The semantics of tail predication loops means that the value of LR as an
instruction is executed determines the predicate. In other words:

mov r3, #3
DLSTP lr, r3        // Start tail predication, lr==3
VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0.
mov lr, #1
VADD.s32 q0, q1, q2 // Only first lane is updated.

This means that the value of lr cannot be spilled and re-used in tail
predication regions without potentially altering the behaviour of the
program. More lanes than required could be stored, for example, and in
the case of a gather those lanes might not have been setup, leading to
alignment exceptions.

This patch adds a new lr predicate operand to MVE instructions in order
to keep a reference to the lr that they use as a tail predicate. It will
usually hold the zeroreg meaning not predicated, being set to the LR phi
value in the MVETPAndVPTOptimisationsPass. This will prevent it from
being spilled anywhere that it needs to be used.

A lot of tests needed updating.

Differential Revision: https://reviews.llvm.org/D107638
2021-09-02 13:42:58 +01:00
David Green 49476a4d66 [ARM] Add MVE lowering for fptosi.sat
This adds lowering of the llvm.fptosi.sat and llvm.fptoui.sat intinsics,
selecting a VCVT instruction which under MVE will inherently perform the
saturate.

Differential Revision: https://reviews.llvm.org/D107865
2021-09-01 22:38:47 +01:00
Nick Desaulniers 5c91b98c5d [ARMISelLowering] avoid emitting libcalls to __mulodi4()
__has_builtin(__builtin_mul_overflow) returns true for 32b ARM targets,
but Clang is deferring to compiler RT when encountering `long long`
types. This breaks sanitizer builds of the Linux kernel that are using
__builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support allmodconfig builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D108842
2021-08-27 15:14:47 -07:00
David Green 605489d593 [ARM] Fix VQDMULH fold for scalar smin
Add a variant of mve-vqdmulh tests that uses min/max intrinsics
directly, including a scalar test that shows it misbehaving for min
intrinsics and a fix for the combine to prevent it from misbehaving.
2021-08-21 16:33:18 +01:00
Arthur Eubanks 46cf82532c [NFC] Replace Function handling of attributes with less confusing calls
To avoid magic constants and confusing indexes.
2021-08-17 21:05:40 -07:00
David Green 9236dea255 [ARM] Create MQQPR and MQQQQPR register classes
Similar to the MQPR register class as the MVE equivalent to QPR, this
adds MQQPR and MQQQQPR register classes for the MVE equivalents of QQPR
and QQQQPR registers. The MVE MQPR seemed have worked out quite well,
and adding MQQPR and MQQQQPR allows us to a little more accurately
specify the number of registers, calculating register pressure limits a
little better.

Differential Revision: https://reviews.llvm.org/D107463
2021-08-16 22:58:12 +01:00
Simon Pilgrim d6fe8d37c6 [DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b)
Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal.

This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands.

Differential Revision: https://reviews.llvm.org/D107597
2021-08-16 16:06:54 +01:00
Arthur Eubanks 92ce6db9ee [NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr()
This is more consistent with similar methods.
2021-08-13 11:09:18 -07:00
David Green ae9a346ef8 [ARM] Fix DAG combine loop in reduction distribution
Given a constant operand, the MVE and DAGCombine combines could fight,
each redistributing in the opposite order. Add a guard to the MVE
vecreduce distribution to prevent that.
2021-08-12 16:37:39 +01:00
Simon Pilgrim dbce6a8d9d [ARM] Fold insert_subvector to concat_vectors
D107068 fixed the same problem on aarch64 but the arm variant wasn't exposed in existing test coverage.

I've copied the arm64-neon-copy tests (and stripped the intrinsic test from it) for testing on arm neon builds as well.
2021-08-06 11:21:31 +01:00
David Green 15a1d7e839 [ARM] Switch order of creating VADDV and VMLAV.
It can be beneficial to attempt to try the larger VMLAV patterns before
VADDV, in case both may match the same code.
2021-07-31 16:28:52 +01:00
David Green 69cdadddec [ARM] Distribute reductions based on ascending load offset
This distributes reductions based on the relative offset of loads, if
one is found from their operands. Given chains of reductions this will
then sort them in ascending load order, which in turn can help simple
prefetches latch on to increasing strides more easily.

Differential Revision: https://reviews.llvm.org/D106569
2021-07-30 19:50:07 +01:00
David Green 532d05b714 [ARM] Attempt to distribute reductions
This adds a combine for adds of reductions, distributing them so that
they occur sequentially to enable better use of accumulating VADDVA
instructions. It combines:
  add(X, add(vecreduce(Y), vecreduce(Z))) ->
    add(add(X, vecreduce(Y)), vecreduce(Z))
and
  add(add(A, reduce(B)), add(C, reduce(D))) ->
    add(add(add(A, C), reduce(B)), reduce(D))

These together distribute the add's so that more reductions can be
selected to VADDVA.

Differential Revision: https://reviews.llvm.org/D106532
2021-07-30 14:48:31 +01:00
David Green 4b56306762 [ARM] Turn vecreduce_add(add(x, y)) into vecreduce(x) + vecreduce(y)
Under MVE we can use VADDV/VADDVA's to perform integer add reductions,
so it can be beneficial to use more reductions than summing subvectors
and reducing once. Especially for VMLAV/VMLAVA the mul can be
incorporated into the reduction, producing less instructions.

Some of the test cases currently get larger due to extra integer adds,
but will be improved in a followup patch.

Differential Revision: https://reviews.llvm.org/D106531
2021-07-30 10:10:41 +01:00
David Green ba42f6a4b5 [ARM] Pass SelectionDAG to methods that dont require DCI. NFC
In these methods DCI is never used, only the DAG from it. Pass the DAG
directly, cleaning up the code a little.
2021-07-21 22:11:09 +01:00
David Green 5561ad8b36 [ARM] Remove PromotedBitwiseVT for NEON types
This removes the promotion of NEON AND, OR and XOR nodes to v2i32/v4i32,
treating them the same as the AArch64 and MVE backends where we just add
the relevant patterns for each legal type. This prevents a lot of
bitcasts from being added to the DAG, which have the potential to make
optimizations more difficult. It does mean adding extra patterns, and
some codegen can change due to the types now being legal, not promoted.

Differential Revision: https://reviews.llvm.org/D105588
2021-07-19 16:36:33 +01:00
David Green eb1e95dbdf [ARM] Extend more reductions during lowering
This relaxes the VMLAV and VADDV reduction recognition code to handle
smaller than legal types, extending them as needed. That was already
handled for some reductions, this extends it to more types in a more
generic way. If a smaller than legal value is found it is extended to
the legal type as needed.

Differential Revision: https://reviews.llvm.org/D106051
2021-07-19 08:58:03 +01:00
David Green ad8e75caa2 [ARM] Fix for matching reductions that are both sext and zext.
Fix a silly mistake that was not making sure that _both_ operands were
the correct extend code.
2021-07-16 23:11:42 +01:00
David Green dad506bd4e [ARM] Expand types handled in VQDMULH recognition
We have a DAG combine for recognizing the sequence of nodes that make up
an MVE VQDMULH, but only currently handles specifically legal types.
This patch expands that to other power-2 vector types. For smaller than
legal types this means any_extending the type and casting it to a legal
type, using a VQDMULH where we only use some of the lanes. The result is
sign extended back to the original type, to properly set the invalid
lanes. Larger than legal types are split into chunks with extracts and
concat back together.

Differential Revision: https://reviews.llvm.org/D105814
2021-07-15 14:47:53 +01:00
David Green 31b8f40006 [ARM] Move add(VMLALVA(A, X, Y), B) to VMLALVA(add(A, B), X, Y)
For i64 reductions we currently try and convert add(VMLALV(X, Y), B) to
VMLALVA(B, X, Y), incorporating the addition into the VMLALVA. If we
have an add of an existing VMLALVA, this patch pushes the add up above
the VMLALVA so that it may potentially be simplified further, for
example being folded into another VMLALV.

Differential Revision: https://reviews.llvm.org/D105686
2021-07-14 20:06:49 +01:00
David Green 338314f9c2 [ARM] Lower v16i8 -> i64 VMLA reductions.
MVE does not have a VMLALV instruction that can perform v16i8 -> i64
reductions, like it does for v8i16->i64 and v4i32->i64 reductions. That
means that the pattern to create them will be spilt up by type
legalization, creating a lot of instructions.

This extends the patterns for matching i64 reductions a little to handle
the v16i8->i64 case. We need to turn them into a pair of v8i16->i64
VMLALVs that each perform half of the reduction and are summed together
(so the later is a VMLALVA). The order of the lanes does not matter for
the reduction so we generate a MVEEXT for the extension, that will
either be folded into a extending load or can be optimized to a
VREV/VMOVL. Some of the resulting codegen isn't optimal, but will be
improved in a later patch.

Differential Revision: https://reviews.llvm.org/D105680
2021-07-14 18:11:32 +01:00
David Green ca78151001 [ARM] Introduce MVEEXT ISel lowering
Similar to D91921 (and D104515) this introduces two MVESEXT and MVEZEXT
nodes that larger-than-legal sext and zext are lowered to. These either
get optimized away or end up becoming a series of stack loads/store, in
order to perform the extending whilst keeping the order of the lanes
correct. They are generated from v8i16->v8i32, v16i8->v16i16 and
v16i8->v16i32 extends, potentially with a intermediate extend for the
larger v16i8->v16i32 extend. A number of combines have been added for
obvious cases that come up in tests, notably MVEEXT of shuffles. More
may be needed in the future, but this seems to cover most of the cases
that come up in the tests.

Differential Revision: https://reviews.llvm.org/D105090
2021-07-13 07:21:20 +01:00
Daniel Egger 98c2e4115d [ARM] Add lowering of uadd_sat to uq{add|sub}8 and uq{add|sub}16
This follow the lead of https://reviews.llvm.org/D68974 to add lowering
of unsigned saturated addition/subtraction.

Differential Revision: https://reviews.llvm.org/D105413
2021-07-11 15:58:11 +01:00
Krzysztof Parzyszek df88c26f0d [OpaquePtr] Add type parameter to emitLoadLinked
Differential Revision: https://reviews.llvm.org/D105353
2021-07-02 13:07:40 -05:00
David Green 3d48775b89 [ARM] Reassociate BFI
D104868 removed an (incorrect) fold for distributing BFI instructions in
a chain, combining them into a single instruction. BFIs like that are
hard to test, as the patterns are often destroyed before they become
BFIs. But it can come up in places, with chains of BFIs that can be
combined.

This patch adds a replacement, which reassociates BFI instructions with
non-overlapping insertion masks so that low bits are inserted first.
This can end up sorting the nodes so that adjacent inserts are next to
one another, allowing the existing folds to combine into a single BFI.

Differential Revision: https://reviews.llvm.org/D105096
2021-07-01 21:08:13 +01:00
David Green 371ee32e01 [ARM] Fold extract of ARM_BUILD_VECTOR
This adds a small fold for extract (ARM_BUILD_VECTOR) to fold to the
original node. This can help simplify the resulting codegen in some
cases.

Differential Revision: https://reviews.llvm.org/D104860
2021-06-29 11:03:19 +01:00
David Green a1c0f09a89 [ARM] Add an extra fold for f32 extract(vdup(i32))
This adds another small fold for extract of a vdup, between a i32 and a
f32, converting to a BITCAST. This allows some extra folding to happen,
simplifying the resulting code.

Differential Revision: https://reviews.llvm.org/D104857
2021-06-28 08:54:03 +01:00
David Green 41d8149ee9 [ARM] Lower MVETRUNC to stack operations
The MVETRUNC node truncates two wide vectors to a single vector with
narrower elements. This is usually lowered to a series of extract/insert
elements, going via GPR registers. This patch changes that to instead
use a pair of truncating stores and a stack reload. This cuts down the
number of instructions at the expense of some stack space.

Differential Revision: https://reviews.llvm.org/D104515
2021-06-26 22:12:57 +01:00
David Green 5955812927 [ARM] Introduce MVETRUNC ISel lowering
Currently, when encountering store(trunc(..)) where the trunc is double
a legal vector lenth in MVE, we spilt the node into two different stores
each performing half of the trunc from the wider type. This works well
for efficiently lowering wider than legal types, else the trunc becomes
a series of individual lane moves. Unfortunately this splitting is
currently one of the first combines attempted, so can happen before any
other combines which might be more preferable.

This patch instead introduces the concept of a MVETRUNC ISel node that
the trunk is initially lowered to, to keep it intact as a single item as
opposed to splitting it up. This allows us to push the store(trunc(..))
combine later, allowing other optimisations to potentially happen on the
trunc first. The store(trunc(..)) splitting can then be done later in
the legalisation period if needed, or else fall back to a buildvector as
before.

This can also be used in the future to lower to loads/stores, as opposed
to the more expensive lane extracts/inserts. Some extra combines are
added to keep all the existing tests happy.

Differential Revision: https://reviews.llvm.org/D91921
2021-06-26 22:00:26 +01:00
David Green 0f83d37a14 [ARM] MVE vabd
This adds MVE lowering for VABDS/VABDU, using the code parted from
AArch64 in D91937.

Differential Revision: https://reviews.llvm.org/D91938
2021-06-26 19:41:32 +01:00
Amara Emerson f9b3840c3d [ARM] Fix crash in chained BFI combine due to incorrectly RAUW'ing a node.
For a bfi chain like:
a = bfi input, x, y
b = bfi a, x', y'

The previous code was RAUW'ing a with x, mutating the second 'b' bfi, and when
SelectionDAG's CSE code ended up deleting it unexpectedly, bad things happend.
There's no need to RAUW in this case because we can just return our newly
created replacement BFI node. It also looked incorrect because it didn't account
for other users of the 'a' bfi.

Since it seems that chains of more than 2 BFI nodes are hard/impossible to
produce without this combine kicking in at some point, I've removed that
functionality since it had no test coverage.

rdar://79095399

Differential Revision: https://reviews.llvm.org/D104868
2021-06-24 23:35:47 -07:00
Martin Storsjö 42f74e8249 [llvm] Rename StringRef _lower() method calls to _insensitive()
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
2021-06-25 00:22:01 +03:00
Eli Friedman 74909e4b6e Rename MachineMemOperand::getOrdering -> getSuccessOrdering.
Since this method can apply to cmpxchg operations, make sure it's clear
what value we're actually retrieving.  This will help ensure we don't
accidentally ignore the failure ordering of cmpxchg in the future.

We could potentially introduce a getOrdering() method on AtomicSDNode
that asserts the operation isn't cmpxchg, but not sure that's
worthwhile.

Differential Revision: https://reviews.llvm.org/D103338
2021-06-21 16:49:27 -07:00
Eli Friedman bf0d0671a1 [ARM] Make sure we don't transform unaligned store to stm on Thumb1.
This isn't likely to come up in practice; the combination of compiler
flags required to hit this issue should be rare. Found by inspection.
2021-06-21 14:32:42 -07:00
David Spickett e4ecd83fe9 [llvm][AArch64] Handle arrays of struct properly (from IR)
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.

This fixes https://bugs.llvm.org/show_bug.cgi?id=46996

One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)

This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
  ret [ 1 x %T1 ] zeroinitializer
}
```

We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.

This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.

You get this:
```
  renamable $d0 = FMOVD0
  $w0 = COPY killed renamable $d0
```

Rather than this:
```
  $d0 = FMOVD0
  $w0 = COPY $wzr
```

The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.

This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.

Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.

New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D104123
2021-06-16 13:56:01 +00:00
David Green 0a714eaa51 [ARM] Correct type of setcc results for FP vectors
Under MVE v4f32 and v8f16 vectors should be using v4i1/v8i1 predicates
for the setcc result type, as they have predicated registers for those
types. Setting this correctly prevents some inefficient optimizations
from happening.
2021-06-16 11:11:03 +01:00
Kristina Bessonova f6b9836b09 [ARM][NEON] Combine base address updates for vld1Ndup intrinsics
Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D103836
2021-06-13 11:18:32 +02:00
Koutheir Attouchi 789708617d Do not generate calls to the 128-bit function __multi3() on 32-bit ARM
Re-applying this patch after bots failures. Should be fine now.

The function __multi3() is undefined on 32-bit ARM, so a call to it should
never be emitted. Instead, plain instructions need to be generated to
perform 128-bit multiplications.

Differential Revision: https://reviews.llvm.org/D103906
2021-06-11 11:45:21 +01:00
Simon Pilgrim 4eb47e3cd4 [TargetLowering] getABIAlignmentForCallingConv - pass DataLayout by const reference. NFCI.
Avoid unnecessary copies and match every other method in TargetLowering that takes DataLayout as an argument.
2021-06-10 10:55:24 +01:00
Nico Weber 68a1d9a1f5 Revert "Do not generate calls to the 128-bit function __multi3() on 32-bit ARM"
This reverts commit 64e9aa3302.
Breaks check-llvm everywhere, see https://reviews.llvm.org/D103906
2021-06-09 13:21:05 -04:00
Koutheir Attouchi 64e9aa3302 Do not generate calls to the 128-bit function __multi3() on 32-bit ARM
The function __multi3() is undefined on 32-bit ARM, so a call to it
should never be emitted. Instead, plain instructions need to be
generated to perform 128-bit multiplications.

Differential Revision: https://reviews.llvm.org/D103906
2021-06-09 16:21:16 +01:00
David Green d7853bae94 [ARM] Generate VDUP(Const) from constant buildvectors
If we cannot otherwise use a VMOVimm/VMOVFPimm/VMVNimm, fall back to
producing a VDUP(const) as opposed to a constant pool load. This will at
least be smaller codesize and can allow the VDUP to be folded into other
instructions.

Differential Revision: https://reviews.llvm.org/D103808
2021-06-08 20:51:33 +01:00
Nikita Popov 1ffa6499ea [TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC)
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.

Differential Revision: https://reviews.llvm.org/D103759
2021-06-06 16:29:50 +02:00
Michael Benfield 00d19c6704 [various] Remove or use variables which are unused but set.
This is in preparation for the -Wunused-but-set-variable warning.

Differential Revision: https://reviews.llvm.org/D102942
2021-06-01 15:38:48 -07:00
Tim Northover 9ff2eb1ea5 SwiftTailCC: teach verifier musttail rules applicable to this CC.
SwiftTailCC has a different set of requirements than the C calling convention
for a tail call. The exact argument sequence doesn't have to match, but fewer
ABI-affecting attributes are allowed.

Also make sure the musttail diagnostic triggers if a musttail call isn't
actually a tail call.
2021-05-28 11:12:00 +01:00
Tim Northover d88f96dff3 ARM: support mandatory tail calls for tailcc & swifttailcc
This adds support for callee-pop conventions to the ARM backend so that it can
ensure a call marked "tail" is actually a tail call.
2021-05-28 11:10:51 +01:00
David Green 2cf0e52b85 [ARM] Add patterns for vmulh
Now that vmulh can be selected, this adds the MVE patterns to make it
legal and generate instructions.

Differential Revision: https://reviews.llvm.org/D88011
2021-05-26 09:22:12 +01:00
Kristina Bessonova 44843e2a04 [ARM][NEON] Combine base address updates for vld1x intrinsics
Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D102855
2021-05-25 11:06:39 +02:00
David Green 53c42f7700 [ARM] Ensure WLS preheader blocks have branches during memcpy lowering
This makes sure that the blocks created for lowering memcpy to loops end
up with branches, even if they fall through to the successor. Otherwise
IfCvt is getting confused with unanalyzable branches and creating
invalid block layouts.

The extra branches should be removed as the tail predicated loop is
finalized in almost all cases.
2021-05-24 11:26:45 +01:00
David Green 6cc78b9245 [ARM] Fix inline memcpy trip count sequence
The trip count for a memcpy/memset will be n/16 rounded up to the
nearest integer. So (n+15)>>4. The old code was including a BIC too, to
clear one of the bits, which does not seem correct. This remove the
extra BIC.

Note that ideally this would never actually be generated, as in the
creation of a tail predicated loop we will DCE that setup code, letting
the WLSTP perform the trip count calculation. So this doesn't usually
come up in testing (and apparently the ARMLowOverheadLoops pass does not
do any sort of validation on the tripcount). Only if the generation of
the WLTP fails will it use the incorrect BIC instructions.

Differential Revision: https://reviews.llvm.org/D102629
2021-05-24 11:01:58 +01:00
Kristina Bessonova d59a2a32b9 [ARM][NEON] Combine base address updates for vst1x intrinsics
Differential Revision: https://reviews.llvm.org/D102256
2021-05-19 14:05:55 +02:00
Tim Northover 82a0e808bb IR/AArch64/X86: add "swifttailcc" calling convention.
Swift's new concurrency features are going to require guaranteed tail calls so
that they don't consume excessive amounts of stack space. This would normally
mean "tailcc", but there are also Swift-specific ABI desires that don't
naturally go along with "tailcc" so this adds another calling convention that's
the combination of "swiftcc" and "tailcc".

Support is added for AArch64 and X86 for now.
2021-05-17 10:48:34 +01:00