Commit Graph

6374 Commits

Author SHA1 Message Date
David Green 993b203b6a [AArch64] Sink splat(s/zext(..)) to uses
If the Shuffle is a splat and the operand is a zext/sext, sinking the
operand and the s/zext can help create indexed s/umull. This is
especially useful to prevent i64 mul being scalarized.

Differential Revision: https://reviews.llvm.org/D133355
2022-09-13 15:47:41 +01:00
David Spickett 0b8a44388e [llvm][AArch64] Explain why certain registers are reserved on Arm64EC
This extends 4658366d95 to add a note
explaining why the register is reserved.

note: x13 is clobbered by asynchronous signals when using Arm64EC.

I've added testing for w/x registers and v/q/s/d and h floating point
registers.

llvm will accept, but silently do nothing with, b registers. So they
are not tested here (clang rejects them so at least for C you're safe anyway).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D133701
2022-09-13 10:13:06 +00:00
Sylvestre Ledru cd20a18286 Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView"
Causing:
https://github.com/llvm/llvm-project/issues/57709

This reverts commit ab56719acd.
2022-09-13 10:53:59 +02:00
David Majnemer ab56719acd [clang, llvm] Add __declspec(safebuffers), support it in CodeView
__declspec(safebuffers) is equivalent to
__attribute__((no_stack_protector)).  This information is recorded in
CodeView.

While we are here, add support for strict_gs_check.
2022-09-12 21:15:34 +00:00
Matthias Gehre c1502425ba Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetLowering in the new pass manager.

Differential Revision: https://reviews.llvm.org/D133691
2022-09-12 17:06:16 +01:00
Sander de Smalen cf72dddaef [AArch64][SME] Add utility class for handling SME attributes.
This patch adds a utility class that will be used in subsequent patches
for parsing the function/callsite attributes and determining whether
changes to PSTATE.SM are needed, or whether a lazy-save mechanism is
required.

It also implements some of the restrictions on the SME attributes
in the IR Verifier pass.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: david-arm, aemerson

Differential Revision: https://reviews.llvm.org/D131570
2022-09-12 12:41:30 +00:00
David Spickett 739b69e655 [LLVM][AArch64] Explain that X19 is used as the frame base pointer register
Fixes #50098

LLVM uses X19 as the frame base pointer, if it needs to. Meaning you
can get warnings if you clobber that with inline asm.

However, it doesn't explain why. The frame base register is not part
of the ABI so it's pretty confusing why you get that warning out of the blue.

This adds a method to explain a reserved register with X19 as the first one.
The logic is the same as getReservedRegs.

I could have added a return parameter to isASMClobberable and friends
but found that there's a lot of things that call isReservedReg in various
ways.

So while one more method on the pile isn't great design, it is simpler
right now to do it this way and only pay the cost if you are actually using
a reserved register.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D133213
2022-09-12 09:18:09 +00:00
Mingming Liu 8aa800614b [AArch64][CostModel] Detects that {extract,insert}-element at lane 0 has the same cost as the other lane for vector instructions in the IR.
Currently, {extract,insert}-element has zero cost at lane 0 [1]. However, there is a cost (by fmov instruction [2], or ext/ins instruction) to move values from SIMD registers to GPR registers, when the element is used explicitly as integers.

See https://godbolt.org/z/faPE1nTn8, when fmov is generated for d* register -> x* register conversion.

Implementation-wise, add a private method `AArch64TTIImpl::getVectorInstrCostHelper` as a helper function. This way, instruction-based method could share the core logic (e.g.,
returning zero cost if type is legalized to scalar).

[1] 2cf320d41e/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (L1853)
[2] 2cf320d41e/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L8150-L8157)

Differential Revision: https://reviews.llvm.org/D128302
2022-09-09 09:47:30 -07:00
zhongyunde 7a81782585 [AArch64][CodeGen]Fold the mov and lsl into ubfiz
Fix the issue exposed by D132322, depand on D132939
Reviewed By: efriedma, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D132325
2022-09-09 23:50:29 +08:00
Fangrui Song f9b5924975 [AArch64] Fix -Wunused-variable. NFC 2022-09-08 18:27:16 -07:00
zhongyunde b6655333c2 [Peephole] rewrite INSERT_SUBREG to SUBREG_TO_REG if upper bits zero
Restrict the 32-bit form of an instruction of integer as too many test cases
will be clobber as the register number updated.
    From %reg = INSERT_SUBREG %reg, %subreg, subidx
    To   %reg:subidx =  SUBREG_TO_REG 0, %subreg, subidx
Try to prefix the redundant mov instruction at D132325 as the SUBREG_TO_REG should not generate code.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D132939
2022-09-09 09:00:54 +08:00
David Green 3875c38adf [AArch64] Fix formatting of the Shuffle Cost tables. NFC 2022-09-08 19:54:12 +01:00
Joe Loser 5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
liqinweng 723245bfac [AARCH64][COST] Improve cost of reverse shuffles for AArch64
Update the comments for reverse shuffles and add tests

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D132730
2022-09-08 18:55:49 +08:00
Marco Elver 31a548021b [GlobalISel] Propagate PCSections metadata to MachineInstr
Propagate (most) PC sections metadata to MachineInstr when GlobalISel is
doing instruction selection.

This change results in support for architectures using GlobalISel (such
as -O0 with AArch64). Not all instructions may be supported yet, and
requires further target-specific handling (such as done for AArch64
pseudo-atomics). Expanding supported instructions is planned on a
case-by-case basis and new use cases for PC sections metadata.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130886
2022-09-07 11:36:02 +02:00
Marco Elver 0ba8886af5 [FastISel] Propagate PCSections metadata to MachineInstr
Propagate PC sections metadata to MachineInstr when FastISel is doing
instruction selection.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130884
2022-09-07 11:36:01 +02:00
Alexander Shaposhnikov 6a2442e9be [AArch64] Increase AddedComplexity of BIC
This diff adjusts AddedComplexity of BIC to bump its position
in the list of patterns to make LLVM pick it instead of MVN + AND.
MVN + AND requires 2 cycles, so does e.g. MOV + BIC, but the latter
outperforms the former if the instructions producing the operands of
BIC can be issued in parallel.

One may consider the following example:

ldur x15, [x0, #2] # 4 cycles
mvn x10, x15 # 1 cycle (depends on ldur)
and x9, x10, #0x8080808080808080

vs.

ldur x15, [x0, #2] # 4 cycles
mov x9, #0x8080808080808080 # 1 cycle (can be executed in parallel with ldur)
bic x9, x9, x15. # 1 cycle

Test plan: ninja check-all

Differential revision: https://reviews.llvm.org/D133345
2022-09-06 20:31:24 +00:00
Markus Böck f049b2c3fc [MC] Emit Stackmaps before debug info
This patch is essentially an alternative to https://reviews.llvm.org/D75836 and was mentioned by @lhames in a comment.

The gist of the issue is that Mach-O has restrictions on which kind of sections are allowed after debug info has been emitted, which is also properly asserted within LLVM. Problem is that stack maps are currently emitted as one of the last sections in each target-specific AsmPrinter so far, which would cause the assertion to trigger. The current approach of special casing for the `__LLVM_STACKMAPS` section is not viable either, as downstream users can overwrite the stackmap format using plugins, which may want to use different sections.

This patch fixes the issue by emitting the stack map earlier, right before debug info is emitted. The way this is implemented is by taking the choice when to emit the StackMap away from the target AsmPrinter and doing so in the base class. The only disadvantage of this approach is that the `StackMaps` member is now part of the base class, even for targets that do not support them. This is functionaly not a problem however, as emitting an empty `StackMaps` is a no-op.

Differential Revision: https://reviews.llvm.org/D132708
2022-09-06 20:20:56 +02:00
Guozhi Wei 3cf4ab5447 [AArch64] Add an option to reserve physical registers from RA
This patch adds an option --reserve-regs-for-regalloc, so we can reserve a list
of physical registers. These registers will not be used by register allocator,
but can still be used as ABI requests such as passing arguments to function
call.

Its main purpose is simulating high register pressure by reserving many physical
registers. So it will be much easier to test and debug register allocation
changes.

Differential Revision: https://reviews.llvm.org/D132717
2022-09-06 17:18:01 +00:00
Matthias Gehre 2090e85fee [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
This adds the ExpandLargeDivRem to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

Differential Revision: https://reviews.llvm.org/D130076
2022-09-06 15:32:04 +01:00
Eli Friedman 2b9cec6244 [ARM64EC 5/?] Fix names of __chkstk and __security_check_cookie.
Part of initial Arm64EC patchset.

Arm64EC code needs to use functions with a different name, to avoid
using the x64 versions.

Differential Revision: https://reviews.llvm.org/D125417
2022-09-05 13:19:54 -07:00
Eli Friedman 5637ec0983 [ARM64EC 4/?] Add LLVM support for varargs calling convention.
Part of patchset to add initial support for ARM64EC.

The ARM64EC calling convention is the same as ARM64 for non-varargs
functions, but for varargs, the convention is significantly different.
Basically, only x0-x3 registers are used for passing arguments, and x4
and x5 describe the address/size of the arguments passed in memory. (See
https://docs.microsoft.com/en-us/windows/uwp/porting/arm64ec-abi for
more details; see
https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention for
the x64 calling convention rules, which this convention needs to match.)

Note that this currently doesn't handle i128 arguments correctly; as
noted in review, that's sort of complicated to handle, so I'm leaving it
for a followup.

Differential Revision: https://reviews.llvm.org/D125415
2022-09-05 13:05:48 -07:00
Eli Friedman 4658366d95 [ARM64EC 3/?] Mark reserved registers specific to ARM64EC ABI.
Part of patchset to add initial support for ARM64EC.

I'm not completely sure I understand the reason for this restriction,
but Microsoft documentation says that asynchronous signals clobber these
registers, so we can't ever use them.

As far as I know, none of these registers have any hardcoded meaning, so
reserving them shouldn't have any significant side-effects.

Differental Revision: https://reviews.llvm.org/D125413
2022-09-05 12:59:39 -07:00
Eli Friedman 63335afb4e [ARM64EC 2/?] Add target triple, and allow targeting it.
Part of patchset to add initial support for ARM64EC.

Per discussion on review, using the triple arm64ec-pc-windows-msvc. The
parsing works the same way as Apple's alternate Arm ABI "arm64e".

Differential Revision: https://reviews.llvm.org/D125412
2022-09-05 12:27:10 -07:00
David Green 3c6edc0b2f [AArch64][GlobalISel] Recognise some CCMPri
This is a simple addition to emitConditionalComparison, to match CCMP
with immediates using getIConstantVRegValWithLookThrough, letting it
select the CCMPri variants of the instructions.

Differential Revision: https://reviews.llvm.org/D131073
2022-09-05 19:43:23 +01:00
Kazu Hirata 7d8c2d17eb [llvm] Use range-based for loops (NFC)
Identified with modernize-loop-convert.
2022-09-03 23:27:25 -07:00
Eli Friedman b219a9c0a2 [CostModel][AArch64] Fix ctpop intrinsic cost when NEON is disabled.
If we don't have NEON, we use the generic fallback, which takes 12
instructions. Make sure the costs reflect that.

(On a related note, we could optimize the generic fallback a bit. It
currently uses sequences like lsr+and+add; if we use and+lsr+add
instead, we can fold the lsr into the add.)

Differential Revision: https://reviews.llvm.org/D133154
2022-09-02 15:17:55 -07:00
Mingming Liu 242203d254 [AArch64][TTI] Add cost table entry for trunc over vector of integers.
1) Tablegen patterns exist to use 'xtn' and 'uzp1' for trunc [1]. Cost table entries are updated based on the actual number of {xtn, uzp1} instructions generated.
2) Without this, an IR instruction like trunc <8 x i16> %v to <8 x i8> is considered free and might be sinked to other basic blocks. As a result, the sinked 'trunc' is in a different basic block with its (usually not-free) vector operand and misses the chance to be combined during instruction selection. (examples in [2])
3) It's a lot of effort to teach CodeGenPrepare.cpp to sink the operand of trunc without introducing regressions, since the instruction to compute the operand of trunc could be faster (e.g., throughput) than the instruction corresponding to "trunc (bin-vector-op". For instance in [3], sinking %1 (as trunc operand) into bb.1 and bb.2 means to replace 2 xtn with 2 shrn (shrn has a throughput of 1 and only utilize v1 pipeline), which is not necessarily good, especially since ushr result needs to be preserved for store operation in bb.0. Meanwhile, it's too optimistic (for CodeGenPrepare pass) to assume machine-cse will always be able to de-dup shrn from various basic blocks into one shrn.

[1] For {v8i16->v8i8, v4i32->v4i16, v2i64->v2i32}, 813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L4472).
    For concat (trunc, trunc) -> uzip1, 813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L5428-L5437)
[2] examples
    - trunc(umin(X, 255)) -> UQXTRN v8i8 (and other {u,s}x{min,max} pattern for v8i16 operands) from 813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L4515-L4528)
    - trunc (AArch64vlshr v8i16, imm) -> SHRNv8i8 (same missed for SHRNv2i32) from 813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L6743-L6748)
[3]
    ---
    ; instruction latency / throughput / pipeline on `neoverse-n1`
    bb.0:
      %1 = lshr <8 x i16> %10, <i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4>   ; ushr, latency 2, throughput 1, pipeline V1
      %2 = trunc <8 x i16> %1 to <8 x i8>  ; xtn, latency 2, throughput 2, pipeline V
      %3 = store <8 x i8> %1, ptr %addr
      br cond i1 cond, label bb.1, label bb.2

    bb.1:
      %4 = trunc <8 x i16> %1 to <8 x i8> ; xtn

    bb.2:
      %5 = trunc <8 x i16> %1 to <8 x i8> ; xtn
    ---

Differential Revision: https://reviews.llvm.org/D132784
2022-09-02 10:06:55 -07:00
Fangrui Song 1b726f0a4c [AArch64InstPrinter] Add some `<reg:...>` for llvm-mc --mdis output 2022-09-01 21:34:56 -07:00
Antonio Frighetto 4e99079774 [AArch64InstPrinter] Introduce immediate markup tags emission
AArch64 assembly syntax emission now leverages markup tags for immediates, if enabled.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D129871
2022-09-01 20:58:42 -07:00
Hassnaa Hamdi a6d9c944df [AArch64 - SVE]: Use SVE to lower reduce.fadd.
Differential Revision: https://reviews.llvm.org/D132573

skip custom-lowering for v1f64 to be expanded instead, because it has only one lane

Differential Revision: https://reviews.llvm.org/D132959
2022-08-31 12:31:06 +00:00
Stephen Long 40999cbd93 [SVE] Fix SVEDup0 matching -0.0f
Because of D128669, CPY is being used to zero active lanes even in the case of -0.0f. This patch checks for floating point positive zero. That way SVEDup0 won't match -0.0f.

Fixes https://github.com/llvm/llvm-project/issues/57428

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D132880
2022-08-30 11:07:17 -07:00
Tomas Matheson 050dad57f7 [AArch64][GISel] constrain regclass for 128->64 copy
When selecting G_EXTRACT to COPY for extracting a 64-bit GPR from
a 128-bit register pair (XSeqPair) we know enough to constrain the
destination register class to gpr64. Without this it may have only
a register bank and some copy elimination code would assert while
assuming that a register class existed.

The register class has to be set explicitly because we might hit the
COPY -> COPY case where register class can't be inferred.

This would cause the following to crash in selection, where the store
is commented (otherwise the store constrains the register class):

  define dso_local i128 @load_atomic_i128_unordered(i128* %p) {
    %pair = cmpxchg i128* %p, i128 0, i128 0 acquire acquire
    %val = extractvalue { i128, i1 } %pair, 0
    ; store i128 %val, i128* %p
    ret i128 %val
  }

Differential Revision: https://reviews.llvm.org/D132665
2022-08-30 11:02:51 +01:00
Kazu Hirata 2833760c57 [Target] Qualify auto in range-based for loops (NFC) 2022-08-28 17:35:09 -07:00
Kazu Hirata 33b9304435 Use llvm::is_contained (NFC) 2022-08-27 21:21:00 -07:00
Kazu Hirata a33ef8f2b7 Use llvm::all_equal (NFC) 2022-08-27 09:53:10 -07:00
Paul Walker 3bb228729f [CostModel][SVE] Correct cost model of SK_Splice shuffles for <vscale x 1 x Ty> vector types.
AArch64TTIImpl::getSpliceCost() is now used more aggressively and
LNT (MultiSource/Benchmarks/mafft) exposed a failure case for
<vscale x 1 x i1>. I've tested other element types and whilst they
can be costed they cannot be code generated, so this patch returns
InstructionCost::getInvalid() for all cases.
2022-08-26 16:06:01 +01:00
Paul Walker 11b4dce7d3 [SVE] Lower fixed-length floating point loads and stores to integer variants.
There's no advatange to emitting floating point scalable accesses,
whereas by lowering them to integer variants we can benefit from
several combines that seek to replace explicit extends/truncates
with extending/truncating accesses.

Differential Revision: https://reviews.llvm.org/D132393
2022-08-26 11:10:23 +01:00
David Majnemer bd28bd59a3 [clang-cl] /kernel should toggle bit 30 in @feat.00
The linker is supposed to detect when an object with /kernel is linked
with another object which is not compiled with /kernel. The linker
detects this by checking bit 30 in @feat.00.
2022-08-25 14:17:26 +00:00
Matt Devereau 30b045aba6 [AArch64][SVE] Extend LD1RQ ISel patterns to cover missing addressing modes
Add some missing patterns for ld1rq's scalar + scalar addressing mode.
Also, adds the scalar + imm and scalar + scalar addressing modes for
the patterns added in https://reviews.llvm.org/D130010

Differential Revision: https://reviews.llvm.org/D130993
2022-08-25 13:07:37 +00:00
zhongyunde 3c8f327ce9 [AArch64] Fix sched model for tsv110
Update three changes:
1.Split the Load/Store resources into two, Ld0St and Ld1,
  since only one of them is capable of stores.
2.Integer ADD and SUB instructions have different latencies
  and processor resource usage (pipeline) when they have a shift of
  zero vs. non-zero, refer to D8043
3.The throughout of scalar DIV instruction.

Reviewed By: dmgreen, bryanpkc

Differential Revision: https://reviews.llvm.org/D132529
2022-08-25 19:20:07 +08:00
Usman Nadeem 46768052e0 [AArch64][DAGCombine] Fix a bug in performBuildVectorCombine where it could produce an invalid EXTRACT_SUBVECTOR
EXTRACT_SUBVECTOR requires that Idx be a constant multiple of ResultType's
known minimum vector length.

Something like this will produce an invalid extract_subvector:

t1: v4i16 = .....
t2: i32 = extract_vector_elt t1, Constant:i64<1>
t3: i32 = extract_vector_elt t1, Constant:i64<2>
t4: v2i32 = BUILD_VECTOR t2, t3
// produces
t5: v2i32 = extract_subvector t...., Constant:i64<1>

Differential Revision: https://reviews.llvm.org/D132517

Change-Id: I7a5acf054edee3e89c0f85a28d8869256403ce08
2022-08-24 16:24:19 -07:00
Sami Tolvanen cff5bef948 KCFI sanitizer
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.

Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.

KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.

A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:

```
.c:
  int f(void);
  int (*p)(void) = f;
  p();

.s:
  .4byte __kcfi_typeid_f
  .global f
  f:
    ...
```

Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.

As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.

Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.

Relands 67504c9549 with a fix for
32-bit builds.

Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay

Differential Revision: https://reviews.llvm.org/D119296
2022-08-24 22:41:38 +00:00
Sami Tolvanen a79060e275 Revert "KCFI sanitizer"
This reverts commit 67504c9549 as using
PointerEmbeddedInt to store 32 bits breaks 32-bit arm builds.
2022-08-24 19:30:13 +00:00
Sami Tolvanen 67504c9549 KCFI sanitizer
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.

Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.

KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.

A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:

```
.c:
  int f(void);
  int (*p)(void) = f;
  p();

.s:
  .4byte __kcfi_typeid_f
  .global f
  f:
    ...
```

Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.

As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.

Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.

Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay

Differential Revision: https://reviews.llvm.org/D119296
2022-08-24 18:52:42 +00:00
Simon Pilgrim f9de13232f [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Differential Revision: https://reviews.llvm.org/D132520
2022-08-24 17:28:18 +01:00
Alex Richardson 38107171ed [RegisterInfoEmitter] Generate isConstantPhysReg(). NFCI
This commit moves the information on whether a register is constant into
the Tablegen files to allow generating the implementaiton of
isConstantPhysReg(). I've marked isConstantPhysReg() as final in this
generated file to ensure that changes are made to tablegen instead of
overriding this function, but if that turns out to be too restrictive,
we can remove the qualifier.

This should be pretty much NFC, but I did notice that e.g. the AMDGPU
generated file also includes the LO16/HI16 registers now.

The new isConstant flag will also be used by D131958 to ensure that
constant registers are marked as call-preserved.

Differential Revision: https://reviews.llvm.org/D131962
2022-08-24 14:16:20 +00:00
Hassnaa Hamdi d8f63382e8 AArch64 SVE
Add SVE patterns to make use of predicated smin, umin, smax, and umax instructions,
add sve-min-max-pred.ll test file for the new patterns

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D132122
2022-08-24 11:09:22 +00:00
Jakub Kuderski 6fa87ec10f [ADT] Deprecate is_splat and replace all uses with all_equal
See the discussion thread for more details:
https://discourse.llvm.org/t/adt-is-splat-and-empty-ranges/64692

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D132335
2022-08-23 11:36:27 -04:00
Philip Reames c9608d57b8 [TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC]
This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet.  The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.
2022-08-23 07:55:42 -07:00
Lucas Prates d1922c9862 [AArch64] Fix list of features for Cortex-X1C
This patch fixes the list of subtarget features enabled for the
Cortex-X1C processor, including the following:

* Fix incorrect version used for FeatureRCPC:
  * Use FEAT_LRCPC2 instead of FEAT_LRCPC.
* Add missing v8.4-A features included in the TRM:
  * Flag Manipulation Instructions - FeatureFlagM (FEAT_FlagM)
  * Large System Extension 2 - FeatureLSE2 (FEAT_LSE2)

Reviewed By: vhscampos

Differential Revision: https://reviews.llvm.org/D132120
2022-08-23 11:35:23 +01:00
Philip Reames 104fa367ee [TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC]
This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both.

This is the change which motivated the whole sequence which preceeded it.  In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact.  This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through.

I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance.  For instance, every parameter which changes type in this change also changes name.  This was intentional to make sure that every call site possible effected must show up in the diff.  This let me audit each one closely.
2022-08-22 15:16:39 -07:00
Philip Reames 478cf94378 [X86][AArch64][WebAsm][RISCV] Query operand properties instead of using enums directly [nfc]
This is part of an ongoing transition to use OperandValueInfo which combines OperandValueKind and OperandValueProperties.  This change adds some accessor methods and uses them to simplify backend code.  The primary motivation of doing so is removing uses of the parameters so that an upcoming api change is less error prone.
2022-08-22 13:37:59 -07:00
David Green 0cf9e47f27 [AArch64] Add SK_Splice fixed-width costs
A fixed length SK_Splice shuffle vector is lowered to a Ext under
AArch64, which should have a cost of 1.

Differential Revision: https://reviews.llvm.org/D132299
2022-08-22 12:44:57 +01:00
wanglian 53bc7d5f08 [AArch64][NFC] Replace setOperationAction and AddPromotedToType
with setOperationPromotedToType.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D132213
2022-08-22 09:59:47 +08:00
Simon Pilgrim 5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Kazu Hirata 8b1b0d1d81 Revert "Use std::is_same_v instead of std::is_same (NFC)"
This reverts commit c5da37e42d.

This patch seems to break builds with some versions of MSVC.
2022-08-20 23:00:39 -07:00
Kazu Hirata c5da37e42d Use std::is_same_v instead of std::is_same (NFC) 2022-08-20 22:36:26 -07:00
Kazu Hirata ec5eab7e87 Use range-based for loops (NFC) 2022-08-20 21:18:32 -07:00
Kazu Hirata 258531b7ac Remove redundant initialization of Optional (NFC) 2022-08-20 21:18:28 -07:00
Kazu Hirata 06b551c944 Use llvm::is_contained (NFC) 2022-08-20 21:18:27 -07:00
Mingming Liu 945a306501 [AArch64] Change aarch64_neon_pmull{,64} intrinsic ISel through a new
SDNode.

How:
1) Add AArch64ISD::PMULL SDNode, and extend aarch64_neon_pmull intrinsic
   tablegen pattern for this SDNode.
2) For aarch64_neon_pmull64, canonicalize i64 operands to v1i64 vectors
   during legalization.
3) For {aarch64_neon_pmull, aarch64_neon_pmull64}, combine intrinsic to
   SDNode.

Why
1) Adding the SDNode makes it easier to canonicalize i64 inputs (required by
   aarch64_neon_pmull64) to vector inputs. Vector inputs carries lane
   information, which helps dag-combiner to combine nodes (e.g. rewrite to a
   better node to prepare for instruction selection) and instruction-selection
   to emit instructions that use higher-half inputs in place
   (i.e., no need to move lane 1 content to lane 0).
2) Using the SDNode for aarch64_neon_pmull64 is NFC, yet without this we
   have to move the definition of {PMULLv1i64, PMULLv2i64} out of its
   current group of records without gains.

Test cases are commented with what is being tested in
`aarch64-pmull2.ll` and `pmull-ldr-merge.ll` under directory
`llvm/test/CodeGen/AArch64`.

Differential Revision: https://reviews.llvm.org/D131047
2022-08-19 13:17:13 -07:00
Alexey Bataev d53e245951 [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC.
Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to
better estimate cost with immediate values.

Part of D126885.
2022-08-19 07:33:00 -07:00
Archibald Elliott 270c179afd [AArch64][GISel] Lower llvm.prefetch
This change adds support for lowering llvm.prefetch directly using
GlobalISel. Currently, llvm.prefetch falls back to SelectionDAG.

This Change:
- Adds an AArch64-specific G_PREFETCH generic instruction, to be used
  where AArch64ISD::PREFETCH is used in SelectionDAG.
- Adds the GINodeEquiv so patterns are translated over to GlobalISel
  automatically.
- Corrects the AArch64Prefetch patterns to use a target immediate, which
  is needed to get the patterns to translate across correctly.
- Translates the SelectionDAG legalisation of the prefetch intrinsic
  into the corresponding GlobalISel legalisation.

Differential Revision: https://reviews.llvm.org/D132043
2022-08-19 09:11:18 +01:00
Alexander Shaposhnikov fba0367e03 [AArch64] Enable AdrpAdd fusion for neoverse-n1
AdrpAdd fusion is already enabled for "generic" and it helps
the linker apply relocation relaxations for more adrp+add pairs.
This patch enables it for -mtune=neoverse-n1.

Differential revision: https://reviews.llvm.org/D132075
2022-08-19 00:27:32 +00:00
Karl Meakin 71f0ec242f [AArch64] Add `foldCSELOfCSEL` combine.
This time more conservative.

Differential review: https://reviews.llvm.org/D125504
2022-08-19 01:04:29 +01:00
Florian Hahn b8709a9d03
[LV] Support fixed order recurrences.
If the incoming previous value of a fixed-order recurrence is a phi in
the header, go through incoming values from the latch until we find a
non-phi value. Use this as the new Previous, all uses in the header
will be dominated by the original phi, but need to be moved after
the non-phi previous value.

At the moment, fixed-order recurrences are modeled as a chain of
first-order recurrences.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D119661
2022-08-18 19:15:52 +01:00
Paul Walker 96c8d615d6 [SVE] Extend findMoreOptimalIndexType so BUILD_VECTORs do not force 64bit indices.
Extends findMoreOptimalIndexType to allow ISD::BUILD_VECTOR based
indices to be truncated when such truncation is lossless. This can
enable the use of 32bit gather/scatter indices thus making it less
likely to have to split a gather/scatter in two.

Depends on D125194

Differential Revision: https://reviews.llvm.org/D130533
2022-08-18 18:00:53 +01:00
Daniil Fukalov 7ed3d81333 [NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.

E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.

Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D117723
2022-08-18 00:38:55 +03:00
Vladislav Dzhidzhoev 4b57939583 [AArch64][GlobalISel] Fallback to generic lowering of G_CTPOP
Use generic lowering of G_CTPOP for s32 and s64 scalars when
noimplicitfloat is specified.

Differential Revision: https://reviews.llvm.org/D131454
2022-08-17 21:10:27 +03:00
OverMighty 232953f996 [AArch64] Add pattern for SQDML*Lv1i32_indexed
There was no pattern to fold into these instructions. This patch adds
the pattern obtained from the following ACLE intrinsics so that they
generate sqdmlal/sqdmlsl instructions instead of separate sqdmull and
sqadd/sqsub instructions:
 - vqdmlalh_s16, vqdmlslh_s16
 - vqdmlalh_lane_s16, vqdmlalh_laneq_s16, vqdmlslh_lane_s16,
   vqdmlslh_laneq_s16 (when the lane index is 0)

It also modifies the result of the existing pattern for the latter, when
the lane index is not 0, to use the v1i32_indexed instructions instead
of the v4i16_indexed ones.

Fixes #49997.

Differential Revision: https://reviews.llvm.org/D131700
2022-08-17 12:00:47 +01:00
Vitaly Buka 16fecdfa70 Revert "[AArch64] Add `foldCSELOfCSEl` DAG combine"
Breaks ubsan on buildbot, details in D125504

This reverts commit 6f9423ef06.
2022-08-16 20:29:37 -07:00
Karl Meakin 6f9423ef06 [AArch64] Add `foldCSELOfCSEl` DAG combine
Differential Revision: https://reviews.llvm.org/D125504
2022-08-16 12:49:11 +01:00
Zain Jaffal 7155ed4289
[AArch64] Add support for 256-bit non temporal loads
Currenlty all temporal loads are mapped to `LDP` or `LDR`. This patch will map all the non temporal 256-bit loads into `LDNP`. Future patches should address other non-temporal loads.

Reviewed By: fhahn, dmgreen

Differential Revision: https://reviews.llvm.org/D131773
2022-08-16 12:19:36 +01:00
Vitaly Buka e0e960923f [AArch64] Fix signed integer overflow in CSINC case
Followup to D131815, which overlflows on different
values.
2022-08-15 15:04:20 -07:00
Vitaly Buka f1596952f9 [AArch64] Fix signed integer overflow in CSINC case
https://lab.llvm.org/staging/#/builders/224/builds/2/steps/16/logs/stdio

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D131815
2022-08-13 13:12:09 -07:00
Kazu Hirata 109df7f9a4 [llvm] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-13 12:55:42 -07:00
Florian Hahn c2af37dcdb
Revert "[AArch64][GlobalISel] Recognise some CCMPri"
This reverts commit 38c2366b3f.

This patch seems to break boostraping LLVM with `-fglobal-isel -O3`
on AArch64 hardware. Without the revert, there are 500+ test
failures for the `check-llvm-codegen-x86` target.
2022-08-13 17:44:41 +01:00
David Green a9e9dd9a3a [AArch64] Add bf16 select handling
A bfloat select operation will currently crash, but is allowed from C.
This adds handling for the operation, turning it into a FCSELHrrr if
fullfp16 is present, or converting it to a FCSELSrrr if not. The
FCSELSrrr is created via using INSERT_SUBREG/EXTRACT_SUBREG to convert
the bf16 to a f32 and using the f32 pattern for FCSELSrrr. (I originally
attempted to do this via a tablegen pattern, but it appears that the
nzcv glue is places onto the wrong node, causing it to be forgotten and
incorrect scheduling to be emitted).

The FCSELSrrr can also be used for fp16 selects when +fullfp16 is not
present, which helps avoid an unnecessary promotion to f32.

Differential Revision: https://reviews.llvm.org/D131253
2022-08-11 14:20:36 +01:00
David Truby b1b9c39629 [AArch64][SVE] Use SVE for VLS fcopysign for wide vectors
Currently fcopysign for VLS vectors lowers through NEON even when the
vector width is wider than a NEON vector, causing bad codegen as the
vectors are split. This patch causes SVE to be used for these vectors
instead, giving much better codegen on wide VLS vectors.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D128642
2022-08-10 10:17:19 +00:00
Dinar Temirbulatov cab6cd6834 [AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding.
After D121595 was commited, I noticed regressions assosicated with small trip
count numbersvectorisation by tail folding with scalable vectors. As a solution
for those issues I propose to introduce the minimal trip count threshold value.

  Differential Revision: https://reviews.llvm.org/D130755
2022-08-09 22:10:17 +01:00
Archibald Elliott b20fe2c25b [docs][AArch64] Label Features with Arm ARM Names
This patch adds the names of the Arm Architecture Reference Manual (ARM)
features to the corresponding Subtarget Features in the AArch64 backend
and target parser.

The aim of this is to make it clearer what architectural features a
subtarget feature might enable (so, which features a CPU must provide to
support that subtarget feature), and so make it easier to add new CPUs
in the future.

Differential Revision: https://reviews.llvm.org/D131257
2022-08-09 18:45:50 +01:00
Luo, Yuanke aaf6c7b05c [globalisel] Select register bank for DBG_VALUE
The register operand of DBG_VALUE is not selected to a proper register
bank in both AArch64 and X86. This would cause getRegClass crash after
global ISel. After discussion, we think the MIR should assume all
vritual register should be set proper register class after global ISel,
so this patch is to fix the gap of DBG_VALUE for AArch64 and X86.

Differential Revision: https://reviews.llvm.org/D129037
2022-08-09 13:11:51 +08:00
Yuta Mukai 3f561996bf [AArch64] Fix and add A64FX scheduling resource/latency info
1. Missing instruction information (FTSSEL, FMSB, PFIRST and RDFFR)
   is added and CompleteModel is set to one.

2. Information for pseudo SVE instructions is added. Those
   instructions are present at the time of scheduling.

3. Resource and latency information for SVE instructions is modified
   to be more accurate.
   For example, the description for CMPEQ, which consumes one cycle
   each of unit FLA and PPR, is as follows.
```
Previous:
  def A64FXGI01 : ProcResGroup<[A64FXIPFLA, A64FXIPPR]>;
  def A64FXWrite_4Cyc_GI01 : SchedWriteRes<[A64FXGI01]> {...
Modified:
  def A64FXGI0 : ProcResGroup<[A64FXIPFLA]>;
  def A64FXGI1 : ProcResGroup<[A64FXIPPR]>;
  def A64FXWrite_CMP : SchedWriteRes<[A64FXGI0, A64FXGI1]> {...
```

Reference: A64FX Microarchitecture Manual (Table 16-3)
https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.7.pdf

Reviewed By: dmgreen, kawashima-fj

Differential Revision: https://reviews.llvm.org/D131165
2022-08-09 10:53:40 +09:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Cullen Rhodes a6dec9f5b2 [AArch64][SVE] Add patterns to select masked FP arith
Add patterns to select predicated instructions when lowering:

  fadd(a, select(mask, b, splat(0)))
  fsub(a, select(mask, b, splat(0)))

'fadd' is unsafe unless no-signed zeros fast-math flag is set, since

  -0.0 + 0.0 = 0.0

changes the sign. Alive2: https://alive2.llvm.org/ce/z/wbhJh_

Also adds FMA patterns for:

  fadd(a, select(mask, mul(b, c), splat(0))) -> fmla(a, mask, b, c)
  fsub(a, select(mask, mul(b, c), splat(0))) -> fmla(a, mask, b, c)

These patterns require the 'contract' fast-math flag to be set, and the
fadd 'nsz' as above.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130564
2022-08-08 08:44:13 +00:00
Kazu Hirata d0ec61c9ff [Target] Remove unused forward declarations (NFC) 2022-08-07 00:16:16 -07:00
Kazu Hirata a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Paul Walker 0533c39a76 [SVE] Expand DUPM patterns to handle all integer vector types.
NOTE: i8 vector splats are ignored because the immediate range of
DUP already has full coverage.

Differential Revision: https://reviews.llvm.org/D131078
2022-08-05 16:00:08 +00:00
David Green 38c2366b3f [AArch64][GlobalISel] Recognise some CCMPri
This is a simple addition to emitConditionalComparison, to match CCMP
with immediates using getIConstantVRegValWithLookThrough, letting it
select the CCMPri variants of the instructions.

Differential Revision: https://reviews.llvm.org/D131073
2022-08-05 07:48:42 +01:00
Mingming Liu bc8f2f3649 [AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation.
1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method.
2) This patch also changes a few callsites (VectorCombine.cpp,
   SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method.
3) This is a split of D128302.

Differential Revision: https://reviews.llvm.org/D131114
2022-08-04 12:58:25 -07:00
David Truby 9a976f3661 [llvm] Always use TargetConstant for FP_ROUND ISD Nodes
This patch ensures consistency in the construction of FP_ROUND nodes
such that they always use ISD::TargetConstant instead of ISD::Constant.

This additionally fixes a bug in the AArch64 SVE backend where patterns
were matching against TargetConstant nodes and sometimes failing when
passed a Constant node.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130370
2022-08-03 14:02:11 +01:00
Vladislav Dzhidzhoev 71aecbb75c [AArch64] Treat x18 as callee-saved in functions with Windows calling convention on Darwin
rGcf97e0ec42b8 makes $x18 to be treated as callee-saved in functions with
Windows calling convention on non-Windows OSes.

Here we mark $x18 as callee-saved for functions with Windows calling
convention on Darwin, as well as on other non-Windows platforms, in
order to prevent some miscompilations (like miscompilation of
win64cc-darwin-backup-x18.ll).

Since getCalleeSavedRegs doesn't return x18 in list of callee-saved
registers, assignCalleeSavedSpillSlots and determineCalleeSaves
consider different sets of registers as callee-saved. It causes an
error:
```
Assertion failed: ((!HasCalleeSavedStackSize || getCalleeSavedStackSize() == Size) && "Invalid size calculated for callee saves"), function getCalleeSavedStackSize, file
AArch64MachineFunctionInfo.h, line 292.
```

Differential Revision: https://reviews.llvm.org/D130676
2022-08-02 20:33:42 +03:00
David Green 1206f72e31 [AArch64] Fold Mul(And(Srl(X, 15), 0x10001), 0xffff) to CMLTz
This folds a v4i32 Mul(And(Srl(X, 15), 0x10001), 0xffff) into a v8i16
CMLTz instruction. The Srl and And extract the top bit (whether the
input is negative) and the Mul sets all values in the i16 half to all
1/0 depending on if that top bit was set. This is equivalent to a v8i16
CMLTz instruction. The same applies to other sizes with equivalent
constants.

Differential Revision: https://reviews.llvm.org/D130874
2022-08-02 13:01:59 +01:00
David Sherwood 4ef9cb6c17 [AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses
If we have interleave groups in the loop we want to vectorise then
we should fall back on normal vectorisation with a scalar epilogue. In
such cases when tail-folding is enabled we'll almost certainly go on to
create vplans with very high costs for all vector VFs and fall back on
VF=1 anyway. This is likely to be worse than if we'd just used an
unpredicated vector loop in the first place.

Once the vectoriser has proper support for analysing all the costs
for each combination of VF and vectorisation style, then we should
be able to remove this.

Added an extra test here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D128342
2022-08-02 09:52:33 +01:00
Vasileios Porpodas f669030373 [TTI][AArch64][SLP] Sets the cost of an ADD reduction 2xi64 to 2.
2xi64 is the legalized type for wide reductions (like 16xi64) and setting the
cost to 2 makes `load-reduce` and `load-zext-reduce` patterns profitable.

The few performance measurments that I did on an aarch64 machine confirm that
these patterns are actually faster when vectorized.

Differential Revision: https://reviews.llvm.org/D130740
2022-08-01 13:03:14 -07:00
Kazu Hirata bf6021709a Use drop_begin (NFC) 2022-07-31 15:17:09 -07:00
Kazu Hirata 12b29900a1 Use any_of (NFC) 2022-07-30 10:35:56 -07:00
Matt Devereau a8b726ac65 [AArch64][SVE] Change DupLane128Combine Index comparison to 0
IdxInsert == IdxDupLane is incorrect. IdxInsert is the starting element number,
whereas IdxIndex is the index of a quadword
2022-07-29 14:31:00 +00:00
David Sherwood 487fa6f8c3 [AArch64][DAGCombine] Add performBuildVectorCombine 'extract_elt ~> anyext'
A build vector of two extracted elements is equivalent to an extract
subvector where the inner vector is any-extended to the
extract_vector_elt VT, because extract_vector_elt has the effect of an
any-extend.

  (build_vector (extract_elt_i16_to_i32 vec Idx+0) (extract_elt_i16_to_i32 vec Idx+1))
  => (extract_subvector (anyext_i16_to_i32 vec) Idx)

Depends on D130697

Differential Revision: https://reviews.llvm.org/D130698
2022-07-29 09:51:09 +01:00