Commit Graph

5955 Commits

Author SHA1 Message Date
Jessica Paquette 8aedb435db [GlobalISel] Combine abs(undef) -> 0
SDAG does this, GISel doesn't.

See https://gcc.godbolt.org/z/sqjMx3Tfv

More context:
https://github.com/llvm/llvm-project/issues/57256

Differential Revision: https://reviews.llvm.org/D135021
2022-10-01 15:16:32 -07:00
Jessica Paquette 424c69190c Update missing test after 24553df57d 2022-10-01 14:00:01 -07:00
Jessica Paquette 24553df57d [GlobalISel] Combine `undef / X -> 0` and `undef % X -> 0`
This fixes the `urem_undef_lhs` case in the following:

https://gcc.godbolt.org/z/Wo9x7o679

Also see https://github.com/llvm/llvm-project/issues/57256 for more related
bugs.

This is equivalent to the undef bits in `simplifyDivRem` in the DAGCombiner.

Differential Revision: https://reviews.llvm.org/D135020
2022-10-01 13:48:55 -07:00
zhongyunde 4a549be9c3 [AArch64] Lower multiplication by a negative constant to shl+sub+shl
Change the costmodel to lower a = b * C where C = -(2^n - 2^m) to
            lsl     w8, w0, m
            sub     w0, w8, w0, lsl n
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134934
2022-10-01 21:27:42 +08:00
Nilanjana Basu 85714ff0bd [AArch64] Unit test for trunc lowering from i64 to i8
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D134840
2022-09-30 13:03:12 -07:00
Peter Collingbourne 0caa9d4b1e AArch64: Don't use RETA[AB] when ShadowCallStack is enabled.
When returning from a function with both SCS and PAC-RET enabled, we need to
authenticate the return address from the stack and then load from the SCS,
but this was happening in the reverse order when RETA[AB] were being used.
Fix it by disabling the use of RETA[AB] when SCS is enabled.

Fixes pr58072.

Differential Revision: https://reviews.llvm.org/D134931
2022-09-30 12:33:23 -07:00
Zain Jaffal 137faa4a84
[AArch64] Add additional tests for SMULL instruction selection
Add tests where the operands are switched and where the top bit of the operand is set to 1

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D134867
2022-09-30 12:32:26 +01:00
Zain Jaffal 661403b85c
[AArch64] Add support for 128-bit non temporal loads.
Adding to the work done in `D131773` here we add support to 128-bit loads.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D132559
2022-09-30 11:04:04 +01:00
Amara Emerson 7653586d88 [AArch64][GlobalISel] Implement another combine for shufflevector->AArch64 G_EXT.
This is a port of an existing optimization in AArch64 ISelLowering, handling
a case when the same input vector can be used for both ext inputs.

Differential Revision: https://reviews.llvm.org/D134891
2022-09-29 22:53:24 +01:00
zhongyunde 62a51c357c [AArch64] Lower multiplication by a constant int to shl+sub+shl
Decompose the const 14 can be separated from D132322
Change the costmodel to lower a = b * C where C = 2^n - 2^m to
        lsl     w8, w0, n
        sub     w0, w8, w0, lsl m
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134706
2022-09-30 01:31:06 +08:00
Amara Emerson a1aa0390cb [AArch64][GlobalISel] Update shuffle->ext test before patch. 2022-09-29 17:20:34 +01:00
Jessica Paquette 1eb49bbab6 [GlobalISel][CallLowering] Use hasRetAttr for return flags on CallBases
Given something like this:

```
declare signext i16 @signext_callee()
define i32 @caller() {
  %res = call i16 @signext_callee()
  ...
}
```

CallLowering would miss that signext_callee's return value is sign extended,
because it isn't on the call.

Use hasRetAttr on the CallBase to allow us to catch this.

(This now inserts G_ASSERT_SEXT/G_ASSERT_ZEXT like in the original review.)

Differential Revision: https://reviews.llvm.org/D86228
2022-09-28 19:38:24 -07:00
Jessica Paquette 95dabac7a5 [AArch64][GlobalISel] Make G_PTRTOINT only legal for s64 + p0
A few issues:

  1. There was no legalizer test for G_PTRTOINT
  2. Same clamping issue as in many other opcodes
  3. AArch64 pointers can only be 64b, so in reality we always have to trunc or
     extend with any size other than p0 anyway.

This seems to actually produce more correct selection for narrow types as well.

Differential Revision: https://reviews.llvm.org/D107588
2022-09-28 16:20:24 -07:00
Jessica Paquette a7aaafde2e [AArch64][GlobalISel] Implement custom legalization for s32/s64 G_FCOPYSIGN
This is intended to be equivalent to the s32 + s64 cases in
AArch64TargetLowering::LowerFCOPYSIGN.

Widen everything and then use G_BIT + a mask to handle the actual copysign
operation. Then, narrow back down to s32/s64.

I wasn't sure about what the best/most canonical INSERT_SUBREG-selectable
pattern is. I chose G_INSERT_VECTOR_ELT + an undef vector because it produces
reasonably okay codegen. (It doesn't produce INSERT_SUBREG right now though.)
If there's a better way to do this then I'm happy to change it.

We also have a couple codegen deficiencies with how we emit vector constants
right now. (We need a GISel equivalent to the tryAdvSIMDModImm64 stuff)

Differential Revision: https://reviews.llvm.org/D108725
2022-09-28 16:03:22 -07:00
Florian Mayer 0401dc2913 [MTE] [HWASan] unify isInterestingAlloca
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D134779
2022-09-28 15:52:34 -07:00
Jessica Paquette 4957ee6529 [AArch64][GlobalISel] Add a target-specific G_BIT opcode.
This is necessary for custom-legalizing G_FCOPYSIGN.

This is equivalent to the BIT instruction (bitwise insert if true).

Add selection testcases for imported patterns.

Differential Revision: https://reviews.llvm.org/D108714
2022-09-28 15:48:35 -07:00
Matt Devereau 0a4771a7e8 [AArch64][SVE] Expand gather index to 32 bits instead of 64 bits
For gathers which load in 8 and 16 bit data then use that data
as an index, the index can be extended to 32 bits instead of
64 bits

Differential Revision: https://reviews.llvm.org/D130692
2022-09-28 14:42:12 +00:00
Florian Hahn 2d3c260362
[AArch64] break non-temporal loads over 256 into 256-loads and a smaller load
Currently over 256 non-temporal loads are broken inefficently. For example, `v17i32` gets broken into 2 128-bit loads. It is better if we can use
256-bit loads instead.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D133421
2022-09-28 15:20:26 +01:00
Cullen Rhodes 3918ef07c4 [AArch64][SVE] Remove redundant ptest after match/nmatch
These instructions are flag setting so the ptest is redundant, the
TableGen class wasn't setting the element size for the predicate causing
the checks in AArch64InstrInfo::optimizePTestInstr to fail.
2022-09-28 08:23:23 +00:00
Cullen Rhodes e36ffdf42e [AArch64][SVE] Precommit tests for redundant ptest after match/nmatch 2022-09-28 08:23:23 +00:00
Zain Jaffal 4fbbfd2530
[AArch64] Add tests for selecting SMULL instruction where the operand is zero extended and the top bit value is 0
This covers the case where we can convert a zext instruction to a sext and then select smull

Differential Revision: https://reviews.llvm.org/D134643
2022-09-27 19:43:43 +01:00
David Green 401481daac [AArch64] Remove incorrect zero element insert-bitcast patterns
These two patterns are not working as intended, as shown in D134022.
They need to insert the value into the new register, not override it.
2022-09-27 17:08:17 +01:00
David Sherwood fbb119412f [AArch64] Add Neoverse V2 CPU support
Adds support for the Neoverse V2 CPU to the AArch64 backend.

Differential Revision: https://reviews.llvm.org/D134352
2022-09-27 07:56:08 +00:00
David Green bebc96956b [AArch64] Enable FeatureFuseAdrpAdd for all Arm cpus
The commit D120104 enabled FeatureFuseAdrpAdd for -mcpu=generic,
allowing the linker to relax adrp;add pairs where possible. D132075
extended that to neoverse-n1, this patch extends it to all other cortex
and neoverse cpus for the same reasons.

Differential Revision: https://reviews.llvm.org/D134521
2022-09-26 09:55:10 +01:00
Amaury Séchet dc947ba3f5 Autogenerate a couple of test in the AMDGPU backend. NFC 2022-09-25 14:12:49 +00:00
Caroline Concatto 5431bf27bd [AArch64]Remove svget/svset/svcreate from llvm
This patch removes the aarch64 instrinsic svget/svset/svcreate from llvm.
It also implements the InstCombine for vector.extract that used to be in svget.

Depends on: D131547

Differential Revision: https://reviews.llvm.org/D131548
2022-09-23 10:48:43 +01:00
Philip Reames b9c4733079 [DAG] Move one-use add of splat to base of scatter/gather
This extends the uniform base transform used with scatter/gather to support one-use vector adds-of-splats with a non-zero base. This has the effect of essentially reassociating an add from vector to scalar domain.

The motivation is to improve the lowering of scatter/gather operations fed by complex geps.

Differential Revision: https://reviews.llvm.org/D134472
2022-09-22 18:45:12 -07:00
Sanjay Patel ef7d61d67c [AArch64] add tests for vector load combining; NFC
More coverage for D133584
2022-09-22 11:43:37 -04:00
Florian Hahn ac434afed8
[AArch64] Try to fold shuffle (tbl2, tbl2) to tbl4.
shuffle (tbl2, tbl2) can be folded into a single tbl4 if the mask for
the selected elements is constant.

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D133491
2022-09-21 19:15:56 +01:00
Sanjay Patel ddd27a3d39 [AArch64] add tests for fadd -> fma combines; NFC
The transform to create a final fma was added with:
D132837 / c98a46fee6

These tests are intended to show the minimal fast-math-flags
necessary to enable the fold: currently only the final fadd
needs to have "reassoc".
2022-09-21 09:00:11 -04:00
Alex Richardson b84be9f2f1 Add all constant physical registers to callee preserved masks
This allows MachineCopyPropagation to eliminate copies of constant registers
such as zero registers. They were previously not being eliminated as the
check for MO.clobbersPhysReg(AvailSrc) would return true for constant
registers such as MIPS $zero.

To avoid having to manually add the zero registers to all CalleeSavedRegs
instantiations in tablegen, I instead added a new isConstant bit to the
Register and set this for MIPS, RISC-V, and AArch64 zero registers.
RegisterInfoEmitter.cpp looks at this flag and adds all constant registers
to the preserved register mask.

This may also benefit other passes but so far I have only seen differences
in MachineCopyPropagation. In the future it might make sense to generate
`isConstantPhysReg()` from this information.

Original source: 8588d8b814

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D131958
2022-09-21 12:50:12 +00:00
Alex Richardson 963287acbf Add a baseline test for D131958
This test shows that the save of MIPS $zero to a callee-saved register
is not elided by the machine-cp pass.

Differential Revision: https://reviews.llvm.org/D131957
2022-09-21 12:50:12 +00:00
David Green 4f78e022ee [AArch64] Lower scalar sqxtn intrinsics to use fp registers
The llvm.aarch64.neon.scalar.sqxtn.i32.i64 intrinsics take and return
integer types, but operate on fp registers. This can create some
inefficiencies in their lowering, where the registers are converted to
fp a little too late. This patch adds lowering for the intrinsics,
creating bitcasts to/from fp types to allow nicer folding later when the
instructions are selected, especially around insert/extracts.

Differential Revision: https://reviews.llvm.org/D134024
2022-09-21 10:46:43 +01:00
David Green 9a20596f48 [AArch64] Insert/Extract of bitcast patterns
This adds some quick tablegen patterns for vector_insert(bitcast(..))
and bitcast(vector_extract(..)), allowing us to avoid a round-trip
through GPRs.

Differential Revision: https://reviews.llvm.org/D134022
2022-09-21 09:54:17 +01:00
David Green cb375e8c1f [AArch64] Enable LSLFast for modern OoO cpus
This patch enables the LSLFast feature for Cortex-A76, Cortex-A77,
Cortex-A78, Cortex-A78C, Cortex-A710, Cortex-X1, Cortex-X2, Neoverse N1,
Neoverse N2, Neoverse V1 and the Neoverse 512TB pseudo-cpu, in-line with
the software optimization guides for those CPUs.

Differntial revision: https://reviews.llvm.org/D134273
2022-09-20 17:09:14 +01:00
Amara Emerson 78833a43e8 [GlobalISel][Legalizer] Fix lowerSelect() not sign-extending the mask value.
I'm not sure why the SEXT_INREG was gated on a bitwidth check of the mask
vs element size.

This fixes a miscompile in chromium's skia library.

Differential Revision: https://reviews.llvm.org/D134236
2022-09-20 16:40:34 +01:00
Caroline Concatto d32b8fdbdb [LLVM][AArch64] Replace aarch64.sve.ld by aarch64.sve.ldN.sret
This patch removes the intrinsic aarch64.sve.ldN from tablegen in favour of
using arch64.sve.ldN.sret.

Depends on: D133023

Differential Revision: https://reviews.llvm.org/D133025
2022-09-20 13:15:07 +01:00
Caroline Concatto 2aad9093ac [AArch64][NFC] Correctly rename mangling name for ldN.sret
Remove from the function name the predicate type and pointer type, because:
 The predicate type in the name(nxvNi1)  can be deduced from the overloaded
element count(nxvNEltTy).
 The pointer type(p0EltTy) can be deduced from the overloaded element type.

Differential Revision: https://reviews.llvm.org/D133023
2022-09-20 09:56:44 +01:00
David Green 908b3b6ccb [AArch64] Use fast-math-flags in isAssociativeAndCommutative
Previously only using the UnsafeFPMath option, this now looks for the
fast moth flags on the instructions, using the same flag flags as other
backends.
2022-09-19 11:34:00 +01:00
Sander de Smalen bed214cf0f [AArch64][SME] Add intrinsics for enabling/disabling ZA.
This adds the intrinsics:
* void @llvm.aarch64.sme.za.enable() -> smstart za
* void @llvm.aarch64.sme.za.disable()  -> smstop za

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D133894
2022-09-17 16:41:42 +00:00
Sander de Smalen 5fae000f36 [AArch64][SME] Disable tail-call optimization when streaming mode change or lazy-save may be required.
When a streaming mode change is (or may be) required for a call, it will
need to restore the original mode after the call, which prevents the use of
tail-call optimization. The same holds true for a call that requires the lazy-save
mechanism to be set up before the call, and possibly restored after.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131579
2022-09-17 16:15:07 +00:00
Vladislav Dzhidzhoev 6cf11f4462 [GlobalISel][DebugInfo] Salvage trivially dead instructions
Use salvageDebugInfo for instructions erased as trivially dead in
GlobalISel.

It would be helpful to implement support of G_PTR_ADD and G_FRAME_INDEX
in salvageDebugInfo in future in order to preserve more variable
location.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D133986
2022-09-17 03:54:55 +03:00
Sander de Smalen bd4935c175 [AArch64][SME] Implement ABI for calls from streaming-compatible functions.
When a function is streaming-compatible and calls a function with a normal or streaming
interface, it may need to enable/disable stremaing mode before the call, and
needs to restore PSTATE.SM after the call.

This patch implements this with a Pseudo node that gets expanded to a
conditional branch and smstart/smstop node.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131578
2022-09-16 14:48:37 +00:00
Liqiang Tao 2e37557fde StackProtector: ensure stack checks are inserted before the tail call
The IR stack protector pass should insert stack checks before the tail
calls not only the musttail calls. So that the attributes `ssqreq` and
`tail call`, which are emited by llvm-opt, could be both enabled by
llvm-llc.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D133860
2022-09-16 22:24:46 +08:00
Sander de Smalen b00c36c295 [AArch64][SME] Implement ABI for calls to/from streaming functions.
This patch implements the ABI for calls from:

  Normal -> Streaming
  Normal -> Streaming-compatible
  Streaming -> Normal
  Streaming -> Streaming-compatible
  Streaming -> Streaming

The compiler inserts SMSTART/SMSTOP instructions before and after the call,
depending on the required transition.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131576
2022-09-16 14:07:47 +00:00
Florian Hahn 6b86b481e3
[AArch64] Use tbl for truncating vector FPtoUI conversions.
On AArch64, doing the vector truncate separately after the fptoui
conversion can be lowered more efficiently using tbl.4, building on
D133495.

https://alive2.llvm.org/ce/z/T538CC

Depends on D133495

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D133496
2022-09-16 14:57:43 +01:00
Florian Hahn 8491d01cc3
[AArch64] Lower vector trunc using tbl.
Similar to using tbl to lower vector ZExts, tbl4 can be used to lower
vector truncates.

The initial version support i32->i8 conversions.

Depends on D120571

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D133495
2022-09-16 12:42:49 +01:00
Nikita Popov b4309800e9 [CodeGen] Don't zero callee-save registers with zero-call-used-regs (PR57692)
Callee save registers must be preserved, so -fzero-call-used-regs
should not be zeroing them. The previous implementation only did
not zero callee save registers that were saved&restored inside the
function, but we need preserve all of them.

Fixes https://github.com/llvm/llvm-project/issues/57692.

Differential Revision: https://reviews.llvm.org/D133946
2022-09-16 11:52:29 +02:00
Florian Hahn 9f2c39418b
[AArch64] Add tests with 2 x tbl2 for v8i8 and nonconst masks.
Extra tests for D133491.
2022-09-16 10:25:27 +01:00
Florian Hahn 5871f18827
[AArch64] Lower extending uitofp using tbl.
On AArch64, doing the zero-extend separately first can be lowered more
efficiently using tbl, building on D120571.

https://alive2.llvm.org/ce/z/8Je595

Depends on D120571

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D133494
2022-09-16 10:20:25 +01:00