Commit Graph

69035 Commits

Author SHA1 Message Date
David Green 5e1a9d319d [ARM] Add lowering for bf16 neon vtrn, vzup and vuzp.
These go via Dag2Dag, which are better based on element sizes not the
exact element types.
2022-10-02 15:34:37 +01:00
David Green f2fde99461 [ARM] More bf16 shuffle handling, including perfect shuffles. 2022-10-02 14:31:51 +01:00
David Green 8193f0d1d2 [ARM] Add tablegen patterns for bf16 vrev 2022-10-02 13:42:14 +01:00
David Green 58369c8631 [ARM] Add tablegen patterns for bf16 vext
This adds missing tablegen patterns for VEXT, identical to the fp16
patterns as they only use baseline Neon operations.
Part of fixing #57770.
2022-10-02 12:45:58 +01:00
Craig Topper 5bbc5eb55f [RISCV] Use _TIED form of VWADD(U)_WX/VWSUB(U)_WX to avoid early clobber.
One of the sources is the same size as the destination so that source
doesn't have an overlap with the destination register. By using the _TIED
form we avoid an early clobber contraint for that source.

This matches what was already done for instrinsics. ConvertToThreeAddress
will fix it if it can't stay tied.
2022-10-01 16:34:39 -07:00
Craig Topper 85db4f10e3 [RISCV] Minor tablegen formatting cleanup. NFC 2022-10-01 15:59:25 -07:00
zhongyunde 4a549be9c3 [AArch64] Lower multiplication by a negative constant to shl+sub+shl
Change the costmodel to lower a = b * C where C = -(2^n - 2^m) to
            lsl     w8, w0, m
            sub     w0, w8, w0, lsl n
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134934
2022-10-01 21:27:42 +08:00
Filipp Zhinkin 945a1468c9 [ARM] Support all versions of AND, ORR, EOR and BIC in optimizeCompareInstr
Combine cmp with zero and all versions of AND, ORR, EOR and BIC instructions into S-suffixed versions.

Related issue: https://github.com/llvm/llvm-project/issues/57122

Reviewed By: efriedma, samtebbs

Differential Revision: https://reviews.llvm.org/D131786
2022-10-01 12:41:37 +03:00
Carl Ritson a35013bec6 [AMDGPU][GFX11] Mitigate VALU mask write hazard
VALU use of an SGPR (pair) as mask followed by SALU write to the
same SGPR can cause incorrect execution of subsequent SALU reads
of the SGPR.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D134151
2022-10-01 16:21:24 +09:00
Craig Topper 9273f860c0 [RISCV] Prevent performCombineVMergeAndVOps from creating cycles in the DAG.
If True has a Chain result, the other operands of the vmerge may
depend on it through that Chain. We need to ensure it isn't a
predecessor of those operands.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D134980
2022-09-30 20:01:45 -07:00
Craig Topper de0de294eb [RISCV] Update cost of vector roundeven to match round which uses the same sequence but a different FRM value.
Reviewed By: reames, eopXD

Differential Revision: https://reviews.llvm.org/D134978
2022-09-30 20:01:35 -07:00
Yeting Kuo cefb7aab61 [VP][RISCV] Add vp.copysign and RISC-V support.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134935
2022-10-01 10:19:10 +08:00
Matthias Braun 189900eb14 X86: Stop assigning register costs for longer encodings.
This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`.
This was previously done because instruction encoding require a REX
prefix when using them resulting in longer instruction encodings. I
found that this regresses the quality of the register allocation as the
costs impose an ordering on eviction candidates. I also feel that there
is a bit of an impedance mismatch as the actual costs occure when
encoding instructions using those registers, but the order of VReg
assignments is not primarily ordered by number of Defs+Uses.

I did extensive measurements with the llvm-test-suite wiht SPEC2006 +
SPEC2017 included, internal services showed similar patterns. Generally
there are a log of improvements but also a lot of regression. But on
average the allocation quality seems to improve at a small code size
regression.

Results for measuring static and dynamic instruction counts:

Dynamic Counts (scaled by execution frequency) / Optimization Remarks:
    Spills+FoldedSpills   -5.6%
    Reloads+FoldedReloads -4.2%
    Copies                -0.1%

Static / LLVM Statistics:
    regalloc.NumSpills    mean -1.6%, geomean -2.8%
    regalloc.NumReloads   mean -1.7%, geomean -3.1%
    size..text            mean +0.4%, geomean +0.4%

Static / LLVM Statistics:
    mean -2.2%, geomean -3.1%) regalloc.NumSpills
    mean -2.6%, geomean -3.9%) regalloc.NumReloads
    mean +0.6%, geomean +0.6%) size..text

Static / LLVM Statistics:
    regalloc.NumSpills   mean -3.0%
    regalloc.NumReloads  mean -3.3%
    size..text           mean +0.3%, geomean +0.3%

Differential Revision: https://reviews.llvm.org/D133902
2022-09-30 16:01:33 -07:00
Peter Collingbourne 0caa9d4b1e AArch64: Don't use RETA[AB] when ShadowCallStack is enabled.
When returning from a function with both SCS and PAC-RET enabled, we need to
authenticate the return address from the stack and then load from the SCS,
but this was happening in the reverse order when RETA[AB] were being used.
Fix it by disabling the use of RETA[AB] when SCS is enabled.

Fixes pr58072.

Differential Revision: https://reviews.llvm.org/D134931
2022-09-30 12:33:23 -07:00
Xiang Li a80a888de5 [DirectX backend] Support global ctor for DXILBitcodeWriter.
1. Save typed pointer type for GlobalVariable/Function instead of the ObjectType.
   This will allow use GlobalVariable/Function as value.
2. Save target type for global ctors for Constant.
3. In DXILBitcodeWriter::getTypeID, check PointerMap first for Constant case.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D133283
2022-09-30 11:27:23 -07:00
Florian Hahn fe49ba84d3
[AArch64] Reflow comment in AArch64IselLowering.cpp (NFC). 2022-09-30 17:17:04 +01:00
Zain Jaffal fca8730793
[AArch64] Refactor opcode selection for LowerMUL (NFC)
Move the logic for selecting `NewOpc` out of `LowerMUL`

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D134875
2022-09-30 16:48:02 +01:00
Philip Reames 1e3c179519 [RISCV] Address post commit review comments from D134881 2022-09-30 08:31:40 -07:00
Saleem Abdulrasool 519a73111b RISCV: adjust relocation emission
Simplify and make the pair-wise relocation more precise.  If either of
the symbol references are textual, the relocation must be delayed.  If
the difference is across sections, delay it as well which partially
matches the behaviour of gas.  We unfortunately do not handle the case
where the difference references a symbol that is not yet defined.  In
such a case, we simply fail to resolve the difference, which should
hopefully not be too onerous (particularly since no other target
supports cross-section references and it is not clear if this was
intentional on the part of RISCV).

Differential Revision: https://reviews.llvm.org/D132262
Reviewed By: @MaskRay
2022-09-30 15:28:48 +00:00
Philip Reames 2b5960028e [RISCV] Branchless lowering for select (and (x , 0x1) == 0), y, (z ^ y) ) and select (and (x , 0x1) == 0), y, (z | y) )
This code is directly ported from the X86 backend which applies the same rewrite (along with several others). Planning on looking more closely at the other branchless variants from x86 to see if any are worth porting in future changes.

Motivation here is the coremark crc8 routine from https://github.com/eembc/coremark/blob/main/core_util.c#L165. This patch significantly reduces the number of unpredictable branches in the workload.

Differential Revision: https://reviews.llvm.org/D134881
2022-09-30 08:24:32 -07:00
Ray Wang 4c786c9747 [RISCV] Remove some unused var decl. NFC
Differential Revision: https://reviews.llvm.org/D134707
2022-09-30 08:08:15 -07:00
Pierre van Houtryve d8258508d4 [AMDGPU][GISel] Update `isCanonicalized`
Recognize more opcodes in the function.
Fixes some regressions introduced in D134857 for fdiv.f16 too.

Depends on D134857

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D134862
2022-09-30 14:13:35 +00:00
Pierre van Houtryve 9a67a6b72a [AMDGPU][GISel] Legalize V2S16 G_BUILD_VECTOR
Preparation patch for D134354 to make V2S16 G_BUILD_VECTOR legal.
Also removes RegBankInfo's scalarization of small BUILD_VECTORs,
replacing it with InstructionSelector logic instead.

This allows for V2S16 BUILD_VECTOR instructions to survive
all the way to ISel so we can select FMA/MAD_MIX instructions
in D134354.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134433
2022-09-30 14:04:53 +00:00
Ivan Kosarev a964099ce5 [AMDGPU][SetWavePriority] Fix dealing with MBBInfo records.
Happened earlier than I anticipated. :)

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134726
2022-09-30 14:27:50 +01:00
Zain Jaffal 661403b85c
[AArch64] Add support for 128-bit non temporal loads.
Adding to the work done in `D131773` here we add support to 128-bit loads.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D132559
2022-09-30 11:04:04 +01:00
gonglingqin 853a1b7236 [LoongArch] Clean up redundant code introduced by conflict resolution. NFC 2022-09-30 16:45:21 +08:00
Yeting Kuo 1cc02b05b7 [SelectionDAG] Add helper function to check whether a SDValue is neutral element. NFC.
Using this helper makes work about neutral elements more easier. Although I only
find one case now, I think it will have more chance to be used since so many
combine works are related to neutral elements.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133866
2022-09-30 11:29:11 +08:00
Amara Emerson 7653586d88 [AArch64][GlobalISel] Implement another combine for shufflevector->AArch64 G_EXT.
This is a port of an existing optimization in AArch64 ISelLowering, handling
a case when the same input vector can be used for both ext inputs.

Differential Revision: https://reviews.llvm.org/D134891
2022-09-29 22:53:24 +01:00
Philip Reames 900364fccf [RISCV] Minor code motion in InsertVSETVLI [nfc] 2022-09-29 14:01:57 -07:00
Bjorn Pettersson 0513b0305a [X86] Avoid miscompile in combineOr (X86ISelLowering.cpp)
In combineOr (X86ISelLowering.cpp) there is a DAG combine that rewrite
a "(0 - SetCC) | C" pattern into something simpler given that a LEA
can be used. Another requirement is that C has some specific value,
for example 1 or 7. When checking those requirements the code used a
32-bit unsigned variable to store the value of C. So for a 64-bit OR
this could miscompile in case any of the 32 most significant bits in
C were non zero.

This patch adds fixes the bug by using a large enough type for the
C value.

The faulty code seem to have been introduced by commit 9bceb8981d
(D131358).

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134892
2022-09-29 21:24:31 +02:00
zhongyunde 4d15e7b21b [AArch64] Lower multiplication by a constant (NFC)
Refactor according https://reviews.llvm.org/D134706#inline-1298952
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134848
2022-09-30 01:37:28 +08:00
zhongyunde 62a51c357c [AArch64] Lower multiplication by a constant int to shl+sub+shl
Decompose the const 14 can be separated from D132322
Change the costmodel to lower a = b * C where C = 2^n - 2^m to
        lsl     w8, w0, n
        sub     w0, w8, w0, lsl m
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134706
2022-09-30 01:31:06 +08:00
Chris Bieneman 5d4dd53570 Revert "[DirectX backend] Support global ctor for DXILBitcodeWriter."
This reverts commit 26129766df.

The reverted commit broke in-tree unit tests for the DirectX backend.
2022-09-29 11:58:27 -05:00
Dmitry Preobrazhensky 485c539391 [AMDGPU][MC][GFX11] Disable non-null src0 for s_waitcnt_*cnt
Differential Revision: https://reviews.llvm.org/D134809
2022-09-29 19:56:03 +03:00
David Green 4c4e544cd8 [ARM] Add an option for disabling omitting DLS.
Useful for testing, this option disables when `DLS lr, lr` gets removed.
2022-09-29 17:42:45 +01:00
Philip Reames 02bfe2de7c [RISCV] Adjust vector immediate store materialization cost
This change updates the costs to make constant pool loads match their actual cost, and adds the broadcast special case to avoid too many regressions. We really need more information about the constants being rematerialized, but this is an incremental improvement.

Differential Revision: https://reviews.llvm.org/D134746
2022-09-29 07:37:13 -07:00
eopXD 02a982829c [RISCV] Add lowering for llvm.roundeven
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134785
2022-09-29 06:08:14 -07:00
Fangrui Song 04a65d62a0 Revert D134638 "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC"
This reverts commit b7baddc755.

Broke CodeGen/X86/callbr-asm-kill.mir
We shall pay attention when adding new constraints.
2022-09-29 00:54:56 -07:00
Weining Lu b7baddc755 [Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC
k: A memory operand whose address is formed by a base register and
(optionally scaled) index register.

m: A memory operand whose address is formed by a base register and
offset that is suitable for use in instructions with the same
addressing mode as st.w and ld.w.

ZB: An address that is held in a general-purpose register. The offset
is zero.

ZC: A memory operand whose address is formed by a base register and
offset that is suitable for use in instructions with the same
addressing mode as ll.w and sc.w.

Differential Revision: https://reviews.llvm.org/D134638
2022-09-29 15:02:08 +08:00
Abinav Puthan Purayil 3759398b4b [AMDGPU] Report minimum scratch size in code object v5 and later by default
This change sets
-amdgpu-assume-{external-call-stack-size | dynamic-stack-object-size}
options to zero by default for code object v5 and later. The runtime is
expected to adjust the scratch size if the amdhsa_uses_dynamic_stack bit
in the kernel descriptor is set.

Differential Revision: https://reviews.llvm.org/D128346
2022-09-29 09:52:45 +05:30
gonglingqin dc3c5a78f2 [LoongArch] Add fp_to_sint support for soft floating point
Differential Revision: https://reviews.llvm.org/D134692
2022-09-29 10:25:35 +08:00
WANG Xuerui 3155f6c508 [LoongArch] Expand llvm.stacksave and llvm.stackrestore
As in commit bfb00d4c1c ("[RISCV] Allow lowering of dynamic_stackalloc, stacksave, stackrestore").

Differential Revision: https://reviews.llvm.org/D134435
2022-09-29 09:07:44 +08:00
wanglei 036b170c24 [LoongArch] Produce a R_LARCH_32_PCREL relocation
LoongArchELFObjectWriter::getRelocType check IsPCRel for FK_Data_4
(which we produce a R_LARCH_32_PCREL relocation for if IsPCRel).

R_LARCH_32_PCREL is required for FDE relocation.

Differential Revision: https://reviews.llvm.org/D134715
2022-09-29 09:04:44 +08:00
wanglei 7b1bdfbeb0 [LoongArch] Override TargetSubtargetInfo::getSelectionDAGInfo
The target selection DAG lowering information is needed for
SelectionDAGBuilder to lower a call like memcmp into an optimized
form.

Differential Revision: https://reviews.llvm.org/D134712
2022-09-29 08:46:53 +08:00
Jessica Paquette 95dabac7a5 [AArch64][GlobalISel] Make G_PTRTOINT only legal for s64 + p0
A few issues:

  1. There was no legalizer test for G_PTRTOINT
  2. Same clamping issue as in many other opcodes
  3. AArch64 pointers can only be 64b, so in reality we always have to trunc or
     extend with any size other than p0 anyway.

This seems to actually produce more correct selection for narrow types as well.

Differential Revision: https://reviews.llvm.org/D107588
2022-09-28 16:20:24 -07:00
Jessica Paquette a7aaafde2e [AArch64][GlobalISel] Implement custom legalization for s32/s64 G_FCOPYSIGN
This is intended to be equivalent to the s32 + s64 cases in
AArch64TargetLowering::LowerFCOPYSIGN.

Widen everything and then use G_BIT + a mask to handle the actual copysign
operation. Then, narrow back down to s32/s64.

I wasn't sure about what the best/most canonical INSERT_SUBREG-selectable
pattern is. I chose G_INSERT_VECTOR_ELT + an undef vector because it produces
reasonably okay codegen. (It doesn't produce INSERT_SUBREG right now though.)
If there's a better way to do this then I'm happy to change it.

We also have a couple codegen deficiencies with how we emit vector constants
right now. (We need a GISel equivalent to the tryAdvSIMDModImm64 stuff)

Differential Revision: https://reviews.llvm.org/D108725
2022-09-28 16:03:22 -07:00
Florian Mayer 0401dc2913 [MTE] [HWASan] unify isInterestingAlloca
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D134779
2022-09-28 15:52:34 -07:00
Jessica Paquette 4957ee6529 [AArch64][GlobalISel] Add a target-specific G_BIT opcode.
This is necessary for custom-legalizing G_FCOPYSIGN.

This is equivalent to the BIT instruction (bitwise insert if true).

Add selection testcases for imported patterns.

Differential Revision: https://reviews.llvm.org/D108714
2022-09-28 15:48:35 -07:00
Xiang Li 26129766df [DirectX backend] Support global ctor for DXILBitcodeWriter.
1. Save typed pointer type for GlobalVariable/Function instead of the ObjectType.
   This will allow use GlobalVariable/Function as value.
2. Save target type for global ctors for Constant.
3. In DXILBitcodeWriter::getTypeID, check PointerMap first for Constant case.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D133283
2022-09-28 13:23:56 -07:00
Stanislav Mekhanoshin 5a3fe9a039 [AMDGPU] Move SIModeRegisterDefaults to SI MFI
It does not belong to a general AMDGPU MFI.

Differential Revision: https://reviews.llvm.org/D134666
2022-09-28 13:13:40 -07:00