Commit Graph

69244 Commits

Author SHA1 Message Date
Filipp Zhinkin ef774bec63 [AArch64] Support SETCCCARRY lowering
Support SETCCCARRY lowering to SBCS instruction.

Related issue: https://github.com/llvm/llvm-project/issues/44629

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D135302
2022-10-14 22:29:31 +03:00
Craig Topper 1fab0ac559 [RISCV] Rename ReadVIALUCV->ReadVICALUV to match WriteVICALUV. NFC 2022-10-14 12:11:55 -07:00
Krzysztof Parzyszek e8375e3042 [Hexagon] Use IRBuilderBase in function parameters
This will allow using builders with different folders.
2022-10-14 12:10:59 -07:00
Krzysztof Parzyszek 7f4ce3f1eb [Hexagon] Introduce PS_vsplat[ir][bhw] pseudo instructions
HVX v60 only has splats that take a 32-bit word as input, while v62+
has splats that take 8- or 16-bit value. This makes writing output
patterns that need to use a splat annoying, because the entire output
pattern needs to be replicated for various versions of HVX.
To avoid this, the patterns will always use the pseudos, and then the
pseudos will be handled using a post-ISel hook.
2022-10-14 12:03:13 -07:00
Chris Bieneman 911d2dc230 [NFC] [HLSL] Move common metadata to LLVMFrontend
This change pulls some code from the DirectX backend into a new
LLVMFrontendHLSL library to share utility data structures between the
HLSL code generation in Clang and the backend in LLVM.

This is a small refactoring as a first start to get code into the
right structure and get the library built and dependencies correct.

Fixes #58000 (https://github.com/llvm/llvm-project/issues/58000)

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D135110
2022-10-14 13:40:04 -05:00
Craig Topper 44f0b13494 [RISCV] Correct RISCVTTIImpl::getRegUsageForType for vectors of pointers.
getPrimitiveSizeInBits returns 0 for pointers, we need to query
the size via DataLayout instead.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135976
2022-10-14 11:34:12 -07:00
Chris Bieneman e530a1188e [DX] Add pass to pretty-print DXIL metadata in asm
When DXC prints IR output it adds a bunch of IR comments in a header
that describe the DXIL metadata in a more human-readable format. This
pass will serve that purpose for LLVM by printing out ahead of the IR
printer.

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D135802
2022-10-14 13:32:59 -05:00
Caroline Concatto 60e2aad109 [AArch64]Change printVectorList to print SVE vector range
This patch has the prefered disassembly changed for SVE vector list.
For instance, instead of printing this assembly:
  ld4d { z1.d, z2.d, z3.d, z4.d }, p0/z, [x0]
it will print this:
  ld4d { z1.d-z4.d }, p0/z, [x0]

Differential Revision: https://reviews.llvm.org/D135952
2022-10-14 18:59:56 +01:00
David Green de6dfbbb30 [ARM] Fix for MVE i128 vector icmp costs.
We were hitting an assert as the legalied type needn't be a vector.

Fixes #58364
2022-10-14 18:49:25 +01:00
Hassnaa Hamdi 2c72d90ecc [AArch64-SVE]: Force generating code compatible to streaming mode.
Add a compile-time flag for enabling streaming mode.
When streaming mode is enabled, lower basic loads and stores of fixed-width vectors;
to generate code that is compatible to streaming mode.

Differential Revision: https://reviews.llvm.org/D133433
2022-10-14 17:46:56 +00:00
Dmitry Preobrazhensky bf96703fb3 [AMDGPU][MC][GFX8+] Correct v_cndmask modifiers
Correct v_cndmask_b32 to support abs/neg modifiers in dpp/sdwa/e64 variants.
Correct v_cndmask_b16 for proper disassembly of abs/neg modifiers in e64_dpp variants.

Differential Revision: https://reviews.llvm.org/D135900
2022-10-14 19:37:27 +03:00
Sander de Smalen 02df03c5b7 [AArch64][SME] Add support for arm_locally_streaming functions.
Functions with `aarch64_sme_pstatesm_body` will emit a SMSTART at the start
of the function, and a SMSTOP at the end of the function, such that all
operations use the right value for vscale.

Because the placement of these nodes is critically important (i.e. no
vscale-dependent operations should be done before SMSTART has been issued),
we require glueing the CopyFromReg to the Entry node such that we can
insert the SMSTART as part of that glued chain.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131582
2022-10-14 13:47:53 +00:00
chenglin.bi 85e41fcaac [AArch64] Select to CCMN when the CCMP's second operator is negative constant
CCMP/CCMN's second operator support const from 0 to 31. When the CCMP's second operator is in the range [-31, -1] we can replace it with CCMN to avoid extra mov.

Fix: #57034

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D135939
2022-10-14 21:41:25 +08:00
Martin Storsjö f309f095e7 Revert "[AArch64] Fix aligning the stack after calling __chkstk"
This reverts commit 50e0aced45.

This could accidentally start producing invalid code in some
cases (in particular, if compiling with -mstack-alignment=16, which
one could expect to be a no-op for a target where the stack always
is aligned to 16 bytes anyway).
2022-10-14 11:55:59 +03:00
Benjamin Kramer 08dc847f33 Add missing `override`s after aad013de41 2022-10-14 10:38:32 +02:00
gonglingqin e632bb6543 [LoongArch] Add codegen support for atomicrmw umin/umax operation on LA64
Furthermore, use `beqz $rd, .BB` instead of `beq $rd, $zero, .BB`.

Differential Revision: https://reviews.llvm.org/D135525
2022-10-14 15:24:43 +08:00
Leon Clark 6370bc2435 Add f16 nearbyint support.
Enable lowering of FNEARBYINT for f16 and extend existing tests.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135124
2022-10-14 08:05:24 +01:00
Xiang1 Zhang d0269dd059 [AArch64][BuildErrorFix] Add compatible classifyGlobalFunctionReference 2022-10-14 12:03:50 +08:00
Xiang1 Zhang aad013de41 [InlineAsm][bugfix] Correct function addressing in inline asm
In Linux PIC model, there are 4 cases about value/label addressing:
Case 1: Function call or Label jmp inside the module.
Case 2: Data access (such as global variable, static variable) inside the module.
Case 3: Function call or Label jmp outside the module.
Case 4: Data access (such as global variable) outside the module.

Due to current llvm inline asm architecture designed to not "recognize" the asm
code, there are quite troubles for us to treat mem addressing differently for
same value/adress used in different instuctions.
For example, in pic model, call a func may in plt way or direclty pc-related,
but lea/mov a function adress may use got.

This patch fix/refine the case 1 and case 2 in inline asm.
Due to currently inline asm didn't support jmp the outsider lable, this patch
mainly focus on fix the function call addressing bugs in inline asm.

Reviewed By: Pengfei, RKSimon

Differential Revision: https://reviews.llvm.org/D133914
2022-10-14 09:47:26 +08:00
Nemanja Ivanovic 0d253bbd33 [PowerPC] Change CRNOT to a code gen single operand instruction
Inputs to crnor can come from operands with chains so
if it is being used simply to negate such an operand,
the repeated input cannot be CSE'd. This patch just
adds a code-gen only instruction for this that takes
a single input and duplicates it in the encoding of
the underlying crnor.

Differential revision: https://reviews.llvm.org/D133577
2022-10-13 20:09:44 -05:00
Xiang Li 1fc78d66ea [DirectX backend] [NFC] Change Resources::write to const.
Change Resources::write to const.
Also fix parameter name.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D135705
2022-10-13 12:38:12 -07:00
Jakub Chlanda 8407fdbd69 [NVPTX] Support neg{.ftz} for f16 and f16x2
Differential Revision: https://reviews.llvm.org/D135428
2022-10-13 10:48:33 -07:00
Craig Topper e68b0d5875 [RISCV] Match (select C, -1, X)->(or -C, X) during lowerSelect
Same with (select C, X, -1), (select C, 0, X), and (select C, X, 0).

There's a DAGCombine after we turn the select into select_cc, but
that may introduce a setcc that didn't previously exist. We could
add more DAGCombines to remove the extra setcc, but this seemed lower
effort.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135833
2022-10-13 09:06:12 -07:00
Nemanja Ivanovic a77a70fa3c [PowerPC] Stash GPR to VSR if emergency spill slot is not reachable
When removing frame indices on PowerPC, we need to scavenge
a GPR to materialize a large constant if the stack offset
for the spill/reload cannot be reached by a D-Form
instruction. However, in a perfect storm of conditions,
we may not have GPR's available to scavenge, thereby
requiring an emergency spill. If such an emergency
spill also needs to be spilled to a location with a
large offset, it would itself require register scavenging
thereby creating an infinite loop.

This patch detects when the scavenger cannot scavenge
a register and the spill/reload is to a location with
a large offset. It then stashes a GPR into a VSR so
that it can use the GPR to materialize the constant
(rather than scavenging a GPR).

Fixes: https://github.com/llvm/llvm-project/issues/52894

Differential revision: https://reviews.llvm.org/D124841
2022-10-13 09:06:37 -05:00
Anton Sidorenko 4431e705cc [NFC] Use forward decl of MachineCombinerPattern enum to reduce dependencies
Differential Revision: https://reviews.llvm.org/D135776
2022-10-13 14:56:14 +01:00
Simon Pilgrim fa9c12ed96 [X86] Attempt to combine binary shuffles where both operands come from the same larger vector
Allows us to use combineX86ShuffleChainWithExtract to combine targetshuffle(low_subvector(x),high_subvector(x)) -> low_subvector(targetshuffle(x)) style patterns

This is currently very limited (it must have a v2i64/v2f64 result), but while triaging I noticed we might be able to extend this to allow more types for targets with suitable variable cross lane shuffle support.

Fixes #58339
2022-10-13 14:34:11 +01:00
WANG Xuerui f017e92c1c [LoongArch] Add support for llvm.trap and llvm.debugtrap
Similar to D69390 for RISCV, use a guaranteed non-existing insn for
llvm.trap and the break insn for llvm.debugtrap.

Differential Revision: https://reviews.llvm.org/D134365
2022-10-13 19:27:47 +08:00
WANG Xuerui 4e2dfd3589 [LoongArch] Updates for the LoongArch ELF psABI v2.01 revision
The e_flags of existing object files are all 0x3 which happens to be
compatible. From this commit on, all LoongArch objects produced with
upstream LLVM will be of object file ABI v1, which is already supported
by binutils' master branch (to be released as 2.40), and is allowed by
the same binutils version to interlink with v0 objects so the existing
distributions have time to migrate.

Differential Revision: https://reviews.llvm.org/D134601
2022-10-13 19:12:26 +08:00
Sheng 62fc58a61d [AArch64] Improve codegen for "trunc <4 x i64> to <4 x i8>" for all cases
To achieve this, we need this observation:

`uzp1` is just a `xtn` that operates on two registers

For example, given the following register with type v2i64:

LSB_______MSB

x0 x1	x2 x3

Applying xtn on it we get:

x0	x2

This is equivalent to bitcast it to v4i32, and then applying uzp1 on it:

x0	x1	x2	x3
   |
  uzp1
   v
x0	x2	<value from other register>

We can transform xtn to uzp1 by this observation, and vice versa.

This observation only works on little endian target. Big endian target has
a problem: the uzp1 cannot be replaced by xtn since there is a discrepancy
in the behavior of uzp1 between the little endian and big endian.

To illustrate, take the following for example:

LSB____________________MSB

x0	x1	x2	x3

On little endian, uzp1 grabs x0 and x2, which is right; on big endian, it
grabs x3 and x1, which doesn't match what I saw on the document. But, since
I'm new to AArch64, take my word with a pinch of salt. This bevavior is
observed on gdb, maybe there's issue in the order of the value printed by it ?

Whatever the reason is, the execution result given by qemu just doesn't match.
So I disable this on big endian target temporarily until we find the crux.

Fixes #57502

Reviewed By: dmgreen, mingmingl

Co-authored-by: Mingming Liu <mingmingl@google.com>

Differential Revision: https://reviews.llvm.org/D133850
2022-10-13 19:08:33 +08:00
Caroline Concatto 3ee96a26d5 [AArch64] Add SME 2 target feature for Armv8-A and Armv9-A 2022 Architecture Extension
First patch in a series adding MC layer support for Scalable Matrix
Extension 2 (SME2).

This patch adds the following feature:
  sme2

The 2022 Architecture Extension release adds other feature flags(eg.:sme2.1),
that will be in follow-up patches.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09

Differential Revision: https://reviews.llvm.org/D135448
2022-10-13 11:28:08 +01:00
Archibald Elliott 7d15212b8c [ARM] Support fp16/bf16 using w constraint
fp16 and bf16 values can be used in GCC's inline assembly using the "w"
constraint, which means "VFP floating-point registers d0-d31" - fp16 and
bf16 values are stored in S registers (which alias the D registers).

This change ensures that LLVM is compatible with GCC for programs that
use fp16 and the 'w' constraint.

Differential Revision: https://reviews.llvm.org/D135662
2022-10-13 10:32:06 +01:00
Martin Storsjö cbd8464595 [MC] [Win64EH] Check that ARM64 prologs and epilogs have the right matching number of instructions
This matches what was done for the ARM implementation (where getting
the instruction sizes right is even more tricky, and hence needed
tighter testing).

This will allow catching any future cases where prologs and epilogs
don't match the instructions within them.

Differential Revision: https://reviews.llvm.org/D131394
2022-10-13 09:47:39 +03:00
Martin Storsjö 24303e3ad2 [AArch64] Use encodeLogicalImmediate for forming the immediate to an AND. NFC.
Differential Revision: https://reviews.llvm.org/D135817
2022-10-13 09:47:38 +03:00
Martin Storsjö 19e2b403b4 [AArch64] Remove dead code for inserting SEH opcodes for realignment. NFC.
If the stack is realigned, we've emitted a frame pointer and
already terminated the SEH prologue, making this dead code since
a07787c9a5.

The immediate to this SEH opcode was entirely bogus - we don't
know how many bytes the AND operation adjusts the SP, and by
doing "NumBytes & andMaskEncoded" (where andMaskEncoded was the
immediate bitpattern for the AND instruction), the immediate to the
opcode was total gibberish.

This hasn't had any practical effect, since the original stack
pointer always was restored from the frame pointer afterwards anyway.

Differential Revision: https://reviews.llvm.org/D135815
2022-10-13 09:47:38 +03:00
Martin Storsjö 50e0aced45 [AArch64] Fix aligning the stack after calling __chkstk
Whenever a call to __chkstk was made, the frame lowering previously
omitted the aligning (as NumBytes was reset to zero before doing
alignment).

This fixes https://github.com/llvm/llvm-project/issues/56182.

Differential Revision: https://reviews.llvm.org/D135687
2022-10-13 09:47:38 +03:00
Matt Arsenault 838fd611b7 AMDGPU: Fix assertion on <1 x i16> vectors
Fixes issue 58331.
2022-10-12 17:25:24 -07:00
Matt Arsenault 575eed3dac AMDGPU: Fix hazard with v_accvgpr_write_b32 and inline asm VGPR defs
If inline asm has a VGPR def, it must have come from a VGPR write
somewhere inside the asm. This should be further extended to all
read after write hazards.
2022-10-12 17:25:24 -07:00
Krzysztof Parzyszek 79632163db [Hexagon] Switch vunpackub->op->vpackeb pattern to vzb/vshuffeb
V6_vzb and V6_vshuffeb can use any 2 resources in a packet, while
V6_vunpackub/V6_vpackeb both need a shift resource.

Also, add patterns for shifting vectors of i8.
2022-10-12 15:31:28 -07:00
Philip Reames 1c41d0cb62 [RISCV] Use branchless form for selects with 0 in either arm
Continuing the theme of adding branchless lowerings for simple selects, this time handle the 0 arm case. This is very common for various umin idioms, etc..

Differential Revision: https://reviews.llvm.org/D135600
2022-10-12 13:51:52 -07:00
Krzysztof Parzyszek dca7e451ee [Hexagon] Handle packing of even/odd 32-bit words
This is a workaround until perfect shuffle generation is improved.
2022-10-12 13:00:14 -07:00
Krzysztof Parzyszek 2d8d2bec70 [Hexagon] Implement TLI::isExtractSubvectorCheap hook 2022-10-12 12:48:56 -07:00
Martin Storsjö bd3fa31887 [AArch64] Generate SEH info for PAC instructions
Without this, unwinding through functions that does use PAC
would fail, if PAC actually was active.

Differential Revision: https://reviews.llvm.org/D135103
2022-10-12 22:21:03 +03:00
Martin Storsjö 918f6f581d [AArch64] [SEH] Rename pac_sign_return_address to pac_sign_lr
This new opcode was initially documented as "pac_sign_return_address"
in https://github.com/MicrosoftDocs/cpp-docs/pull/4202, but was
soon afterwards renamed into "pac_sign_lr" in
https://github.com/MicrosoftDocs/cpp-docs/pull/4209, as the other
name was unwieldy, and there were no other external references to
that name anywhere.

Rename our external .seh assembler directive - it hasn't been merged
for very long yet, so there's probably no external use to account for.

Rename all other internal references to the opcode similarly.

Differential Revision: https://reviews.llvm.org/D135762
2022-10-12 22:19:59 +03:00
gonglingqin ec2640bf3a [LoongArch] Handle missing CondCodes
Support SETLE/SETEQ and expand SETGE/SETNE/SETGT

Differential Revision: https://reviews.llvm.org/D135511
2022-10-12 21:26:57 +08:00
gonglingqin b1d7a95e4e [LoongArch] Add earlyclobber of destination register to atomic instructions
If the AM* atomic memory access instruction has the same register number as
rd and rj, the execution will trigger an Instruction Non-defined Exception.
If the AM* atomic memory access instruction has the same register number as
rd and rk, the execution result is uncertain.

Reference: https://github.com/loongson/LoongArch-Documentation

Differential Revision: https://reviews.llvm.org/D135641
2022-10-12 21:09:21 +08:00
Luo, Yuanke f885c08034 Don't widen shuffle element with AVX512
Fix crash issue of D129537 and reopen it.

Currently the X86 shuffle lowering would widen the element type for
shuffle if the mask element value is adjacent. For below example

  %t2 = add nsw <16 x i32> %t0, %t1
  %t3 = sub nsw <16 x i32> %t0, %t1
  %t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3,
                      <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4,
                       i32 5, i32 6, i32 7, i32 8, i32 9, i32 10,
                       i32 11, i32 12, i32 13, i32 14, i32 15>

  ret <16 x i32> %t4

Compiler would transform the shuffle to
  %t4 = shufflevector <8 x i64> %t2, <8 x i64> %t3,
                      <8 x i64> <i32 8, i32 1, i32 2, i32 3, i32 4,
                                 i32 5, i32 6, i32 7>
This may lose the oppotunity to let ISel select mask instruction when
avx512 is enabled.

This patch is to prevent the tranform when avx512 feature is enabled.
Thank Simon for the idea.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130830
2022-10-12 19:18:10 +08:00
David Green 1e723b7ab3 Revert "[AArch64] Add support for 128-bit non temporal loads."
This reverts commit 661403b85c as the
custom lowering of loads prevents expanding unaligned loads with
strict-align.
2022-10-12 11:11:32 +01:00
Martin Storsjö a07787c9a5 [AArch64] Exclude instructions after setting the FP from SEH prologues
After setting up the FP, the rest of the prologue doesn't need to
be replayed for unwinding the stack frame.

This allows reverting the functional parts of
2f7fbf8376 (but fixing inconsistent
duplicate setting of HasWinCFI).

Differential Revision: https://reviews.llvm.org/D135686
2022-10-12 12:36:21 +03:00
Cullen Rhodes 388cacb341 [AArch64][SVE] Add instcombine for PTEST_ANY(X=OP(PG,...), X) -> PTEST_ANY(PG, X))
Given this is an OR reduction the two are equivalent and later
optimizations (AArch64InstrInfo::optimizePTestInstr) may rewrite the
sequence to use the flag-setting variant of instruction X, to remove the
PTEST altogether.

Reviewed By: paulwalker-arm, bsmith

Differential Revision: https://reviews.llvm.org/D134946
2022-10-12 09:14:08 +00:00
Cullen Rhodes a17fcb2230 [AArch64][SVE] Fix BRKNS bug in optimizePTestInstr
The BRKNS instruction is unlike the other instructions that set flags
since it has an all active implicit predicate, so the existing

  PTEST(PG, BRKN(PG, A, B)) -> BRKNS(PG, A, B)

in AArch64InstrInfo::optimizePTestInstr is incorrect, however

  PTEST(PTRUE_B(31), BRKN(PG, A, B)) -> BRKNS(PG, A, B)

is correct.

Spotted by @paulwalker-arm in D134946.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D135655
2022-10-12 08:34:41 +00:00
Martin Storsjö c43bff64e9 [AArch64] Add support for the SEH opcode for return address signing
This was documented upstream in
https://github.com/MicrosoftDocs/cpp-docs/pull/4202.

Differential Revision: https://reviews.llvm.org/D135276
2022-10-12 11:07:11 +03:00
chenglin.bi 41f5bbe18b [AArch64][Windows] Check sret attribute also for inreg attribute
Fix the issue: #57684

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D135512
2022-10-12 09:58:50 +08:00
Yeting Kuo 2749b942e9 [RISCV] Add isel patterns for vmacc, vnmsac.
The patch selects VSELECT/VP_MERGE_VL which uses fmadd/fnmsub as true operand
and the adden of the fmadd/fnmsub as false operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D135330
2022-10-12 09:19:01 +08:00
Craig Topper 1bdf21d55c [RISCV] Use mask/tail agnostic if tied source is IMPLICIT_DEF regardless of the policy operand.
If the source is implicit_def, the register allocator won't have
any constraint on what register it picks for the destination. This
doesn't give the user much control of what register is being used.

So in my mind that means the only reason to honor the policy operand
is to control what policy is used in vsetvli to maybe avoid a vtype
change. Given the other optimizations we do on the policy field, I
don't think allowing the user this control is reliable.

Therefore, I think we should use agnostic policies if the source is
undef.

This should give better performance on some CPUs for VP intrinsics where
there is no merge operand and the backend adds IMPLICIT_DEF to the instruction.

Differential Revision: https://reviews.llvm.org/D135396
2022-10-11 16:40:16 -07:00
Craig Topper 902c1b3c2f [RISCV] Remove unused SchedClass WriteVFCvtFToFV. NFC
This isn't bound to any instruction. From the section comment it
was for single-width F-to-F conversions, but those don't exist.
2022-10-11 16:26:06 -07:00
Chris Bieneman 2b2afb2529 [DX] Add analysis and printer for shader flags
This adds infrastructural pieces for an analysis to compute the DXIL
shader flags. In this state the analysis can compute two fairly
straightforward feature flags for use of double-precision floating
point values and the DX 11.1 extended double support.

This patch does conflict with D135190, conflicts will be resolved prior
to merging.

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D135393

# Conflicts:
#	llvm/lib/Target/DirectX/CMakeLists.txt
#	llvm/lib/Target/DirectX/DirectXTargetMachine.cpp
2022-10-11 14:27:05 -05:00
Abinav Puthan Purayil 3d9f011a9c [AMDGPU] Make the uses_dynamic_stack field in the kernel descriptor and the metadata map specific to code object v5 and later
Unfortunately, we have a broken handling of this in the runtime of rocm
5.3. The runtime is expected to handle this correctly when v5 becomes
the default.

Differential Revision: https://reviews.llvm.org/D134714
2022-10-11 23:28:43 +05:30
Joe Nash 3648fc5b42 [AMDGPU] Make disassembler convertFMAanyK call more generic
Make support more generic to support future instructions.
Currently NFC.

Reviewed By: foad, arsenm

Differential Revision: https://reviews.llvm.org/D135678
2022-10-11 11:22:25 -04:00
Krzysztof Parzyszek cb6804104f [Hexagon] Remove unused function, NFC 2022-10-11 08:05:22 -07:00
Luke Drummond 940fa35ece [NVPTX] Fix a segfault for bitcasted calls with byval params
`getFunctionParamOptimizedAlign` was being passed a null function
argument when getting the callee of a bitcasted function symbol. This is
because `CallBase::getCalledFunction` does not look through bitcasts.

There is already code to handle this case in
`NVPTXTargetLowering::getArgumentAlignment`, which is now hoisted into
an NVPTX util.

The alignment computation now gracefully handles computing alignment of
virtual functions with a check for null.
2022-10-11 15:12:25 +01:00
Weining Lu 42b70793a1 Reland "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC"
Reference: https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html

k: A memory operand whose address is formed by a base register and
(optionally scaled) index register.

m: A memory operand whose address is formed by a base register and
offset that is suitable for use in instructions with the same
addressing mode as st.w and ld.w.

ZB: An address that is held in a general-purpose register. The offset
is zero.

ZC: A memory operand whose address is formed by a base register and
offset that is suitable for use in instructions with the same
addressing mode as ll.w and sc.w.

Note:
The INLINEASM SDNode flags in below tests are updated because the new
introduced enum `Constraint_k` is added before `Constraint_m`.
  llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
  llvm/test/CodeGen/X86/callbr-asm-kill.mir

This patch passes `ninja check-all` on a X86 machine with all official
targets and the LoongArch target enabled.

Differential Revision: https://reviews.llvm.org/D134638
2022-10-11 19:51:48 +08:00
Dmitry Preobrazhensky 4e62d02db9 [AMDGPU][MC] Correct image_gather4h
Correct encoding of image_gather4h for GFX9; disable this instruction for SI, CI and VI.

Differential Revision: https://reviews.llvm.org/D135605
2022-10-11 14:41:27 +03:00
Martin Storsjö 018ac7847b [AArch64] Add SEH_Nop opcodes for BTI hints
These are harmless for the unwinder - the unwinder doesn't need to
handle them for being able to unwind correctly.

Only add the opcodes when the branch target is in a SEH prologue;
for jumptables e.g. within a function, we shouldn't add any SEH
opcodes.

Differential Revision: https://reviews.llvm.org/D135277
2022-10-11 14:32:01 +03:00
wanglei 64c42a4d70 [LoongArch] Define getSetCCResultType for setting vector setCC type
To avoid trigger "No default SetCC type for vectors!" Assertion.

Differential Revision: https://reviews.llvm.org/D135527
2022-10-11 19:05:14 +08:00
wanglei d1b526fb95 [LoongArch] Add codegen support of GlobalTLSAddress lowering
There are static and dynamic TLS address lowering in DAG stage according
to different TLS models.

TLS address will be lowered to pseudo instruction and then expanded by
the `LoongArch Pre-RA pseudo instruction expansion` pass.

Differential Revision: https://reviews.llvm.org/D134713
2022-10-11 18:10:13 +08:00
Nikita Popov ac47db6aca [Attributes] Return Optional from getAllocSizeArgs() (NFC)
As suggested on D135572, return Optional<> from getAllocSizeArgs()
rather than the peculiar pair(0, 0) sentinel.

The method on Attribute itself does not return Optional, because
the attribute must exist in that case.
2022-10-11 11:05:21 +02:00
Michal Paszkowski 7a3c9a85c5 [SPIRV] Fix call lowering of "anonymous" functions
The patch fixes lowering of anonymous functions, removes file/linkage
info for builtin call demangling, and adds relevant test demonstrating
a fixed problem.

Differential Revision: https://reviews.llvm.org/D135390
2022-10-11 00:06:29 +02:00
David Green deb8f8ab17 [ARM] Add errors for MVE exclusive registers.
These instructions already had errors for operands that could not share
the same register:
  VCMUL, VMULL, VQDMULL.
This extends that to a few others:
  VREV64, VQDMULLqr, VCADD and VHCADD.
Only the i32 types require the error.

Differential Revision: https://reviews.llvm.org/D135560
2022-10-10 22:06:35 +01:00
Joe Nash 8a7d4993b7 [AMDGPU] Fix True16 patterns for cmp on GFX11
These patterns should have a True16 version and a non-true16 version.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135609
2022-10-10 16:41:06 -04:00
Joe Nash ebb258d3b0 [AMDGPU] Make V_SAT_PK_U8_I16 a True16 Instruction
The return type is two u8 packed into a 16 bit VGPR, so this instruction
should be True16.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D135478
2022-10-10 10:33:49 -04:00
Peter Rong 2343ad755d [AArch64] Add index check before lowerInterleavedStore() uses ShuffleVectorInst's mask
This commit fixes https://github.com/llvm/llvm-project/issues/57326.

Currently we would take a Mask out and directly use it by doing
auto Mask = SVI->getShuffleMask();
However, if the mask is undef, this Mask is not initialized. It might be
a vector of -1 or random integers.
This would cause an Out-of-bound read later when trying to find a
StartMask.

This change checks if all indices in the Mask is in the allowed range,
and fixes the out-of-bound accesses.

Differential Revision: https://reviews.llvm.org/D132634
2022-10-10 12:52:31 +01:00
Matt Devereau d48e63074f [AArch64][SVE] Fix AArch64_SVE_VectorCall calling convention
This fixes the case where callees with SVE arguments outside of the z0-z7
range were incorrectly deduced as SVE calling convention functions
2022-10-10 10:25:29 +00:00
LiaoChunyu a835b92e6c [RISCV] Use hasAllWUsers to recover XORI/ORI
reference 0fbe71e91f.

Also add testcase for addi.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D135538
2022-10-10 14:16:50 +08:00
Mingming Liu 159fb378f7 [AArch64] Swap 'lsl(val1,small-shmt)' to right hand side for AND(lsl(val1,small-shmt), lsl(val2,large-shmt))
On many aarch64 processors (Cortex A78, Neoverse N1/N2/V1, etc), ADD with LSL shift (shift-amount <= 4) has smaller latency and higher
throughput than ADD with larger shift (shift-amunt > 4). This is at least no-op for the rest of the processors.

Differential Revision: https://reviews.llvm.org/D135208
2022-10-09 17:26:54 -07:00
Yeting Kuo 7329dc0cc3 [RISCV][NFC] Fix unused variable warning.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135365
2022-10-09 21:39:30 +08:00
gonglingqin 5593d36356 [LoongArch] Expand fptrunc store from f64 to f32
Differential Revision: https://reviews.llvm.org/D135510
2022-10-09 17:55:42 +08:00
Ting Wang bc5e969ca1 [PowerPC] Add vector pair calling convention for AIX
This is AIX part of update after https://reviews.llvm.org/D117225

Fixed the issue that AIX64 with vector pair enabled saw redundant
spill/reload of callee saved vector registers.

Based on original patch by: Kai Luo

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D133466
2022-10-09 01:23:18 -04:00
WANG Xuerui 31327c29fb [LoongArch] Don't merge FrameIndex accesses into [F]{LD,ST}X
Otherwise eliminateFrameIndex cannot figure out how to fixup the stack
offset with its stateless logic, because there wouldn't be an immediate
slot for it to trivially write to, and it may not be easy to transform
the surrounding code to make it work.

This fixes a fairly common crash when compiling moderately complex code with
Clang.

Differential Revision: https://reviews.llvm.org/D135251
2022-10-09 13:04:21 +08:00
Freddy Ye 566c277c64 [X86] Remove AVX512VP2INTERSECT from Sapphire Rapids.
For more details, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D135509
2022-10-08 14:54:03 +08:00
gonglingqin f4ccb577f8 [LoongArch] Do not assert value type in isFPImmLegal
This patch fixes the failure of llvm/test/CodeGen/Generic/vector.ll and
CodeGen/PowerPC/2007-11-19-VectorSplitting.ll for a LoongArch native build.

Differential Revision: https://reviews.llvm.org/D134798
2022-10-08 14:50:48 +08:00
Craig Topper f749b2d9a5 [RISCV] Fix incorrect parenthese placement in comment. NFC 2022-10-07 17:16:38 -07:00
Craig Topper 9f67047cf0 [VP][RISCV] Add vp.smax/smin/umax/umin intrinsics
Differential Revision: https://reviews.llvm.org/D135418
2022-10-07 17:14:31 -07:00
Krzysztof Parzyszek 09d84e0ad8 [Hexagon] Implement helper to get intrinsic for instruction opcode
There are intrinsics for most scalar instructions and almost all HVX
instructions. What's somewhat painful is that there are two intrinsics
for each HVX instruction: one for 64- and one for 128-byte mode.
Instead of checking the current codegen settings every time, this
function would simply return the right intrinsic.
2022-10-07 15:56:06 -07:00
Jessica Paquette 42cb2f8b12 [GlobalISel] Mark mi_match as nodiscard
Typically when you match something, you want to check the result.

Fix a couple warnings in the AMDGPUPostLegalizerCombiner which appear as a
result of this.

Differential Revision: https://reviews.llvm.org/D135491
2022-10-07 15:47:05 -07:00
Artem Belevich 9a01cca660 Add support for CUDA-11.8 and sm_{87,89,90} GPUs.
Differential Revision: https://reviews.llvm.org/D135306
2022-10-07 13:59:28 -07:00
Matt Arsenault 74ef03d38a AMDGPU: Update SlotIndexes independently of LiveIntervals
Apparently StackColoring depends on SlotIndexes, but not
LiveIntervals. If regalloc fast were manually requested, LiveIntervals
would be dropped before SILowerSGPRSpills but not SlotIndexes.

SILowerSGPRSpills preserved SlotIndexes, but only through
LiveIntervals. As a result, SILowerSGPRSpills was incorrectly
reporting it preserved SlotIndexes. Start updating these directly,
instead of depending on LiveIntervals also being available.
2022-10-07 13:15:15 -07:00
Krzysztof Parzyszek d184045d36 [Hexagon] Formatting changes, NFC 2022-10-07 09:13:51 -07:00
Krzysztof Parzyszek e492cdc358 [Hexagon] Add couple of helper functions in HexagonVectorCombine
1. `length(value/type)`: return the number of elements in the vector
   input,
2. `getHvxTy(elem_type)`: return the HVX vector type with the element
   type provided.

These will help write things more succintly.
2022-10-07 09:10:08 -07:00
Krzysztof Parzyszek 06019b8e55 [Hexagon] Add default parameter to HexagonVectorCombine::getIntTy, NFC 2022-10-07 08:52:19 -07:00
Krzysztof Parzyszek d376b2667a [Hexagon] Make HexagonSubtarget::isHVXVectorType take EVT instead of MVT
EVT can be created for any Type, and so this function can now be used to
check if given Type, as-is, is an HVX type (as opposed to a type that may
be subject to legalization to an HVX type).
2022-10-07 08:42:39 -07:00
Kazu Hirata 7f90597be6 [AMDGPU] Fix a warning
This patch fixes:

  llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:800:17:
  error: unused variable 'DST_IDX' [-Werror,-Wunused-variable]
2022-10-07 08:27:02 -07:00
Krzysztof Parzyszek 2216d8f6b8 [Hexagon] Replace llvm::Optional with std::optional, NFC 2022-10-07 08:23:39 -07:00
Krzysztof Parzyszek 473210ae90 [Hexagon] Constify member refererence, NFC 2022-10-07 08:23:39 -07:00
Dmitry Preobrazhensky 8f8e4e3b38 [AMDGPU][MC][GFX11] Correct v_fmac_.*_e64_dpp
Differential Revision: https://reviews.llvm.org/D134961
2022-10-07 16:21:55 +03:00
Dmitry Preobrazhensky 1d1c7555e2 [AMDGPU][GFX11][NFC] Refactor VOPD handling in codegen
Differential Revision: https://reviews.llvm.org/D135084
2022-10-07 16:13:05 +03:00
Dmitry Preobrazhensky fd7b0eeaf6 [AMDGPU][MC][GFX11] Add VOPD VGPR bank access validation
Differential Revision: https://reviews.llvm.org/D134960
2022-10-07 15:52:59 +03:00
zhongyunde 75358f060c [AArch64] Lower multiplication by a constant int to madd
Lower a = b * C -1 into madd
  a) instcombine change b * C -1 --> b * C + (-1)
  b) machine-combine change b * C + (-1) --> madd

Assembler will transform the neg immedate of sub to add, see https://gcc.godbolt.org/z/cTcxePPf4
Fixes AArch64 part of https://github.com/llvm/llvm-project/issues/57255.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134336
2022-10-07 19:33:47 +08:00
eopXD dbc681c98e [VP][RISCV] Add vp.roundtozero and its RISC-V support
The scalar instruction of this is `llvm.trunc`. However the naming of
ISD::VP_TRUNC is already taken by `trunc` of the LLVM IR. Naming this as
`vp.ftrunc` would likely cause confusion with `vp.fptrunc`. So adding
`vp.roundtozero` that will look similar to `vp.roundeven`.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D135233
2022-10-07 02:15:23 -07:00
Xiang Li 220185552f [DirectX backend] Add analysis to collect DXILResources
Now only DXILTranslateMetadata uses DXILResources, so DXILResourceWrapper is only used by DXILTranslateMetadata.
Once we add lower for createHandle, DXILResourceWrapper will be used in more passes.
Also we can add resource index allocation in DXILResourceWrapper.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D135190
2022-10-06 19:34:29 -07:00
Craig Topper 3b20765cf7 [RISCV] Use mask agnostic policy for isel patterns where the merge operand is IMPLICIT_DEF.
I tend to think we should ignore the policy bit in vsetvli insertion
if the tied operand is IMPLICIT_DEF. But that raises questions about
what the policy operand on RVV intrinsics means if you also pass
vundefined().

This change at least fixes some cases. I'll post a separate patch
for vsetvli insertion for discussion.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135386
2022-10-06 15:44:39 -07:00