Commit Graph

65876 Commits

Author SHA1 Message Date
Chenbing.Zheng 6d6c44a3f3 [RISCV] Add support for matching vwmulsu from fixed vectors
According to riscv-v-spec-1.0, widening signed(vs2)-unsigned integer multiply
vwmulsu.vv vd, vs2, vs1, vm # vector-vector
vwmulsu.vx vd, vs2, rs1, vm # vector-scalar

It is worth noting that signed op is only for vs2.
For vwmulsu.vv, we can swap two ops, and don't care which is sign extension,
but for vwmulsu.vx signExt can not be a vector extended from scalar (rs1).
I specifically added two functions ending with _swap in the test case.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118215
2022-01-28 02:33:30 +00:00
Jonas Paulsson 9ca9fee6e8 [SystemZ] Don't shrink 64-bit FP constants.
Return false from ShouldShrinkFPConstant(), so that these constants are stored
in their full size on the constant pool, even if they could have been shrunk
and used with an extending load.

This is better since LD is faster than LDE, and it also enables reg/mem opcodes.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D117927
2022-01-27 16:14:53 -06:00
Jonas Paulsson f541a5048a [SystemZ] Implement orderFrameObjects().
By reordering the objects on the stack frame after looking at the users, a
better utilization of displacement operands will result. This means less
needed Load Address instructions for the accessing of these objects.

This is important for very large functions where otherwise small changes
could cause a lot more/less accesses go out of range.

Note: this is not yet enabled for SystemZXPLINKFrameLowering, but should be.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D115690
2022-01-27 16:09:19 -06:00
Craig Topper 70e1cc6792 [RISCV] Prefer vmslt.vx v0, v8, zero over vmsle.vi v0, v8, -1.
At least when starting from a vmslt.vx intrinsic or ISD::SETLT. We
don't handle the case where the user used vmsle.vx intrinsic with -1.
2022-01-27 11:48:27 -08:00
David Green 82973edfb7 [ARM][AArch64] Introduce qrdmlah and qrdmlsh intrinsics
Since it's introduction, the qrdmlah has been represented as a qrdmulh
and a sadd_sat. This doesn't produce the same result for all input
values though. This patch fixes that by introducing a qrdmlah (and
qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah
instructions. The old test cases will now produce a qrdmulh and sqadd,
as expected.

Fixes #53120 and #50905 and #51761.

Differential Revision: https://reviews.llvm.org/D117592
2022-01-27 19:19:46 +00:00
Simon Pilgrim 9103b73fe0 [X86] Fold MOVMSK(CONCAT(X,Y)) -> MOVMSK(AND/OR(X,Y)) for all_of/any_of patterns
Makes it easier for later folds and avoids unnecessary 256-bit ops (especially on AVX1-only targets where we miss a lot of integer instructions)
2022-01-27 18:28:09 +00:00
Jay Foad 4b133cee80 [AMDGPU] SILoadStoreOptimizer: reject AGPR DS_WRITE sooner
Rejecting AGPR DS_WRITE instructions before adding them to any mergeable
list seems cleaner than adding them to the list and rejecting them
later.

Differential Revision: https://reviews.llvm.org/D118368
2022-01-27 18:20:46 +00:00
Jay Foad 94a4594c54 [AMDGPU] SILoadStoreOptimizer: use separate lists for AGPR instructions
Using separate lists for AGPR and non-AGPR instructions seems like a
cleaner solution than putting them all in the same list and then later
refusing to merge instructions of different AGPR-ness.

Differential Revision: https://reviews.llvm.org/D118367
2022-01-27 18:20:46 +00:00
Jay Foad 8a52fef1e0 [AMDGPU] SILoadStoreOptimizer: tweak API of CombineInfo::setMI. NFC.
Change CombineInfo::setMI to take a reference to the
SILoadStoreOptimizer instance, for easy access to common fields like
TII and STM.

Differential Revision: https://reviews.llvm.org/D118366
2022-01-27 18:20:46 +00:00
Matt Arsenault 33b45ee44b AMDGPU: Handle addrspacecast of constant 32-bit to flat
I accidentally made this work on the GlobalISel path, and there's no
real reason not to handle this.
2022-01-27 11:01:44 -05:00
Sander de Smalen af1c8f0d14 [AArch64][SVE] Folds VSELECT if the predicate is all active.
This adds the following changes:

* Fold: vselect(<all active predicate>, x, y) => x
* Extend isAllActivePredicate to take vscale_range into account, e.g.
  isAllActivePredicate(vl16) for nxv16i1 and vscale == 1 => true.
  isAllActivePredicate(vl32) for nxv16i1 and vscale == 2 => true.

Differential Revision: https://reviews.llvm.org/D118147
2022-01-27 15:58:56 +00:00
Matt Arsenault aa88b65392 AMDGPU/GlobalISel: Fix assert on invalid cond code for llvm.amdgcn.icmp 2022-01-27 10:34:06 -05:00
Matt Arsenault f482e86980 AMDGPU/GlobalISel: Fix flat_scratch_init handling for shaders
I don't think this is actually defined for mesa, but this is what we
were doing on the DAG path.
2022-01-27 10:20:52 -05:00
Simon Pilgrim ccda0f2226 [X86][SSE] Add combineBitOpWithShift for BITOP(SHIFT(X,Z),SHIFT(Y,Z)) -> SHIFT(BITOP(X,Y),Z) vector folds
InstCombine performs this more generally with SimplifyUsingDistributiveLaws, but we don't need anything that complex here - this is mainly to fix up cases where logic ops get created late on during lowering, often in conjunction with sext/zext ops for type legalization.

https://alive2.llvm.org/ce/z/gGpY5v
2022-01-27 14:54:41 +00:00
Jay Foad 185cb8e82c [AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access
Swizzled accesses are not merged, but there is no particular reason not
to merge two instructions if any of the intervening instructions happens
to be a swizzled access.

This moves the check for swizzled accesses out of checkAndPrepareMerge
into collectMergeableInsts where I think it makes more sense.

Differential Revision: https://reviews.llvm.org/D118267
2022-01-27 14:40:58 +00:00
Simon Pilgrim 389ae775e4 [X86] Fold TESTZ(OR(LO(X),HI(X)),OR(LO(Y),HI(Y))) -> TESTZ(X,Y)
Helps fix a number of poor codegen cases for allof(cmp()) with 256-bit vectors on AVX1
2022-01-27 13:20:36 +00:00
Sander de Smalen dafd1f29da [AArch64][SVE] Avoid using ptrue for unpredicated predicate AND.
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D118146
2022-01-27 13:00:23 +00:00
Sander de Smalen 417a75c6d0 [AArch64][SVE] Avoid using ptrue for ptest in VECREDUCE_OR.
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D118145
2022-01-27 11:44:49 +00:00
Sander de Smalen c9da81d997 [AArch64][SVE] Implement missing lowering for extract_subvector for predicates.
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D118057
2022-01-27 11:01:11 +00:00
Sander de Smalen d58757e522 [AArch64][SVE] Implement PFALSE with explicit AArch64ISD node.
The ISel patterns for PFALSE helps recognise the instructions as being
free of side-effects, which helps MachineCSE remove redundant
PFALSE instructions.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118054
2022-01-27 10:30:13 +00:00
Jay Foad 95857a7058 [AMDGPU] SILoadStoreOptimizer: Remove redundant check for volatile
SILoadStoreOptimizer::collectMergeableInsts already ends the current
block if it sees a volatile (or ordered) memory access, so there is no
need to check for them again when scanning the instructions between two
pairing candidates in a block.

Differential Revision: https://reviews.llvm.org/D118266
2022-01-27 10:14:53 +00:00
Nikita Popov fc72f3a168 [BTFDebug] Avoid pointer element type access
Use the global value type instead.
2022-01-27 10:30:21 +01:00
Fraser Cormack 84e85e025e [SelectionDAG][VP] Provide expansion for VP_MERGE
This patch adds support for expanding VP_MERGE through a sequence of
vector operations producing a full-length mask setting up the elements
past EVL/pivot to be false, combining this with the original mask, and
culminating in a full-length vector select.

This expansion should work for any data type, though the only use for
RVV is for boolean vectors, which themselves rely on an expansion for
the VSELECT.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118058
2022-01-27 09:00:41 +00:00
Wu Xinlong 6a4d3f37b5 [RISCV] fix dead code
fix dead code mentioned on https://reviews.llvm.org/D98136

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118323
2022-01-27 16:00:01 +08:00
Zi Xuan Wu 4ad517e6b0 [CSKY] Add floating operation support including float and double
CSKY arch has multiple FPU instruction versions such as FPU, FPUv2 and FPUv3 to implement floating operations.
For now, we just only support FPUv2 and FPUv3.

It includes the encoding, asm parsing of instructions and codegen of DAG nodes.
2022-01-27 15:54:04 +08:00
Wu Xinlong 615d71d9a3 [RISCV][CodeGen] Implement IR Intrinsic support for K extension
This revision implements IR Intrinsic support for RISCV Scalar Crypto extension according to the specification of version [[ https://github.com/riscv/riscv-crypto/releases/tag/v1.0.0-scalar | 1.0]]
Co-author:@ksyx & @VincentWu & @lihongliang & @achieveartificialintelligence

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102310
2022-01-27 15:53:35 +08:00
Ting Wang 6f25cb8685 [PowerPC] Add the Power10 XS[MAX|MIN]CQP instruction
Add the Power 10 instruction XS[MAX|MIN]CQP.

Reviewed By: shchenz, amyk

Differential Revision: https://reviews.llvm.org/D118036
2022-01-26 23:00:43 -05:00
Craig Topper b3bec6e453 [RISCV] Use vnsrl.wx with x0 instead of vnsrl.vi for truncate.
This matches what the spec uses for the vncvt.x.x.w assembly
pseudoinstruction.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D118295
2022-01-26 18:38:13 -08:00
Stanislav Mekhanoshin 409c4436f9 [AMDGPU] Validate dst and src2 non-overlapping restriction in asm
Differential Revision: https://reviews.llvm.org/D118089
2022-01-26 15:14:06 -08:00
Stanislav Mekhanoshin dbf278b984 [AMDGPU] Prevent aliasing of SrcC and Dst in MAI
Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs
or Src_C and vDst are completely separated. The case that Src_C and vDst
are overlapping should be avoid as new value could be written to accumulator
input before it gets read.

Note that this inevitably increases register pressure to the point where
some programs will become uncompilable.

This patch separates MAC and FMA versions of MFMA instructions using either
tied dst and src2 or earlyclobber dst.

Fixes: SWDEV-318900

Differential Revision: https://reviews.llvm.org/D117844
2022-01-26 14:48:20 -08:00
Craig Topper f487a76430 [RISCV] Add hasStdExtZbp() to hasAndNotCompare. 2022-01-26 13:54:05 -08:00
Matt Arsenault 045be6ff36 AMDGPU/GlobalISel: Fold wave address into mubuf addressing modes 2022-01-26 15:25:26 -05:00
Matt Arsenault 09fc311af7 AMDGPU/GlobalISel: Mostly fix BFI patterns
Most importantly, fixes constant bus errors in the 64-bit cases. It's
surprising to me these were even passing the selection test using
SReg_* sources. Also fixes pattern matching in the 32-bit cases, with
simple operands.

These patterns aren't working in a few cases, like with mixed SGPR
inputs. The patterns aren't looking through the SGPR->VGPR copies like
they need to. The vector cases also have some unmerges of build_vector
which are obscuring the inputs.
2022-01-26 15:06:50 -05:00
Craig Topper b3d94b199c [RISCV] Remove references to 'B' extension from AssemblerPredicate and SubtargetFeature strings.
For Zba/Zbb/Zbc/Zbs I've removed the 'B' completely and used the
extension names as presented at the start of Chapter 1 of the
1.0.0 Bitmanipulation spec.

For the unratified extensions, I've replaced 'B' with 'Zb' and
otherwise left them unchanged.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D117822
2022-01-26 11:08:29 -08:00
Matt Arsenault e6564f39c7 AMDGPU: Emit user sgpr count directives in text asm
We were emitting these in the object file but not printing them.
2022-01-26 13:51:12 -05:00
Konstantina aa418b9133 [AMDGPU][SIWholeQuadMode] Use the right VCC register to activate the correct lanes.
Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D118096
2022-01-26 08:54:39 -08:00
Stanislav Mekhanoshin 4e077c0a0b [AMDGPU] Remove feature register-banking
Since RegBankReassign pass was removed this feature is not
use for anything.

Differential Revision: https://reviews.llvm.org/D118195
2022-01-26 08:39:17 -08:00
Benjamin Kramer f15014ff54 Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17"
This reverts commit ef82063207.

- It conflicts with the existing llvm::size in STLExtras, which will now
  never be called.
- Calling it without llvm:: breaks C++17 compat
2022-01-26 16:55:53 +01:00
serge-sans-paille ef82063207 Rename llvm::array_lengthof into llvm::size to match std::size from C++17
As a conquence move llvm::array_lengthof from STLExtras.h to
STLForwardCompat.h (which is included by STLExtras.h so no build
breakage expected).
2022-01-26 16:17:45 +01:00
Simon Pilgrim 99ae5c13f6 [X86] Add 'getSplitVectorSrc' helper to determine if subvectors all come from the same source
Helps determine if the subvector ops come from the same larger vector and match the lower/upper extractions
2022-01-26 15:17:21 +00:00
Nemanja Ivanovic 0c56bc92e4 [PowerPC] Fix eq/ne comparison of v2i64 pre-Power8
In commit 1674d9b6b2, I fixed the bug where we didn't consider
both words of the result of the comparison. However, the logic
needs to be different for eq and ne.
Namely for eq, we need both words of the doubleword to equal so it
is an AND. OTOH for ne, we need either word to be unequal so it
is an OR.
2022-01-26 08:59:08 -06:00
Nikita Popov a5e324e3e2 [AMDGPUHSAMetadataStreamer] Do not assume ABI alignment for pointers
AMDGPUHSAMetadataStreamer currently assumes that pointer arguments
without align attribute have ABI alignment of the pointee type.
This is incompatible with opaque pointers, but also plain incorrect:
Pointer arguments without explicit alignment have alignment 1. It is
the responsibility of the frontent to add correct align annotations.

Differential Revision: https://reviews.llvm.org/D118229
2022-01-26 15:45:14 +01:00
Alban Bridonneau 2feddb37b4 Implement correct cost for SVE bitcasts
We have some bitcasts which we know will be simplified,
so their cost is zero.

Reviewed By: david-arm, sdesmalen

Differential Revision: https://reviews.llvm.org/D118019
2022-01-26 14:25:44 +00:00
Simon Moll 5ceb0bc7ea [VE] Packed 32/64bit broadcast isel and tests
Packed-mode broadcast of f32/i32 requires the subregister to be
replicated to the full I64 register prior. Add repl_i32 and repl_f32 to
faciliate this.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D117878
2022-01-26 14:16:06 +01:00
alex-t 5157f984ae [AMDGPU] Enable divergence-driven XNOR selection
Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence.
This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32)
on those targets which have no V_XNOR.
Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form.
We assume that xor (not) is already turned into the not (xor) by the combiner.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D116270
2022-01-26 15:33:10 +03:00
Paul Walker 66bd7ebdf7 [SVE] Use DUPM to handling more splat immediate cases.
NOTE: Only considers i64 based vectors at this time because smaller
element types require extra isel operand parsing.

Differential Revision: https://reviews.llvm.org/D118040
2022-01-26 12:04:44 +00:00
Maciej Gabka c5263cd518 Restrict performPostLD1Combine to 64 and 128 bit vectors
When wider vectors are used, for example fixed width SVE,
there is no patterns to select AArch64ISD::LD1LANEpost
nodes, so we should do an early exit.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D117674
2022-01-26 09:57:44 +00:00
jacquesguan 267711e38b [RISCV] Fix support of vlen = 64.
In the Zve* extensions, the vlen could be 64. This patch change the vlen constraint of low bound to 64.

Differential Revision: https://reviews.llvm.org/D118217
2022-01-26 16:31:21 +08:00
Jim Lin da1cac7d19 [NFC] Remove duplicate include 2022-01-26 15:10:16 +08:00
Qiu Chaofan ad0345aed1 [PowerPC] Emit gnu_attribute according to float-abi metadata
According to GNU as documentation, PowerPC supports some .gnu_attribute
tags to represent the vector and float ABI type in the object file.
Some linkers like GNU ld respects the attribute and will prevent objects
with conflicting ABIs being linked.

This patch emits gnu_attribute value in assembly printer according to
the float-abi metadata. More attributes for soft-fp, hard single/double
and even vector ABI need to be supported in the future.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D117193
2022-01-26 13:28:50 +08:00
Micah Weston f65651cc8a [AArch64] Fixes ADD/SUB opt bug and abstracts shared behavior in MIPeepholeOpt for ADD, SUB, and AND.
This fixes a bug where (SUBREG_TO_REG 0 (MOVi32imm <negative-number>) sub_32)
would generate invalid code since the top 32-bits were not zeroed when inspecting the
immediate value. A new test was added for this case.

Change to abstract shared behavior in MIPeepholeOpt. Both
visitAND and visitADDSUB attempt to split an RR instruction with an immediate
operand into two RI instructions with the immediate split.

The differing behavior lies in how the immediate is split into two pieces and
how the new instructions are built. The rest of the behavior (adding new VRegs,
checking for the MOVImm, constraining reg classes, removing old intructions)
are shared between the operations.

The new helper function splitTwoPartImm implements the shared behavior and
delegates differing behavior to two function objects passed by the caller.
One function object splits the immediate into two values and returns the
opcode to use if it is a valid split. The other function object builds
the new instructions.

I felt this abstraction would help since I believe it will help reduce the
code repetition when adding new instructions of the pattern, such as
SUBS for this conditional optimization.

Tested it locally by running check all with compiler-rt, mlir, clang-tools-extra,
flang, llvm, and clang enabled.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118000
2022-01-26 04:22:27 +00:00
Zakk Chen 510710d037 [RISCV][NFC] Add getVLOperand for RVV intrinsics.
Use the VLOperand information to get the VL.

Differential Revision: https://reviews.llvm.org/D118156
2022-01-25 17:37:58 -08:00
Zakk Chen 9273378b85 [RISCV] Add the passthru operand for RVV nomask load intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

Co-Authored-by: Hsiangkai Wang <Hsiangkai@gmail.com>

Reviewers: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D117647
2022-01-25 17:31:36 -08:00
eopXD b089e4072a [RISCV] Don't allow i64 vector div by constant to use mulh with Zve64x
EEW=64 of mulh and its vairants requires V extension.

Authored by: Craig Topper <craig.topper@sifive.com> @craig.topper

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117947
2022-01-25 09:55:05 -08:00
Sebastian Neubauer 4ed7c6eec9 [AMDGPU] Only match correct type for a16
Addresses are floats when a sampler is present and unsigned integers
when no sampler is present.

Therefore, only zext instructions, not sext instructions should match.

Also match integer constants that can be truncated.

Differential Revision: https://reviews.llvm.org/D118043
2022-01-25 14:59:16 +01:00
Danila Malyutin 153b1e3cba [AArch64] Add patterns for relaxed atomic ld/st into fp registers
Adds patterns to match integer loads/stores bitcasted to fp values

Fixes https://github.com/llvm/llvm-project/issues/52927

Differential Revision: https://reviews.llvm.org/D117573
2022-01-25 15:33:37 +03:00
Paul Walker d95cf1f6cf [SVE] Enable ISD::ABDS/U ISel for scalable vectors.
NOTE: This patch also includes tests that highlight those cases
where the existing DAG combine doesn't yet work well for SVE.

Differential Revision: https://reviews.llvm.org/D117873
2022-01-25 12:14:53 +00:00
Simon Pilgrim 157f9b68a3 [X86] combineVectorSignBitsTruncation - fix indentation. NFC. 2022-01-25 11:54:22 +00:00
Simon Tatham f302e0b5dd [AArch64] Exclude optional features from HasV8_0rOps.
The following SubtargetFeatures are removed from the definition of
HasV8_0rOps, on the grounds that they are optional in Armv8.4-A, and
therefore (by the definition of Armv8.0-R) also optional in v8.0-R:

 * performance monitoring: FeaturePerfMon
 * cryptography: FeatureSM4 and FeatureSHA3
 * half-precision FP: FeatureFullFP16, FeatureFP16FML
 * speculation control: FeatureSSBS, FeaturePredRes, FeatureSB,
   FeatureSpecRestrict

This isn't the full set of features that are listed as optional in the
spec. FeatureCCIDX and FeatureTRACEV8_4 are also optional. But LLVM
includes those in HasV8_3aOps and HasV8_4aOps respectively (I think on
the grounds that the system registers they enable are useful to be
able to access after a runtime check), and so for consistency, I've
left those in HasV8_0rOps too.

After this commit, HasV8_0rOps is a strict subset of HasV8_4aOps (but
missing features that are not in Armv8.0-R at all).

The definition of Cortex-R82 is correspondingly updated to add most of
the features that I've removed from base Armv8.0-R (with the exception
of the cryptography ones), since that particular implementation of
v8.0-R does have them.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118045
2022-01-25 10:54:59 +00:00
David Sherwood 13252160c3 [NFC] Move useSVEForFixedLengthVectors into AArch64Subtarget.h
Given how small the function is and how often it gets used it
makes more sense to live in the header file.

Differential Revision: https://reviews.llvm.org/D117883
2022-01-25 09:49:04 +00:00
Nikita Popov aa97bc116d [NFC] Remove uses of PointerType::getElementType()
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().

This is part of D117885, in preparation for deprecating the API.
2022-01-25 09:44:52 +01:00
Martin Storsjö 70cb8daed4 [X86] [CodeView] Add codeview mappings for registers ST0-ST7
These can end up needed after https://reviews.llvm.org/D116821.

Suggested by Alexandre Ganea.

Differential Revision: https://reviews.llvm.org/D118072
2022-01-25 10:09:06 +02:00
Craig Topper fd0a4bc76b [RISCV] Add missing space to 'clang-format on' directive. NFC
Without a space after the comment characters it seems to be ignored.
2022-01-24 17:00:37 -08:00
Simon Pilgrim 902184e6cc [X86] combinePredicateReduction - generalize allof(cmpeq(x,0)) handling to allof(cmpeq(x,y))
There's no further reasons to limit this to cmpeq-with-zero, the outstanding regressions with lowering to PTEST have now been addressed

Improves codegen for Issue #53379
2022-01-25 00:24:06 +00:00
Simon Pilgrim 11bb4a1111 [X86] combinePredicateReduction - split vXi16 allof(cmpeq()) to vXi8 allof(cmpeq())
vXi16 patterns allof(cmp()) reduction patterns will have to be pack the comparison results to vXi8 to use PMOVMSKB.

If we're reducing cmpeq(), then we can compare the vXi8 halves directly - similar to what we already do for vXi64 -> vXi32 for cases without PCMPEQQ.
2022-01-24 22:43:29 +00:00
Simon Pilgrim 8d298355ca [X86] combineSetCCMOVMSK - detect and(pcmpeq(),pcmpeq()) ptest pattern.
Handle cases where we've split an allof(cmpeq()) pattern to a legal vector type
2022-01-24 21:42:03 +00:00
Quinn Pham 6a028296fe [PowerPC] Emit warning when SP is clobbered by asm
This patch emits a warning when the stack pointer register (`R1`) is found in
the clobber list of an inline asm statement. Clobbering the stack pointer is
not supported.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D112073
2022-01-24 15:12:23 -06:00
Stanislav Mekhanoshin bb1fe36977 [AMDGPU] Make v8i16/v8f16 legal
Differential Revision: https://reviews.llvm.org/D117721
2022-01-24 11:51:08 -08:00
Stanislav Mekhanoshin c27f8fb968 [AMDGPU] Remove cndmask from readsExecAsData
Differential Revision: https://reviews.llvm.org/D117909
2022-01-24 11:24:47 -08:00
Sebastian Neubauer 80532ebb50 [AMDGPU][InstCombine] Remove zero image offset
Remove the offset parameter if it is zero.

Differential Revision: https://reviews.llvm.org/D117876
2022-01-24 18:06:33 +01:00
Simon Pilgrim 6997f4d07f [X86] combineSetCCMOVMSK - fold allof(cmpeq(x,y)) -> ptest(sub(x,y)) (PR53379)
As suggested on PR53379, for all-of icmp-eq patterns, we can use ptest(sub(x,y)) on SSE41+ targets

This is a generalization of the existing allof(cmpeq(x,0)) -> ptest(x) pattern

We can probably extend this further, in particularly to handle 256-bit cases on pre-AVX2 targets, but this part of the generalization is pretty trivial

Fixes Issue #53379
2022-01-24 16:44:37 +00:00
Craig Topper cd2a9ff397 [RISCV] Select int_riscv_vsll with shift of 1 to vadd.vv.
Add might be faster than shift. We can't do this earlier without
using a Freeze instruction.

This is the intrinsic version of D106689.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118013
2022-01-24 08:04:53 -08:00
Matt Arsenault 18aabae8e2 AMDGPU: Fix assertion on fixed stack objects with VGPR->AGPR spills
These have negative / out of bounds frame index values and would
assert when trying to set the BitVector. Fixed stack objects can't be
colored away so ignore them.
2022-01-24 09:45:41 -05:00
Matt Arsenault 99e8e17313 Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction"
This reverts commit a97e20a3a8.
2022-01-24 09:26:52 -05:00
Fraser Cormack d42678b453 [RISCV] Add side-effect-free vsetvli intrinsics
This patch introduces new intrinsics that enable the use of vsetvli in
contexts where only the returned vector length is of interest. The
pre-existing intrinsics are marked with side-effects, which prevents
even trivial optimizations on/across them.

These intrinsics are intended to be used in situations where the vector
length is fed in turn to RVV intrinsics or to vector-predication
intrinsics during loop vectorization, for example. Those codegen paths
ensure that instructions are generated with their own implicit vsetvli,
so the vector length and vtype can be relied upon to be correct.

No corresponding C builtins are planned at this stage, though that is a
possibility for the future if the need arises.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117910
2022-01-24 13:52:08 +00:00
serge-sans-paille 5f290c090a Move STLFunctionalExtras out of STLExtras
Only using that change in StringRef already decreases the number of
preoprocessed lines from 7837621 to 7776151 for LLVMSupport

Perhaps more interestingly, it shows that many files were relying on the
inclusion of StringRef.h to have the declaration from STLExtras.h. This
patch tries hard to patch relevant part of llvm-project impacted by this
hidden dependency removal.

Potential impact:
- "llvm/ADT/StringRef.h" no longer includes <memory>,
  "llvm/ADT/Optional.h" nor "llvm/ADT/STLExtras.h"

Related Discourse thread:
https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831
2022-01-24 14:13:21 +01:00
Sebastian Neubauer f1e36474b9 [AMDGPU][NFC] Fix debug prints
Print the instructions instead of pointers.
2022-01-24 13:55:00 +01:00
SForeKeeper 70f83f3084 [RISCV] add support for zbkx subextension in MC layer.
This patch adds support for zbkx extension from K extension(v1.0.0) in MC layer.
Instructions with same functionality and same encoding is defined in the bitmanip extension.
It defines {Xperm8, Xperm4} as instruction aliases for xperm.* in Zbp extension. When Zbkx is enabled while Zbp is not, xperm.h will not be available. When Zbkx and Zbp are both enabled, the instructions will be decoded in Zbp format.

[[ https://reviews.llvm.org/D94999 | D94999 ]] this is the patch that introduces xperm.* instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117889
2022-01-24 20:38:46 +08:00
Fraser Cormack af773a1818 [RISCV][VP] Lower VP_MERGE to RVV instructions
This patch adds lowering of the llvm.vp.merge.* intrinsic
(ISD::VP_MERGE) to RVV vmerge/vfmerge instructions. It introduces a
special pseudo form of vmerge which allows a tied merge operand,
allowing us to specify the tail elements as being equal to the "on
false" operand, using a tied-def constraint and a "tail undisturbed"
policy.

While this strategy allows us to often lower the intrinsic to just one
instruction, it may be less efficient in fixed-vector types as the
number of tail elements may extend far beyond the length of the fixed
vector. Another strategy could be to use a vmerge/vfmerge instruction
with an AVL equal to the length of the vector type, and manipulate the
condition operand such that mask elements greater than the operation's
EVL are false.

I've also observed inefficient codegen in which our 'VF' patterns don't
match raw floating-point SPLAT_VECTORs, which occur in scalable-vector
code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117561
2022-01-24 11:05:05 +00:00
Fraser Cormack e7926e8d97 [RISCV] Match VF variants for masked VFRDIV/VFRSUB
This patch follows up on D117697 to help the simple binary operations
behave similarly in the presence of masks.

It also enables CGP sinking support for vp.fdiv and vp.fsub intrinsics,
now that VFRDIV and VFRSUB are consistently matched with a LHS splat for
masked and unmasked variants.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117783
2022-01-24 10:59:43 +00:00
Simon Pilgrim 577a6dc9a1 [X86] getVectorMaskingNode - fix indentation. NFC.
clang-format
2022-01-24 11:08:41 +00:00
Nikita Popov 0d1308a7b7 [AArch64][GlobalISel] Support returned argument with multiple registers
The call lowering code assumed that a returned argument could only
consist of one register. Pass an ArrayRef<Register> instead of
Register to make sure that all parts get assigned.

Fixes https://github.com/llvm/llvm-project/issues/53315.

Differential Revision: https://reviews.llvm.org/D117866
2022-01-24 10:55:28 +01:00
Chenbing.Zheng 9aaa74aeef [RISCV] Add patterns of SET[U]LT_VI for STECC forms
This patch optmizes "li a0, 5
                     vmsgt[u].vx v10, v8, a0"
                 -> "vmsgt[u].vi v10, v8, 5"

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118014
2022-01-24 08:50:49 +00:00
Jim Lin f533011252 [Hexagon] Use llvm::Register instead of unsigned in HexagonConstExtenders.cpp. NFC.
Reviewed By: kparzysz

Differential Revision: https://reviews.llvm.org/D117851
2022-01-24 16:06:25 +08:00
jacquesguan ba16e3c31f [RISCV] Decouple Zve* extensions and the V extension.
According to the spec, there are some difference between V and Zve64d. For example, the vmulh integer multiply variants that return the high word of the product (vmulh.vv, vmulh.vx, vmulhu.vv, vmulhu.vx, vmulhsu.vv, vmulhsu.vx) are not included for EEW=64 in Zve64*, but V extension does support these instructions. So we should decouple Zve* extensions and the V extension.

Differential Revision: https://reviews.llvm.org/D117854
2022-01-24 14:55:21 +08:00
Kazu Hirata bf039a8620 [Target] Use range-based for loops (NFC) 2022-01-23 22:53:15 -08:00
Wu Xinlong e29d8fb169 [RISCV] Initially support the K-extension instructions on the LLVM MC layer
This commit is currently implementing supports for scalar cryptography extension for LLVM according to version v1.0.0 of [K Ext specification](https://github.com/riscv/riscv-crypto/releases)(scala crypto has been ratified already). Currently, we are implementing the MC (Machine Code) layer of his extension and the majority of work is done under `llvm/lib/Target/RISCV` directory. There are also some test files in `llvm/test/MC/RISCV` directory.

Remove the subfeature of Zbk* which conflict with b extensions to reduce the size of the patch.
(Zbk* will be resubmit after this patch has been merged)

**Co-author:**@ksyx & @VincentWu & @lihongliang & @achieveartificialintelligence

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98136
2022-01-24 14:45:35 +08:00
Jim Lin 3f24cdec25 [RISCV][NFC] Remove tailing whitespaces in RISCVInstrInfoVSDPatterns.td and RISCVInstrInfoVVLPatterns.td 2022-01-24 10:49:43 +08:00
Simon Pilgrim 4762c077e7 [X86] LowerFunnelShift - always lower vXi8 fshl by constant amounts as unpack(y,x) << zext(z)
This can always be lowered as PMULLW+PSRLWI+PACKUSWB
2022-01-23 21:35:05 +00:00
Simon Pilgrim 32dc14f876 [X86] LowerFunnelShift - use supportedVectorShiftWithBaseAmnt to check for supported scalar shifts
Allows us to reuse the ISD shift opcode instead of a mixture of ISD/X86ISD variants
2022-01-23 21:13:58 +00:00
Craig Topper 413684313d [RISCV] Adjust the header comment in RISCVInstrInfoZb.td to better integrate Zbk* extensions.
The Zbk* extensions have some overlap with Zb so have been placed in this file.

Reviewed By: VincentWu

Differential Revision: https://reviews.llvm.org/D117958
2022-01-23 11:42:52 -08:00
Ayke van Laethem 116ab78694
[AVR] Make use of the constant value 0 in R1
The register R1 is defined to have the constant value 0 in the avr-gcc
calling convention (which we follow). Unfortunately, we don't really
make use of it. This patch replaces `LDI 0` instructions with a copy
from R1.

This reduces code size: my AVR build of compiler-rt goes from 50660 to
50240 bytes of code size, which is a 0.8% reduction. Presumably it will
also improve execution speed, although I didn't measure this.

Differential Revision: https://reviews.llvm.org/D117425
2022-01-23 17:08:01 +01:00
Ayke van Laethem 153359180a
[AVR] Remove regalloc workaround for LDDWRdPtrQ
Background: https://github.com/avr-rust/rust-legacy-fork/issues/126

In short, this workaround was introduced to fix a "ran out of registers
during regalloc" issue. The root cause has since been fixed in
https://reviews.llvm.org/D54218 so this workaround can be removed.

There is one test that changes a little bit, removing a single
instruction. I also compiled compiler-rt before and after this patch but
didn't see a difference. So presumably the impact is very low. Still,
it's nice to be able to remove such a workaround.

Differential Revision: https://reviews.llvm.org/D117831
2022-01-23 17:08:00 +01:00
eopXD 3cf15af2da [RISCV] Remove experimental prefix from rvv-related extensions.
Extensions affected: +v, +zve*, +zvl*

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117860
2022-01-22 20:18:40 -08:00
Phoebe Wang 37d1d02200 [X86][MS] Change the alignment of f80 to 16 bytes on Windows 32bits to match with ICC
MSVC currently doesn't support 80 bits long double. ICC supports it when
the option `/Qlong-double` is specified. Changing the alignment of f80
to 16 bytes so that we can be compatible with ICC's option.

Reviewed By: rnk, craig.topper

Differential Revision: https://reviews.llvm.org/D115942
2022-01-23 09:58:46 +08:00
Craig Topper d44b6be6ea [RISCV] Don't Custom legalize f16/f32/f64 bitcasts if those types aren't Legal. 2022-01-22 11:55:18 -08:00
Qiu Chaofan 00d68c3824 [PowerPC] Support parsing GNU attributes in MC
This patch is the first step to enable support of GNU attribute in LLVM
PowerPC, enabling it for PowerPC targets, otherwise llvm-mc raises error
when seeing the attribute section.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D115854
2022-01-22 23:29:34 +08:00
Qiu Chaofan 8dedf9b58b [PowerPC] Change CTR clobber estimation for 128-bit floating types
Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D117459
2022-01-22 23:20:14 +08:00
David Green b27e5459d5 [DAG] Convert truncstore(extend(x)) back to store(x)
Pulled out of D106237, this folds truncstore(extend(x)) back to store(x)
if the original store was legal. This can come up due to the order we
fold nodes. A fold from X86 needs to be adjusted to prevent infinite
loops, to have it pick the operand of a trunc more directly.

Differential Revision: https://reviews.llvm.org/D117901
2022-01-22 13:20:36 +00:00
Micah Weston 93deac2e2b [AArch64] Optimize add/sub with immediate through MIPeepholeOpt
Fixes the build issue with D111034, whose goal was to optimize
add/sub with long immediates.

Optimize ([add|sub] r, imm) -> ([ADD|SUB] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([add|sub] r, imm) -> ([SUB|ADD] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

The change which fixed the build issue in D111034 was the use of new virtual
registers so that SSA form is maintained until deleting MI.

Differential Revision: https://reviews.llvm.org/D117429
2022-01-22 12:39:22 +00:00
Alex Fan e796eaf2af [RISCV][RFC] add MC support for zbkc subextension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117874
2022-01-22 10:23:01 +08:00
Craig Topper 0379459fc5 [RISCV] Strengthen a SDTypeProfile. Fix formatting. 2022-01-21 13:01:53 -08:00
Craig Topper 48132bb1e4 [RISCV] Simplify interface to combineMUL_VLToVWMUL. NFC
Instead of passing the both the SDNode* and 2 of the operands
in two different orders, just pass the SDNode * and a bool to
indicate which operand order to test.

While there rename to combineMUL_VLToVWMUL_VL.
2022-01-21 11:43:06 -08:00
Craig Topper 11754a4dbb [RISCV] Use RVBUnary in more places to simplify some tablegen declarations. NFCI 2022-01-21 10:55:35 -08:00
Kai Nacke d5ae039ed7 [SystemZ] Properly register machine passes.
Registering the passes enables use of -stop-before=/-stop-after
options.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D117823
2022-01-21 09:10:37 -05:00
serge-sans-paille 75e164f61d [llvm] Cleanup header dependencies in ADT and Support
The cleanup was manual, but assisted by "include-what-you-use". It consists in

1. Removing unused forward declaration. No impact expected.
2. Removing unused headers in .cpp files. No impact expected.
3. Removing unused headers in .h files. This removes implicit dependencies and
   is generally considered a good thing, but this may break downstream builds.
   I've updated llvm, clang, lld, lldb and mlir deps, and included a list of the
   modification in the second part of the commit.
4. Replacing header inclusion by forward declaration. This has the same impact
   as 3.

Notable changes:

- llvm/Support/TargetParser.h no longer includes llvm/Support/AArch64TargetParser.h nor llvm/Support/ARMTargetParser.h
- llvm/Support/TypeSize.h no longer includes llvm/Support/WithColor.h
- llvm/Support/YAMLTraits.h no longer includes llvm/Support/Regex.h
- llvm/ADT/SmallVector.h no longer includes llvm/Support/MemAlloc.h nor llvm/Support/ErrorHandling.h

You may need to add some of these headers in your compilation units, if needs be.

As an hint to the impact of the cleanup, running

clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/Support/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l

before: 8000919 lines
after:  7917500 lines

Reduced dependencies also helps incremental rebuilds and is more ccache
friendly, something not shown by the above metric :-)

Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831
2022-01-21 13:54:49 +01:00
Fraser Cormack 4d268dc94a [RISCV] Enable CGP to sink splat operands of VP intrinsics
This patch brings better splat-matching to our VP support, by sinking
splat operands of VP intrinsics back into the same block as the VP
operation. The list of VP intrinsics we are interested in matches that
of the regular instructions.

Some optimization is still lacking. For instance, our VL nodes aren't
recognized as commutative, so splats must be on the RHS. Because of
this, we limit our sinking of splats to just the RHS operand for now.
Improvement in this regard can come in another patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117703
2022-01-21 11:30:37 +00:00
Sebastian Neubauer ae2f9c8be8 [AMDGPU] Remove lz and nomip combine from codegen
These combines have been moved into the IR combiner in D116042.

Differential Revision: https://reviews.llvm.org/D116116
2022-01-21 12:09:08 +01:00
Sebastian Neubauer 603d18033c [AMDGPU][InstCombine] Remove zero LOD bias
If the bias is zero, we can remove it from the image instruction.
Also copy other image optimizations (l->lz, mip->nomip) to IR combines.

Differential Revision: https://reviews.llvm.org/D116042
2022-01-21 12:09:07 +01:00
Sebastian Neubauer 0530fdbbbb [AMDGPU] Fix LOD bias in A16 combine
As the codegen fix in D111754, the LOD bias needs to be converted to 16
bits. Fix this in the combine.

Differential Revision: https://reviews.llvm.org/D116038
2022-01-21 12:09:06 +01:00
Simon Moll 7950010e49 [VE][NFC] Factor out helper functions
Factor out some helper functions to cleanup VEISelLowering.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D117683
2022-01-21 09:15:59 +01:00
wangpc 8def89b5dc [RISCV] Set CostPerUse to 1 iff RVC is enabled
After D86836, we can define multiple cost values for
different cost models. So here we set CostPerUse to
1 iff RVC is enabled to avoid potential impact on RA.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117741
2022-01-21 14:44:26 +08:00
Zi Xuan Wu 82bb8a588d [CSKY] Add codegen support of GlobalTLSAddress lowering
There are static and dynamic TLS address lowering in DAG stage according to different TLS model.
It needs PseudoTLSLA32 pseudo to get address of TLS-related entry which resides in constant pool.
2022-01-21 14:39:55 +08:00
Craig Topper 7b3d307288 [RISCV] Add isel patterns for grevi, shfli, and unshfli to brev8/zip/unzip instructions.
Zbkb supports some encodings of the general grevi, shfli, and
unshfli instructions legal, so we added separate instructions for
those encodings to improve the diagnostics for assembler and
disassembler. To be consistent we should always use these separate
instructions whenever those specific encodings of grevi/shfli/unshfli
occur. So this patch adds specific isel patterns to override the generic
isel patterns for these cases. Similar was done for rev8 and zext.h
for Zbb previously.
2022-01-20 20:43:52 -08:00
Wu Xinlong 7ee1c162cc [RISCV][RFC] add inst support of zbkb
This commit add instructions supports of `zbkb` which defined in scalar cryptography extension version v1.0.0 (has been ratified already).

Most of the zbkb directives reuse parts of the zbp and zbb directives, so this patch just modified some of the inst aliases and predicates.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117640
2022-01-21 11:49:36 +08:00
Joao Moreira 82af95029e [X86] Enable ibt-seal optimization when LTO is used in Kernel
Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit these instruction on function's prologues. Because this is a security feature, it is desirable that only actual indirect-branch-targeted functions are emitted with ENDBRs. While it is possible to identify address-taken functions through LTO, minimizing these ENDBR instructions remains a hard task for user-space binaries because exported functions may end being reachable through PLT entries, that will use an indirect branch for such. Because this cannot be determined during compilation-time, the compiler currently emits ENDBRs to every non-local-linkage function.

Despite the challenge presented for user-space, the kernel landscape is different as no PLTs are used. With the intent of providing the most fit ENDBR emission for the kernel, kernel developers proposed an optimization named "ibt-seal" which replaces the ENDBRs for NOPs directly in the binary. The discussion of this feature can be seen in [1].

This diff brings the enablement of the flag -mibt-seal, which in combination with LTO enforces a different policy for ENDBR placement in when the code-model is set to "kernel". In this scenario, the compiler will only emit ENDBRs to address taken functions, ignoring non-address taken functions that are don't have local linkage.

A comparison between an LTO-compiled kernel binaries without and with the -mibt-seal feature enabled shows that when -mibt-seal was used, the number of ENDBRs in the vmlinux.o binary patched by objtool decreased from 44383 to 33192, and that the number of superfluous ENDBR instructions nopped-out decreased from 11730 to 540.

The 540 missed superfluous ENDBRs need to be investigated further, but hypotheses are: assembly code not being taken care of by the compiler, kernel exported symbols mechanisms creating bogus address taken situations or even these being removed due to other binary optimizations like kernel's static_calls. For now, I assume that the large drop in the number of ENDBR instructions already justifies the feature being merged.

[1] - https://lkml.org/lkml/2021/11/22/591

Reviewed By: xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D116070
2022-01-21 10:55:34 +08:00
Hsiangkai Wang ad06e65dc4 [RISCV] Fix the bug in the register allocator caused by reserved BP.
Originally, hasRVVFrameObject() will scan all the stack objects to check
whether if there is any scalable vector object on the stack or not.
However, it causes errors in the register allocator. In issue 53016, it
returns false before RA because there is no RVV stack objects. After RA,
it returns true because there are spilling slots for RVV values during RA.
The compiler will not reserve BP during register allocation and generate BP
access in the PEI pass due to the inconsistent behavior of the function.

The function is changed to use hasStdExtV() as the return value. It is
not precise, but it can make the register allocation correct.

Refer to https://github.com/llvm/llvm-project/issues/53016.

Differential Revision: https://reviews.llvm.org/D117663
2022-01-21 01:23:01 +00:00
Craig Topper cfae2c65db [RISCV] Factor Zve32 support into RISCVSubtarget::getMaxELENForFixedLengthVectors.
This is needed to properly limit fractional LMULs for Zve32.

Add new RUN Zve32 RUN lines to the existing tests for the
-riscv-v-fixed-length-vector-elen-max command line option.
2022-01-20 16:31:12 -08:00
Craig Topper 5e88f527da [RISCV] Remove RISCVSubtarget::hasStdExtV() and hasStdExtZve*(). NFC
All code should use one of the cleaner named hasVInstructions*
functions. Fix the two uses that weren't and delete the methods
so no new uses can be created.
2022-01-20 15:05:09 -08:00
Craig Topper fa8bb22466 [RISCV] Optimize vector_shuffles that are interleaving the lowest elements of two vectors.
RISCV only has a unary shuffle that requires places indices in a
register. For interleaving two vectors this means we need at least
two vrgathers and a vmerge to do a shuffle of two vectors.

This patch teaches shuffle lowering to use a widening addu followed
by a widening vmaccu to implement the interleave. First we extract
the low half of both V1 and V2. Then we implement
(zext(V1) + zext(V2)) + (zext(V2) * zext(2^eltbits - 1)) which
simplifies to (zext(V1) + zext(V2) * 2^eltbits). This further
simplifies to (zext(V1) + zext(V2) << eltbits). Then we bitcast the
result back to the original type splitting the wide elements in half.

We can only do this if we have a type with wider elements available.
Because we're using extends we also have to be careful with fractional
lmuls. Floating point types are supported by bitcasting to/from integer.

The tests test a varied combination of LMULs split across VLEN>=128 and
VLEN>=512 tests. There a few tests with shuffle indices commuted as well
as tests for undef indices. There's one test for a vXi64/vXf64 vector which
we can't optimize, but verifies we don't crash.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D117743
2022-01-20 14:44:47 -08:00
Stanislav Mekhanoshin 41ebd19681 [AMDGPU] Do not ignore exec use where exec is read as data
Compares, v_cndmask_b32, and v_readfirstlane_b32 use EXEC
in a way which modifies the result. This implicit EXEC use
shall not be ignored for the purposes of instruction moves.

Differential Revision: https://reviews.llvm.org/D117814
2022-01-20 14:05:22 -08:00
Craig Topper dd7b69a61f [RISCV] Remove HadStdExtV and HasStdZve* Predicates from tablegen.
No instructions should be using these. Everything should use
HasVInstructions* Predicates. Remove them so that they can't be
used by accident.
2022-01-20 12:54:20 -08:00
Craig Topper 7a275dc354 [RISCV] Remove Zvlsseg extension.
This string no longer appears in the Vector Extension specification.
The segment load/store instructions are just part of the vector
instruction set.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D117724
2022-01-20 12:40:07 -08:00
Craig Topper 94e69fbb4f [RISCV] Add DAG combine to fold (fp_to_int_sat (ffloor X)) -> (select X == nan, 0, (fcvt X, rdn))
Similar for ceil, trunc, round, and roundeven. This allows us to use
static rounding modes to avoid a libcall.

This is similar to D116771, but for the saturating conversions.

This optimization is done for AArch64 as isel patterns.
RISCV doesn't have instructions for ceil/floor/trunc/round/roundeven
so the operations don't stick around until isel to enable a pattern
match. Thus I've implemented a DAG combine.

I'm only handling saturating to i64 or i32. This could be extended
to other sizes in the future.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116864
2022-01-20 11:35:37 -08:00
Jonas Paulsson 792853cb78 [SystemZ] Remove the ManipulatesSP flag from backend (NFC).
This flag was set in the presence of stacksave/stackrestore in order to force
a frame pointer.

This should however not be needed per the comment in MachineFrameInfo.h
stating that a a variable sized object "...is the sole condition which
prevents frame pointer elimination", and experiments have also shown that
there seems to be no effect whatsoever on code generation with ManipulatesSP.

Review: Ulrich Weigand
2022-01-20 13:00:51 -06:00
Matt Arsenault 064cea9c9a AMDGPU/GlobalISel: Try to use s_and_b64 in ptrmask selection
Avoids a test diff with SDAG.
2022-01-20 12:56:53 -05:00
Matt Arsenault 2e49e0cfde AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics
Emit an error if the return value is used on subtargets that do not
support them. Previously we were falling back to the DAG on selection
failure, where it would emit this error and then fail again.
2022-01-20 12:46:45 -05:00
Matt Arsenault be7e938e27 AMDGPU/GlobalISel: Stop handling llvm.amdgcn.buffer.atomic.fadd
This code is not structured to handle the legacy buffer intrinsics and
was miscompiling them.
2022-01-20 12:12:06 -05:00
Matt Arsenault 8ff3c9e0be AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics
The struct/raw forms for the buffer atomics now work as
expected. However, we're incorrectly handling the legacy form (which
we probably shouldn't handle at all). We also are not diagnosing the
use of the return value on gfx908. These will be addressed separately.
2022-01-20 12:12:06 -05:00
Matt Arsenault 89c447e4e6 AMDGPU: Stop reserving 36-bytes before kernel arguments for amdpal
This was inheriting the mesa behavior, and as far as I know nobody is
using opencl kernels with amdpal. The isMesaKernel check was
irrelevant because this property needs to be held for all functions.
2022-01-20 12:12:05 -05:00
Random06457 ee198df2e1 [mips] Improve vr4300 mulmul bugfix pass
When compiling with dwarf info, the mfix4300 flag introduced in
https://reviews.llvm.org/D116238 can miss some occurrences of the vr4300
mulmul bug if a debug instruction happens to be between two `muls`
instructions. This change skips debug instructions in order to fix
the mulmul bug detection.

Fixes https://github.com/llvm/llvm-project/issues/53094

Differential Revision: https://reviews.llvm.org/D117615
2022-01-20 20:10:04 +03:00
Simon Pilgrim 866311e71c [X86] lowerToAddSubOrFMAddSub - lower 512-bit ADDSUB patterns to blend(fsub,fadd)
AVX512 doesn't provide a ADDSUB instruction, but if we've built this from a build vector of scalar fsub/fadd elements we can still lower to blend(fsub,fadd)
2022-01-20 15:16:05 +00:00
Simon Tatham a4ac40e92f [AArch64] Remove PRBAR0_ELn and PRLAR0_ELn sysregs.
The Armv8-R.64 architecture defines numbered MPU region registers with
indices 1-15, not 0-15. So there's no such register as PRBAR0_EL2 or
PRLAR0_EL1 (for example). The encodings that they would occupy are
used for the unnumbered PRBAR_ELn and PRLAR_ELn registers.

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D117755
2022-01-20 13:37:58 +00:00
Abinav Puthan Purayil d8b690409d [AMDGPU] Set MemoryVT for truncstores in tblgen.
GlobalISelEmitter was skipping these patterns when its predicates were
checked. This patch should allow us to select d16_hi stores in
GlobalISel.

Differential Revision: https://reviews.llvm.org/D117762
2022-01-20 19:05:12 +05:30
Simon Pilgrim 304cfc706a [X86] combineConcatVectorOps - remove superfluous Subtarget.hasAVX() check
This function only ever gets called by AVX targets, and we already assert for this at the top of the function
2022-01-20 12:56:09 +00:00
Simon Pilgrim c4f5fd76da [X86] combineConcatVectorOps - add handling for X86ISD::VSHL/VSRL/VSRA
These can be handled the same as the vector shift by immediate variants that are already handled.
2022-01-20 12:56:08 +00:00
Fraser Cormack ca36cc56ac [RISCV] Match RVV VF variants also through masked operations
This brings floating-point RVV vector/scalar support more in line with
the integer vector patterns, which can already match '.vx' instructions
with masked operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117697
2022-01-20 12:08:02 +00:00
Peter Waller d4a6bf4d1a Revert "[AArch64][SVE][VLS] Move extends into arguments of comparisons"
This reverts commit db04d3e30b, which
causes a buildbot failure.
2022-01-20 12:01:23 +00:00
Fraser Cormack 5a12024b95 [RISCV] Optimize lowering of floating-point -0.0
This idea has come up in several reviews -- D115978 and D105902 -- so I
can't take any credit for the idea. Instead of using a constant pool to
lower -0.0, we can emit a sequence of two instructions:

    fmv.[hwd].x freg, zero
    fsgnjn.[hsd] freg, freg, freg

This is only done when the floating-point type is legal.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117687
2022-01-20 11:46:28 +00:00
David Spickett 6732c43897 [llvm][AArch64] Accept armv8.8 "hbc" and "mops" in .arch_extension directive
Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D117693
2022-01-20 09:16:08 +00:00
Chenbing.Zheng 0be3da1fab [RISCV] Add intrinsic for Zbt extension
RV32: fsl, fsr, fsri
RV64: fsl, fsr, fsri, fslw, fsrw, fsriw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117468
2022-01-20 08:27:05 +00:00
eopXD 8eae99dfe5 [RISCV] Add the zve extension according to the v1.0 spec
`zve` is the new standard vector extension to specify varying degrees of
vector support for embedding processors. The `zve` extension is related
to the `zvl` extension and other updates that are added in v1.0.

According to https://github.com/riscv-non-isa/riscv-c-api-doc/pull/21,
Clang defines macro `__riscv_v_max_elen`,  `__riscv_v_max_elen_fp` for
`zve` and it can be used by applications that uses the vector extension.

Authored by: Zakk Chen <zakk.chen@sifive.com> @khchen
Co-Authored by: Eop Chen <eop.chen@sifive.com> @eopXD

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D112408
2022-01-19 23:48:28 -08:00
Jim Lin 216ac31dd7 [M68k][NFC] Rename Bt(BT) to Btst(BTST)
It seems that implementation of Bt refered from x86.
In M68k, Bt(BT) should be renamed to Btst(BTST).

Reviewed By: myhsu

Differential Revision: https://reviews.llvm.org/D117534
2022-01-20 12:45:02 +08:00
Heejin Ahn eb675e972d [WebAssembly] Support Wasm EH + Wasm SjLj
D108960 added support for SjLj using Wasm EH instructions, which we call
Wasm SjLj going forward. (We call the old SjLj Emscripten SjLj) But it
did not support using Wasm EH and Wasm SjLj together. So far users of
Wasm EH had to use Wasm EH with Emscripten SjLj, which had a certain
limitation and it suffered from bigger code size increases as well.

This enables using Wasm EH and Wasm SjLj together.
1. This redirects `catchswitch` and `cleanupret` that unwind to caller
   to `catch.dispatch.longjmp` BB, which is a `catchswitch` BB that
   handles longjmps.
2. D108960 converted all longjmpable `call`s to `invokes` that unwind to
   `catch.dispatch.longjmp`. This CL checks if the `call` is embedded
   within another `catchpad`, and if so, makes it unwind to its nearest
   parent's unwind destination, rather than `catch.dispatch.longjmp`.
   This is necessary to preserve the scoping structure.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D117610
2022-01-19 20:13:54 -08:00
Luo, Yuanke 5dea7a865e Combine to vpdpbusd when operand is constant and small enough.
Differential Revision: https://reviews.llvm.org/D116363
2022-01-20 11:10:49 +08:00
Ben Shi 94173dc24c [AVR] Generate ELPM for loading byte/word from extended program memory
Reviewed By: aykevl

Differential Revision: https://reviews.llvm.org/D116493
2022-01-20 02:53:10 +00:00
Ben Shi c1dd607463 [AVR][MC] Generate section '.progmemX.data' for extended flash banks
Reviewed By: aykevl

Differential Revision: https://reviews.llvm.org/D115987
2022-01-20 02:53:10 +00:00
Adrian Tong b6a7ae2c5d Optimize shift and accumulate pattern in AArch64.
AArch64 supports unsigned shift right and accumulate. In case we see a
unsigned shift right followed by an OR. We could turn them into a USRA
instruction, given the operands of the OR has no common bits.

Differential Revision: https://reviews.llvm.org/D114405
2022-01-20 01:57:40 +00:00
Mohammed Nurul Hoque 21c79be5d7 [RISCV] Add patterns to MIR sign-extension removal pass.
This patch adds a few instruction patterns that generate sign-extended values or propagate them, adding to the pass introduced in https://reviews.llvm.org/D116397

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117465
2022-01-19 17:33:58 -08:00
Luís Marques a767ae2c5c [RISCV] Fix incomplete asm statement parsing
For instructions without operands, the final `AsmToken::EndOfStatement`
wasn't being consumed. In the context of inline assembly, the resulting
empty statements would cause extraneous empty lines to be emitted. Fix
the issue by consuming the `EndOfStatement` token.

Differential Revision: https://reviews.llvm.org/D117565
2022-01-19 21:56:21 +00:00