Commit Graph

62287 Commits

Author SHA1 Message Date
David Sherwood 57ca65e21e [AArch64] Add instruction costs for FP_TO_UINT and FP_TO_SINT with half types
We were missing some instruction costs when converting vectors of
floating point half types into integers, so I've added those here.
I also manually generated assembly code for each FP->int case and
looked at the number of instructions generated, which meant
adjusting some of the existing costs too.

I've updated an existing test to reflect the new costs:

  Analysis/CostModel/AArch64/sve-fptoi.ll

Differential Revision: https://reviews.llvm.org/D99935
2021-04-21 09:39:45 +01:00
Zakk Chen ad0fe5db2f [RISCV][MC] Mask load should not have VMConstraint.
Add a test, dest register could be v0.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D100825
2021-04-21 15:21:37 +08:00
Serge Pavlov d20a2376d8 [RISCV] Introduce floating point control and state registers
New registers FRM, FFLAGS and FCSR was defined. They represent
corresponding system registers. The new registers are necessary to
properly order floating point instructions in non-default modes.

Differential Revision: https://reviews.llvm.org/D99083
2021-04-21 12:55:30 +07:00
Zi Xuan Wu ca31b43ae8 [NFC][CSKY] Resort the instruction description in td
Resort the instruction description in td to make it easy to upstream more instructions and add predicts later.
2021-04-21 12:36:07 +08:00
Sam Clegg 103956170b [WebAssembly] Update README. NFC.
This is just a cleanup of the very high level stuff.  I'm sure there is
more to update here but I'll leave that to others and/or a followup.

Differential Revision: https://reviews.llvm.org/D100888
2021-04-20 16:59:08 -07:00
Philip Reames 4824d876f0 Revert "Allow invokable sub-classes of IntrinsicInst"
This reverts commit d87b9b81cc.

Post commit review raised concerns, reverting while discussion happens.
2021-04-20 15:38:38 -07:00
Philip Reames d87b9b81cc Allow invokable sub-classes of IntrinsicInst
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.

This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.

After this lands and has baked for a couple days, planned cleanups:
    Make GCStatepointInst a IntrinsicInst subclass.
    Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
    Do the same in SelectionDAG.
    Do the same in FastISEL.

Differential Revision: https://reviews.llvm.org/D99976
2021-04-20 15:03:49 -07:00
Sam Clegg d2de2d1724 [WebAssembly] Remove unused known_gcc_test_failures.txt. NFC
Differential Revision: https://reviews.llvm.org/D100887
2021-04-20 14:07:25 -07:00
Alexey Bataev 673e2f1b70 [COST][AARCH64] Improve cost of reverse shuffles for AArch64.
Introduced the cost of thre reverse shuffles for AArch64, currently just
copied the costs for PermuteSingleSrc.

Differential Revision: https://reviews.llvm.org/D100871
2021-04-20 13:47:56 -07:00
Jon Roelofs 167da6c9e8 [AArch64][GlobalISel] Clarify fallback debug print
... to only print when that fallback actually happens.
2021-04-20 12:41:14 -07:00
Thomas Lively 693d767c60 [WebAssembly] More codegen for f64x2.convert_low_i32x4_{s,u}
af7925b4dd added a custom DAG combine for recognizing fp-to-ints of
extract_subvectors that could be lowered to f64x2.convert_low_i32x4_{s,u}
instructions. This commit extends the combines to recognize equivalent
extract_subvectors of fp-to-ints as well.

Differential Revision: https://reviews.llvm.org/D100790
2021-04-20 12:37:13 -07:00
Simon Pilgrim 2a419a0b99 [X86][SSE] combineX86ShuffleChain - check if we're blending with zero into already zero elements
Add a SelectionDAG::MaskedElementsAreZero helper that wraps SelectionDAG::MaskedValueIsZero testing for entirely zero vector elements
2021-04-20 17:09:49 +01:00
Jay Foad ec8c61efdf [AMDGPU] Allow multiple uses of the same literal
In GFX10 VOP3 can have a literal, which opens up the possibility of two
operands using the same literal value, which is allowed and only counts
as one use of the constant bus.

AMDGPUAsmParser::validateConstantBusLimitations already knew about this
but SIInstrInfo::verifyInstruction did not.

Differential Revision: https://reviews.llvm.org/D100770
2021-04-20 16:44:01 +01:00
Ahmed Bougacha a0573b6c10 [AArch64] Bump apple-latest CPU alias to apple-a14. 2021-04-20 08:41:04 -07:00
Ahmed Bougacha a8a3a43792 [AArch64] Add apple-m1 CPU, and default to it for macOS.
apple-m1 has the same level of ISA support as apple-a14,
so this is a straightforward mechanical change.  However, that
also means this inherits apple-a14's v8.5a+nobti quirkiness.

rdar://68287159
2021-04-20 08:41:04 -07:00
David Green 21a8b9d9e9 [ARM] Limit PerformExtractEltToVMOVRRD to when f64 is legal.
The generic SoftFloatVectorExtract.ll test was failing when run on arm
machines, as it tries to create a f64 under soft float. Limit the
transform to when f64 is legal.

Also add a missing override, as reported in D100244.
2021-04-20 16:24:36 +01:00
Matt Arsenault 1cb8a9d595 AMDGPU/GlobalISel: Fix uitofp/sitofp with non-power-of-2 integers 2021-04-20 11:13:29 -04:00
Bradley Smith b8b075d8d7 [AArch64][SVE] Lower MULHU/MULHS nodes to umulh/smulh instructions
Mark MULHS/MULHU nodes as legal for both scalable and fixed SVE types,
and lower them to the appropriate SVE instructions.

Additionally now that the MULH nodes are legal, integer divides can be
expanded into a more performant code sequence.

Differential Revision: https://reviews.llvm.org/D100487
2021-04-20 15:18:06 +01:00
David Green 48cef1fa8e [ARM] Create VMOVRRD from adjacent vector extracts
This adds a combine for extract(x, n); extract(x, n+1)  ->
VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the
same time in a single instruction, and thanks to the other VMOVRRD folds
we have added recently can help reduce the amount of executed
instructions. Floating point types are very similar, but will include a
bitcast to an integer type.

This also adds a shouldRewriteCopySrc, to prevent copy propagation from
DPR to SPR, which can break as not all DPR regs can be extracted from
directly.  Otherwise the machine verifier is unhappy.

Differential Revision: https://reviews.llvm.org/D100244
2021-04-20 15:15:43 +01:00
Cullen Rhodes f166d0db71 [AArch64][AsmParser] NFC: Remove unused ExtendOp struct
Left over from 2625a993f9 when extend and shift were merged.
2021-04-20 13:45:09 +00:00
Sebastian Neubauer 4897effb14 [AMDGPU] Add TransVALU to gfx10
Instructions on the transcendental unit are executed in parallel to the
normal VALU, so add this as an extra resource.

This doesn't seem to have any effect, but it should be more correct.

Differential Revision: https://reviews.llvm.org/D100123
2021-04-20 15:34:43 +02:00
Jay Foad 2aea830ec4 [AMDGPU] Use if instead of foreach in a few places. NFC. 2021-04-20 14:20:30 +01:00
Nemanja Ivanovic 03e7fefff8 [PowerPC] Canonicalize shuffles on big endian targets as well
Extend shuffle canonicalization and conversion of shuffles fed by vectorized
scalars to big endian subtargets. For big endian subtargets, loads and direct
moves of scalars into vector registers put the data in the correct element for
SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is
narrower, the value still ends up in the wrong place - althouth a different
wrong place than on little endian targets.

This patch extends the combine that keeps values where they are if they feed a
shuffle to big endian targets.

Differential revision: https://reviews.llvm.org/D100478
2021-04-20 07:29:47 -05:00
Jay Foad edea476142 [AMDGPU] Use simpler alternatives to !foldl. NFC. 2021-04-20 12:59:04 +01:00
hsmahesha 840c4e4e90 [AMDGPU] Re-arrange ds_read/ds_write ISel pattern for better readability.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D100773
2021-04-20 16:17:15 +05:30
Ben Shi 30e2c7be99 [RISCV] Refactor an optimization of addition with immediate
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100769
2021-04-20 18:04:25 +08:00
Joe Ellis c91cd4f3bb [AArch64][SVE][InstCombine] Replace last{a,b} intrinsics with extracts...
when the predicate used by last{a,b} specifies a known vector length.

For example:
  aarch64_sve_lasta(VL1, D) -> extractelement(D, #1)
  aarch64_sve_lastb(VL1, D) -> extractelement(D, #0)

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D100476
2021-04-20 10:01:33 +00:00
Fraser Cormack b4a358a7ba [RISCV] Fix missing emergency slots for scalable stack offsets
This patch adds an additional emergency spill slot to RVV code. This is
required as RVV stack offsets may require an additional register to compute.

This patch includes an optimization by @HsiangKai <kai.wang@sifive.com>
to reduce the number of registers required for the computation of stack
offsets from 3 to 2. Otherwise we'd need two additional emergency spill
slots.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D100574
2021-04-20 09:59:41 +01:00
Qiu Chaofan 2432d80d3b [PowerPC] Use mtvsrdd to put callee-saved GPR into VSR
This patch exploits mtvsrdd instruction (available in ISA3.0+) to save
two callee-saved GPR registers into a single VSR, making it more
efficient.

Reviewed By: jsji, nemanjai

Differential Revision: https://reviews.llvm.org/D62565
2021-04-20 16:43:24 +08:00
Jay Foad b22721f01a [AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used
Don't shrink VOP3 instructions if there are any uses of a carry-out
operand, because the shrunken form of the instruction would write the
carry-out to vcc instead of to a virtual register.

Differential Revision: https://reviews.llvm.org/D100760
2021-04-20 09:17:52 +01:00
Qiu Chaofan b820339752 [PowerPC] Support f128 under VSX
This patch is the last one in backend to support fp128 type in
pre-POWER9 subtargets with VSX, removing temporary option and updating
remaining tests.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D92374
2021-04-20 15:49:52 +08:00
Zi Xuan Wu 4bb60c285c [CSKY 6/n] Add support branch and symbol series instruction
This patch adds basic CSKY branch instructions and symbol address series instructions.
Those two kinds of instruction have relationship between each other, and it involves much work about Fixups.

For now, basic instructions are enabled except for disassembler support.
We would support to generate basic codegen asm firstly and delay disassembler work later.

Differential Revision: https://reviews.llvm.org/D95029
2021-04-20 15:36:49 +08:00
Zi Xuan Wu 4216389c26 [CSKY 5/n] Add support for all CSKY basic integer instructions except for branch series
This patch adds basic CSKY integer instructions except for branch series such as bsr, br.
It mainly includes basic ALU, load & store, compare and data move instructions.

Branch series instructions need handle complex symbol operand as following patch later.

Differential Revision: https://reviews.llvm.org/D94007
2021-04-20 15:36:49 +08:00
Zi Xuan Wu 8ba622bae1 [CSKY 4/n] Add basic CSKYAsmParser and CSKYInstPrinter
This basic parser will handle basic instructions with register or immediate operands.
With the addition of CSKYInstPrinter, we can now make use of lit tests.

Differential Revision: https://reviews.llvm.org/D93798
2021-04-20 15:36:49 +08:00
Zakk Chen d5fa71e9ec [RISCV] Handle PseudoVRELOAD and PseudoVSPILL in getInstSizeInBytes.
It's necessary to calculate correct instruction size because
PseudoVRELOAD and PseudoSPILL will be expanded into multiple
instructions.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100702
2021-04-19 22:30:03 -07:00
Jun Ma 5c6ac3b4a2 [AArch64][SVE] Combine add and index_vector
This patch tries to combine pattern add(index_vector(zero, step), dup(X)) into index_vector(X, step)

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D100107
2021-04-20 11:38:37 +08:00
Min-Yih Hsu 7ac461f6f7 [M68k] Put M68kDesc as the direct library dependency for disassembler
M68kDisassembler should put M68kDesc as its direct library dependency
since it uses logics releated to code beads Otherwise the build will
fail when building LLVM libraries as shared objects (building LLVM
libraries statically won't have this problem though)
2021-04-19 15:56:24 -07:00
Ricky Taylor 2221185776 [M68k] Implement Disassembler
This is an implementation of a disassembler for M68k.

Differential Revision: https://reviews.llvm.org/D98540
2021-04-19 22:24:12 +01:00
Ricky Taylor 6de262827c [M68k] Change printing of absolute memory references
This also includes PC-relative addresses since they are still
referenced as absolute addresses in assembly and converted to
relative addresses by the assembler.

This changes, for example:
- `bra #-2` -> `bra $100`
- `jsr #16` -> `jsr $10`

Differential Revision: https://reviews.llvm.org/D100697
2021-04-19 22:24:12 +01:00
David Penry 78a871abf7 [ARM] Use ProcResGroup in Cortex-M7 scheduling model
Used to model structural hazards on FP issue, where some
instructions take up 2 issue slots and others one as well
as similar structural hazards on load issue, where some
instructions take up two load lanes and others one.

Differential Revision: https://reviews.llvm.org/D98977
2021-04-19 21:23:05 +01:00
Thomas Lively e657c84fa1 [WebAssembly] Use v128.const instead of splats for constants
We previously used splats instead of v128.const to materialize vector constants
because V8 did not support v128.const. Now that V8 supports v128.const, we can
use v128.const instead. Although this increases code size, it should also
increase performance (or at least require fewer engine-side optimizations), so
it is an appropriate change to make.

Differential Revision: https://reviews.llvm.org/D100716
2021-04-19 12:43:59 -07:00
Jinsong Ji d88d8c5b86 [PowerPC] Disable relative lookup table converter pass for AIX
XCOFF hasn't implemented lowerRelativeReference.
So we need to disable new pass introduced by https://reviews.llvm.org/D94355 for
AIX for now.

Reviewed By: gulfem

Differential Revision: https://reviews.llvm.org/D100584
2021-04-19 19:28:11 +00:00
madhur13490 6a4d9cb7e0 [AMDGPU] Remove error check for indirect calls and add missing queue-ptr
This patch removes -fixed-abi check for indirect calls
and also adds queue-ptr which is required for indirect calls to work.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D100633
2021-04-20 00:35:17 +05:30
Pavel Iliin 2ec16103c6 [AArch64] Peephole rule to remove redundant cmp after cset.
Comparisons to zero or one after cset instructions can be safely
removed in examples like:

cset w9, eq          cset w9, eq
cmp  w9, #1   --->   <removed>
b.ne    .L1          b.ne    .L1

cset w9, eq          cset w9, eq
cmp  w9, #0   --->   <removed>
b.ne    .L1          b.eq    .L1

Peephole optimization to detect suitable cases and get rid of that
comparisons added.

Differential Revision: https://reviews.llvm.org/D98564
2021-04-19 19:58:38 +01:00
Craig Topper 87afefcd22 [RISCV] Fix mistake in comment. NFC 2021-04-19 11:15:32 -07:00
Craig Topper 7ed01a420a [RISCV] Pad v4i1/v2i1/v1i1 stores with 0s to make a full byte.
As noted in the FIXME there's a sort of agreement that the any
extra bits stored will be 0.

The generated code is pretty terrible. I was really hoping we
could use a tail undisturbed trick, but tail undisturbed no
longer applies to masked destinations in the current draft
spec.

Fingers crossed that it isn't common to do this. I doubt IR
from clang or the vectorizer would ever create this kind of store.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100618
2021-04-19 11:05:18 -07:00
Jessica Paquette 65f257a215 [AArch64][GlobalISel] Implement custom legalization for s32 and s64 G_CTPOP
This is a partial port of AArch64TargetLowering::LowerCTPOP.

This custom lowering tries to uses NEON instructions to give a more efficient
CTPOP lowering when possible.

In the non-NEON/noimplicitfloat case, this should use the generic lowering
(see: https://godbolt.org/z/GcaPvWe4x). I think that's worth implementing after
implementing the widening code for s16/s8 though.

Differential Revision: https://reviews.llvm.org/D100399
2021-04-19 10:56:02 -07:00
Nick Desaulniers c440b97d89 [TargetLowering] move "o" and "X" constraint handling to base class
These constraints are machine agnostic; there's no reason to handle
these per-arch. If arches don't support these constraints, then they
will fail elsewhere during instruction selection. We don't need virtual
calls to look these up; TargetLowering::getInlineAsmMemConstraint should
only be overridden by architectures with additional unique memory
constraints.

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D100416
2021-04-19 10:53:31 -07:00
Jessica Paquette 91bbb914e0 [AArch64][GlobalISel] Regbankselect + select @llvm.aarch64.neon.uaddlv
It turns out we actually import a bunch of selection code for intrinsics. The
imported code checks that the register banks on the G_INTRINSIC instruction
are correct. If so, it goes ahead and selects it.

This adds code to AArch64RegisterBankInfo to allow us to correctly determine
register banks on intrinsics which have known register bank constraints.

For now, this only handles @llvm.aarch64.neon.uaddlv. This is necessary for
porting AArch64TargetLowering::LowerCTPOP.

Also add a utility for getting the intrinsic ID from a G_INTRINSIC instruction.
This seems a little nicer than having to know about how intrinsic instructions
are structured.

Differential Revision: https://reviews.llvm.org/D100398
2021-04-19 10:47:49 -07:00
Jay Foad a02aa91313 [AMDGPU] GCNDPPCombine: simplify API of isShrinkable. NFC. 2021-04-19 14:20:46 +01:00