Commit Graph

69712 Commits

Author SHA1 Message Date
Michael Maitland 184fbfd712 [RISCV][CodeGen] Chapter of vector instruction type corresponds with chapters in RISCV vector specification. NFC
The [vector spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc) is organized in chapters
based on instruction type. The comments in the tablegen marked the incorrect chapters. This change
updates the comments with the correct chapter numbers.

Differential Revision: https://reviews.llvm.org/D138311
2022-11-18 10:30:08 -08:00
Matt Arsenault fe56afc4d7 AMDGPU: Fix fcanonicalize constant folding not correctly handling -0.0 2022-11-18 10:03:29 -08:00
Philip Reames 18fda867f4 [RISCV] Optimize scalable frame offset calculation when VLEN is precisely known
When we have a precisely known VLEN, we can replace runtime usage of VLENB with compile time constants. This converts offsets involving both fixed and scalable components into fixed offsets. The result is that we avoid the csr read of vlenb, and can often fold the multiply as well.

Differential Revision: https://reviews.llvm.org/D137591
2022-11-18 09:56:55 -08:00
Michael Maitland 98e342dca2 [RISCV][llvm-mca] Use LMUL Instruments to provide more accurate reports on RISCV
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.

On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.

* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change

A prior version of this patch was commited in 5e82ee5373. 2323a4ee61 reverted
that change because the unit test files caused build errors. The change with fixes
were committed in b88b8307bf but reverted once again e8e92c8313 due to more
build errors.

This commit adds the prior changes and fixes the build error.

Differential Revision: https://reviews.llvm.org/D137440
2022-11-18 09:55:15 -08:00
Mirko Brkusanin e58b116843 [AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11
Differential Revision: https://reviews.llvm.org/D133012
2022-11-18 18:19:27 +01:00
Petar Avramovic 0f3e72e86c AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selection
When selectVOP3PMadMixModsImpl fails, it can still create new copy instr
via selectVOP3ModsImpl. When selectG_FMA_FMAD gives up, new copy instr
will remain dead but will not be automatically removed.
InstructionSelect does not check if instructions created during selection
are dead.
Such dead copy doesn't have register class on dst operand and causes crash.
Fix is to build copy when operands are being added to selected instruction.

Differential Revision: https://reviews.llvm.org/D138044
2022-11-18 18:02:26 +01:00
Jay Foad 38302c60ef [AMDGPU] Stop looking for implicit M0 uses on MOV instructions
Before D114230, indirect moves used regular MOV opcodes and were
identified by having an implicit use of M0. Since D114230 they use
dedicated opcodes instead, so remove some old code that checks for
implicit uses of M0. NFCI.

Differential Revision: https://reviews.llvm.org/D138308
2022-11-18 16:57:55 +00:00
Matt Arsenault 08ec15e44b AMDGPU/GlobalISel: Fix strictfp fmul 2022-11-18 08:53:49 -08:00
Dinar Temirbulatov 44e2c6a428 [AArch64][SVE] Use PTRUE instruction instead of WHILELO if the range is appropriate for predicator constant.
While get_active_lane_mask lowering it uses WHILELO instruction,
but forconstant range suitable for PTRUE then we could issue PTRUE instruction
instead.

Differential Revision: https://reviews.llvm.org/D137547
2022-11-18 16:21:10 +00:00
Krzysztof Parzyszek ea6693d4c8 [Hexagon] Add missing patterns for mulhs/mulhu 2022-11-18 08:13:57 -08:00
Alexander Timofeev 3ae96e9eb8 ARCRegisterInfo::eliminateFrameIndex updated to fix build error caused by 32bd75716c 2022-11-18 16:16:10 +01:00
Alexander Timofeev 32bd75716c PEI should be able to use backward walk in replaceFrameIndicesBackward.
The backward register scavenger has correct register
liveness information. PEI should leverage the backward register scavenger.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137574
2022-11-18 15:57:34 +01:00
David Sherwood 2e02f007a2 [AArch64][SME2] Remove vector constraints from zip/uzp (2-vector) instruction classes
The zip/uzp (2-vector) instruction classes have the incorrect
register constraints and mark the destination as also being an
input. However, the instructions are fully destructive so I've
restructured the classes.

Differential Revision: https://reviews.llvm.org/D138288
2022-11-18 14:30:48 +00:00
Phoebe Wang d558255650 [X86] Use lock add/sub for cases that we only care about the EFLAGS
This fixes #36373, #36905 and partial of #58685.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D137711
2022-11-18 21:43:47 +08:00
Hassnaa Hamdi 79e8bd1add [AArch64][SME]: Generate streaming-compatible code for ISD::INSERT_VECTOR_ELT.
1- Enable custom lowering INSERT_VECTOR_ELT to generate code compatible
   to streaming mode.
2- Add testing file:
   insert-vector-elt.ll

Differential Revision: https://reviews.llvm.org/D138222
2022-11-18 12:20:16 +00:00
Hassnaa Hamdi d8306b8885 [AArch64][SME]: Use SVE mov instruction for FPR128 registers in streaming-compatible mode.
1- in streaming mode, use SVE OR/mov instruction instead of NEON OR,
   during copying phyReg -AArch64InstrInfo::copyPhysReg-.
2- add testing file:
   register-mov.ll

Differential Revision: https://reviews.llvm.org/D138211
2022-11-18 11:18:30 +00:00
Dmitry Preobrazhensky 96155bf44b [AMDGPU][GFX11][NFC] Refactor VOPD operands handling (part 2)
Rename interface functions and operands to make code clearer.

Differential Revision: https://reviews.llvm.org/D138133
2022-11-18 14:15:05 +03:00
Valery Pykhtin a35ba2a256 [AMDGPU] Fix PreRARematStage::sinkTriviallyRematInsts region boundary update after sinking.
First boundary of a region wasn't updated when a sinked instruction was added first into the region.

Reviewed By: vangthao

Differential Revision: https://reviews.llvm.org/D138256
2022-11-18 12:13:14 +01:00
wanglei bfa3551dd3 [LoongArch] Implement assembler branches pseudo instructions
These instructions always output the canonical mnemonic. The GNU tools
emit the canonical mnemonic for the branch pseudo instructions as well
(e.g. "bgt" will be recognised by the assembler but never printed by
objdump).

Reviewed By: xen0n

Differential Revision: https://reviews.llvm.org/D138100
2022-11-18 16:54:20 +08:00
Chen Zheng f034c98af0 [PowerPC] mark dead def for ctr be clobber.
TLS pseudo ADDIStlsgdHA will have such def. This dead def should
also prevent PPC from generating CTR loops.
2022-11-18 06:55:42 +00:00
Han-Kuan Chen 7e6dbfcd9d [RISCV] Make lowerVECTOR_SHUFFLEAsVSlidedown follow source until not EXTRACT_SUBVECTOR.
Current lowerVECTOR_SHUFFLEAsVSlidedown only seeks whether input are
EXTRACT_SUBVECTOR and their source are same. The commit will make the
function seek input and their source until they are not
EXTRACT_SUBVECTOR.

Differential Revision: https://reviews.llvm.org/D138025
2022-11-17 22:32:53 -08:00
Matt Arsenault fe5b9a6a11 AMDGPU/GlobalISel: Make strict fadd, fmul and fma legal 2022-11-17 20:50:04 -08:00
Matt Arsenault ae43420f39 AMDGPU/GlobalISel: Fix not selecting modifiers for f16 fma on gfx9
VOP3OpSel wasn't trying to match any modifiers. Just try to match the
basic case, like the DAG does.
2022-11-17 18:51:45 -08:00
Alexander Shaposhnikov 7059a6c32c [IR] Split out IR printing passes into IRPrinter
This diff splits out (from LLVMCore) IR printing passes into IRPrinter.
This structure is similar to what we already have for IRReader and
enables us to avoid circular dependencies between LLVMCore and Analysis
(this is a preparation for https://reviews.llvm.org/D137768).
The legacy interface is left unchanged, once the legacy pass manager
is removed (in the future) we will be able to clean it up further.
The bazel build configuration has been updated as well.

Test plan:
1/ Tested the following cmake configurations: static/dynamic linking * lld/gold * clang/gcc
2/ bazel build --config=generic_clang @llvm-project//...

Differential revision: https://reviews.llvm.org/D138081
2022-11-18 01:47:56 +00:00
Krzysztof Parzyszek a98fc08396 [Hexagon] Add instruction definitions for Hexagon v71, v71t, and v73
This includes instruction formats, definitions, encodings, scheduling
classes, and builtins/intrinsics.

New and improved version of 76536989ba, so much so that even clang
builds with it.
2022-11-17 15:51:38 -08:00
Fangrui Song 99f730c645 Revert "[Hexagon] Add instruction definitions for Hexagon v71, v71t, and v73"
This reverts commit 766536989b.

The commit caused:

clang/include/clang/Basic/BuiltinsHexagonDep.def:1896:69: error: use of undeclared identifier 'HVXV73'
TARGET_BUILTIN(__builtin_HEXAGON_V6_vadd_sf_bf, "V32iV16iV16i", "", HVXV73)

when building `clang`.
2022-11-17 23:14:32 +00:00
Krzysztof Parzyszek 766536989b [Hexagon] Add instruction definitions for Hexagon v71, v71t, and v73
This includes instruction formats, definitions, encodings, scheduling
classes, and builtins/intrinsics.
2022-11-17 14:15:47 -08:00
Krzysztof Parzyszek 534b26aa07 [Hexagon] Improve inserting/extracting to/from scalar predicates
Fixes https://github.com/llvm/llvm-project/issues/59042.
2022-11-17 13:03:45 -08:00
Krzysztof Parzyszek a2a89eb019 [Hexagon] Fix lowering loads/stores of scalar vNi1
Don't treat them as i1, all predicate bits need to be loaded or stored.
2022-11-17 12:48:01 -08:00
Krzysztof Parzyszek 8407c9916d [Hexagon] Use BUILD_PAIR instead of HexagonISD::COMBINE in lowering 2022-11-17 12:31:48 -08:00
Sami Tolvanen a542d5422a [X86][KCFI] Add support for memory operand unfolding
When the Linux kernel is compiled without -mretpoline, KCFI fails
ungracefully because it doesn't handle indirect calls with a memory
target operand. Since the KCFI check will need to load the target
address into a register for validating the type hash anyway, simply
unfold memory operands in indirect calls that need a KCFI check.

Fixes #59017
2022-11-17 19:00:48 +00:00
Stanislav Mekhanoshin bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Jay Foad 49762162ea [AMDGPU] Remove isLiteralConstant and isLiteralConstantLike
isLiteralConstant and isLiteralConstantLike were similar to
!isInlineConstant with slight differences like handling isReg operands.

To avoid a profusion of similar functions with undocumented differences,
this patch removes all the isLiteralConstant* variants. Callers are responsible
for handling the isReg case.

Differential Revision: https://reviews.llvm.org/D125759
2022-11-17 16:45:48 +00:00
Bradley Smith ac82907a1c [AArch64][SVE] Ensure redundant PTEST are removed with an 'invalid' PTRUE
When a PTRUE of non-element size is encountered, the PTEST optimization
logic bails out since it cannot handle that type of PTRUE. Instead, it
should be treated as a generic predicate to allow later optimizations trigger.

Differential Revision: https://reviews.llvm.org/D138116
2022-11-17 15:42:17 +00:00
Anton Sidorenko b6c790736e [MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patterns
This patch adds tranformation of fmul+fadd/fsub chains to fused multiply
instructions:
  * fmul+fadd->fmadd
  * fmul+fsub->fmsub/fnmsub

We also will try to combine these instructions if the fmul has more than one use
and cannot be deleted. However, removing the dependence between fmul and fadd can
still be profitable, and we rely on machine combiner approximations of scheduling.

Differential Revision: https://reviews.llvm.org/D136764
2022-11-17 13:24:04 +03:00
Yashwant Singh 2652db4d68 Handling ADD|SUB U64 decomposed Pseudos not getting lowered to SDWA form
This patch fixes some of the V_ADD/SUB_U64_PSEUDO not getting converted to their sdwa form.
We still get below patterns in generated code:
v_and_b32_e32 v0, 0xff, v0
v_add_co_u32_e32 v0, vcc, v1, v0
v_addc_co_u32_e64 v1, s[0:1], 0, 0, vcc

and,
v_and_b32_e32 v2, 0xff, v2
v_add_co_u32_e32 v0, vcc, v0, v2
v_addc_co_u32_e32 v1, vcc, 0, v1, vcc

1st and 2nd instructions of both above examples should have been folded into sdwa add with BYTE_0 src operand.

The reason being the pseudo instruction is broken down into VOP3 instruction pair of V_ADD_CO_U32_e64 and V_ADDC_U32_e64.
The sdwa pass attempts lowering them to their VOP2 form before converting them into sdwa instructions. However V_ADDC_U32_e64
cannot be shrunk to it's VOP2 form if it has non-reg src1 operand.
This change attempts to fix that problem by only shrinking V_ADD_CO_U32_e64 instruction.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D136663
2022-11-17 10:01:40 +05:30
Craig Topper 7e15ea102f [RISCV] Add a DAG combine to pre-promote (i1 (truncate (i32 (srl X, Y)))) with Zbs on RV64.
Type legalization will want to turn (srl X, Y) into RISCVISD::SRLW,
which will prevent us from using a BEXT instruction.

This is similar to what we do for (i32 (and (srl X, Y), 1)).
2022-11-16 19:07:33 -08:00
Koakuma fd0aeaa83a [SPARC] Don't emit deprecated FP branches when targeting v9
Don't emit deprecated v8-style FP compares & branches when targeting v9
processors.

For now, always use %fcc0, because currently the allocator requires allocatable
registers to also be spillable, which isn't the case with v9 FCC registers.

The work to enable allocation over the entire FCC register file will be done in
a future patch.

Fixes bug #17834

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135515
2022-11-16 20:56:17 -05:00
Koakuma 586d5f91e6 [SPARC] Improve integer branch handling for v9 targets
Do not emit deprecated v8-style branches when targeting a v9 processor.

As a side effect, this also fixes the emission of useless ba's when doing
conditional branches on 64-bit integer values.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D130006
2022-11-16 20:51:20 -05:00
gonglingqin 825547247a [LoongArch] Eliminate extra un-accounted-for successors
Specifically:
```
*** Bad machine code: MBB has unexpected successors which are not branch targets, fallthrough, EHPads, or inlineasm_br targets. ***
- function:    atomicrmw_umax_i8_acquire
- basic block: %bb.3  (0x1b90bd8)

*** Bad machine code: Non-terminator instruction after the first terminator ***
- function:    atomicrmw_umax_i8_acquire
- basic block: %bb.3  (0x1b90bd8)
- instruction: DBAR 1792
```

Differential Revision: https://reviews.llvm.org/D137884
2022-11-17 09:44:59 +08:00
wanglei 7da2d69da6 [LoongArch] Transfer MI flags when expand PseudoCALL
When expanding a PseudoCALL, the corresponding flags (e.g. nomerge)
need to be passed to the new instruction.

This patch also adds test for the nomerge attribute.

The `nomerge` attribute was added during `LowerCall`, but was lost
during expand PseudoCALL. Now add it back.

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D137888
2022-11-17 09:25:10 +08:00
Craig Topper 5c9b03faef [RISCV] Remove duplicate setOperationAction. NFC 2022-11-16 16:54:27 -08:00
Matt Arsenault 3830e4e58c AMDGPU: Create poison values instead of undef
These placeholders don't care about the finer points on
the difference between the two.
2022-11-16 14:47:24 -08:00
Krzysztof Parzyszek 1aa7bd09a9 [Hexagon] Rearrange bits in TSFlags, NFC 2022-11-16 11:02:07 -08:00
Simon Pilgrim becf7b2259 [X86] Remove unnecessary override GFNI AFFINE reg-reg overrides from AlderlakeP model
Now matches the default SchedWriteVecIMul values used for the instruction.

NOTE: The folded variant overrides are still there as the latency differs by 1cy
2022-11-16 17:46:29 +00:00
Sander de Smalen 6f48e68d39 [SME] Store buffer to the correct pointer when setting up lazy-save.
This fixes a bug in 'allocateLazySaveBuffer' that led to the
buffer pointer being stored to the wrong address.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D137734
2022-11-16 16:37:33 +00:00
Nicholas Guy 41a3f92596 [AArch64][CodeGen] Add AArch64 support for complex deinterleaving
Differential Revision: https://reviews.llvm.org/D129066
2022-11-16 14:00:54 +00:00
Dmitry Preobrazhensky e468b1b740 [AMDGPU][GFX11] Refactor VOPD operands handling
Differential Revision: https://reviews.llvm.org/D137952
2022-11-16 16:29:12 +03:00
David Green 71609871dd [AArch64][MachineCombiner] Use MIMetadata to copy pcsections metadata to reassociated instructions.
D134260/D138107 exposed that the MachineCombiner was not copying
pcsections metadata where it should. This patch switches the MIBuild
methods to use MIMetadata that can copy the debug loc and pcsections at
the same time.

Differential Revision: https://reviews.llvm.org/D138112
2022-11-16 13:22:48 +00:00
David Green 5f7f484ee5 [AArch64] Add GPR rr instructions to isAssociativeAndCommutative
This adds some more scalar instructions that are both associative and
commutative to isAssociativeAndCommutative, allowing the machine
combiner to reassociate them to reduce critical path length.

Differential Revision: https://reviews.llvm.org/D134260
2022-11-16 12:39:13 +00:00